Initial Agent OS scaffolding — identity, brain, memory, infra-monitor skill
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# Skill: infra-monitor
|
||||
|
||||
Monitors server health and watches all Agent OS skills for staleness or errors. Runs on a cron schedule on 172.27.40.3.
|
||||
|
||||
## Inputs
|
||||
|
||||
Reads before executing:
|
||||
- `../../identity.md`
|
||||
- `../../brain.md`
|
||||
- `../../memory/persistent.md`
|
||||
- `learnings.md` (this skill's improvement notes)
|
||||
|
||||
## What to check
|
||||
|
||||
### Docker health (on 172.27.40.3)
|
||||
- All expected containers are running (not exited/restarting)
|
||||
- Flag any container that has restarted more than 3 times in the last hour
|
||||
- Expected containers: portainer, nginx-proxy-manager, uptime-kuma, gitea, headscale, netbird, vaultwarden, flowise, plane, zabbix, homarr
|
||||
|
||||
### Service reachability
|
||||
Lightweight HTTP check (curl, timeout 5s) on each internal URL:
|
||||
- http://172.27.40.3:9443 (Portainer)
|
||||
- http://172.27.40.3:3002 (Uptime Kuma)
|
||||
- http://172.27.40.3:3000 (Gitea)
|
||||
- http://172.27.40.3:3010 (Flowise)
|
||||
- http://172.27.40.3:7575 (Homarr)
|
||||
- http://172.27.6.139:11434 (Ollama)
|
||||
|
||||
### Agent watchdog
|
||||
For each skill directory under `../../skills/`:
|
||||
- Check `last-output.md` modification time — flag if older than expected schedule
|
||||
- Check `../../logs/<skill-name>/` for ERROR entries in last run
|
||||
- Report: healthy / stale / erroring
|
||||
|
||||
### System resources (on 172.27.40.3)
|
||||
- Disk usage on / — warn if >80%, critical if >90%
|
||||
- Memory usage — flag if >85%
|
||||
|
||||
## Output
|
||||
|
||||
Write a digest to `last-output.md` in this format:
|
||||
- Summary line: X healthy, Y warnings, Z critical
|
||||
- Section per category: Docker, Services, Agent Watchdog, System
|
||||
- Each item: ✓ OK / ⚠ Warning / ✗ Critical + one line detail
|
||||
|
||||
Pass anomalies to `context/handoff.md` for notification skill (future).
|
||||
|
||||
## Wrap-up
|
||||
|
||||
After writing output:
|
||||
1. Update `learnings.md` with anything that went wrong or could be improved
|
||||
2. Append a one-line log entry to `../../logs/infra-monitor.log`: `YYYY-MM-DD HH:MM | status | summary`
|
||||
3. Update `../../memory/notes-from-last-run.md`
|
||||
|
||||
## Schedule
|
||||
|
||||
- **Heartbeat:** every hour — checks Docker + Ollama only (fast, <30s)
|
||||
- **Full digest:** daily at 07:00 — all checks
|
||||
Reference in New Issue
Block a user