Skill: infra-monitor

Monitors server health and watches all Agent OS skills for staleness or errors. Runs on a cron schedule on 172.27.40.3.

Inputs

Reads before executing:

All expected containers are running (not exited/restarting)
Flag any container that has restarted more than 3 times in the last hour
Expected containers: portainer, nginx-proxy-manager, uptime-kuma, gitea, headscale, netbird, vaultwarden, flowise, plane, zabbix, homarr

Lightweight HTTP check (curl, timeout 5s) on each internal URL:

For each skill directory under ../../skills/:

Write a digest to last-output.md in this format:

Pass anomalies to context/handoff.md for notification skill (future).

After writing output:

Update learnings.md with anything that went wrong or could be improved
Append a one-line log entry to ../../logs/infra-monitor.log: YYYY-MM-DD HH:MM | status | summary
Update ../../memory/notes-from-last-run.md