# Agent OS — Project CLAUDE.md ## What This Project Is Personal Agentic Operating System for NxM / Nexum SA infrastructure. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation. Plain markdown files — no databases, no vendor lock-in. - **Runtime:** `/opt/agent-os/` on 172.27.40.3 - **Gitea:** `git.nxm.co.za/admin/agent-os` (SSH: `gitea-local:admin/agent-os.git`) - **Owner:** Jaco Bezuidenhout, Nexum SA (PTY) Ltd ## Current Phase | Phase | Status | |---|---| | 1 — NFS export + mount | DONE 2026-05-01 (NFS no longer needed — consolidated to server) | | 2 — Identity interview → identity.md | DONE 2026-05-01 | | 3 — infra-monitor skill | NEXT (spec at `skills/infra-monitor/skill.md`, needs update) | | 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 | | 5 — Future skills (backup monitor, log digest) | Future | ## Live Agent Ecosystem (as of 2026-06-19) All agents run as Docker containers on 172.27.40.3 unless noted. Every agent writes to `/opt/agent-os/logs//last-run.json`. ### Always-on agents | Agent | Port | Stack Path | Role | |---|---|---|---| | citadel-mcp | 8300 | `/opt/stacks/citadel-mcp/` | MCP tool server (37 tools: Docker, Plane, TRMM, Directus, files, web search) | | raven-notify | 8400 | `/opt/stacks/raven-notify/` | Notification hub — Discord webhook + Gmail SMTP | | sam-research | 8500 | `/opt/stacks/sam-research/` | SearXNG + Ollama research agent | | qyburn-coder | 8700 | `/opt/stacks/qyburn-coder/` | LLM coding agent with approve/reject workflow | | maester-reports | 8800 | `/opt/stacks/maester-reports/` | NIST CSF compliance reports (⚠ restart = Anthropic API cost) | | jon-snow | 8900 | `/opt/stacks/jon-snow/` | Chief of staff orchestrator, HMAC approval gate | | hodor-gateway | 8200 | `/opt/stacks/hodor-gateway/` | Simple Ollama gateway (POST /ask) | | tarly-backup | 8750 | `/opt/stacks/tarly-backup/` | Backup monitoring — OPNsense configs + Proxmox | | hermes-cloud | 8643 | `/opt/stacks/hermes-cloud/` | Claude-sonnet brain, Citadel MCP wired | | hermes-native | VM 108 | native on 172.27.40.30 | Primary conversational agent — OpenRouter, Honcho memory, WhatsApp, dashboard at hermes.nxm.co.za:9119 | ### Scheduled/one-shot agents | Agent | Schedule | Stack Path | Role | |---|---|---|---| | bran-changelog | Daily 06:00 | `/opt/stacks/bran-changelog/` | Git changelog generator | | varys-monitor | Every 15 min | `/opt/stacks/varys-monitor/` | HTTP reachability checks for all services | ### Support agents (via Hermes Native, VM 108) | Agent | Role | |---|---| | vexis (workshop profile) | Nexum workshop agent — TRMM script execution on client devices | ### Integrations running as cron (not standalone agents) | Job | Schedule | Script | |---|---|---| | ovpn-status.py | Every 1 min | `/opt/stacks/monitoring/ovpn-status.py` | | trmm-frappe-sync.py | Every 30 min | `/opt/stacks/monitoring/trmm-frappe-sync.py` | | zenarmor-pull.py | Daily 06:00 | `/opt/stacks/monitoring/zenarmor-pull.py` | | hub-backup.sh | Daily 02:05 | `/opt/stacks/tarly-backup/hub-backup.sh` | ## Citadel MCP Tool Registry (37 tools) The central tool server that other agents call via MCP protocol: **File operations:** read_file, write_file, list_files, delete_file, propose_file_change **Docker:** docker_list_containers, docker_container_stats, docker_stack_list, docker_rebuild **Plane (project management):** plane_add_issue, plane_get_issues, plane_list_projects, plane_create_project, plane_create_page, plane_update_issue **TRMM (remote management):** trmm_list_agents, trmm_get_agent, trmm_list_scripts, trmm_add_script, trmm_delete_script, trmm_run_script, trmm_confirm_with_user, trmm_sync_now **Directus (CRM):** directus_list_clients, directus_get_client, directus_get_client_services, directus_get_renewals, directus_upcoming_renewals **Other:** get_agent_status, get_agent_output, list_agents, qyburn_task, qyburn_status, qyburn_approve, sam_research, web_search, proxmox_backup_status ## Agent Web Pages Static HTML dashboards served at `agents.nxm.co.za//` from `/opt/sites/`: - agents-dashboard, bran, changelog, citadel, hermes, hermes-native, hodor, jon-snow, qyburn, raven, sam, security-review, setup, stock, swarm, tarly, varys, workflow-test ## Phase 3 — infra-monitor (NEXT) Skill scaffold at `skills/infra-monitor/skill.md`. **Spec is stale — needs update before building.** **Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate. **Before building:** - Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ services) - Ollama URL is now `http://172.27.40.20:11434` - Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script? **Output targets:** - `/opt/sites/infra-monitor/index.html` — web dashboard at agents.nxm.co.za/infra-monitor/ - `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog - Raven alert on critical: `http://raven-notify:8400` ## Directory Structure ``` /opt/agent-os/ ├── CLAUDE.md ← this file (project brief) ├── README.md ← onboarding for new LLMs ├── identity.md ← who the user is, hard limits ├── brain.md ← all infra facts, IPs, services, decisions ├── memory/ │ ├── active-projects.md ← what's in flight right now │ ├── persistent.md ← facts that never expire │ ├── recent-decisions.md ← decisions from last 30 days │ ├── constraints.md ← hard limits agents must respect │ └── notes-from-last-run.md ← cleared each session ├── claude-code/ │ └── memory/ ← Claude Code's persistent memory files (symlinked) ├── skills/ │ └── infra-monitor/ ← Phase 3 target (not yet built) │ ├── skill.md │ ├── learnings.md │ ├── eval.json │ ├── last-output.md │ └── context/handoff.md └── logs/ ← all agent log outputs ├── bran-changelog/ ├── citadel-mcp/ ├── jon-snow/ ├── qyburn-coder/ ├── raven-notify/ ├── sam-research/ ├── tarly-backup/ ├── trmm-frappe-sync/ └── varys-monitor/ ``` ## Architecture - **LLM inference:** Ollama at `http://172.27.40.20:11434` (gemma4, llama3.1:8b, phi4) + Anthropic API (Claude Code, Hermes) - **Agent output pages:** `/opt/sites//` served at agents.nxm.co.za - **Log standard:** `/opt/agent-os/logs//last-run.json` - **Notifications:** Raven at `http://raven-notify:8400` (Discord + Gmail) - **Task tracking:** Plane at plane.nxm.co.za - **Client CRM:** Directus at directus.nxm.co.za - **Client devices:** Tactical RMM at 172.27.40.4 (45 agents, 13 clients) - **Helpdesk:** Frappe Helpdesk at helpdesk.nxm.co.za (VM 109) - **Credentials:** `~/.nxm-keys` (chmod 600) — ONLY place credential values live ## Key Gotchas - **maester-reports restart = Anthropic API cost** — cache is in-memory only - **Open WebUI → Citadel MCP:** auth_type must be `none` (empty bearer key = silent failure) - **Docker → OPNsense API:** Docker proxy network can't reach 172.27.6.1 (HTTP 400) — run from host - **Headscale v0.28:** all write operations require numeric user ID, not username - **Vaultwarden:** requires HTTPS — use vault.nxm.co.za, not LAN IP - **Tailscale on Windows:** overrides DNS — disconnect when testing split DNS - **NPM forward scheme:** HTTP even for HTTPS external — NPM handles SSL termination - **NocoDB:** RvDM personal birthday DB only — never use for Nexum projects