Files
Claude Code 6cebab9a4a docs: comprehensive update — bring all Agent OS docs current for LLM onboarding
All files were 5-7 weeks stale. Updated brain.md (complete service/agent/VPN/cron
inventory), identity.md (current expertise + infra context), CLAUDE.md (full agent
ecosystem table, Citadel tool registry, gotchas), README.md (LLM quick-start guide),
all memory files (current projects, decisions, constraints, persistent facts), and
infra-monitor skill.md (current container list with criticality tiers).

Also fixed: git remote switched from HTTP+embedded-token to SSH, removed references
to decommissioned services (Netbird, WireGuard, Flowise, Zabbix), corrected Ollama
IP (172.27.40.20), TrueNAS IP (172.27.40.220), and added 20+ services/agents that
were built since the last commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-19 17:15:45 +00:00

145 lines
7.7 KiB
Markdown

# Agent OS — Project CLAUDE.md
## What This Project Is
Personal Agentic Operating System for NxM / Nexum SA infrastructure. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation. Plain markdown files — no databases, no vendor lock-in.
- **Runtime:** `/opt/agent-os/` on 172.27.40.3
- **Gitea:** `git.nxm.co.za/admin/agent-os` (SSH: `gitea-local:admin/agent-os.git`)
- **Owner:** Jaco Bezuidenhout, Nexum SA (PTY) Ltd
## Current Phase
| Phase | Status |
|---|---|
| 1 — NFS export + mount | DONE 2026-05-01 (NFS no longer needed — consolidated to server) |
| 2 — Identity interview → identity.md | DONE 2026-05-01 |
| 3 — infra-monitor skill | NEXT (spec at `skills/infra-monitor/skill.md`, needs update) |
| 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 |
| 5 — Future skills (backup monitor, log digest) | Future |
## Live Agent Ecosystem (as of 2026-06-19)
All agents run as Docker containers on 172.27.40.3 unless noted. Every agent writes to `/opt/agent-os/logs/<agent>/last-run.json`.
### Always-on agents
| Agent | Port | Stack Path | Role |
|---|---|---|---|
| citadel-mcp | 8300 | `/opt/stacks/citadel-mcp/` | MCP tool server (37 tools: Docker, Plane, TRMM, Directus, files, web search) |
| raven-notify | 8400 | `/opt/stacks/raven-notify/` | Notification hub — Discord webhook + Gmail SMTP |
| sam-research | 8500 | `/opt/stacks/sam-research/` | SearXNG + Ollama research agent |
| qyburn-coder | 8700 | `/opt/stacks/qyburn-coder/` | LLM coding agent with approve/reject workflow |
| maester-reports | 8800 | `/opt/stacks/maester-reports/` | NIST CSF compliance reports (⚠ restart = Anthropic API cost) |
| jon-snow | 8900 | `/opt/stacks/jon-snow/` | Chief of staff orchestrator, HMAC approval gate |
| hodor-gateway | 8200 | `/opt/stacks/hodor-gateway/` | Simple Ollama gateway (POST /ask) |
| tarly-backup | 8750 | `/opt/stacks/tarly-backup/` | Backup monitoring — OPNsense configs + Proxmox |
| hermes-cloud | 8643 | `/opt/stacks/hermes-cloud/` | Claude-sonnet brain, Citadel MCP wired |
| hermes-native | VM 108 | native on 172.27.40.30 | Primary conversational agent — OpenRouter, Honcho memory, WhatsApp, dashboard at hermes.nxm.co.za:9119 |
### Scheduled/one-shot agents
| Agent | Schedule | Stack Path | Role |
|---|---|---|---|
| bran-changelog | Daily 06:00 | `/opt/stacks/bran-changelog/` | Git changelog generator |
| varys-monitor | Every 15 min | `/opt/stacks/varys-monitor/` | HTTP reachability checks for all services |
### Support agents (via Hermes Native, VM 108)
| Agent | Role |
|---|---|
| vexis (workshop profile) | Nexum workshop agent — TRMM script execution on client devices |
### Integrations running as cron (not standalone agents)
| Job | Schedule | Script |
|---|---|---|
| ovpn-status.py | Every 1 min | `/opt/stacks/monitoring/ovpn-status.py` |
| trmm-frappe-sync.py | Every 30 min | `/opt/stacks/monitoring/trmm-frappe-sync.py` |
| zenarmor-pull.py | Daily 06:00 | `/opt/stacks/monitoring/zenarmor-pull.py` |
| hub-backup.sh | Daily 02:05 | `/opt/stacks/tarly-backup/hub-backup.sh` |
## Citadel MCP Tool Registry (37 tools)
The central tool server that other agents call via MCP protocol:
**File operations:** read_file, write_file, list_files, delete_file, propose_file_change
**Docker:** docker_list_containers, docker_container_stats, docker_stack_list, docker_rebuild
**Plane (project management):** plane_add_issue, plane_get_issues, plane_list_projects, plane_create_project, plane_create_page, plane_update_issue
**TRMM (remote management):** trmm_list_agents, trmm_get_agent, trmm_list_scripts, trmm_add_script, trmm_delete_script, trmm_run_script, trmm_confirm_with_user, trmm_sync_now
**Directus (CRM):** directus_list_clients, directus_get_client, directus_get_client_services, directus_get_renewals, directus_upcoming_renewals
**Other:** get_agent_status, get_agent_output, list_agents, qyburn_task, qyburn_status, qyburn_approve, sam_research, web_search, proxmox_backup_status
## Agent Web Pages
Static HTML dashboards served at `agents.nxm.co.za/<name>/` from `/opt/sites/`:
- agents-dashboard, bran, changelog, citadel, hermes, hermes-native, hodor, jon-snow, qyburn, raven, sam, security-review, setup, stock, swarm, tarly, varys, workflow-test
## Phase 3 — infra-monitor (NEXT)
Skill scaffold at `skills/infra-monitor/skill.md`. **Spec is stale — needs update before building.**
**Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate.
**Before building:**
- Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ services)
- Ollama URL is now `http://172.27.40.20:11434`
- Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script?
**Output targets:**
- `/opt/sites/infra-monitor/index.html` — web dashboard at agents.nxm.co.za/infra-monitor/
- `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog
- Raven alert on critical: `http://raven-notify:8400`
## Directory Structure
```
/opt/agent-os/
├── CLAUDE.md ← this file (project brief)
├── README.md ← onboarding for new LLMs
├── identity.md ← who the user is, hard limits
├── brain.md ← all infra facts, IPs, services, decisions
├── memory/
│ ├── active-projects.md ← what's in flight right now
│ ├── persistent.md ← facts that never expire
│ ├── recent-decisions.md ← decisions from last 30 days
│ ├── constraints.md ← hard limits agents must respect
│ └── notes-from-last-run.md ← cleared each session
├── claude-code/
│ └── memory/ ← Claude Code's persistent memory files (symlinked)
├── skills/
│ └── infra-monitor/ ← Phase 3 target (not yet built)
│ ├── skill.md
│ ├── learnings.md
│ ├── eval.json
│ ├── last-output.md
│ └── context/handoff.md
└── logs/ ← all agent log outputs
├── bran-changelog/
├── citadel-mcp/
├── jon-snow/
├── qyburn-coder/
├── raven-notify/
├── sam-research/
├── tarly-backup/
├── trmm-frappe-sync/
└── varys-monitor/
```
## Architecture
- **LLM inference:** Ollama at `http://172.27.40.20:11434` (gemma4, llama3.1:8b, phi4) + Anthropic API (Claude Code, Hermes)
- **Agent output pages:** `/opt/sites/<name>/` served at agents.nxm.co.za
- **Log standard:** `/opt/agent-os/logs/<agent>/last-run.json`
- **Notifications:** Raven at `http://raven-notify:8400` (Discord + Gmail)
- **Task tracking:** Plane at plane.nxm.co.za
- **Client CRM:** Directus at directus.nxm.co.za
- **Client devices:** Tactical RMM at 172.27.40.4 (45 agents, 13 clients)
- **Helpdesk:** Frappe Helpdesk at helpdesk.nxm.co.za (VM 109)
- **Credentials:** `~/.nxm-keys` (chmod 600) — ONLY place credential values live
## Key Gotchas
- **maester-reports restart = Anthropic API cost** — cache is in-memory only
- **Open WebUI → Citadel MCP:** auth_type must be `none` (empty bearer key = silent failure)
- **Docker → OPNsense API:** Docker proxy network can't reach 172.27.6.1 (HTTP 400) — run from host
- **Headscale v0.28:** all write operations require numeric user ID, not username
- **Vaultwarden:** requires HTTPS — use vault.nxm.co.za, not LAN IP
- **Tailscale on Windows:** overrides DNS — disconnect when testing split DNS
- **NPM forward scheme:** HTTP even for HTTPS external — NPM handles SSL termination
- **NocoDB:** RvDM personal birthday DB only — never use for Nexum projects