docs: comprehensive update — bring all Agent OS docs current for LLM onboarding
All files were 5-7 weeks stale. Updated brain.md (complete service/agent/VPN/cron inventory), identity.md (current expertise + infra context), CLAUDE.md (full agent ecosystem table, Citadel tool registry, gotchas), README.md (LLM quick-start guide), all memory files (current projects, decisions, constraints, persistent facts), and infra-monitor skill.md (current container list with criticality tiers). Also fixed: git remote switched from HTTP+embedded-token to SSH, removed references to decommissioned services (Netbird, WireGuard, Flowise, Zabbix), corrected Ollama IP (172.27.40.20), TrueNAS IP (172.27.40.220), and added 20+ services/agents that were built since the last commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,28 +1,84 @@
|
||||
# Agent OS — Project CLAUDE.md
|
||||
|
||||
## What This Project Is
|
||||
Personal Agentic Operating System. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation.
|
||||
- Runtime: `/opt/agent-os/` on 172.27.40.3
|
||||
- Gitea: `git.nxm.co.za/admin/agent-os`
|
||||
- Edit clone (server): `/home/nxm/Documents/agent-os/` (clone pending)
|
||||
Personal Agentic Operating System for NxM / Nexum SA infrastructure. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation. Plain markdown files — no databases, no vendor lock-in.
|
||||
|
||||
- **Runtime:** `/opt/agent-os/` on 172.27.40.3
|
||||
- **Gitea:** `git.nxm.co.za/admin/agent-os` (SSH: `gitea-local:admin/agent-os.git`)
|
||||
- **Owner:** Jaco Bezuidenhout, Nexum SA (PTY) Ltd
|
||||
|
||||
## Current Phase
|
||||
|
||||
| Phase | Status |
|
||||
|---|---|
|
||||
| 1 — NFS export + Kubuntu mount | ✓ DONE 2026-05-01 (NFS no longer needed — consolidated to server) |
|
||||
| 2 — Identity interview → identity.md populated | ✓ DONE 2026-05-01 |
|
||||
| **3 — infra-monitor skill** | **NEXT** |
|
||||
| 1 — NFS export + mount | DONE 2026-05-01 (NFS no longer needed — consolidated to server) |
|
||||
| 2 — Identity interview → identity.md | DONE 2026-05-01 |
|
||||
| 3 — infra-monitor skill | NEXT (spec at `skills/infra-monitor/skill.md`, needs update) |
|
||||
| 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 |
|
||||
| 5 — Future skills (backup monitor, peer health, log digest) | Future |
|
||||
| 5 — Future skills (backup monitor, log digest) | Future |
|
||||
|
||||
## Live Agent Ecosystem (as of 2026-06-19)
|
||||
|
||||
All agents run as Docker containers on 172.27.40.3 unless noted. Every agent writes to `/opt/agent-os/logs/<agent>/last-run.json`.
|
||||
|
||||
### Always-on agents
|
||||
| Agent | Port | Stack Path | Role |
|
||||
|---|---|---|---|
|
||||
| citadel-mcp | 8300 | `/opt/stacks/citadel-mcp/` | MCP tool server (37 tools: Docker, Plane, TRMM, Directus, files, web search) |
|
||||
| raven-notify | 8400 | `/opt/stacks/raven-notify/` | Notification hub — Discord webhook + Gmail SMTP |
|
||||
| sam-research | 8500 | `/opt/stacks/sam-research/` | SearXNG + Ollama research agent |
|
||||
| qyburn-coder | 8700 | `/opt/stacks/qyburn-coder/` | LLM coding agent with approve/reject workflow |
|
||||
| maester-reports | 8800 | `/opt/stacks/maester-reports/` | NIST CSF compliance reports (⚠ restart = Anthropic API cost) |
|
||||
| jon-snow | 8900 | `/opt/stacks/jon-snow/` | Chief of staff orchestrator, HMAC approval gate |
|
||||
| hodor-gateway | 8200 | `/opt/stacks/hodor-gateway/` | Simple Ollama gateway (POST /ask) |
|
||||
| tarly-backup | 8750 | `/opt/stacks/tarly-backup/` | Backup monitoring — OPNsense configs + Proxmox |
|
||||
| hermes-cloud | 8643 | `/opt/stacks/hermes-cloud/` | Claude-sonnet brain, Citadel MCP wired |
|
||||
| hermes-native | VM 108 | native on 172.27.40.30 | Primary conversational agent — OpenRouter, Honcho memory, WhatsApp, dashboard at hermes.nxm.co.za:9119 |
|
||||
|
||||
### Scheduled/one-shot agents
|
||||
| Agent | Schedule | Stack Path | Role |
|
||||
|---|---|---|---|
|
||||
| bran-changelog | Daily 06:00 | `/opt/stacks/bran-changelog/` | Git changelog generator |
|
||||
| varys-monitor | Every 15 min | `/opt/stacks/varys-monitor/` | HTTP reachability checks for all services |
|
||||
|
||||
### Support agents (via Hermes Native, VM 108)
|
||||
| Agent | Role |
|
||||
|---|---|
|
||||
| vexis (workshop profile) | Nexum workshop agent — TRMM script execution on client devices |
|
||||
|
||||
### Integrations running as cron (not standalone agents)
|
||||
| Job | Schedule | Script |
|
||||
|---|---|---|
|
||||
| ovpn-status.py | Every 1 min | `/opt/stacks/monitoring/ovpn-status.py` |
|
||||
| trmm-frappe-sync.py | Every 30 min | `/opt/stacks/monitoring/trmm-frappe-sync.py` |
|
||||
| zenarmor-pull.py | Daily 06:00 | `/opt/stacks/monitoring/zenarmor-pull.py` |
|
||||
| hub-backup.sh | Daily 02:05 | `/opt/stacks/tarly-backup/hub-backup.sh` |
|
||||
|
||||
## Citadel MCP Tool Registry (37 tools)
|
||||
|
||||
The central tool server that other agents call via MCP protocol:
|
||||
|
||||
**File operations:** read_file, write_file, list_files, delete_file, propose_file_change
|
||||
**Docker:** docker_list_containers, docker_container_stats, docker_stack_list, docker_rebuild
|
||||
**Plane (project management):** plane_add_issue, plane_get_issues, plane_list_projects, plane_create_project, plane_create_page, plane_update_issue
|
||||
**TRMM (remote management):** trmm_list_agents, trmm_get_agent, trmm_list_scripts, trmm_add_script, trmm_delete_script, trmm_run_script, trmm_confirm_with_user, trmm_sync_now
|
||||
**Directus (CRM):** directus_list_clients, directus_get_client, directus_get_client_services, directus_get_renewals, directus_upcoming_renewals
|
||||
**Other:** get_agent_status, get_agent_output, list_agents, qyburn_task, qyburn_status, qyburn_approve, sam_research, web_search, proxmox_backup_status
|
||||
|
||||
## Agent Web Pages
|
||||
|
||||
Static HTML dashboards served at `agents.nxm.co.za/<name>/` from `/opt/sites/`:
|
||||
- agents-dashboard, bran, changelog, citadel, hermes, hermes-native, hodor, jon-snow, qyburn, raven, sam, security-review, setup, stock, swarm, tarly, varys, workflow-test
|
||||
|
||||
## Phase 3 — infra-monitor (NEXT)
|
||||
Skill scaffold at `skills/infra-monitor/skill.md`. Ready to implement after spec update.
|
||||
|
||||
Skill scaffold at `skills/infra-monitor/skill.md`. **Spec is stale — needs update before building.**
|
||||
|
||||
**Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate.
|
||||
|
||||
**Before building:**
|
||||
- Update `skills/infra-monitor/skill.md` — container list is stale (has Flowise, missing Open WebUI + all new agents)
|
||||
- Correct Ollama URL: now `http://172.27.40.20:11434` (migrated from 172.27.6.139)
|
||||
- Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ services)
|
||||
- Ollama URL is now `http://172.27.40.20:11434`
|
||||
- Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script?
|
||||
|
||||
**Output targets:**
|
||||
@@ -30,38 +86,59 @@ Skill scaffold at `skills/infra-monitor/skill.md`. Ready to implement after spec
|
||||
- `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog
|
||||
- Raven alert on critical: `http://raven-notify:8400`
|
||||
|
||||
**Schedule:** hourly heartbeat (Docker + Ollama only) + daily 07:00 full digest
|
||||
|
||||
## Directory Structure
|
||||
```
|
||||
/opt/agent-os/
|
||||
├── CLAUDE.md ← this file (project brief, tracked in Gitea)
|
||||
├── identity.md ← populated Phase 2
|
||||
├── brain.md
|
||||
├── CLAUDE.md ← this file (project brief)
|
||||
├── README.md ← onboarding for new LLMs
|
||||
├── identity.md ← who the user is, hard limits
|
||||
├── brain.md ← all infra facts, IPs, services, decisions
|
||||
├── memory/
|
||||
│ ├── active-projects.md ← update at end of each session
|
||||
│ ├── persistent.md
|
||||
│ ├── recent-decisions.md
|
||||
│ ├── constraints.md
|
||||
│ └── notes-from-last-run.md
|
||||
├── context/
|
||||
│ ├── active-projects.md ← what's in flight right now
|
||||
│ ├── persistent.md ← facts that never expire
|
||||
│ ├── recent-decisions.md ← decisions from last 30 days
|
||||
│ ├── constraints.md ← hard limits agents must respect
|
||||
│ └── notes-from-last-run.md ← cleared each session
|
||||
├── claude-code/
|
||||
│ └── memory/ ← Claude Code's persistent memory files (symlinked)
|
||||
├── skills/
|
||||
│ └── infra-monitor/ ← Phase 3 target
|
||||
│ ├── skill.md ← spec (stale container list — update before building)
|
||||
│ └── infra-monitor/ ← Phase 3 target (not yet built)
|
||||
│ ├── skill.md
|
||||
│ ├── learnings.md
|
||||
│ ├── eval.json
|
||||
│ ├── last-output.md
|
||||
│ └── context/handoff.md
|
||||
└── logs/
|
||||
└── logs/ ← all agent log outputs
|
||||
├── bran-changelog/
|
||||
├── citadel-mcp/
|
||||
├── jon-snow/
|
||||
├── qyburn-coder/
|
||||
├── raven-notify/
|
||||
├── sam-research/
|
||||
├── tarly-backup/
|
||||
├── trmm-frappe-sync/
|
||||
└── varys-monitor/
|
||||
```
|
||||
|
||||
## Architecture
|
||||
- LLM inference: Kubuntu Ollama at `http://172.27.40.20:11434`
|
||||
- All agent output: `/opt/sites/<name>/` served at agents.nxm.co.za
|
||||
- Log standard: `/opt/agent-os/logs/<skill>/last-run.json`
|
||||
- Notifications: Raven at `http://raven-notify:8400`
|
||||
|
||||
## Pending — Gitea SSH Key (security debt)
|
||||
Server remote uses HTTP with embedded token. Before next token rotation:
|
||||
1. Add SSH key for `nxm@172.27.40.3` to Gitea (Admin → Settings → SSH Keys)
|
||||
2. `cd /opt/agent-os && git remote set-url origin gitea-local:admin/agent-os.git`
|
||||
- **LLM inference:** Ollama at `http://172.27.40.20:11434` (gemma4, llama3.1:8b, phi4) + Anthropic API (Claude Code, Hermes)
|
||||
- **Agent output pages:** `/opt/sites/<name>/` served at agents.nxm.co.za
|
||||
- **Log standard:** `/opt/agent-os/logs/<agent>/last-run.json`
|
||||
- **Notifications:** Raven at `http://raven-notify:8400` (Discord + Gmail)
|
||||
- **Task tracking:** Plane at plane.nxm.co.za
|
||||
- **Client CRM:** Directus at directus.nxm.co.za
|
||||
- **Client devices:** Tactical RMM at 172.27.40.4 (45 agents, 13 clients)
|
||||
- **Helpdesk:** Frappe Helpdesk at helpdesk.nxm.co.za (VM 109)
|
||||
- **Credentials:** `~/.nxm-keys` (chmod 600) — ONLY place credential values live
|
||||
|
||||
## Key Gotchas
|
||||
|
||||
- **maester-reports restart = Anthropic API cost** — cache is in-memory only
|
||||
- **Open WebUI → Citadel MCP:** auth_type must be `none` (empty bearer key = silent failure)
|
||||
- **Docker → OPNsense API:** Docker proxy network can't reach 172.27.6.1 (HTTP 400) — run from host
|
||||
- **Headscale v0.28:** all write operations require numeric user ID, not username
|
||||
- **Vaultwarden:** requires HTTPS — use vault.nxm.co.za, not LAN IP
|
||||
- **Tailscale on Windows:** overrides DNS — disconnect when testing split DNS
|
||||
- **NPM forward scheme:** HTTP even for HTTPS external — NPM handles SSL termination
|
||||
- **NocoDB:** RvDM personal birthday DB only — never use for Nexum projects
|
||||
|
||||
Reference in New Issue
Block a user