docs: comprehensive update — bring all Agent OS docs current for LLM onboarding
All files were 5-7 weeks stale. Updated brain.md (complete service/agent/VPN/cron inventory), identity.md (current expertise + infra context), CLAUDE.md (full agent ecosystem table, Citadel tool registry, gotchas), README.md (LLM quick-start guide), all memory files (current projects, decisions, constraints, persistent facts), and infra-monitor skill.md (current container list with criticality tiers). Also fixed: git remote switched from HTTP+embedded-token to SSH, removed references to decommissioned services (Netbird, WireGuard, Flowise, Zabbix), corrected Ollama IP (172.27.40.20), TrueNAS IP (172.27.40.220), and added 20+ services/agents that were built since the last commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+18
-14
@@ -1,7 +1,7 @@
|
||||
# Active Projects
|
||||
|
||||
Current in-flight work. Update at the end of each session.
|
||||
Last updated: 2026-05-16
|
||||
Last updated: 2026-06-19
|
||||
|
||||
---
|
||||
|
||||
@@ -12,8 +12,8 @@ Phases 1 (NFS + mount) and 2 (identity interview) are complete.
|
||||
**Phase 3 goal:** Docker container state monitoring + system resources. Complements Varys (HTTP reachability) — do not duplicate.
|
||||
|
||||
Pre-work before implementing:
|
||||
- [ ] Update `skills/infra-monitor/skill.md` — container list is stale (has Flowise, missing Open WebUI + all new agents: citadel, varys, bran, sam, raven, qyburn, hodor, searxng, monitoring, bni-scheduler, nocodb)
|
||||
- [ ] Correct Ollama URL in skill.md: now `http://172.27.40.20:11434` (moved from 172.27.6.139)
|
||||
- [ ] Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ current services)
|
||||
- [ ] Correct Ollama URL in skill.md: now `http://172.27.40.20:11434` (moved from 172.27.6.139 → 172.27.40.20)
|
||||
- [ ] Decide implementation: Docker one-shot container (consistent with bran/varys pattern) vs host cron + shell script
|
||||
|
||||
Implementation tasks:
|
||||
@@ -26,23 +26,27 @@ Implementation tasks:
|
||||
- [ ] Hourly heartbeat cron on 172.27.40.3
|
||||
- [ ] Daily 07:00 full digest cron
|
||||
- [ ] Notification channel: Raven (confirmed live at http://raven-notify:8400)
|
||||
- [ ] Home Assistant integration (172.27.10.6) — optional, revisit after Phase 3
|
||||
|
||||
## Agent OS — Phase 5: Future Skills (Future)
|
||||
- backup-monitor: TrueNAS migrated to new hardware (172.27.40.220) — skill ready to build
|
||||
- Netbird/Headscale peer health: Netbird API at http://172.22.0.11:80/api/
|
||||
- backup-monitor: extend Tarly with deeper TrueNAS integration
|
||||
- Daily log digest: summarise /opt/agent-os/logs/ via Ollama
|
||||
|
||||
---
|
||||
|
||||
## Gitea Documentation Repos
|
||||
- [x] nxm-infrastructure repo — Obsidian vault imported, CLAUDE.md added 2026-05-16
|
||||
- [x] nexum-projects repo — Obsidian vault imported (on Kubuntu)
|
||||
- [x] agent-os repo — scaffolding created, CLAUDE.md is global symlink
|
||||
## Active Infrastructure Projects
|
||||
|
||||
| Project | Status | Next Step |
|
||||
|---|---|---|
|
||||
| **Monitoring** | bezhuis+mwp+coetzee alerts live | CPU/mem/WAN/ping Grafana rules pending |
|
||||
| **OpenVPN S2S** | bezhuis/mwp/coetzee DONE | fwlaw pending |
|
||||
| **Tarly Backup** | Hub working | bezhuis/mwp/coetzee API key fix (backup privilege) |
|
||||
| **Directus CRM** | LIVE, 12 clients seeded | Manual data enrichment (contacts, renewals) |
|
||||
| **InvenTree** | LIVE (testing) | SSL cert, production use |
|
||||
| **Mailcow** | MAIL-1+2 done | Blocked on Mimecast (MAIL-3→9) |
|
||||
| **Vexis** | nexum-private-customer-setup + office-install done | ESET/Evolve creds or standard-setup next |
|
||||
| **Maester Phase 2** | Phase 1 live | Hermes narrative + .docx generation |
|
||||
|
||||
---
|
||||
|
||||
## Pending: Gitea SSH Key (security debt)
|
||||
Server remote uses HTTP with embedded token. Before rotating:
|
||||
1. Add SSH key for `nxm@172.27.40.3` to Gitea (Admin → Settings → SSH Keys)
|
||||
2. `cd /opt/agent-os && git remote set-url origin gitea-local:admin/agent-os.git`
|
||||
## Gitea SSH Key — DONE
|
||||
Server remote switched from HTTP+token to SSH (`gitea-local:admin/agent-os.git`) on 2026-06-19.
|
||||
|
||||
+33
-5
@@ -1,13 +1,41 @@
|
||||
# Constraints
|
||||
|
||||
Hard limits agents must respect. Never work around these without explicit user confirmation.
|
||||
Last updated: 2026-04-30
|
||||
Last updated: 2026-06-19
|
||||
|
||||
---
|
||||
|
||||
- Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset, force push)
|
||||
- Never store credentials in output files, logs, or generated markdown — reference their location instead
|
||||
- Never skip git hooks or bypass signing
|
||||
- TrueNAS is on new hardware — use 172.27.40.220 (Servers40) for services, 172.27.6.221 for management/API
|
||||
## Destructive actions
|
||||
- Never delete or overwrite files without explicit confirmation
|
||||
- Never restart or stop services without explicit confirmation
|
||||
- Never drop, reset, or modify databases without explicit confirmation
|
||||
- Never force push to git or bypass hooks
|
||||
- Never run `pfctl` commands on OPNsense (risk of locking out remote access)
|
||||
|
||||
## Credentials
|
||||
- All credentials live in `~/.nxm-keys` (chmod 600) — ONLY location
|
||||
- Never store credentials in output files, logs, generated markdown, .env files, or code
|
||||
- Reference the file location, never the values
|
||||
- TrueNAS IPs: 172.27.40.220 (Servers40 data) / 172.27.6.221 (management/API)
|
||||
|
||||
## Infrastructure
|
||||
- Linux server (172.27.40.3) has no GPU — never schedule LLM inference to run locally there
|
||||
- Ollama runs on 172.27.40.20 (Windows 11 Pro) — not on the Docker host
|
||||
- Docker Compose only — no Kubernetes, no Swarm
|
||||
- Docker proxy network (172.22.0.0/16) cannot reach OPNsense API at 172.27.6.1 — always run OPNsense API scripts from the host
|
||||
- NPM handles SSL termination — internal services always use HTTP
|
||||
|
||||
## Agent-specific
|
||||
- **maester-reports:** restart clears in-memory cache → re-parses all evidence PDFs via Claude Opus vision (Anthropic API cost). Avoid unnecessary restarts.
|
||||
- **NocoDB:** RvDM personal birthday DB ONLY — never suggest for any Nexum project. Nexum data layer = Directus.
|
||||
- **Open WebUI → Citadel MCP:** auth_type must be `none`. Empty bearer key generates illegal header → silent connection failure.
|
||||
- **Qyburn task specs:** never embed code in the description field — use plain English only (14b models explain code instead of writing it)
|
||||
|
||||
## External communication
|
||||
- Never send any external message (email, webhook, Discord notification) without explicit confirmation
|
||||
- Notifications always route through Raven (http://raven-notify:8400)
|
||||
- Never expose services publicly without confirming NPM + Cloudflare + firewall implications
|
||||
|
||||
## Naming
|
||||
- S2S = always suggest Site-to-Site VPN (not Road Warrior) for permanent infrastructure endpoints
|
||||
- Use `.50+` IP range for non-firewall infrastructure devices on S2S tunnels
|
||||
|
||||
+32
-6
@@ -1,18 +1,44 @@
|
||||
# Persistent Memory
|
||||
|
||||
Facts that don't expire. If you'd have to re-explain it to a new agent every time, it belongs here.
|
||||
Last updated: 2026-04-30
|
||||
Last updated: 2026-06-19
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure decisions
|
||||
- RustDesk is self-hosted on 172.27.40.3 — clients connect to local server not public relay
|
||||
- Netbird signal+management both route through NPM on port 443 — exposedAddress in /opt/stacks/netbird/config.yaml must be https://netbird.nxm.co.za:443 (caddy-netbird on :8443 exists but is not used externally)
|
||||
- NPM handles all SSL termination — internal services use HTTP, NPM adds HTTPS
|
||||
- Headscale v0.28: all write operations require numeric user ID, not username
|
||||
- Tailscale on Windows overrides DNS — disconnect before testing split DNS changes
|
||||
- Servers running Tailscale must run `sudo tailscale set --accept-dns=false` before joining Netbird
|
||||
- Docker Compose only — no Kubernetes, no Swarm
|
||||
- Docker → OPNsense API: HTTP 400 from Docker proxy network — always run OPNsense API scripts from the host
|
||||
- All internal subdomains: gray-cloud CNAME → opnsense.nxm.co.za in Cloudflare. Proxied = 523 error.
|
||||
- OPNsense split DNS: all subdomains resolve to 172.27.40.3 internally via Unbound host overrides
|
||||
|
||||
## Decommissioned services (do not reference)
|
||||
- **Netbird:** Fully removed from server 2026-05-28. Orphaned clients on mwp/coetzee/b0qxxx/fwlaw firewalls pending removal.
|
||||
- **WireGuard (N2W):** Fully removed 2026-05-30. Replaced by OpenVPN S2S.
|
||||
- **Flowise:** Replaced by Open WebUI 2026-05-01.
|
||||
- **Zabbix:** No longer running (monitoring moved to Grafana + InfluxDB + Telegraf).
|
||||
|
||||
## Agent OS build state
|
||||
- Phase 1-2 (file structure + NFS + identity interview): not yet started
|
||||
- First skill to build: infra-monitor (Docker health + agent watchdog)
|
||||
- Notifications target: Home Assistant at 172.27.10.6
|
||||
- Phase 1-2 complete (file structure + identity interview)
|
||||
- Phase 3 (infra-monitor skill): spec written but stale, not yet implemented
|
||||
- Notifications target: Raven at http://raven-notify:8400 (Discord + Gmail)
|
||||
- All agent logs write to `/opt/agent-os/logs/<agent>/last-run.json`
|
||||
|
||||
## Credential policy
|
||||
- All API keys and passwords: `~/.nxm-keys` (chmod 600)
|
||||
- Never write credential values into output, logs, docs, or config files
|
||||
- Reference credential location instead
|
||||
|
||||
## VPN topology
|
||||
- **Headscale** (self-hosted Tailscale): remote access for admin devices
|
||||
- **OpenVPN S2S:** site-to-site for client firewalls (bezhuis/mwp/coetzee done, fwlaw pending)
|
||||
- Hub tunnel IPs: bezhuis=172.16.17.2, mwp=172.16.17.3, coetzee=172.16.17.4
|
||||
|
||||
## Ollama
|
||||
- Host: 172.27.40.20 (Windows 11 Pro, NxM-AI), Vulkan GPU
|
||||
- Models: gemma4, llama3.1:8b, phi4
|
||||
- Auto-starts via Scheduled Task (S4U + AtStartup)
|
||||
- Used by: hodor-gateway, sam-research, qyburn-coder, Open WebUI
|
||||
|
||||
@@ -1,11 +1,24 @@
|
||||
# Recent Decisions
|
||||
|
||||
Decisions made in the last 30 days that affect current work. Archive when no longer relevant.
|
||||
Last updated: 2026-04-30
|
||||
Decisions made in the last 60 days that affect current work. Archive when no longer relevant.
|
||||
Last updated: 2026-06-19
|
||||
|
||||
---
|
||||
|
||||
- **2026-04-30:** Chose Gitea (self-hosted git) over Obsidian for documentation — AI-writable, browser-accessible, version controlled
|
||||
- **2026-04-30:** Agent OS files to live on 172.27.40.3 at /opt/agent-os/, accessed from Kubuntu via NFS
|
||||
- **2026-04-29:** Chose Syncthing-free approach for Obsidian migration — NFS for Linux, SMB for Windows
|
||||
- **2026-04-29:** infra-monitor will be first Agent OS skill — covers Docker health and agent watchdog in one skill
|
||||
- **2026-06-19:** Agent OS git remote switched from HTTP+token to SSH (gitea-local:admin/agent-os.git) — security debt resolved
|
||||
- **2026-06-19:** Comprehensive Agent OS documentation update — brain.md, identity.md, all memory files brought current for LLM onboarding
|
||||
- **2026-06-18:** Coetzee OpenVPN S2S complete — monitoring-only + hub-side NAT for Active Backup to Synology DS423+
|
||||
- **2026-06-18:** Tarly backup service live — OPNsense config backups to TrueNAS NFS, Proxmox monitoring
|
||||
- **2026-06-17:** Directus CRM live — 6 collections, 12 clients seeded from TRMM, 5 Citadel MCP tools
|
||||
- **2026-06-17:** MWP Netbird fully removed, WireGuard spoke cleaned
|
||||
- **2026-06-12:** NxM-AI (Kubuntu) migrated to Windows 11 Pro — same IP 172.27.40.20, Ollama via Scheduled Task
|
||||
- **2026-06-12:** Vexis office-install/uninstall scripts live-tested, windows-update scripts done
|
||||
- **2026-06-11:** Workshop20 → Servers40 firewall rules (1677-1680) for TRMM + Vexis access
|
||||
- **2026-06-10:** Frappe Helpdesk live — TRMM→HD sync, Citadel tools, Vexis wired
|
||||
- **2026-06-10:** trmm_confirm_with_user proven working (incl. response-parsing bug fix)
|
||||
- **2026-05-30:** WireGuard fully removed, replaced by OpenVPN S2S
|
||||
- **2026-05-29:** Maester reports Phase 1 live — 8 automated CSF controls, Grafana dashboard
|
||||
- **2026-05-28:** Netbird fully removed from server
|
||||
- **2026-05-28:** ZenArmor → Grafana pipeline all 3 phases complete
|
||||
- **2026-05-27:** Jon Snow Phase 3 complete — approval gate, Discord approve/reject
|
||||
- **2026-04-30:** Agent OS architecture: plain markdown files at /opt/agent-os/, Gitea-tracked, cron-scheduled
|
||||
|
||||
Reference in New Issue
Block a user