Files
agent-os/CLAUDE.md
T

8.5 KiB

NxM Infrastructure Project

Primary Linux Server

  • IP: 172.27.40.3
  • User: nxm
  • Password: 6589
  • OS: Ubuntu Server (LTS)
  • Docker stack root: /opt/stacks/

Network

VLAN Name Subnet Gateway
40 Servers40 172.27.40.0/24 172.27.6.1
20 Workshop20 172.27.20.0/24 172.27.6.1
10 IoT10 172.27.10.0/24 172.27.6.1

Key Devices

Device IP Role
OPNsense Firewall 172.27.6.1 Firewall, router, DHCP
Ubuntu Server 172.27.40.3 Docker host, Headscale
TrueNAS 172.27.40.5 NAS storage
Home Assistant 172.27.10.6 Home automation (IoT10)

Docker Stacks & Ports

Stack Path Port
Portainer /opt/stacks/portainer/ 9443 (HTTPS)
Nginx Proxy Manager /opt/stacks/nginx-proxy-manager/ 80, 81, 443
Uptime Kuma /opt/stacks/uptime-kuma/ 3002
Zabbix /opt/stacks/zabbix/ 8091
Headscale /opt/stacks/headscale/ 8080 (internal)
Headplane /opt/stacks/headplane/ 3001
Headscale UI /opt/stacks/headscale-ui/ 3005
Homarr /opt/stacks/homarr/ 7575
Dashy /opt/stacks/dashy/ 4000
Vaultwarden /opt/stacks/vaultwarden/ 8222
Netbird /opt/stacks/netbird/ 3479/udp STUN
Caddy (Netbird sidecar) /opt/stacks/caddy-netbird/ 8443/tcp — gRPC proxy for Netbird clients
Plane /opt/stacks/plane/ 8095 (HTTP, via NPM)
Gitea /opt/stacks/gitea/ 3000 (web), 2222 (SSH git) — self-hosted git, infrastructure docs
Open WebUI /opt/stacks/open-webui/ 3010 — Chat UI for Ollama + MCP (replaced Flowise 2026-05-01)
agent-sites /opt/stacks/sites/ internal only (proxy network) — nginx:alpine serving /opt/sites/ at agents.nxm.co.za
hodor-gateway /opt/stacks/hodor-gateway/ 8200 — FastAPI agent gateway, POST /ask → Ollama
bran-changelog /opt/stacks/bran-changelog/ one-shot container, run.sh + cron 06:00 daily
citadel-mcp /opt/stacks/citadel-mcp/ 8300 — MCP SSE+HTTP server, tools: list_agents/get_agent_status/get_agent_output/web_search
varys-monitor /opt/stacks/varys-monitor/ one-shot container, run.sh + cron every 15 min
sam-research /opt/stacks/sam-research/ 8500 — Research agent, POST /research → SearXNG + Ollama
searxng /opt/stacks/searxng/ 8600 — Self-hosted search backend (internal only, used by sam + citadel)

Public Subdomains (via NPM + Let's Encrypt)

Subdomain Internal Target
headscale.nxm.co.za 172.27.40.3:8080
vault.nxm.co.za 172.27.40.3:8222
kuma.nxm.co.za 172.27.40.3:3002
zabbix.nxm.co.za 172.27.40.3:8091
netbird.nxm.co.za netbird-dashboard:80 via NPM (dashboard UI only)
netbird.nxm.co.za:8443 Caddy sidecar → netbird-server:80 (client management, gRPC)
plane.nxm.co.za 172.27.40.3:8095
git.nxm.co.za 172.27.40.3:3000
rmm.nxm.co.za 172.27.40.4:443
api.nxm.co.za 172.27.40.4:443
mesh.nxm.co.za 172.27.40.4:4430

OPNsense Split DNS (Unbound Host Overrides)

All subdomains resolve to 172.27.40.3 internally via Unbound overrides. If a subdomain isn't resolving internally, check:

  1. OPNsense → Services → Unbound DNS → Overrides
  2. Tailscale client on Windows — disconnect it, as it overrides local DNS

Headscale (VPN)

  • Version: v0.28
  • Public URL: https://headscale.nxm.co.za
  • Policy mode: database (live apply via API)
  • v0.28 breaking change: All write operations require numeric user ID, not username
    • Get IDs: headscale users list
    • Example: headscale preauthkeys create --user 13
  • Config: /opt/stacks/headscale/config/config.yaml
  • After restart, always check: tailscale status — if server node offline: sudo systemctl restart tailscaled

Vaultwarden

  • Admin panel: https://vault.nxm.co.za/admin (token in OneNote)
  • Signups currently OPEN (disable after all 4 users register)
  • To disable: set SIGNUPS_ALLOWED=false in /opt/stacks/vaultwarden/.envdocker compose up -d
  • Admin token needs Argon2 upgrade: docker exec -it vaultwarden /vaultwarden hash --preset owasp

Domain Knowledge

  • Networking: VLANs, inter-VLAN routing, firewall rules, NAT, split DNS, DHCP — comfortable at OPNsense config level, not just GUI clicks
  • DNS: Unbound overrides, split-horizon, DNS-over-TLS, troubleshooting resolution order (Tailscale/Netbird conflict patterns)
  • VPN: WireGuard fundamentals, Headscale (self-hosted Tailscale), Netbird (self-hosted, embedded Dex IdP), relay vs P2P, ACL/policy models
  • Docker: Compose stacks, container networking, volume mounts, healthchecks, reverse proxy patterns — not Kubernetes
  • Linux: Ubuntu Server admin, systemd, cron, file permissions, basic shell scripting — not a kernel developer
  • Reverse proxy: NPM (OpenResty), Caddy — knows the difference between HTTP/HTTPS termination and gRPC proxying
  • Self-hosted services: IPAM (NetBox), monitoring (Zabbix, Uptime Kuma, VictoriaMetrics, Grafana), dashboards (Homarr, Dashy), secrets (Vaultwarden)
  • Not expert in: Kubernetes, cloud platforms (AWS/Azure/GCP), advanced Python (learning), application development

Agent Guidelines

  • Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset)
  • Never store credentials in output, logs, or generated files — reference credential location instead
  • Always return structured output (JSON or markdown table) unless plain text is explicitly requested
  • When suggesting shell commands, use PowerShell syntax for Windows-side tasks, bash for Linux server tasks
  • SSH to Linux server always uses the Posh-SSH pattern defined in the Shell & Tools section
  • Docker commands always use docker compose (not docker-compose) with -q on pulls
  • If a task touches Headscale, Netbird, or NPM config — flag the relevant Known Issues entry before proceeding
  • Prefer idempotent operations — scripts should be safe to run more than once
  • When uncertain about current server state, ask rather than assume

Credentials Location

  • All service passwords: Jaco's OneNote + Nexum Password Spreadsheet
  • No passwords stored in code or config files (except SSH for automation)

Documentation (Gitea)

  • Obsidian vaults migrated to Gitea on 2026-04-30
  • nxm-infrastructure repo: https://git.nxm.co.za/admin/nxm-infrastructure
    • Local path: /home/nxm/Documents/NxM Linux Server/
    • Remote: gitea-local:admin/nxm-infrastructure.git (SSH)
  • nexum-projects repo: https://git.nxm.co.za/admin/nexum-projects
    • Local path: /home/nxm/Documents/Nexum Projects/
    • Remote: gitea-local:admin/nexum-projects.git (SSH)
  • agent-os repo: https://git.nxm.co.za/admin/agent-os
    • Local path: /home/nxm/Documents/agent-os/
    • Server runtime: /opt/agent-os/ on 172.27.40.3
    • Remote: gitea-local:admin/agent-os.git (SSH)
  • At end of any session that changes infrastructure: update relevant markdown files, commit and push
  • Key files to update after infrastructure changes:
    • Home.md — master index
    • Quick Reference/IP & Port Map.md — any new service
    • Quick Reference/Docker Stacks.md — any new stack
    • Relevant Services/<name>.md — service-specific docs
    • Troubleshooting/ — any new known issues

Known Issues & Gotchas

  • Docker pull logs are very verbose — always use -q flag
  • Headscale YAML: never add duplicate policy: block (causes crash)
  • Homarr board names must be slugs (no spaces)
  • Vaultwarden requires HTTPS — LAN IP (port 8222) will show crypto error, use vault.nxm.co.za
  • Tailscale client on Windows overrides DNS — disconnect when testing split DNS changes
  • Chrome profile-specific links always open in last active profile (Windows limitation)
  • NPM forward scheme is HTTP even for HTTPS external — NPM handles SSL termination
  • Netbird STUN is on 3479/udp (not 3478 — Headscale owns that)
  • Netbird client management URL is port 8443 (Caddy sidecar) — NOT 443
  • NPM (OpenResty) has no gRPC module — Caddy sidecar is the workaround until Traefik migration
  • Netbird config.yaml contains authSecret + encryptionKey — back this file up, losing it breaks all peers
  • Servers running Tailscale must run sudo tailscale set --accept-dns=false before joining Netbird (Tailscale DNS overrides Unbound and resolves via public IP, breaking gRPC hairpin)
  • Open WebUI → Citadel MCP: auth_type must be none — empty bearer key generates an illegal header and the connection silently fails
  • Open WebUI connects via Streamable HTTP POST at http://citadel-mcp:8300/mcp — do NOT use /sse (Open WebUI 0.9+ only supports POST-based transport)