426a60318f
Added: monitoring (InfluxDB/Grafana), qyburn-coder, netbox, bni-scheduler. Updated: citadel-mcp tool count (9→13), raven email/toggle endpoint. Subdomains added: grafana.nxm.co.za, netbox.nxm.co.za, agents.nxm.co.za, bni.nexum.co.za. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.5 KiB
9.5 KiB
NxM Infrastructure Project
Primary Linux Server
- IP: 172.27.40.3
- User: nxm
- Password: 6589
- OS: Ubuntu Server (LTS)
- Docker stack root:
/opt/stacks/
Network
| VLAN | Name | Subnet | Gateway |
|---|---|---|---|
| 40 | Servers40 | 172.27.40.0/24 | 172.27.6.1 |
| 20 | Workshop20 | 172.27.20.0/24 | 172.27.6.1 |
| 10 | IoT10 | 172.27.10.0/24 | 172.27.6.1 |
Key Devices
| Device | IP | Role |
|---|---|---|
| OPNsense Firewall | 172.27.6.1 | Firewall, router, DHCP |
| Ubuntu Server | 172.27.40.3 | Docker host, Headscale |
| TrueNAS | 172.27.40.5 | NAS storage |
| Home Assistant | 172.27.10.6 | Home automation (IoT10) |
Docker Stacks & Ports
| Stack | Path | Port |
|---|---|---|
| Portainer | /opt/stacks/portainer/ |
9443 (HTTPS) |
| Nginx Proxy Manager | /opt/stacks/nginx-proxy-manager/ |
80, 81, 443 |
| Uptime Kuma | /opt/stacks/uptime-kuma/ |
3002 |
| RustDesk | /opt/stacks/rustdesk/ |
21115-21116 TCP, 21116 UDP, 21117-21119 TCP — self-hosted remote desktop relay |
| Headscale | /opt/stacks/headscale/ |
8080 (internal) |
| Headplane | /opt/stacks/headplane/ |
3001 |
| Headscale UI | /opt/stacks/headscale-ui/ |
3005 |
| Homarr | /opt/stacks/homarr/ |
7575 |
| Vaultwarden | /opt/stacks/vaultwarden/ |
8222 |
| Netbird | /opt/stacks/netbird/ |
3479/udp STUN |
| Caddy (Netbird sidecar) | /opt/stacks/caddy-netbird/ |
8443/tcp — gRPC proxy for Netbird clients |
| Plane | /opt/stacks/plane/ |
8095 (HTTP, via NPM) |
| Gitea | /opt/stacks/gitea/ |
3000 (web), 2222 (SSH git) — self-hosted git, infrastructure docs |
| Open WebUI | /opt/stacks/open-webui/ |
3010 — Chat UI for Ollama + MCP (replaced Flowise 2026-05-01) |
| agent-sites | /opt/stacks/sites/ |
internal only (proxy network) — nginx:alpine serving /opt/sites/ at agents.nxm.co.za |
| hodor-gateway | /opt/stacks/hodor-gateway/ |
8200 — FastAPI agent gateway, POST /ask → Ollama |
| bran-changelog | /opt/stacks/bran-changelog/ |
one-shot container, run.sh + cron 06:00 daily |
| citadel-mcp | /opt/stacks/citadel-mcp/ |
8300 — MCP SSE+HTTP server, 13 tools incl. list_agents/web_search/qyburn_task/docker_rebuild |
| varys-monitor | /opt/stacks/varys-monitor/ |
one-shot container, run.sh + cron every 15 min |
| raven-notify | /opt/stacks/raven-notify/ |
8400 — Notification agent, POST /notify → Discord webhook + Gmail SMTP; POST /email/toggle |
| sam-research | /opt/stacks/sam-research/ |
8500 — Research agent, POST /research → SearXNG + Ollama |
| searxng | /opt/stacks/searxng/ |
8600 — Self-hosted search backend (internal only, used by sam + citadel) |
| monitoring | /opt/stacks/monitoring/ |
8086 (InfluxDB), 3020 (Grafana) — metrics from Telegraf/OPNsense, alerts → Raven |
| qyburn-coder | /opt/stacks/qyburn-coder/ |
8700 — LLM coding agent, POST /task → qwen2.5-coder:14b, approve/reject via dashboard |
| netbox | /opt/stacks/netbox/ |
8100 — IPAM, network documentation, client site discovery |
| bni-scheduler | /opt/stacks/bni-scheduler/ |
no host port (proxy only) — React SPA at bni.nexum.co.za, BNI Ignite speaker rotation |
Public Subdomains (via NPM + Let's Encrypt)
| Subdomain | Internal Target |
|---|---|
| headscale.nxm.co.za | 172.27.40.3:8080 |
| vault.nxm.co.za | 172.27.40.3:8222 |
| kuma.nxm.co.za | 172.27.40.3:3002 |
| zabbix.nxm.co.za | 172.27.40.3:8091 |
| netbird.nxm.co.za | netbird-dashboard:80 via NPM (dashboard UI only) |
| netbird.nxm.co.za:8443 | Caddy sidecar → netbird-server:80 (client management, gRPC) |
| plane.nxm.co.za | 172.27.40.3:8095 |
| git.nxm.co.za | 172.27.40.3:3000 |
| grafana.nxm.co.za | 172.27.40.3:3020 |
| netbox.nxm.co.za | 172.27.40.3:8100 |
| agents.nxm.co.za | agent-sites:80 via NPM — static files from /opt/sites/ |
| bni.nexum.co.za | bni-scheduler:80 via NPM (Cloudflare gray-cloud CNAME) |
| rmm.nxm.co.za | 172.27.40.4:443 |
| api.nxm.co.za | 172.27.40.4:443 |
| mesh.nxm.co.za | 172.27.40.4:4430 |
OPNsense Split DNS (Unbound Host Overrides)
All subdomains resolve to 172.27.40.3 internally via Unbound overrides. If a subdomain isn't resolving internally, check:
- OPNsense → Services → Unbound DNS → Overrides
- Tailscale client on Windows — disconnect it, as it overrides local DNS
Headscale (VPN)
- Version: v0.28
- Public URL: https://headscale.nxm.co.za
- Policy mode: database (live apply via API)
- v0.28 breaking change: All write operations require numeric user ID, not username
- Get IDs:
headscale users list - Example:
headscale preauthkeys create --user 13
- Get IDs:
- Config:
/opt/stacks/headscale/config/config.yaml - After restart, always check:
tailscale status— if server node offline:sudo systemctl restart tailscaled
Vaultwarden
- Admin panel: https://vault.nxm.co.za/admin (token in OneNote)
- Signups currently OPEN (disable after all 4 users register)
- To disable: set
SIGNUPS_ALLOWED=falsein/opt/stacks/vaultwarden/.env→docker compose up -d - Admin token needs Argon2 upgrade:
docker exec -it vaultwarden /vaultwarden hash --preset owasp
Domain Knowledge
- Networking: VLANs, inter-VLAN routing, firewall rules, NAT, split DNS, DHCP — comfortable at OPNsense config level, not just GUI clicks
- DNS: Unbound overrides, split-horizon, DNS-over-TLS, troubleshooting resolution order (Tailscale/Netbird conflict patterns)
- VPN: WireGuard fundamentals, Headscale (self-hosted Tailscale), Netbird (self-hosted, embedded Dex IdP), relay vs P2P, ACL/policy models
- Docker: Compose stacks, container networking, volume mounts, healthchecks, reverse proxy patterns — not Kubernetes
- Linux: Ubuntu Server admin, systemd, cron, file permissions, basic shell scripting — not a kernel developer
- Reverse proxy: NPM (OpenResty), Caddy — knows the difference between HTTP/HTTPS termination and gRPC proxying
- Self-hosted services: IPAM (NetBox), monitoring (Zabbix, Uptime Kuma, VictoriaMetrics, Grafana), dashboards (Homarr, Dashy), secrets (Vaultwarden)
- Not expert in: Kubernetes, cloud platforms (AWS/Azure/GCP), advanced Python (learning), application development
Agent Guidelines
- Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset)
- Never store credentials in output, logs, or generated files — reference credential location instead
- Always return structured output (JSON or markdown table) unless plain text is explicitly requested
- When suggesting shell commands, use PowerShell syntax for Windows-side tasks, bash for Linux server tasks
- SSH to Linux server always uses the Posh-SSH pattern defined in the Shell & Tools section
- Docker commands always use
docker compose(notdocker-compose) with-qon pulls - If a task touches Headscale, Netbird, or NPM config — flag the relevant Known Issues entry before proceeding
- Prefer idempotent operations — scripts should be safe to run more than once
- When uncertain about current server state, ask rather than assume
Credentials Location
- All service passwords: Jaco's OneNote + Nexum Password Spreadsheet
- No passwords stored in code or config files (except SSH for automation)
Documentation (Gitea)
- Obsidian vaults migrated to Gitea on 2026-04-30
- nxm-infrastructure repo:
https://git.nxm.co.za/admin/nxm-infrastructure- Local path:
/home/nxm/Documents/NxM Linux Server/ - Remote:
gitea-local:admin/nxm-infrastructure.git(SSH)
- Local path:
- nexum-projects repo:
https://git.nxm.co.za/admin/nexum-projects- Local path:
/home/nxm/Documents/Nexum Projects/ - Remote:
gitea-local:admin/nexum-projects.git(SSH)
- Local path:
- agent-os repo:
https://git.nxm.co.za/admin/agent-os- Local path:
/home/nxm/Documents/agent-os/ - Server runtime:
/opt/agent-os/on 172.27.40.3 - Remote:
gitea-local:admin/agent-os.git(SSH)
- Local path:
- At end of any session that changes infrastructure: update relevant markdown files, commit and push
- Key files to update after infrastructure changes:
Home.md— master indexQuick Reference/IP & Port Map.md— any new serviceQuick Reference/Docker Stacks.md— any new stack- Relevant
Services/<name>.md— service-specific docs Troubleshooting/— any new known issues
Known Issues & Gotchas
- Docker pull logs are very verbose — always use
-qflag - Headscale YAML: never add duplicate
policy:block (causes crash) - Homarr board names must be slugs (no spaces)
- Vaultwarden requires HTTPS — LAN IP (port 8222) will show crypto error, use vault.nxm.co.za
- Tailscale client on Windows overrides DNS — disconnect when testing split DNS changes
- Chrome profile-specific links always open in last active profile (Windows limitation)
- NPM forward scheme is HTTP even for HTTPS external — NPM handles SSL termination
- Netbird STUN is on 3479/udp (not 3478 — Headscale owns that)
- Netbird client management URL is port 8443 (Caddy sidecar) — NOT 443
- NPM (OpenResty) has no gRPC module — Caddy sidecar is the workaround until Traefik migration
- Netbird config.yaml contains authSecret + encryptionKey — back this file up, losing it breaks all peers
- Servers running Tailscale must run
sudo tailscale set --accept-dns=falsebefore joining Netbird (Tailscale DNS overrides Unbound and resolves via public IP, breaking gRPC hairpin) - Open WebUI → Citadel MCP: auth_type must be
none— empty bearer key generates an illegal header and the connection silently fails - Open WebUI connects via Streamable HTTP POST at
http://citadel-mcp:8300/mcp— do NOT use /sse (Open WebUI 0.9+ only supports POST-based transport)