docs: rewrite CLAUDE.md as standalone project brief
Symlink to ~/.claude/CLAUDE.md removed — global config is now independent. This file is now the agent-os project context only: current phase, Phase 3 infra-monitor spec, directory structure, and pending Gitea SSH key action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,173 +1,67 @@
|
|||||||
# NxM Infrastructure Project
|
# Agent OS — Project CLAUDE.md
|
||||||
|
|
||||||
## Primary Linux Server
|
## What This Project Is
|
||||||
- **IP:** 172.27.40.3
|
Personal Agentic Operating System. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation.
|
||||||
- **User:** nxm
|
- Runtime: `/opt/agent-os/` on 172.27.40.3
|
||||||
- **Password:** 6589
|
- Gitea: `git.nxm.co.za/admin/agent-os`
|
||||||
- **OS:** Ubuntu Server (LTS)
|
- Edit clone (server): `/home/nxm/Documents/agent-os/` (clone pending)
|
||||||
- **Docker stack root:** `/opt/stacks/`
|
|
||||||
|
|
||||||
## Network
|
## Current Phase
|
||||||
| VLAN | Name | Subnet | Gateway |
|
| Phase | Status |
|
||||||
|---|---|---|---|
|
|
||||||
| 40 | Servers40 | 172.27.40.0/24 | 172.27.6.1 |
|
|
||||||
| 20 | Workshop20 | 172.27.20.0/24 | 172.27.6.1 |
|
|
||||||
| 10 | IoT10 | 172.27.10.0/24 | 172.27.6.1 |
|
|
||||||
|
|
||||||
## Key Devices
|
|
||||||
| Device | IP | Role |
|
|
||||||
|---|---|---|
|
|
||||||
| OPNsense Firewall | 172.27.6.1 | Firewall, router, DHCP |
|
|
||||||
| Ubuntu Server | 172.27.40.3 | Docker host, Headscale |
|
|
||||||
| TrueNAS | 172.27.40.5 | NAS storage |
|
|
||||||
| Home Assistant | 172.27.10.6 | Home automation (IoT10) |
|
|
||||||
| Kubuntu (NxM-AI) | 172.27.40.20 | Ollama inference host |
|
|
||||||
|
|
||||||
## Docker Stacks & Ports
|
|
||||||
| Stack | Path | Port |
|
|
||||||
|---|---|---|
|
|
||||||
| Portainer | `/opt/stacks/portainer/` | 9443 (HTTPS) |
|
|
||||||
| Nginx Proxy Manager | `/opt/stacks/nginx-proxy-manager/` | 80, 81, 443 |
|
|
||||||
| Uptime Kuma | `/opt/stacks/uptime-kuma/` | 3002 |
|
|
||||||
| RustDesk | `/opt/stacks/rustdesk/` | 21115-21116 TCP, 21116 UDP, 21117-21119 TCP — self-hosted remote desktop relay |
|
|
||||||
| Headscale | `/opt/stacks/headscale/` | 8080 (internal) |
|
|
||||||
| Headplane | `/opt/stacks/headplane/` | 3001 |
|
|
||||||
| Headscale UI | `/opt/stacks/headscale-ui/` | 3005 |
|
|
||||||
| Homarr | `/opt/stacks/homarr/` | 7575 |
|
|
||||||
| Vaultwarden | `/opt/stacks/vaultwarden/` | 8222 |
|
|
||||||
| Netbird | `/opt/stacks/netbird/` | 3479/udp STUN |
|
|
||||||
| Caddy (Netbird sidecar) | `/opt/stacks/caddy-netbird/` | 8443/tcp — gRPC proxy for Netbird clients |
|
|
||||||
| Plane | `/opt/stacks/plane/` | 8095 (HTTP, via NPM) |
|
|
||||||
| Gitea | `/opt/stacks/gitea/` | 3000 (web), 2222 (SSH git) — self-hosted git, infrastructure docs |
|
|
||||||
| Open WebUI | `/opt/stacks/open-webui/` | 3010 — Chat UI for Ollama + MCP (replaced Flowise 2026-05-01) |
|
|
||||||
| agent-sites | `/opt/stacks/sites/` | internal only (proxy network) — nginx:alpine serving /opt/sites/ at agents.nxm.co.za |
|
|
||||||
| hodor-gateway | `/opt/stacks/hodor-gateway/` | 8200 — FastAPI agent gateway, POST /ask → Ollama |
|
|
||||||
| bran-changelog | `/opt/stacks/bran-changelog/` | one-shot container, run.sh + cron 06:00 daily |
|
|
||||||
| citadel-mcp | `/opt/stacks/citadel-mcp/` | 8300 — MCP SSE+HTTP server, 13 tools incl. list_agents/web_search/qyburn_task/docker_rebuild |
|
|
||||||
| varys-monitor | `/opt/stacks/varys-monitor/` | one-shot container, run.sh + cron every 15 min |
|
|
||||||
| raven-notify | `/opt/stacks/raven-notify/` | 8400 — Notification agent, POST /notify → Discord webhook + Gmail SMTP; POST /email/toggle |
|
|
||||||
| sam-research | `/opt/stacks/sam-research/` | 8500 — Research agent, POST /research → SearXNG + Ollama |
|
|
||||||
| searxng | `/opt/stacks/searxng/` | 8600 — Self-hosted search backend (internal only, used by sam + citadel) |
|
|
||||||
| monitoring | `/opt/stacks/monitoring/` | 8086 (InfluxDB), 3020 (Grafana) — metrics from Telegraf/OPNsense, alerts → Raven |
|
|
||||||
| qyburn-coder | `/opt/stacks/qyburn-coder/` | 8700 — LLM coding agent, POST /task → qwen2.5-coder:14b, approve/reject via dashboard |
|
|
||||||
| netbox | `/opt/stacks/netbox/` | 8100 — IPAM, network documentation, client site discovery |
|
|
||||||
| bni-scheduler | `/opt/stacks/bni-scheduler/` | no host port (proxy only, internal port 3000) — Node.js/Express + SQLite at bni.nxm.co.za, BNI Ignite speaker rotation |
|
|
||||||
| nocodb | `/opt/stacks/nocodb/` | 8150 — No-code DB, rvd.nxm.co.za, birthday/client database |
|
|
||||||
|
|
||||||
## Public Subdomains (via NPM + Let's Encrypt)
|
|
||||||
| Subdomain | Internal Target |
|
|
||||||
|---|---|
|
|---|---|
|
||||||
| headscale.nxm.co.za | 172.27.40.3:8080 |
|
| 1 — NFS export + Kubuntu mount | ✓ DONE 2026-05-01 (NFS no longer needed — consolidated to server) |
|
||||||
| vault.nxm.co.za | 172.27.40.3:8222 |
|
| 2 — Identity interview → identity.md populated | ✓ DONE 2026-05-01 |
|
||||||
| kuma.nxm.co.za | 172.27.40.3:3002 |
|
| **3 — infra-monitor skill** | **NEXT** |
|
||||||
| zabbix.nxm.co.za | 172.27.40.3:8091 |
|
| 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 |
|
||||||
| netbird.nxm.co.za | netbird-dashboard:80 via NPM (dashboard UI only) |
|
| 5 — Future skills (backup monitor, peer health, log digest) | Future |
|
||||||
| netbird.nxm.co.za:8443 | Caddy sidecar → netbird-server:80 (client management, gRPC) |
|
|
||||||
| plane.nxm.co.za | 172.27.40.3:8095 |
|
|
||||||
| git.nxm.co.za | 172.27.40.3:3000 |
|
|
||||||
| grafana.nxm.co.za | 172.27.40.3:3020 |
|
|
||||||
| netbox.nxm.co.za | 172.27.40.3:8100 |
|
|
||||||
| agents.nxm.co.za | agent-sites:80 via NPM — static files from /opt/sites/ |
|
|
||||||
| bni.nxm.co.za | bni-scheduler:3000 via NPM |
|
|
||||||
| rvd.nxm.co.za | 172.27.40.3:8150 |
|
|
||||||
| rmm.nxm.co.za | 172.27.40.4:443 |
|
|
||||||
| api.nxm.co.za | 172.27.40.4:443 |
|
|
||||||
| mesh.nxm.co.za | 172.27.40.4:4430 |
|
|
||||||
|
|
||||||
## OPNsense Split DNS (Unbound Host Overrides)
|
## Phase 3 — infra-monitor (NEXT)
|
||||||
All subdomains resolve to 172.27.40.3 internally via Unbound overrides.
|
Skill scaffold at `skills/infra-monitor/skill.md`. Ready to implement after spec update.
|
||||||
If a subdomain isn't resolving internally, check:
|
|
||||||
1. OPNsense → Services → Unbound DNS → Overrides
|
|
||||||
2. Tailscale client on Windows — disconnect it, as it overrides local DNS
|
|
||||||
|
|
||||||
## Headscale (VPN)
|
**Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate.
|
||||||
- Version: v0.28
|
|
||||||
- Public URL: https://headscale.nxm.co.za
|
|
||||||
- Policy mode: database (live apply via API)
|
|
||||||
- **v0.28 breaking change:** All write operations require numeric user ID, not username
|
|
||||||
- Get IDs: `headscale users list`
|
|
||||||
- Example: `headscale preauthkeys create --user 13`
|
|
||||||
- Config: `/opt/stacks/headscale/config/config.yaml`
|
|
||||||
- After restart, always check: `tailscale status` — if server node offline: `sudo systemctl restart tailscaled`
|
|
||||||
|
|
||||||
## Vaultwarden
|
**Before building:**
|
||||||
- Admin panel: https://vault.nxm.co.za/admin (token in OneNote)
|
- Update `skills/infra-monitor/skill.md` — container list is stale (has Flowise, missing Open WebUI + all new agents)
|
||||||
- Signups currently OPEN (disable after all 4 users register)
|
- Correct Ollama URL: now `http://172.27.40.20:11434` (migrated from 172.27.6.139)
|
||||||
- To disable: set `SIGNUPS_ALLOWED=false` in `/opt/stacks/vaultwarden/.env` → `docker compose up -d`
|
- Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script?
|
||||||
- Admin token needs Argon2 upgrade: `docker exec -it vaultwarden /vaultwarden hash --preset owasp`
|
|
||||||
|
|
||||||
## Domain Knowledge
|
**Output targets:**
|
||||||
- **Networking:** VLANs, inter-VLAN routing, firewall rules, NAT, split DNS, DHCP — comfortable at OPNsense config level, not just GUI clicks
|
- `/opt/sites/infra-monitor/index.html` — web dashboard at agents.nxm.co.za/infra-monitor/
|
||||||
- **DNS:** Unbound overrides, split-horizon, DNS-over-TLS, troubleshooting resolution order (Tailscale/Netbird conflict patterns)
|
- `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog
|
||||||
- **VPN:** WireGuard fundamentals, Headscale (self-hosted Tailscale), Netbird (self-hosted, embedded Dex IdP), relay vs P2P, ACL/policy models
|
- Raven alert on critical: `http://raven-notify:8400`
|
||||||
- **Docker:** Compose stacks, container networking, volume mounts, healthchecks, reverse proxy patterns — not Kubernetes
|
|
||||||
- **Linux:** Ubuntu Server admin, systemd, cron, file permissions, basic shell scripting — not a kernel developer
|
|
||||||
- **Reverse proxy:** NPM (OpenResty), Caddy — knows the difference between HTTP/HTTPS termination and gRPC proxying
|
|
||||||
- **Self-hosted services:** IPAM (NetBox), monitoring (Zabbix, Uptime Kuma, VictoriaMetrics, Grafana), dashboards (Homarr, Dashy), secrets (Vaultwarden)
|
|
||||||
- **Not expert in:** Kubernetes, cloud platforms (AWS/Azure/GCP), advanced Python (learning), application development
|
|
||||||
|
|
||||||
## Agent Guidelines
|
**Schedule:** hourly heartbeat (Docker + Ollama only) + daily 07:00 full digest
|
||||||
- Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset)
|
|
||||||
- Never store credentials in output, logs, or generated files — reference credential location instead
|
|
||||||
- Always return structured output (JSON or markdown table) unless plain text is explicitly requested
|
|
||||||
- When suggesting shell commands, use PowerShell syntax for Windows-side tasks, bash for Linux server tasks
|
|
||||||
- SSH to Linux server always uses the Posh-SSH pattern defined in the Shell & Tools section
|
|
||||||
- Docker commands always use `docker compose` (not `docker-compose`) with `-q` on pulls
|
|
||||||
- If a task touches Headscale, Netbird, or NPM config — flag the relevant Known Issues entry before proceeding
|
|
||||||
- Prefer idempotent operations — scripts should be safe to run more than once
|
|
||||||
- When uncertain about current server state, ask rather than assume
|
|
||||||
|
|
||||||
## Credentials Location
|
## Directory Structure
|
||||||
- All service passwords: Jaco's OneNote + Nexum Password Spreadsheet
|
```
|
||||||
- No passwords stored in code or config files (except SSH for automation)
|
/opt/agent-os/
|
||||||
|
├── CLAUDE.md ← this file (project brief, tracked in Gitea)
|
||||||
|
├── identity.md ← populated Phase 2
|
||||||
|
├── brain.md
|
||||||
|
├── memory/
|
||||||
|
│ ├── active-projects.md ← update at end of each session
|
||||||
|
│ ├── persistent.md
|
||||||
|
│ ├── recent-decisions.md
|
||||||
|
│ ├── constraints.md
|
||||||
|
│ └── notes-from-last-run.md
|
||||||
|
├── context/
|
||||||
|
├── skills/
|
||||||
|
│ └── infra-monitor/ ← Phase 3 target
|
||||||
|
│ ├── skill.md ← spec (stale container list — update before building)
|
||||||
|
│ ├── learnings.md
|
||||||
|
│ ├── eval.json
|
||||||
|
│ ├── last-output.md
|
||||||
|
│ └── context/handoff.md
|
||||||
|
└── logs/
|
||||||
|
```
|
||||||
|
|
||||||
## Documentation (Gitea)
|
## Architecture
|
||||||
- Obsidian vaults migrated to Gitea on 2026-04-30
|
- LLM inference: Kubuntu Ollama at `http://172.27.40.20:11434`
|
||||||
- **nxm-infrastructure** repo: `https://git.nxm.co.za/admin/nxm-infrastructure`
|
- All agent output: `/opt/sites/<name>/` served at agents.nxm.co.za
|
||||||
- Local path: `/home/nxm/Documents/NxM Linux Server/`
|
- Log standard: `/opt/agent-os/logs/<skill>/last-run.json`
|
||||||
- Remote: `gitea-local:admin/nxm-infrastructure.git` (SSH)
|
- Notifications: Raven at `http://raven-notify:8400`
|
||||||
- **nexum-projects** repo: `https://git.nxm.co.za/admin/nexum-projects`
|
|
||||||
- Local path: `/home/nxm/Documents/Nexum Projects/`
|
|
||||||
- Remote: `gitea-local:admin/nexum-projects.git` (SSH)
|
|
||||||
- **agent-os** repo: `https://git.nxm.co.za/admin/agent-os`
|
|
||||||
- Local path: `/home/nxm/Documents/agent-os/`
|
|
||||||
- Server runtime: `/opt/agent-os/` on 172.27.40.3
|
|
||||||
- Remote: `gitea-local:admin/agent-os.git` (SSH)
|
|
||||||
- At end of any session that changes infrastructure: update relevant markdown files, commit and push
|
|
||||||
- Key files to update after infrastructure changes:
|
|
||||||
- `Home.md` — master index
|
|
||||||
- `Quick Reference/IP & Port Map.md` — any new service
|
|
||||||
- `Quick Reference/Docker Stacks.md` — any new stack
|
|
||||||
- Relevant `Services/<name>.md` — service-specific docs
|
|
||||||
- `Troubleshooting/` — any new known issues
|
|
||||||
|
|
||||||
## Known Issues & Gotchas
|
## Pending — Gitea SSH Key (security debt)
|
||||||
- Docker pull logs are very verbose — always use `-q` flag
|
Server remote uses HTTP with embedded token. Before next token rotation:
|
||||||
- Headscale YAML: never add duplicate `policy:` block (causes crash)
|
1. Add SSH key for `nxm@172.27.40.3` to Gitea (Admin → Settings → SSH Keys)
|
||||||
- Homarr board names must be slugs (no spaces)
|
2. `cd /opt/agent-os && git remote set-url origin gitea-local:admin/agent-os.git`
|
||||||
- Vaultwarden requires HTTPS — LAN IP (port 8222) will show crypto error, use vault.nxm.co.za
|
|
||||||
- Tailscale client on Windows overrides DNS — disconnect when testing split DNS changes
|
|
||||||
- Chrome profile-specific links always open in last active profile (Windows limitation)
|
|
||||||
- NPM forward scheme is HTTP even for HTTPS external — NPM handles SSL termination
|
|
||||||
- Netbird STUN is on 3479/udp (not 3478 — Headscale owns that)
|
|
||||||
- Netbird client management URL is port 8443 (Caddy sidecar) — NOT 443
|
|
||||||
- NPM (OpenResty) has no gRPC module — Caddy sidecar is the workaround until Traefik migration
|
|
||||||
- Netbird config.yaml contains authSecret + encryptionKey — back this file up, losing it breaks all peers
|
|
||||||
- Servers running Tailscale must run `sudo tailscale set --accept-dns=false` before joining Netbird (Tailscale DNS overrides Unbound and resolves via public IP, breaking gRPC hairpin)
|
|
||||||
- Open WebUI → Citadel MCP: auth_type must be `none` — empty bearer key generates an illegal header and the connection silently fails
|
|
||||||
- Open WebUI connects via Streamable HTTP POST at `http://citadel-mcp:8300/mcp` — do NOT use /sse (Open WebUI 0.9+ only supports POST-based transport)
|
|
||||||
|
|
||||||
## Project Registry
|
|
||||||
Say "let's work on [project name]" to load context. I will read the project CLAUDE.md from the path below.
|
|
||||||
|
|
||||||
| Project | Path | Status | Next |
|
|
||||||
|---|---|---|---|
|
|
||||||
| **agent-os** | `/opt/agent-os/memory/active-projects.md` + `/opt/agent-os/skills/infra-monitor/` | Phases 1-2 done | Phase 3: infra-monitor skill |
|
|
||||||
| **infra-monitor** | `/opt/agent-os/skills/infra-monitor/skill.md` | Not built | Update spec, then implement |
|
|
||||||
| **nxm-infrastructure** | `/home/nxm/Documents/NxM Linux Server/CLAUDE.md` | Active maintenance | Grafana alert rules, maester docs |
|
|
||||||
| **monitoring** | `/opt/stacks/monitoring/` | Alert rules partial | CPU/mem/WAN/ping rules pending |
|
|
||||||
| **maester-reports** | not yet created | Planned (port 8800) | NIST CSF agent, primary business goal |
|
|
||||||
| **nexum-portal** | not yet created | Planned (port 8900) | Phase 1: Authelia stack |
|
|
||||||
| **nexum-csf** | not yet created | Planned (Gitea repo) | Import NIST CSF 2.0 framework docs |
|
|
||||||
| **bni-scheduler** | `/opt/stacks/bni-scheduler/` | Live | Minor updates only |
|
|
||||||
| **nexum-projects** | Kubuntu: `/home/nxm/Documents/Nexum Projects/` | Active | Client project tracking |
|
|
||||||
|
|||||||
Reference in New Issue
Block a user