docs: comprehensive update — bring all Agent OS docs current for LLM onboarding

All files were 5-7 weeks stale. Updated brain.md (complete service/agent/VPN/cron
inventory), identity.md (current expertise + infra context), CLAUDE.md (full agent
ecosystem table, Citadel tool registry, gotchas), README.md (LLM quick-start guide),
all memory files (current projects, decisions, constraints, persistent facts), and
infra-monitor skill.md (current container list with criticality tiers).

Also fixed: git remote switched from HTTP+embedded-token to SSH, removed references
to decommissioned services (Netbird, WireGuard, Flowise, Zabbix), corrected Ollama
IP (172.27.40.20), TrueNAS IP (172.27.40.220), and added 20+ services/agents that
were built since the last commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Claude Code
2026-06-19 17:15:11 +00:00
parent 638b2edd56
commit 6cebab9a4a
9 changed files with 427 additions and 128 deletions
+110 -33
View File
@@ -1,28 +1,84 @@
# Agent OS — Project CLAUDE.md # Agent OS — Project CLAUDE.md
## What This Project Is ## What This Project Is
Personal Agentic Operating System. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation. Personal Agentic Operating System for NxM / Nexum SA infrastructure. Tool-agnostic AI foundation for scheduled skills, monitoring, and automation. Plain markdown files — no databases, no vendor lock-in.
- Runtime: `/opt/agent-os/` on 172.27.40.3
- Gitea: `git.nxm.co.za/admin/agent-os` - **Runtime:** `/opt/agent-os/` on 172.27.40.3
- Edit clone (server): `/home/nxm/Documents/agent-os/` (clone pending) - **Gitea:** `git.nxm.co.za/admin/agent-os` (SSH: `gitea-local:admin/agent-os.git`)
- **Owner:** Jaco Bezuidenhout, Nexum SA (PTY) Ltd
## Current Phase ## Current Phase
| Phase | Status | | Phase | Status |
|---|---| |---|---|
| 1 — NFS export + Kubuntu mount | DONE 2026-05-01 (NFS no longer needed — consolidated to server) | | 1 — NFS export + mount | DONE 2026-05-01 (NFS no longer needed — consolidated to server) |
| 2 — Identity interview → identity.md populated | ✓ DONE 2026-05-01 | | 2 — Identity interview → identity.md | DONE 2026-05-01 |
| **3 — infra-monitor skill** | **NEXT** | | 3 — infra-monitor skill | NEXT (spec at `skills/infra-monitor/skill.md`, needs update) |
| 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 | | 4 — Cron scheduling (hourly heartbeat + daily digest) | Pending Phase 3 |
| 5 — Future skills (backup monitor, peer health, log digest) | Future | | 5 — Future skills (backup monitor, log digest) | Future |
## Live Agent Ecosystem (as of 2026-06-19)
All agents run as Docker containers on 172.27.40.3 unless noted. Every agent writes to `/opt/agent-os/logs/<agent>/last-run.json`.
### Always-on agents
| Agent | Port | Stack Path | Role |
|---|---|---|---|
| citadel-mcp | 8300 | `/opt/stacks/citadel-mcp/` | MCP tool server (37 tools: Docker, Plane, TRMM, Directus, files, web search) |
| raven-notify | 8400 | `/opt/stacks/raven-notify/` | Notification hub — Discord webhook + Gmail SMTP |
| sam-research | 8500 | `/opt/stacks/sam-research/` | SearXNG + Ollama research agent |
| qyburn-coder | 8700 | `/opt/stacks/qyburn-coder/` | LLM coding agent with approve/reject workflow |
| maester-reports | 8800 | `/opt/stacks/maester-reports/` | NIST CSF compliance reports (⚠ restart = Anthropic API cost) |
| jon-snow | 8900 | `/opt/stacks/jon-snow/` | Chief of staff orchestrator, HMAC approval gate |
| hodor-gateway | 8200 | `/opt/stacks/hodor-gateway/` | Simple Ollama gateway (POST /ask) |
| tarly-backup | 8750 | `/opt/stacks/tarly-backup/` | Backup monitoring — OPNsense configs + Proxmox |
| hermes-cloud | 8643 | `/opt/stacks/hermes-cloud/` | Claude-sonnet brain, Citadel MCP wired |
| hermes-native | VM 108 | native on 172.27.40.30 | Primary conversational agent — OpenRouter, Honcho memory, WhatsApp, dashboard at hermes.nxm.co.za:9119 |
### Scheduled/one-shot agents
| Agent | Schedule | Stack Path | Role |
|---|---|---|---|
| bran-changelog | Daily 06:00 | `/opt/stacks/bran-changelog/` | Git changelog generator |
| varys-monitor | Every 15 min | `/opt/stacks/varys-monitor/` | HTTP reachability checks for all services |
### Support agents (via Hermes Native, VM 108)
| Agent | Role |
|---|---|
| vexis (workshop profile) | Nexum workshop agent — TRMM script execution on client devices |
### Integrations running as cron (not standalone agents)
| Job | Schedule | Script |
|---|---|---|
| ovpn-status.py | Every 1 min | `/opt/stacks/monitoring/ovpn-status.py` |
| trmm-frappe-sync.py | Every 30 min | `/opt/stacks/monitoring/trmm-frappe-sync.py` |
| zenarmor-pull.py | Daily 06:00 | `/opt/stacks/monitoring/zenarmor-pull.py` |
| hub-backup.sh | Daily 02:05 | `/opt/stacks/tarly-backup/hub-backup.sh` |
## Citadel MCP Tool Registry (37 tools)
The central tool server that other agents call via MCP protocol:
**File operations:** read_file, write_file, list_files, delete_file, propose_file_change
**Docker:** docker_list_containers, docker_container_stats, docker_stack_list, docker_rebuild
**Plane (project management):** plane_add_issue, plane_get_issues, plane_list_projects, plane_create_project, plane_create_page, plane_update_issue
**TRMM (remote management):** trmm_list_agents, trmm_get_agent, trmm_list_scripts, trmm_add_script, trmm_delete_script, trmm_run_script, trmm_confirm_with_user, trmm_sync_now
**Directus (CRM):** directus_list_clients, directus_get_client, directus_get_client_services, directus_get_renewals, directus_upcoming_renewals
**Other:** get_agent_status, get_agent_output, list_agents, qyburn_task, qyburn_status, qyburn_approve, sam_research, web_search, proxmox_backup_status
## Agent Web Pages
Static HTML dashboards served at `agents.nxm.co.za/<name>/` from `/opt/sites/`:
- agents-dashboard, bran, changelog, citadel, hermes, hermes-native, hodor, jon-snow, qyburn, raven, sam, security-review, setup, stock, swarm, tarly, varys, workflow-test
## Phase 3 — infra-monitor (NEXT) ## Phase 3 — infra-monitor (NEXT)
Skill scaffold at `skills/infra-monitor/skill.md`. Ready to implement after spec update.
Skill scaffold at `skills/infra-monitor/skill.md`. **Spec is stale — needs update before building.**
**Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate. **Goal:** Docker container state + system resource checks. Complements Varys (HTTP reachability) — do not duplicate.
**Before building:** **Before building:**
- Update `skills/infra-monitor/skill.md` — container list is stale (has Flowise, missing Open WebUI + all new agents) - Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ services)
- Correct Ollama URL: now `http://172.27.40.20:11434` (migrated from 172.27.6.139) - Ollama URL is now `http://172.27.40.20:11434`
- Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script? - Decide: Docker one-shot container (consistent with bran/varys) or host cron + shell script?
**Output targets:** **Output targets:**
@@ -30,38 +86,59 @@ Skill scaffold at `skills/infra-monitor/skill.md`. Ready to implement after spec
- `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog - `/opt/agent-os/logs/infra-monitor/last-run.json` — machine-readable, read by Varys watchdog
- Raven alert on critical: `http://raven-notify:8400` - Raven alert on critical: `http://raven-notify:8400`
**Schedule:** hourly heartbeat (Docker + Ollama only) + daily 07:00 full digest
## Directory Structure ## Directory Structure
``` ```
/opt/agent-os/ /opt/agent-os/
├── CLAUDE.md ← this file (project brief, tracked in Gitea) ├── CLAUDE.md ← this file (project brief)
├── identity.md ← populated Phase 2 ├── README.md ← onboarding for new LLMs
├── brain.md ├── identity.md ← who the user is, hard limits
├── brain.md ← all infra facts, IPs, services, decisions
├── memory/ ├── memory/
│ ├── active-projects.md ← update at end of each session │ ├── active-projects.md ← what's in flight right now
│ ├── persistent.md │ ├── persistent.md ← facts that never expire
│ ├── recent-decisions.md │ ├── recent-decisions.md ← decisions from last 30 days
│ ├── constraints.md │ ├── constraints.md ← hard limits agents must respect
│ └── notes-from-last-run.md │ └── notes-from-last-run.md ← cleared each session
├── context/ ├── claude-code/
│ └── memory/ ← Claude Code's persistent memory files (symlinked)
├── skills/ ├── skills/
│ └── infra-monitor/ ← Phase 3 target │ └── infra-monitor/ ← Phase 3 target (not yet built)
│ ├── skill.md ← spec (stale container list — update before building) │ ├── skill.md
│ ├── learnings.md │ ├── learnings.md
│ ├── eval.json │ ├── eval.json
│ ├── last-output.md │ ├── last-output.md
│ └── context/handoff.md │ └── context/handoff.md
└── logs/ └── logs/ ← all agent log outputs
├── bran-changelog/
├── citadel-mcp/
├── jon-snow/
├── qyburn-coder/
├── raven-notify/
├── sam-research/
├── tarly-backup/
├── trmm-frappe-sync/
└── varys-monitor/
``` ```
## Architecture ## Architecture
- LLM inference: Kubuntu Ollama at `http://172.27.40.20:11434`
- All agent output: `/opt/sites/<name>/` served at agents.nxm.co.za
- Log standard: `/opt/agent-os/logs/<skill>/last-run.json`
- Notifications: Raven at `http://raven-notify:8400`
## Pending — Gitea SSH Key (security debt) - **LLM inference:** Ollama at `http://172.27.40.20:11434` (gemma4, llama3.1:8b, phi4) + Anthropic API (Claude Code, Hermes)
Server remote uses HTTP with embedded token. Before next token rotation: - **Agent output pages:** `/opt/sites/<name>/` served at agents.nxm.co.za
1. Add SSH key for `nxm@172.27.40.3` to Gitea (Admin → Settings → SSH Keys) - **Log standard:** `/opt/agent-os/logs/<agent>/last-run.json`
2. `cd /opt/agent-os && git remote set-url origin gitea-local:admin/agent-os.git` - **Notifications:** Raven at `http://raven-notify:8400` (Discord + Gmail)
- **Task tracking:** Plane at plane.nxm.co.za
- **Client CRM:** Directus at directus.nxm.co.za
- **Client devices:** Tactical RMM at 172.27.40.4 (45 agents, 13 clients)
- **Helpdesk:** Frappe Helpdesk at helpdesk.nxm.co.za (VM 109)
- **Credentials:** `~/.nxm-keys` (chmod 600) — ONLY place credential values live
## Key Gotchas
- **maester-reports restart = Anthropic API cost** — cache is in-memory only
- **Open WebUI → Citadel MCP:** auth_type must be `none` (empty bearer key = silent failure)
- **Docker → OPNsense API:** Docker proxy network can't reach 172.27.6.1 (HTTP 400) — run from host
- **Headscale v0.28:** all write operations require numeric user ID, not username
- **Vaultwarden:** requires HTTPS — use vault.nxm.co.za, not LAN IP
- **Tailscale on Windows:** overrides DNS — disconnect when testing split DNS
- **NPM forward scheme:** HTTP even for HTTPS external — NPM handles SSL termination
- **NocoDB:** RvDM personal birthday DB only — never use for Nexum projects
+38 -6
View File
@@ -10,14 +10,41 @@ Every agent interaction reads from and writes back to files in this repo. No dat
| Layer | File(s) | Purpose | | Layer | File(s) | Purpose |
|---|---|---| |---|---|---|
| Identity | `identity.md` | Who you are, communication style, values | | Identity | `identity.md` | Who the user is, communication style, values, hard limits |
| Context | `context/` | Dated, task-specific working files | | Context | `context/` | Dated, task-specific working files |
| Brain | `brain.md` | Persistent facts — infra, people, decisions | | Brain | `brain.md` | Persistent facts — infra, services, IPs, standing decisions |
| Memory | `memory/` | Short and long-term session notes | | Memory | `memory/` | Short and long-term session notes |
| Skills | `skills/` | Repeatable workflows, each self-improving | | Skills | `skills/` | Repeatable workflows, each self-improving |
| Processes | `skills/*/context/handoff.md` | Output passed between chained skills | | Processes | `skills/*/context/handoff.md` | Output passed between chained skills |
| Automation | cron on 172.27.40.3 | Scheduled skill execution | | Automation | cron on 172.27.40.3 | Scheduled skill execution |
## Quick start for a new LLM
If you are an LLM reading this repo for the first time:
1. **Read `identity.md`** — who you're working for, hard limits, communication style
2. **Read `brain.md`** — all infrastructure facts: IPs, services, ports, agents, standing decisions
3. **Read `memory/active-projects.md`** — what's currently in flight
4. **Read `memory/constraints.md`** — things you must never do
5. **Read `CLAUDE.md`** — project status and architecture details
Do NOT take any action without reading `identity.md` first. The hard limits there are non-negotiable.
## Live agent ecosystem
The NxM infrastructure runs 12+ named agents across Docker containers and VMs. Every agent writes logs to `/opt/agent-os/logs/<agent>/last-run.json` and most publish web dashboards to `agents.nxm.co.za/<agent>/`.
Key agents:
- **Citadel MCP** (port 8300) — central tool server, 37 tools covering Docker, Plane, TRMM, Directus, file ops, web search
- **Raven** (port 8400) — notification hub (Discord + Gmail), all alerts route through here
- **Jon Snow** (port 8900) — chief of staff orchestrator with approval gates
- **Maester** (port 8800) — NIST CSF compliance reporting
- **Hermes Native** (VM 108) — primary conversational agent with WhatsApp + Honcho memory
- **Tarly** (port 8750) — backup monitoring (OPNsense configs + Proxmox)
- **Vexis** (via Hermes, VM 108) — workshop/TRMM scripting agent for client devices
See `brain.md` for the complete agent table with ports and schedules.
## Adding a new skill ## Adding a new skill
1. Create `skills/<skill-name>/skill.md` — what the skill does and how 1. Create `skills/<skill-name>/skill.md` — what the skill does and how
@@ -28,10 +55,11 @@ Every agent interaction reads from and writes back to files in this repo. No dat
## Runtime ## Runtime
- Files live on server: `/opt/agent-os/` (cloned from this repo) - **Server:** `/opt/agent-os/` on 172.27.40.3 (Ubuntu, Docker host)
- LLM inference: Ollama at `http://172.27.6.139:11434` - **Repo:** `git.nxm.co.za/admin/agent-os` (SSH: `gitea-local:admin/agent-os.git`)
- Scheduled jobs: cron on `172.27.40.3` - **LLM inference:** Ollama at `http://172.27.40.20:11434` (local) or Anthropic API (Claude Code/Hermes)
- Local editing: `/home/nxm/Documents/agent-os/` on Kubuntu (this machine) - **Scheduled jobs:** cron on 172.27.40.3
- **Agent web pages:** `/opt/sites/<name>/` → agents.nxm.co.za
## Infra reference ## Infra reference
@@ -39,3 +67,7 @@ Cross-repo links to supporting documentation:
- [IP & Port Map](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/IP%20%26%20Port%20Map.md) - [IP & Port Map](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/IP%20%26%20Port%20Map.md)
- [Docker Stacks](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/Docker%20Stacks.md) - [Docker Stacks](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/Docker%20Stacks.md)
- [Network Overview](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Infrastructure/Network%20Overview.md) - [Network Overview](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Infrastructure/Network%20Overview.md)
## Credential policy
All API keys and passwords live in `~/.nxm-keys` (chmod 600). Never write credential values into code, config files, logs, or documentation. Reference the file location instead.
+99 -36
View File
@@ -1,64 +1,127 @@
# Brain # Brain
Core facts read by all skills. Keep under 1000 words. Update when infrastructure changes. Core facts read by all skills. Keep under 1500 words. Update when infrastructure changes.
Last updated: 2026-04-30 Last updated: 2026-06-19
--- ---
## Infrastructure ## Infrastructure
**Primary server:** 172.27.40.3 — Ubuntu Server LTS, Docker host **Primary server:** 172.27.40.3 — Ubuntu Server LTS, Docker host, all agent runtimes
**Kubuntu desktop:** 172.27.6.139 — NxM-AI, runs Ollama **Ollama inference host:** 172.27.40.20 — Windows 11 Pro (NxM-AI), Vulkan GPU, Scheduled Task auto-start
**TrueNAS NAS:** 172.27.40.220 (Servers40), management: 172.27.6.221 **TrueNAS NAS:** 172.27.40.220 (data) / 172.27.6.221 (mgmt) — 35.6 TB, NFS shares for ISOs + Proxmox backups
**Firewall:** OPNsense at 172.27.6.1 **Firewall:** OPNsense at 172.27.6.1 (mgmt UI, not routed gateway)
**Proxmox VE:** 172.27.40.2 — PVE 9.1.1, 2× Xeon Gold 6138 (80 vCPUs), 252 GB RAM
**Hermes Native VM:** 172.27.40.30 (VM 108) — dedicated agent VM, Honcho memory, WhatsApp connected
**Tactical RMM:** 172.27.40.4 (VM 101) — remote management for all Nexum clients
**Home Assistant:** 172.27.10.6 (VM 100) — IoT automation
**Synology DS423+:** 172.27.40.80 — Coetzee off-site backup NAS, Active Backup via S2S
**VLANs:** **VLANs:**
| VLAN | Name | Subnet | | VLAN | Name | Subnet | Gateway |
|---|---|---| |---|---|---|---|
| 40 | Servers40 | 172.27.40.0/24 | | 40 | Servers40 | 172.27.40.0/24 | 172.27.40.1 |
| 20 | Workshop20 | 172.27.20.0/24 | | 20 | Workshop20 | 172.27.20.0/24 | 172.27.20.1 |
| 10 | IoT10 | 172.27.10.0/24 | | 10 | IoT10 | 172.27.10.0/24 | 172.27.10.1 |
## Key Services (172.27.40.3) ## Key Services (172.27.40.3)
| Service | Port | URL | | Service | Port | URL | Role |
|---|---|---|---|
| Portainer | 9443 | https://172.27.40.3:9443 | Docker management |
| Nginx Proxy Manager | 80/81/443 | http://172.27.40.3:81 | Reverse proxy, SSL termination |
| Uptime Kuma | 3002 | kuma.nxm.co.za | HTTP monitoring |
| Gitea | 3000 | git.nxm.co.za | Self-hosted git, all docs + code |
| Headscale | 8080 | headscale.nxm.co.za | VPN (self-hosted Tailscale) |
| Vaultwarden | 8222 | vault.nxm.co.za | Password manager |
| Open WebUI | 3010 | chat.nxm.co.za | Chat UI for Ollama + MCP |
| Plane | 8095 | plane.nxm.co.za | Project/task tracking |
| Homarr | 7575 | http://172.27.40.3:7575 | Dashboard |
| Grafana | 3020 | grafana.nxm.co.za | Monitoring dashboards |
| InfluxDB | 8086 | internal | Time-series DB for monitoring |
| NetBox | 8100 | netbox.nxm.co.za | IPAM, network documentation |
| NocoDB | 8150 | rvd.nxm.co.za | RvDM birthday DB (personal, NOT Nexum) |
| InvenTree | 8160 | inventree.nxm.co.za | IT stock + BOM tracking (testing) |
| Directus | 8850 | directus.nxm.co.za | Nexum client CRM |
| Nextcloud | — | — | Phone backup |
| Wetty | 8450/8451 | terminal.nxm.co.za / term.nxm.co.za | Web SSH terminal |
| RustDesk | 21115-21119 | internal | Self-hosted remote desktop relay |
| SearXNG | 8600 | internal | Search backend for sam + citadel |
| iVentoy | 26000 | internal | PXE boot server |
## AI / Agent Stack
**LLM inference:**
- **Ollama** on 172.27.40.20:11434 — models: gemma4, llama3.1:8b, phi4
- **Claude Code** on 172.27.40.3 — primary AI assistant (Anthropic API)
- **Hermes Native** on 172.27.40.30 — OpenRouter, Honcho memory, WhatsApp
- **Hermes Cloud** on 172.27.40.3:8643 — claude-sonnet-4-6, Citadel MCP wired
**Named agents (all Docker on 172.27.40.3 unless noted):**
| Agent | Port | Role | Schedule |
|---|---|---|---|
| hodor-gateway | 8200 | Simple Ollama gateway (POST /ask) | On-demand |
| citadel-mcp | 8300 | MCP SSE+HTTP server, 37 tools | Always-on |
| raven-notify | 8400 | Discord + Gmail notifications | Always-on |
| sam-research | 8500 | SearXNG + Ollama research | On-demand |
| qyburn-coder | 8700 | LLM coding agent (approve/reject) | On-demand |
| maester-reports | 8800 | NIST CSF compliance reports | On-demand |
| jon-snow | 8900 | Chief of staff orchestrator | Always-on |
| bran-changelog | — | Git changelog generator | Daily 06:00 |
| varys-monitor | — | Service HTTP reachability checks | Cron every 15 min |
| tarly-backup | 8750 | OPNsense config + Proxmox backup monitor | Daily 04:00 SAST |
| hermes-cloud | 8643 | Claude-powered conversational agent | Always-on |
| hermes-native | VM 108 | Primary Hermes agent (WhatsApp) | Always-on |
| vexis (workshop) | VM 108 | Nexum workshop agent (TRMM scripts) | On-demand via Hermes |
**Citadel MCP tools (37):** file ops, Docker management, Plane issues/projects/pages, TRMM (agents/scripts/confirm), Directus CRM, Proxmox backups, Qyburn task/approve, Sam research, web search, propose_file_change.
## Cron Jobs (172.27.40.3)
| Schedule | Job | Log |
|---|---|---| |---|---|---|
| Portainer | 9443 | https://172.27.40.3:9443 | | Daily 06:00 | bran-changelog/run.sh | logs/bran.log |
| Nginx Proxy Manager | 80/81/443 | http://172.27.40.3:81 | | Daily 06:00 | zenarmor-pull.py | monitoring/logs/zenarmor-pull.log |
| Uptime Kuma | 3002 | http://172.27.40.3:3002 | | Daily 02:05 | tarly hub-backup.sh | logs/tarly-backup/hub-backup.log |
| Gitea | 3000 | https://git.nxm.co.za | | Every 1 min | ovpn-status.py | logs/ovpn-status.log |
| Headscale | 8080 | https://headscale.nxm.co.za | | Every 30 min | trmm-frappe-sync.py | logs/trmm-frappe-sync.log |
| Netbird | 3479/udp | https://netbird.nxm.co.za |
| Vaultwarden | 8222 | https://vault.nxm.co.za |
| Flowise | 3010 | http://172.27.40.3:3010 |
| Plane | 8095 | https://plane.nxm.co.za |
| Zabbix | 8091 | https://zabbix.nxm.co.za |
| Homarr | 7575 | http://172.27.40.3:7575 |
## AI Stack ## OpenVPN S2S Sites
- **Ollama** on 172.27.6.139:11434 (bound to 0.0.0.0) | Site | Tunnel IP | Status | Notes |
- **Models:** gemma4, qwen2.5-coder:7b |---|---|---|---|
- **Flowise** on 172.27.40.3:3010 — visual agent/flow builder | bezhuis | 172.16.17.2 | COMPLETE | NAT + DNS overrides, LAN access live |
- **Claude Code** — primary AI assistant, runs on Kubuntu | mwp | 172.16.17.3 | COMPLETE | Monitoring live |
| coetzee | 172.16.17.4 | COMPLETE | Monitoring-only + Active Backup to Synology |
| fwlaw | — | PENDING | Awaiting migration |
## Agent OS Runtime ## Agent OS Runtime
- Files: `/opt/agent-os/` on 172.27.40.3 - Files: `/opt/agent-os/` on 172.27.40.3
- Local edit path: `/home/nxm/Documents/agent-os/` on 172.27.6.139 - Repo: `git.nxm.co.za/admin/agent-os` (SSH remote: `gitea-local:admin/agent-os.git`)
- Repo: `https://git.nxm.co.za/admin/agent-os`
- Scheduled jobs: cron on 172.27.40.3 - Scheduled jobs: cron on 172.27.40.3
- LLM calls: `http://172.27.6.139:11434` - LLM calls: `http://172.27.40.20:11434` (Ollama) or Anthropic API (Claude Code / Hermes)
- Agent web pages: `/opt/sites/<name>/` served at agents.nxm.co.za
## Key Paths on Server ## Key Paths on Server
- Docker stacks: `/opt/stacks/` - Docker stacks: `/opt/stacks/`
- Agent OS: `/opt/agent-os/` - Agent OS: `/opt/agent-os/`
- Agent web pages: `/opt/sites/`
- Credentials: `~/.nxm-keys` (chmod 600) — NEVER write values elsewhere
- SSH keys: `~/.ssh/` (ED25519)
- NxM infrastructure docs: `/home/nxm/Documents/NxM Linux Server/`
- Nexum project docs: `/home/nxm/Documents/Nexum Projects/`
## Standing Decisions ## Standing Decisions
- TrueNAS will move to a dedicated server — avoid hardcoding 172.27.40.5 in automation - NPM handles all SSL termination — internal services use HTTP
- NPM handles all SSL termination — internal services use HTTP, NPM adds HTTPS - Docker Compose only (no Kubernetes, no Swarm)
- NFS preferred for Linux-to-Linux file sharing - All destructive actions require explicit confirmation
- Docker Compose only (no Kubernetes) - Credentials only in `~/.nxm-keys` — never in output, logs, or config files
- All destructive actions require explicit confirmation before execution - Netbird fully removed (2026-05-28) — VPN is Headscale + OpenVPN S2S
- WireGuard fully removed (2026-05-30) — replaced by OpenVPN S2S
- Open WebUI → Citadel MCP: auth_type must be `none` (empty bearer = silent failure)
- Docker → OPNsense API: run from host, never from inside a container (HTTP 400)
- NocoDB = RvDM personal only — never use for Nexum projects
- Nexum client data layer = Directus CRM
+18 -11
View File
@@ -1,6 +1,6 @@
# Identity # Identity
> **Status: COMPLETE** — Interview completed 2026-05-01. > **Status: COMPLETE** — Interview completed 2026-05-01, updated 2026-06-19.
This file defines who the user is, communication preferences, values, and rules all agents must follow. Every skill reads this file before executing. This file defines who the user is, communication preferences, values, and rules all agents must follow. Every skill reads this file before executing.
@@ -11,7 +11,9 @@ This file defines who the user is, communication preferences, values, and rules
- **Name:** Jaco Bezuidenhout - **Name:** Jaco Bezuidenhout
- **Company:** Nexum SA (PTY) Ltd — Mossel Bay, South Africa - **Company:** Nexum SA (PTY) Ltd — Mossel Bay, South Africa
- **Role:** Business owner, IT admin, network engineer - **Role:** Business owner, IT admin, network engineer
- **Primary focus:** Network monitoring for early problem detection; IT infrastructure management for clients - **Primary focus:** Network monitoring, NIST CSF compliance reporting, IT infrastructure management for clients
- **Domain expertise:** VLANs, inter-VLAN routing, firewall rules (OPNsense), split DNS, VPN (Headscale/OpenVPN S2S), Docker Compose, Ubuntu Server admin, reverse proxy (NPM), IPAM (NetBox), monitoring (Grafana/Uptime Kuma/InfluxDB)
- **Not expert in:** Kubernetes, cloud platforms (AWS/Azure/GCP), advanced Python (learning), application development
--- ---
@@ -19,9 +21,10 @@ This file defines who the user is, communication preferences, values, and rules
Priority order: Priority order:
1. **Monitoring & compliance** — collect firewall and software data to support NIST CSF report completion 1. **Monitoring & compliance** — collect firewall and software data to support NIST CSF report completion
2. **Coding** — scripting, automation, tooling 2. **Client management** — TRMM remote management, Directus CRM, Frappe Helpdesk ticketing
3. **Summarising**distil logs, changelogs, reports into concise output 3. **Coding**scripting, automation, tooling
4. **General automation** — recurring tasks, scheduled jobs 4. **Summarising** — distil logs, changelogs, reports into concise output
5. **General automation** — recurring tasks, scheduled jobs, backups
--- ---
@@ -48,7 +51,7 @@ Priority order:
- Send any external message (email, webhook, notification) - Send any external message (email, webhook, notification)
- Push to git or any remote repository - Push to git or any remote repository
- Drop, reset, or modify databases - Drop, reset, or modify databases
- **Never use a cloud-hosted LLM** (OpenAI, Anthropic API, Google, etc.) unless explicitly instructed. All inference stays on local Ollama (172.27.6.139:11434). - Expose any service publicly without confirming NPM + Cloudflare + firewall implications
--- ---
@@ -56,13 +59,17 @@ Priority order:
- Depends on the task — choose the format that fits the output type. - Depends on the task — choose the format that fits the output type.
- **Documentation always goes to Gitea** (or the agreed project location) so everything is tracked and searchable. - **Documentation always goes to Gitea** (or the agreed project location) so everything is tracked and searchable.
- **Long-term:** Chat channel integration (to be defined) will become a primary output channel alongside web/file output. - **Notifications route through Raven** (Discord + Gmail) at `http://raven-notify:8400`
- **Agent web output** goes to `/opt/sites/<name>/` served at agents.nxm.co.za
--- ---
## Infrastructure Context ## Infrastructure Context
- Local LLM: Ollama at `http://172.27.6.139:11434` (gemma4, qwen2.5-coder:7b) - **Ollama:** `http://172.27.40.20:11434` — Windows 11 Pro (NxM-AI), models: gemma4, llama3.1:8b, phi4
- Server: Ubuntu at `172.27.40.3` — Docker host, all agent runtimes - **Server:** Ubuntu at `172.27.40.3` — Docker host, all agent runtimes
- Git: Gitea at `https://git.nxm.co.za` — all code and docs live here - **Hermes Native:** VM 108 at `172.27.40.30` — OpenRouter LLM, Honcho memory, WhatsApp connected
- Agent OS runtime: `/opt/agent-os/` on 172.27.40.3, mounted at `/mnt/agent-os` on Kubuntu - **Git:** Gitea at `https://git.nxm.co.za` — all code and docs
- **Agent OS runtime:** `/opt/agent-os/` on 172.27.40.3
- **Credentials:** `~/.nxm-keys` (chmod 600) — API keys for NPM, OPNsense, Proxmox, TrueNAS, Plane, Gitea, NetBox
- **Claude Code:** installed on 172.27.40.3, primary AI assistant
+18 -14
View File
@@ -1,7 +1,7 @@
# Active Projects # Active Projects
Current in-flight work. Update at the end of each session. Current in-flight work. Update at the end of each session.
Last updated: 2026-05-16 Last updated: 2026-06-19
--- ---
@@ -12,8 +12,8 @@ Phases 1 (NFS + mount) and 2 (identity interview) are complete.
**Phase 3 goal:** Docker container state monitoring + system resources. Complements Varys (HTTP reachability) — do not duplicate. **Phase 3 goal:** Docker container state monitoring + system resources. Complements Varys (HTTP reachability) — do not duplicate.
Pre-work before implementing: Pre-work before implementing:
- [ ] Update `skills/infra-monitor/skill.md` — container list is stale (has Flowise, missing Open WebUI + all new agents: citadel, varys, bran, sam, raven, qyburn, hodor, searxng, monitoring, bni-scheduler, nocodb) - [ ] Update `skills/infra-monitor/skill.md` — container list is stale (references Flowise/Netbird, missing 20+ current services)
- [ ] Correct Ollama URL in skill.md: now `http://172.27.40.20:11434` (moved from 172.27.6.139) - [ ] Correct Ollama URL in skill.md: now `http://172.27.40.20:11434` (moved from 172.27.6.139 → 172.27.40.20)
- [ ] Decide implementation: Docker one-shot container (consistent with bran/varys pattern) vs host cron + shell script - [ ] Decide implementation: Docker one-shot container (consistent with bran/varys pattern) vs host cron + shell script
Implementation tasks: Implementation tasks:
@@ -26,23 +26,27 @@ Implementation tasks:
- [ ] Hourly heartbeat cron on 172.27.40.3 - [ ] Hourly heartbeat cron on 172.27.40.3
- [ ] Daily 07:00 full digest cron - [ ] Daily 07:00 full digest cron
- [ ] Notification channel: Raven (confirmed live at http://raven-notify:8400) - [ ] Notification channel: Raven (confirmed live at http://raven-notify:8400)
- [ ] Home Assistant integration (172.27.10.6) — optional, revisit after Phase 3
## Agent OS — Phase 5: Future Skills (Future) ## Agent OS — Phase 5: Future Skills (Future)
- backup-monitor: TrueNAS migrated to new hardware (172.27.40.220) — skill ready to build - backup-monitor: extend Tarly with deeper TrueNAS integration
- Netbird/Headscale peer health: Netbird API at http://172.22.0.11:80/api/
- Daily log digest: summarise /opt/agent-os/logs/ via Ollama - Daily log digest: summarise /opt/agent-os/logs/ via Ollama
--- ---
## Gitea Documentation Repos ## Active Infrastructure Projects
- [x] nxm-infrastructure repo — Obsidian vault imported, CLAUDE.md added 2026-05-16
- [x] nexum-projects repo — Obsidian vault imported (on Kubuntu) | Project | Status | Next Step |
- [x] agent-os repo — scaffolding created, CLAUDE.md is global symlink |---|---|---|
| **Monitoring** | bezhuis+mwp+coetzee alerts live | CPU/mem/WAN/ping Grafana rules pending |
| **OpenVPN S2S** | bezhuis/mwp/coetzee DONE | fwlaw pending |
| **Tarly Backup** | Hub working | bezhuis/mwp/coetzee API key fix (backup privilege) |
| **Directus CRM** | LIVE, 12 clients seeded | Manual data enrichment (contacts, renewals) |
| **InvenTree** | LIVE (testing) | SSL cert, production use |
| **Mailcow** | MAIL-1+2 done | Blocked on Mimecast (MAIL-3→9) |
| **Vexis** | nexum-private-customer-setup + office-install done | ESET/Evolve creds or standard-setup next |
| **Maester Phase 2** | Phase 1 live | Hermes narrative + .docx generation |
--- ---
## Pending: Gitea SSH Key (security debt) ## Gitea SSH Key — DONE
Server remote uses HTTP with embedded token. Before rotating: Server remote switched from HTTP+token to SSH (`gitea-local:admin/agent-os.git`) on 2026-06-19.
1. Add SSH key for `nxm@172.27.40.3` to Gitea (Admin → Settings → SSH Keys)
2. `cd /opt/agent-os && git remote set-url origin gitea-local:admin/agent-os.git`
+33 -5
View File
@@ -1,13 +1,41 @@
# Constraints # Constraints
Hard limits agents must respect. Never work around these without explicit user confirmation. Hard limits agents must respect. Never work around these without explicit user confirmation.
Last updated: 2026-04-30 Last updated: 2026-06-19
--- ---
- Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset, force push) ## Destructive actions
- Never store credentials in output files, logs, or generated markdown — reference their location instead - Never delete or overwrite files without explicit confirmation
- Never skip git hooks or bypass signing - Never restart or stop services without explicit confirmation
- TrueNAS is on new hardware — use 172.27.40.220 (Servers40) for services, 172.27.6.221 for management/API - Never drop, reset, or modify databases without explicit confirmation
- Never force push to git or bypass hooks
- Never run `pfctl` commands on OPNsense (risk of locking out remote access)
## Credentials
- All credentials live in `~/.nxm-keys` (chmod 600) — ONLY location
- Never store credentials in output files, logs, generated markdown, .env files, or code
- Reference the file location, never the values
- TrueNAS IPs: 172.27.40.220 (Servers40 data) / 172.27.6.221 (management/API)
## Infrastructure
- Linux server (172.27.40.3) has no GPU — never schedule LLM inference to run locally there - Linux server (172.27.40.3) has no GPU — never schedule LLM inference to run locally there
- Ollama runs on 172.27.40.20 (Windows 11 Pro) — not on the Docker host
- Docker Compose only — no Kubernetes, no Swarm - Docker Compose only — no Kubernetes, no Swarm
- Docker proxy network (172.22.0.0/16) cannot reach OPNsense API at 172.27.6.1 — always run OPNsense API scripts from the host
- NPM handles SSL termination — internal services always use HTTP
## Agent-specific
- **maester-reports:** restart clears in-memory cache → re-parses all evidence PDFs via Claude Opus vision (Anthropic API cost). Avoid unnecessary restarts.
- **NocoDB:** RvDM personal birthday DB ONLY — never suggest for any Nexum project. Nexum data layer = Directus.
- **Open WebUI → Citadel MCP:** auth_type must be `none`. Empty bearer key generates illegal header → silent connection failure.
- **Qyburn task specs:** never embed code in the description field — use plain English only (14b models explain code instead of writing it)
## External communication
- Never send any external message (email, webhook, Discord notification) without explicit confirmation
- Notifications always route through Raven (http://raven-notify:8400)
- Never expose services publicly without confirming NPM + Cloudflare + firewall implications
## Naming
- S2S = always suggest Site-to-Site VPN (not Road Warrior) for permanent infrastructure endpoints
- Use `.50+` IP range for non-firewall infrastructure devices on S2S tunnels
+32 -6
View File
@@ -1,18 +1,44 @@
# Persistent Memory # Persistent Memory
Facts that don't expire. If you'd have to re-explain it to a new agent every time, it belongs here. Facts that don't expire. If you'd have to re-explain it to a new agent every time, it belongs here.
Last updated: 2026-04-30 Last updated: 2026-06-19
--- ---
## Infrastructure decisions ## Infrastructure decisions
- RustDesk is self-hosted on 172.27.40.3 — clients connect to local server not public relay - RustDesk is self-hosted on 172.27.40.3 — clients connect to local server not public relay
- Netbird signal+management both route through NPM on port 443 — exposedAddress in /opt/stacks/netbird/config.yaml must be https://netbird.nxm.co.za:443 (caddy-netbird on :8443 exists but is not used externally) - NPM handles all SSL termination — internal services use HTTP, NPM adds HTTPS
- Headscale v0.28: all write operations require numeric user ID, not username - Headscale v0.28: all write operations require numeric user ID, not username
- Tailscale on Windows overrides DNS — disconnect before testing split DNS changes - Tailscale on Windows overrides DNS — disconnect before testing split DNS changes
- Servers running Tailscale must run `sudo tailscale set --accept-dns=false` before joining Netbird - Docker Compose only — no Kubernetes, no Swarm
- Docker → OPNsense API: HTTP 400 from Docker proxy network — always run OPNsense API scripts from the host
- All internal subdomains: gray-cloud CNAME → opnsense.nxm.co.za in Cloudflare. Proxied = 523 error.
- OPNsense split DNS: all subdomains resolve to 172.27.40.3 internally via Unbound host overrides
## Decommissioned services (do not reference)
- **Netbird:** Fully removed from server 2026-05-28. Orphaned clients on mwp/coetzee/b0qxxx/fwlaw firewalls pending removal.
- **WireGuard (N2W):** Fully removed 2026-05-30. Replaced by OpenVPN S2S.
- **Flowise:** Replaced by Open WebUI 2026-05-01.
- **Zabbix:** No longer running (monitoring moved to Grafana + InfluxDB + Telegraf).
## Agent OS build state ## Agent OS build state
- Phase 1-2 (file structure + NFS + identity interview): not yet started - Phase 1-2 complete (file structure + identity interview)
- First skill to build: infra-monitor (Docker health + agent watchdog) - Phase 3 (infra-monitor skill): spec written but stale, not yet implemented
- Notifications target: Home Assistant at 172.27.10.6 - Notifications target: Raven at http://raven-notify:8400 (Discord + Gmail)
- All agent logs write to `/opt/agent-os/logs/<agent>/last-run.json`
## Credential policy
- All API keys and passwords: `~/.nxm-keys` (chmod 600)
- Never write credential values into output, logs, docs, or config files
- Reference credential location instead
## VPN topology
- **Headscale** (self-hosted Tailscale): remote access for admin devices
- **OpenVPN S2S:** site-to-site for client firewalls (bezhuis/mwp/coetzee done, fwlaw pending)
- Hub tunnel IPs: bezhuis=172.16.17.2, mwp=172.16.17.3, coetzee=172.16.17.4
## Ollama
- Host: 172.27.40.20 (Windows 11 Pro, NxM-AI), Vulkan GPU
- Models: gemma4, llama3.1:8b, phi4
- Auto-starts via Scheduled Task (S4U + AtStartup)
- Used by: hodor-gateway, sam-research, qyburn-coder, Open WebUI
+19 -6
View File
@@ -1,11 +1,24 @@
# Recent Decisions # Recent Decisions
Decisions made in the last 30 days that affect current work. Archive when no longer relevant. Decisions made in the last 60 days that affect current work. Archive when no longer relevant.
Last updated: 2026-04-30 Last updated: 2026-06-19
--- ---
- **2026-04-30:** Chose Gitea (self-hosted git) over Obsidian for documentation — AI-writable, browser-accessible, version controlled - **2026-06-19:** Agent OS git remote switched from HTTP+token to SSH (gitea-local:admin/agent-os.git) — security debt resolved
- **2026-04-30:** Agent OS files to live on 172.27.40.3 at /opt/agent-os/, accessed from Kubuntu via NFS - **2026-06-19:** Comprehensive Agent OS documentation update — brain.md, identity.md, all memory files brought current for LLM onboarding
- **2026-04-29:** Chose Syncthing-free approach for Obsidian migration — NFS for Linux, SMB for Windows - **2026-06-18:** Coetzee OpenVPN S2S complete — monitoring-only + hub-side NAT for Active Backup to Synology DS423+
- **2026-04-29:** infra-monitor will be first Agent OS skill — covers Docker health and agent watchdog in one skill - **2026-06-18:** Tarly backup service live — OPNsense config backups to TrueNAS NFS, Proxmox monitoring
- **2026-06-17:** Directus CRM live — 6 collections, 12 clients seeded from TRMM, 5 Citadel MCP tools
- **2026-06-17:** MWP Netbird fully removed, WireGuard spoke cleaned
- **2026-06-12:** NxM-AI (Kubuntu) migrated to Windows 11 Pro — same IP 172.27.40.20, Ollama via Scheduled Task
- **2026-06-12:** Vexis office-install/uninstall scripts live-tested, windows-update scripts done
- **2026-06-11:** Workshop20 → Servers40 firewall rules (1677-1680) for TRMM + Vexis access
- **2026-06-10:** Frappe Helpdesk live — TRMM→HD sync, Citadel tools, Vexis wired
- **2026-06-10:** trmm_confirm_with_user proven working (incl. response-parsing bug fix)
- **2026-05-30:** WireGuard fully removed, replaced by OpenVPN S2S
- **2026-05-29:** Maester reports Phase 1 live — 8 automated CSF controls, Grafana dashboard
- **2026-05-28:** Netbird fully removed from server
- **2026-05-28:** ZenArmor → Grafana pipeline all 3 phases complete
- **2026-05-27:** Jon Snow Phase 3 complete — approval gate, Discord approve/reject
- **2026-04-30:** Agent OS architecture: plain markdown files at /opt/agent-os/, Gitea-tracked, cron-scheduled
+60 -11
View File
@@ -15,35 +15,84 @@ Reads before executing:
### Docker health (on 172.27.40.3) ### Docker health (on 172.27.40.3)
- All expected containers are running (not exited/restarting) - All expected containers are running (not exited/restarting)
- Flag any container that has restarted more than 3 times in the last hour - Flag any container that has restarted more than 3 times in the last hour
- Expected containers: portainer, nginx-proxy-manager, uptime-kuma, gitea, headscale, netbird, vaultwarden, flowise, plane, zabbix, homarr - Expected containers (grouped by criticality):
**Critical (alert immediately if down):**
- nginx-proxy-manager (reverse proxy — everything depends on this)
- gitea (all code + docs)
- citadel-mcp (central tool server)
- raven-notify (notification hub)
- open-webui (chat UI)
- vaultwarden (password manager)
**Important (alert after 15 min down):**
- headscale (VPN)
- grafana (monitoring dashboards)
- influxdb (time-series data)
- portainer (Docker management)
- uptime-kuma (HTTP monitoring)
- maester-reports (CSF compliance)
- jon-snow (orchestrator)
- tarly-backup (backup monitoring)
- directus + directus-db + directus-redis (CRM)
**Normal (report in daily digest only):**
- hodor-gateway, sam-research, qyburn-coder, searxng
- homarr, headplane, headscale-ui
- plane-* (all Plane containers)
- netbox-* (all NetBox containers)
- nocodb, bni-scheduler, inventree-*, wetty, term-dash
- rustdesk-hbbs, rustdesk-hbbr
- iventoy, agent-sites
### Service reachability ### Service reachability
Lightweight HTTP check (curl, timeout 5s) on each internal URL: Lightweight HTTP check (curl, timeout 5s) on each internal URL:
- http://172.27.40.3:9443 (Portainer) - http://172.27.40.3:9443 (Portainer)
- http://172.27.40.3:3002 (Uptime Kuma) - http://172.27.40.3:3002 (Uptime Kuma)
- http://172.27.40.3:3000 (Gitea) - http://172.27.40.3:3000 (Gitea)
- http://172.27.40.3:3010 (Flowise) - http://172.27.40.3:3010 (Open WebUI)
- http://172.27.40.3:7575 (Homarr) - http://172.27.40.3:7575 (Homarr)
- http://172.27.6.139:11434 (Ollama) - http://172.27.40.3:8300 (Citadel MCP)
- http://172.27.40.3:8400 (Raven)
- http://172.27.40.3:8800 (Maester)
- http://172.27.40.3:8900 (Jon Snow)
- http://172.27.40.3:3020 (Grafana)
- http://172.27.40.3:8100 (NetBox)
- http://172.27.40.3:8850 (Directus)
- http://172.27.40.20:11434 (Ollama on NxM-AI)
### Agent watchdog ### Agent watchdog
For each skill directory under `../../skills/`: For each agent log at `../../logs/<agent>/last-run.json`:
- Check `last-output.md` modification time — flag if older than expected schedule - Check modification time — flag if older than expected schedule
- Check `../../logs/<skill-name>/` for ERROR entries in last run - Check `status` field — flag if not "success"
- Report: healthy / stale / erroring - Expected agents and max staleness:
- bran-changelog: 25 hours (daily)
- varys-monitor: 20 minutes (every 15 min)
- trmm-frappe-sync: 35 minutes (every 30 min)
- tarly-backup: 25 hours (daily)
- raven-notify: 25 hours (event-driven, check status only)
- citadel-mcp, sam-research, qyburn-coder, jon-snow: check status only (on-demand)
### System resources (on 172.27.40.3) ### System resources (on 172.27.40.3)
- Disk usage on / — warn if >80%, critical if >90% - Disk usage on / — warn if >80%, critical if >90%
- Memory usage — flag if >85% - Memory usage — flag if >85%
- Docker disk usage (`docker system df`) — warn if reclaimable > 10GB
### Remote hosts (optional, best-effort)
- Ping 172.27.40.20 (Ollama host)
- Ping 172.27.40.30 (Hermes Native VM)
- Ping 172.27.40.2 (Proxmox)
## Output ## Output
Write a digest to `last-output.md` in this format: Write a digest to `last-output.md` in this format:
- Summary line: X healthy, Y warnings, Z critical - Summary line: X healthy, Y warnings, Z critical
- Section per category: Docker, Services, Agent Watchdog, System - Section per category: Docker, Services, Agent Watchdog, System, Remote Hosts
- Each item: ✓ OK / ⚠ Warning / ✗ Critical + one line detail - Each item: ✓ OK / ⚠ Warning / ✗ Critical + one line detail
Pass anomalies to `context/handoff.md` for notification skill (future). Also write machine-readable output to `../../logs/infra-monitor/last-run.json`.
Pass anomalies to `context/handoff.md` for Raven notification.
## Wrap-up ## Wrap-up
@@ -54,5 +103,5 @@ After writing output:
## Schedule ## Schedule
- **Heartbeat:** every hour — checks Docker + Ollama only (fast, <30s) - **Heartbeat:** every hour — checks Docker + Ollama + critical services only (fast, <30s)
- **Full digest:** daily at 07:00 — all checks - **Full digest:** daily at 07:00 — all checks including remote hosts and disk usage