Initial Agent OS scaffolding — identity, brain, memory, infra-monitor skill

2026-04-30 13:40:45 +02:00
commit e8bd571a77
15 changed files with 283 additions and 0 deletions
@@ -0,0 +1,41 @@
 # NxM Agent OS
 A personal agentic operating system built on plain markdown files. Tool-agnostic — works with Claude Code, Ollama, or any LLM harness. Based on the framework from the AI Daily Brief episode "How to Build a Personal Agentic Operating System" (Nufar Gaspar, 2026-04-25).
 ## How it works
 Every agent interaction reads from and writes back to files in this repo. No databases, no APIs, no vendor lock-in. The files ARE the system.
 ## The seven layers
 | Layer | File(s) | Purpose |
 |---|---|---|
 | Identity | `identity.md` | Who you are, communication style, values |
 | Context | `context/` | Dated, task-specific working files |
 | Brain | `brain.md` | Persistent facts — infra, people, decisions |
 | Memory | `memory/` | Short and long-term session notes |
 | Skills | `skills/` | Repeatable workflows, each self-improving |
 | Processes | `skills/*/context/handoff.md` | Output passed between chained skills |
 | Automation | cron on 172.27.40.3 | Scheduled skill execution |
 ## Adding a new skill
 1. Create `skills/<skill-name>/skill.md` — what the skill does and how
 2. Create `skills/<skill-name>/learnings.md` — starts empty, fills over time
 3. Create `skills/<skill-name>/eval.json` — scoring criteria
 4. Add a cron job on 172.27.40.3 calling the skill
 5. The infra-monitor watchdog will automatically pick it up
 ## Runtime
 - Files live on server: `/opt/agent-os/` (cloned from this repo)
 - LLM inference: Ollama at `http://172.27.6.139:11434`
 - Scheduled jobs: cron on `172.27.40.3`
 - Local editing: `/home/nxm/Documents/agent-os/` on Kubuntu (this machine)
 ## Infra reference
 Cross-repo links to supporting documentation:
 - [IP & Port Map](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/IP%20%26%20Port%20Map.md)
 - [Docker Stacks](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/Docker%20Stacks.md)
 - [Network Overview](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Infrastructure/Network%20Overview.md)
@@ -0,0 +1,64 @@
 # Brain
 Core facts read by all skills. Keep under 1000 words. Update when infrastructure changes.
 Last updated: 2026-04-30
 ---
 ## Infrastructure
 **Primary server:** 172.27.40.3 — Ubuntu Server LTS, Docker host
 **Kubuntu desktop:** 172.27.6.139 — NxM-AI, runs Ollama
 **TrueNAS NAS:** 172.27.40.5
 **Firewall:** OPNsense at 172.27.6.1
 **VLANs:**
 | VLAN | Name | Subnet |
 |---|---|---|
 | 40 | Servers40 | 172.27.40.0/24 |
 | 20 | Workshop20 | 172.27.20.0/24 |
 | 10 | IoT10 | 172.27.10.0/24 |
 ## Key Services (172.27.40.3)
 | Service | Port | URL |
 |---|---|---|
 | Portainer | 9443 | https://172.27.40.3:9443 |
 | Nginx Proxy Manager | 80/81/443 | http://172.27.40.3:81 |
 | Uptime Kuma | 3002 | http://172.27.40.3:3002 |
 | Gitea | 3000 | https://git.nxm.co.za |
 | Headscale | 8080 | https://headscale.nxm.co.za |
 | Netbird | 3479/udp | https://netbird.nxm.co.za |
 | Vaultwarden | 8222 | https://vault.nxm.co.za |
 | Flowise | 3010 | http://172.27.40.3:3010 |
 | Plane | 8095 | https://plane.nxm.co.za |
 | Zabbix | 8091 | https://zabbix.nxm.co.za |
 | Homarr | 7575 | http://172.27.40.3:7575 |
 ## AI Stack
 - **Ollama** on 172.27.6.139:11434 (bound to 0.0.0.0)
 - **Models:** gemma4, qwen2.5-coder:7b
 - **Flowise** on 172.27.40.3:3010 — visual agent/flow builder
 - **Claude Code** — primary AI assistant, runs on Kubuntu
 ## Agent OS Runtime
 - Files: `/opt/agent-os/` on 172.27.40.3
 - Local edit path: `/home/nxm/Documents/agent-os/` on 172.27.6.139
 - Repo: `https://git.nxm.co.za/admin/agent-os`
 - Scheduled jobs: cron on 172.27.40.3
 - LLM calls: `http://172.27.6.139:11434`
 ## Key Paths on Server
 - Docker stacks: `/opt/stacks/`
 - Agent OS: `/opt/agent-os/`
 ## Standing Decisions
 - TrueNAS will move to a dedicated server — avoid hardcoding 172.27.40.5 in automation
 - NPM handles all SSL termination — internal services use HTTP, NPM adds HTTPS
 - NFS preferred for Linux-to-Linux file sharing
 - Docker Compose only (no Kubernetes)
 - All destructive actions require explicit confirmation before execution
@@ -0,0 +1,19 @@
 # Identity
 > **Status: PENDING** — To be completed via Claude interview session.
 > Run the interview by saying: "Let's complete the Agent OS identity interview."
 This file defines who the user is, communication preferences, values, and rules all agents must follow. Every skill reads this file before executing.
 ## What the interview will capture
 - Professional role and responsibilities
 - Communication style preferences
 - Core values and priorities
 - Things agents should never do
 - How decisions should be escalated vs handled autonomously
 - Preferred output formats
 ---
 *This section will be replaced with the completed identity profile after the interview.*
@@ -0,0 +1,23 @@
 # Active Projects
 Current in-flight work. Update at the end of each session.
 Last updated: 2026-04-30
 ---
 ## Agent OS — Phase 1 (NEXT)
 Complete the foundation before building skills.
 - [ ] Set up NFS export on 172.27.40.3 + mount on Kubuntu at /mnt/agent-os
 - [ ] Run identity interview with Claude → populate identity.md
 - [ ] Seed brain.md review and confirm accuracy
 - [ ] Clone this repo to /opt/agent-os/ on server
 ## Agent OS — Phase 3 (PENDING Phase 1)
 - [ ] Build infra-monitor skill
 - [ ] Set up cron schedule (hourly heartbeat, daily digest)
 - [ ] Wire up Home Assistant notifications
 ## Gitea documentation
 - [x] nxm-infrastructure repo — Obsidian vault imported
 - [x] nexum-projects repo — Obsidian vault imported
 - [x] agent-os repo — scaffolding created
@@ -0,0 +1,13 @@
 # Constraints
 Hard limits agents must respect. Never work around these without explicit user confirmation.
 Last updated: 2026-04-30
 ---
 - Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset, force push)
 - Never store credentials in output files, logs, or generated markdown — reference their location instead
 - Never skip git hooks or bypass signing
 - TrueNAS (172.27.40.5) is being migrated to a new server — do not create hard dependencies on that IP
 - Linux server (172.27.40.3) has no GPU — never schedule LLM inference to run locally there
 - Docker Compose only — no Kubernetes, no Swarm
@@ -0,0 +1,8 @@
 # Notes from Last Run
 Populated automatically at the end of each skill run. Cleared at the start of each new session.
 Last updated: —
 ---
 *No runs yet — Agent OS not yet deployed.*
@@ -0,0 +1,18 @@
 # Persistent Memory
 Facts that don't expire. If you'd have to re-explain it to a new agent every time, it belongs here.
 Last updated: 2026-04-30
 ---
 ## Infrastructure decisions
 - RustDesk is self-hosted on 172.27.40.3 — clients connect to local server not public relay
 - Netbird client management is on port 8443 via Caddy sidecar, NOT port 443
 - Headscale v0.28: all write operations require numeric user ID, not username
 - Tailscale on Windows overrides DNS — disconnect before testing split DNS changes
 - Servers running Tailscale must run `sudo tailscale set --accept-dns=false` before joining Netbird
 ## Agent OS build state
 - Phase 1-2 (file structure + NFS + identity interview): not yet started
 - First skill to build: infra-monitor (Docker health + agent watchdog)
 - Notifications target: Home Assistant at 172.27.10.6
@@ -0,0 +1,11 @@
 # Recent Decisions
 Decisions made in the last 30 days that affect current work. Archive when no longer relevant.
 Last updated: 2026-04-30
 ---
 - **2026-04-30:** Chose Gitea (self-hosted git) over Obsidian for documentation — AI-writable, browser-accessible, version controlled
 - **2026-04-30:** Agent OS files to live on 172.27.40.3 at /opt/agent-os/, accessed from Kubuntu via NFS
 - **2026-04-29:** Chose Syncthing-free approach for Obsidian migration — NFS for Linux, SMB for Windows
 - **2026-04-29:** infra-monitor will be first Agent OS skill — covers Docker health and agent watchdog in one skill
@@ -0,0 +1,5 @@
 # Handoff: infra-monitor → notification
 Populated by infra-monitor when anomalies are found. Read by the notification skill (future).
 *Empty — no anomalies from last run, or skill has not run yet.*
@@ -0,0 +1,8 @@
 {
  "criteria": [
    { "name": "all_services_checked", "weight": 0.3, "description": "Every expected container and service was checked, none skipped" },
    { "name": "clear_status_summary", "weight": 0.3, "description": "Output leads with a plain-English summary line before detail" },
    { "name": "actionable_findings", "weight": 0.2, "description": "Any warning or critical item includes enough detail to act on immediately" },
    { "name": "agent_watchdog_complete", "weight": 0.2, "description": "All skills in /skills/ were checked for staleness and errors" }
  ]
 }
@@ -0,0 +1,3 @@
 # Last Output: infra-monitor
 *Not yet populated — skill has not run.*
@@ -0,0 +1,12 @@
 # Learnings: infra-monitor
 Updated automatically after each run. The skill reads this before executing to improve its next output.
 ## What has worked well
 *Not yet populated — skill has not run.*
 ## What missed the mark
 *Not yet populated — skill has not run.*
 ## Adjustments for next run
 *Not yet populated — skill has not run.*
@@ -0,0 +1,58 @@
 # Skill: infra-monitor
 Monitors server health and watches all Agent OS skills for staleness or errors. Runs on a cron schedule on 172.27.40.3.
 ## Inputs
 Reads before executing:
 - `../../identity.md`
 - `../../brain.md`
 - `../../memory/persistent.md`
 - `learnings.md` (this skill's improvement notes)
 ## What to check
 ### Docker health (on 172.27.40.3)
 - All expected containers are running (not exited/restarting)
 - Flag any container that has restarted more than 3 times in the last hour
 - Expected containers: portainer, nginx-proxy-manager, uptime-kuma, gitea, headscale, netbird, vaultwarden, flowise, plane, zabbix, homarr
 ### Service reachability
 Lightweight HTTP check (curl, timeout 5s) on each internal URL:
 - http://172.27.40.3:9443 (Portainer)
 - http://172.27.40.3:3002 (Uptime Kuma)
 - http://172.27.40.3:3000 (Gitea)
 - http://172.27.40.3:3010 (Flowise)
 - http://172.27.40.3:7575 (Homarr)
 - http://172.27.6.139:11434 (Ollama)
 ### Agent watchdog
 For each skill directory under `../../skills/`:
 - Check `last-output.md` modification time — flag if older than expected schedule
 - Check `../../logs/<skill-name>/` for ERROR entries in last run
 - Report: healthy / stale / erroring
 ### System resources (on 172.27.40.3)
 - Disk usage on / — warn if >80%, critical if >90%
 - Memory usage — flag if >85%
 ## Output
 Write a digest to `last-output.md` in this format:
 - Summary line: X healthy, Y warnings, Z critical
 - Section per category: Docker, Services, Agent Watchdog, System
 - Each item: ✓ OK / ⚠ Warning / ✗ Critical + one line detail
 Pass anomalies to `context/handoff.md` for notification skill (future).
 ## Wrap-up
 After writing output:
 1. Update `learnings.md` with anything that went wrong or could be improved
 2. Append a one-line log entry to `../../logs/infra-monitor.log`: `YYYY-MM-DD HH:MM | status | summary`
 3. Update `../../memory/notes-from-last-run.md`
 ## Schedule
 - **Heartbeat:** every hour — checks Docker + Ollama only (fast, <30s)
 - **Full digest:** daily at 07:00 — all checks
		`@@ -0,0 +1,3 @@`
							`# Last Output: infra-monitor`

							`Not yet populated — skill has not run.`