Initial Agent OS scaffolding — identity, brain, memory, infra-monitor skill

This commit is contained in:
2026-04-30 13:40:45 +02:00
commit e8bd571a77
15 changed files with 283 additions and 0 deletions
+41
View File
@@ -0,0 +1,41 @@
# NxM Agent OS
A personal agentic operating system built on plain markdown files. Tool-agnostic — works with Claude Code, Ollama, or any LLM harness. Based on the framework from the AI Daily Brief episode "How to Build a Personal Agentic Operating System" (Nufar Gaspar, 2026-04-25).
## How it works
Every agent interaction reads from and writes back to files in this repo. No databases, no APIs, no vendor lock-in. The files ARE the system.
## The seven layers
| Layer | File(s) | Purpose |
|---|---|---|
| Identity | `identity.md` | Who you are, communication style, values |
| Context | `context/` | Dated, task-specific working files |
| Brain | `brain.md` | Persistent facts — infra, people, decisions |
| Memory | `memory/` | Short and long-term session notes |
| Skills | `skills/` | Repeatable workflows, each self-improving |
| Processes | `skills/*/context/handoff.md` | Output passed between chained skills |
| Automation | cron on 172.27.40.3 | Scheduled skill execution |
## Adding a new skill
1. Create `skills/<skill-name>/skill.md` — what the skill does and how
2. Create `skills/<skill-name>/learnings.md` — starts empty, fills over time
3. Create `skills/<skill-name>/eval.json` — scoring criteria
4. Add a cron job on 172.27.40.3 calling the skill
5. The infra-monitor watchdog will automatically pick it up
## Runtime
- Files live on server: `/opt/agent-os/` (cloned from this repo)
- LLM inference: Ollama at `http://172.27.6.139:11434`
- Scheduled jobs: cron on `172.27.40.3`
- Local editing: `/home/nxm/Documents/agent-os/` on Kubuntu (this machine)
## Infra reference
Cross-repo links to supporting documentation:
- [IP & Port Map](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/IP%20%26%20Port%20Map.md)
- [Docker Stacks](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Quick%20Reference/Docker%20Stacks.md)
- [Network Overview](https://git.nxm.co.za/admin/nxm-infrastructure/src/branch/main/Infrastructure/Network%20Overview.md)
+64
View File
@@ -0,0 +1,64 @@
# Brain
Core facts read by all skills. Keep under 1000 words. Update when infrastructure changes.
Last updated: 2026-04-30
---
## Infrastructure
**Primary server:** 172.27.40.3 — Ubuntu Server LTS, Docker host
**Kubuntu desktop:** 172.27.6.139 — NxM-AI, runs Ollama
**TrueNAS NAS:** 172.27.40.5
**Firewall:** OPNsense at 172.27.6.1
**VLANs:**
| VLAN | Name | Subnet |
|---|---|---|
| 40 | Servers40 | 172.27.40.0/24 |
| 20 | Workshop20 | 172.27.20.0/24 |
| 10 | IoT10 | 172.27.10.0/24 |
## Key Services (172.27.40.3)
| Service | Port | URL |
|---|---|---|
| Portainer | 9443 | https://172.27.40.3:9443 |
| Nginx Proxy Manager | 80/81/443 | http://172.27.40.3:81 |
| Uptime Kuma | 3002 | http://172.27.40.3:3002 |
| Gitea | 3000 | https://git.nxm.co.za |
| Headscale | 8080 | https://headscale.nxm.co.za |
| Netbird | 3479/udp | https://netbird.nxm.co.za |
| Vaultwarden | 8222 | https://vault.nxm.co.za |
| Flowise | 3010 | http://172.27.40.3:3010 |
| Plane | 8095 | https://plane.nxm.co.za |
| Zabbix | 8091 | https://zabbix.nxm.co.za |
| Homarr | 7575 | http://172.27.40.3:7575 |
## AI Stack
- **Ollama** on 172.27.6.139:11434 (bound to 0.0.0.0)
- **Models:** gemma4, qwen2.5-coder:7b
- **Flowise** on 172.27.40.3:3010 — visual agent/flow builder
- **Claude Code** — primary AI assistant, runs on Kubuntu
## Agent OS Runtime
- Files: `/opt/agent-os/` on 172.27.40.3
- Local edit path: `/home/nxm/Documents/agent-os/` on 172.27.6.139
- Repo: `https://git.nxm.co.za/admin/agent-os`
- Scheduled jobs: cron on 172.27.40.3
- LLM calls: `http://172.27.6.139:11434`
## Key Paths on Server
- Docker stacks: `/opt/stacks/`
- Agent OS: `/opt/agent-os/`
## Standing Decisions
- TrueNAS will move to a dedicated server — avoid hardcoding 172.27.40.5 in automation
- NPM handles all SSL termination — internal services use HTTP, NPM adds HTTPS
- NFS preferred for Linux-to-Linux file sharing
- Docker Compose only (no Kubernetes)
- All destructive actions require explicit confirmation before execution
View File
+19
View File
@@ -0,0 +1,19 @@
# Identity
> **Status: PENDING** — To be completed via Claude interview session.
> Run the interview by saying: "Let's complete the Agent OS identity interview."
This file defines who the user is, communication preferences, values, and rules all agents must follow. Every skill reads this file before executing.
## What the interview will capture
- Professional role and responsibilities
- Communication style preferences
- Core values and priorities
- Things agents should never do
- How decisions should be escalated vs handled autonomously
- Preferred output formats
---
*This section will be replaced with the completed identity profile after the interview.*
View File
+23
View File
@@ -0,0 +1,23 @@
# Active Projects
Current in-flight work. Update at the end of each session.
Last updated: 2026-04-30
---
## Agent OS — Phase 1 (NEXT)
Complete the foundation before building skills.
- [ ] Set up NFS export on 172.27.40.3 + mount on Kubuntu at /mnt/agent-os
- [ ] Run identity interview with Claude → populate identity.md
- [ ] Seed brain.md review and confirm accuracy
- [ ] Clone this repo to /opt/agent-os/ on server
## Agent OS — Phase 3 (PENDING Phase 1)
- [ ] Build infra-monitor skill
- [ ] Set up cron schedule (hourly heartbeat, daily digest)
- [ ] Wire up Home Assistant notifications
## Gitea documentation
- [x] nxm-infrastructure repo — Obsidian vault imported
- [x] nexum-projects repo — Obsidian vault imported
- [x] agent-os repo — scaffolding created
+13
View File
@@ -0,0 +1,13 @@
# Constraints
Hard limits agents must respect. Never work around these without explicit user confirmation.
Last updated: 2026-04-30
---
- Never take destructive or irreversible action without explicit confirmation (delete, overwrite, drop, reset, force push)
- Never store credentials in output files, logs, or generated markdown — reference their location instead
- Never skip git hooks or bypass signing
- TrueNAS (172.27.40.5) is being migrated to a new server — do not create hard dependencies on that IP
- Linux server (172.27.40.3) has no GPU — never schedule LLM inference to run locally there
- Docker Compose only — no Kubernetes, no Swarm
+8
View File
@@ -0,0 +1,8 @@
# Notes from Last Run
Populated automatically at the end of each skill run. Cleared at the start of each new session.
Last updated: —
---
*No runs yet — Agent OS not yet deployed.*
+18
View File
@@ -0,0 +1,18 @@
# Persistent Memory
Facts that don't expire. If you'd have to re-explain it to a new agent every time, it belongs here.
Last updated: 2026-04-30
---
## Infrastructure decisions
- RustDesk is self-hosted on 172.27.40.3 — clients connect to local server not public relay
- Netbird client management is on port 8443 via Caddy sidecar, NOT port 443
- Headscale v0.28: all write operations require numeric user ID, not username
- Tailscale on Windows overrides DNS — disconnect before testing split DNS changes
- Servers running Tailscale must run `sudo tailscale set --accept-dns=false` before joining Netbird
## Agent OS build state
- Phase 1-2 (file structure + NFS + identity interview): not yet started
- First skill to build: infra-monitor (Docker health + agent watchdog)
- Notifications target: Home Assistant at 172.27.10.6
+11
View File
@@ -0,0 +1,11 @@
# Recent Decisions
Decisions made in the last 30 days that affect current work. Archive when no longer relevant.
Last updated: 2026-04-30
---
- **2026-04-30:** Chose Gitea (self-hosted git) over Obsidian for documentation — AI-writable, browser-accessible, version controlled
- **2026-04-30:** Agent OS files to live on 172.27.40.3 at /opt/agent-os/, accessed from Kubuntu via NFS
- **2026-04-29:** Chose Syncthing-free approach for Obsidian migration — NFS for Linux, SMB for Windows
- **2026-04-29:** infra-monitor will be first Agent OS skill — covers Docker health and agent watchdog in one skill
+5
View File
@@ -0,0 +1,5 @@
# Handoff: infra-monitor → notification
Populated by infra-monitor when anomalies are found. Read by the notification skill (future).
*Empty — no anomalies from last run, or skill has not run yet.*
+8
View File
@@ -0,0 +1,8 @@
{
"criteria": [
{ "name": "all_services_checked", "weight": 0.3, "description": "Every expected container and service was checked, none skipped" },
{ "name": "clear_status_summary", "weight": 0.3, "description": "Output leads with a plain-English summary line before detail" },
{ "name": "actionable_findings", "weight": 0.2, "description": "Any warning or critical item includes enough detail to act on immediately" },
{ "name": "agent_watchdog_complete", "weight": 0.2, "description": "All skills in /skills/ were checked for staleness and errors" }
]
}
+3
View File
@@ -0,0 +1,3 @@
# Last Output: infra-monitor
*Not yet populated — skill has not run.*
+12
View File
@@ -0,0 +1,12 @@
# Learnings: infra-monitor
Updated automatically after each run. The skill reads this before executing to improve its next output.
## What has worked well
*Not yet populated — skill has not run.*
## What missed the mark
*Not yet populated — skill has not run.*
## Adjustments for next run
*Not yet populated — skill has not run.*
+58
View File
@@ -0,0 +1,58 @@
# Skill: infra-monitor
Monitors server health and watches all Agent OS skills for staleness or errors. Runs on a cron schedule on 172.27.40.3.
## Inputs
Reads before executing:
- `../../identity.md`
- `../../brain.md`
- `../../memory/persistent.md`
- `learnings.md` (this skill's improvement notes)
## What to check
### Docker health (on 172.27.40.3)
- All expected containers are running (not exited/restarting)
- Flag any container that has restarted more than 3 times in the last hour
- Expected containers: portainer, nginx-proxy-manager, uptime-kuma, gitea, headscale, netbird, vaultwarden, flowise, plane, zabbix, homarr
### Service reachability
Lightweight HTTP check (curl, timeout 5s) on each internal URL:
- http://172.27.40.3:9443 (Portainer)
- http://172.27.40.3:3002 (Uptime Kuma)
- http://172.27.40.3:3000 (Gitea)
- http://172.27.40.3:3010 (Flowise)
- http://172.27.40.3:7575 (Homarr)
- http://172.27.6.139:11434 (Ollama)
### Agent watchdog
For each skill directory under `../../skills/`:
- Check `last-output.md` modification time — flag if older than expected schedule
- Check `../../logs/<skill-name>/` for ERROR entries in last run
- Report: healthy / stale / erroring
### System resources (on 172.27.40.3)
- Disk usage on / — warn if >80%, critical if >90%
- Memory usage — flag if >85%
## Output
Write a digest to `last-output.md` in this format:
- Summary line: X healthy, Y warnings, Z critical
- Section per category: Docker, Services, Agent Watchdog, System
- Each item: ✓ OK / ⚠ Warning / ✗ Critical + one line detail
Pass anomalies to `context/handoff.md` for notification skill (future).
## Wrap-up
After writing output:
1. Update `learnings.md` with anything that went wrong or could be improved
2. Append a one-line log entry to `../../logs/infra-monitor.log`: `YYYY-MM-DD HH:MM | status | summary`
3. Update `../../memory/notes-from-last-run.md`
## Schedule
- **Heartbeat:** every hour — checks Docker + Ollama only (fast, <30s)
- **Full digest:** daily at 07:00 — all checks