An open-source experiment · MIT licensed · self-hosted

Pernix

per·nix /ˈpɛɾ.nɪks/ — Latin: nimble, swift of foot.

A self-hosted agent harness, wrought in code.
It lives on your hardware, holds memory you own, and runs the same work loop regardless of which model you put inside it.

Run it Read the anatomy

I. The premise

Three things you keep.

The model

Local with Ollama, cloud via your own OpenRouter key, or both — mixed freely. A fast model scouts and plans. A capable one executes. A lighter one handles background work. Workers pick whichever model fits their task. The workflow doesn't care which brain is inside: swap freely, your keys, your routing, your call.

The memory

Markdown files in data/memories/. Plain text, full-text indexed, tagged with source and confidence — yours forever. Stays put when you switch models. Stays put when a provider changes its terms. Not locked to any subscription, not sludge accumulating in a chat thread. Read, edit, or delete with any text editor.

III

The machine

No accounts. No cloud subscription. No usage telemetry. A Python server that binds to localhost until you say otherwise — then network mode with HTTPS and bearer-token auth. The harness is yours to inspect, fork, and rebuild.

II. The anatomy

What you're getting.

Pernix is one Python codebase that fits in your head. A FastAPI server, a state machine, a streaming agent loop, a memory store, a workspace. On top of it: a built-in PWA, a REST API, and a Swagger UI you can poke from any browser.

Run it on a dedicated VM, container, or spare Linux box. Open http://localhost:8090. Talk to it. Watch it think. Read the code that made it think that way.

stackPython 3.11+ · FastAPI · SQLite · SSE · Playwright
interfacesWeb PWA · REST · Server-Sent Events
storageSQLite for sessions · Markdown for memory · filesystem workspace
licenseMIT

localhost:8090

sessions

◉ morning brief

○ refactor auth

○ research: rag

○ weekly digest

+ new

you · 09:14 summarize what happened on the auth branch yesterday and tee it up for review.

pernix · scouting ↳ git_log read_file × 4 search_memory five commits, all on session-token rotation. flagging two for compliance review

processing claude-sonnet-4.6 · 12.4k / 200k

III. The unfolding

How a turn thinks itself out.

Every message you send rolls through five phases. Each one runs on a model suited to its job — fast for planning, capable for acting, light for verifying.

01

Session

Your message lands on a persistent thread. Append-only. Resumable. Restart-proof.
queue
02

Scout

A small fast model in a fresh context plans the approach — picks tools, loads only the relevant skills.
fast model
03

Loop

The main model executes. Streams tokens, calls tools, reads results, calls more tools — until done.
main model
04

Reflect

A quality gate verifies intent was met. Returns pass, retry, or escalate. Up to two retries before surfacing.
verify
05

Post‑hooks

Auto-titling, memory distillation, worker cleanup. The cleanup runs in the background after you've already seen the answer.
background

Compaction trims old turns when context fills past 75%. The originals stay in the database — only the prompt view changes.

Snooze runs between turns. While the agent is idle, it dedupes memory, distills your profile, and archives post-mortems. The moment you send a new message, Snooze stops — your work always wins.

IV. The state machine

Nine states. One session at a time.

Every session is in exactly one state. Transitions are logged, replayed to the UI in real time, and recovered after a crash.

V. The orchestration

Three layers between you and the work.

A conversation, a swarm, a recipe. Each one a different way to put the agent to use — chat, parallelize, automate.

layer 01

Sessions

a conversation thread

A persistent thread. Append-only. Every message and tool call survives a restart. One agent loop runs at a time per session — but you can have many sessions open and switch between them.

SQLite-backed
resumable
per-session model overrides

layer 02

Workers

parallel sub‑agents

Within a turn, the main agent can spawn workers — sub-agents in their own sessions, each on whichever model fits its task best. A slow planner orchestrates fast editors. A vision model and a code-specialist run side by side.

different models per worker
flat — workers don't spawn workers
pause & resume at round boundaries

layer 03

Workflows

repeatable YAML pipelines

A workflow is a YAML pipeline in data/workflows/ — explicit steps, explicit dependencies, parallel waves. Build one for a job you do every Monday, then schedule it on cron. The same workflow runs unchanged whether the brain inside is a local model, a frontier API, or a mix of both. The agent assembles the workers, you read the output.

YAML steps with depends_on
parallel waves auto-dispatched
schedulable via cron
model-agnostic by design

One conversation. Many workers. Many recipes. Pernix is happiest when you use all three.

VI. The thesis

Build the loop once.
Swap the brain freely.

A chatbot is where you ask for help. An agent harness is where work happens. Pernix is a harness — it has a job to do, a place to run, memory of what happened before, and enough structure that the underlying model can change without destroying the workflow.

The loop is stable

Define the workflow once — explicit steps, tool calls, retry logic, scheduling. It runs the same whether today or next quarter, whether the model is local Qwen or a frontier API call. The loop outlives the brain.

The brain is swappable

Use a fast local model for cheap classification passes. Route hard reasoning to a frontier API. Put a vision specialist in one worker and a code model in another. The workflow doesn't care which brain is inside.

The memory is yours

Context, decisions, lessons — in plain Markdown on your disk, tagged with source and confidence, not trapped inside any provider's product. When you switch models or providers, the memory follows. No lock-in. No drift.

VII. The capabilities

A toolbelt. Tools the agent reaches for.

Persistent memory

Markdown facts, decisions, lessons. Searched before each turn. Idle-time consolidation merges duplicates and ages out stale lessons.

Web search & browser

Tavily or DuckDuckGo for search. Playwright for JS-heavy pages, SPAs, paywalled markdown.

Workers

Spawn parallel sub-agents on different models. Slow planner, fast editor, vision specialist — same conversation.

Skills

Markdown capability packs with YAML frontmatter. The agent loads them only when relevant.

Cron scheduling

Run agents on a schedule. Morning brief, weekly digest, watchdog scripts — all built in.

Reflect & retry

When enabled, each response is graded against the original intent. Missed it? The turn re-runs with the lesson appended — bounded retries before surfacing.

Self-extending

The agent can write its own tools and skills. New capabilities show up on the next turn — no rebuild, no restart.

A model per role

Different model for primary, scout, fallback, and background work. Ollama, OpenRouter, or both. Auto-fallback to your local model when the cloud rate-limits, times out, or hiccups.

Tool boundaries

Tools are tagged safe, caution, or dangerous. Dangerous calls require per-session approval after the agent declares exactly what it will do — and every distinct action gates separately. The agent gets hands; the user keeps the keys.

VIII. The interface

An open API. Tinker freely.

The web UI is one client. There are many. Pernix is built on FastAPI, which means every endpoint the UI calls is also yours to call — from a script, a cron job, another service, your terminal.

Open localhost:8090/docs while the server runs and you get a live Swagger UI: every endpoint, every schema, every response model — try-it-able right from the browser. /redoc if you prefer ReDoc. The fastest way to learn the system is to poke it.

streamingServer-Sent Events on /api/sessions/{id}/events for tokens, tool calls, state transitions.
resumableLast-Event-ID replay on reconnect — clients never miss an event.
scriptableThe same API the PWA uses. Build CLIs, integrations, custom UIs.
discoverableOpenAPI 3 schema at /openapi.json. Generate clients in any language.

localhost:8090/docs OAS 3.1

Pernix API v0.x.x

GET /api/sessions List all sessions

POST /api/sessions Create a new session

POST /api/sessions/{id}/messages Send a message · streams response

try-it-out

curl -N http://localhost:8090/api/sessions/abc123/messages \
  -H "Content-Type: application/json" \
  -d '{"text": "summarize the auth branch"}'

GET /api/sessions/{id}/events SSE stream · tokens, tools, state

POST /api/workflows/{name}/run Execute a workflow

GET /api/memory/search BM25 search across memories

+ 82 more endpoints · openapi.json

IX. The soul

Three markdown files, no black boxes.

Pernix's behavior beyond raw model output is shaped by plain text on disk. SOUL.md defines who it is. RULES.md defines how it acts. SESSIONS.md injects deployment-specific context — the user's timezone and key facts, the domains this installation is allowed to act in, per-domain permission levels, and active long-running intents the agent is tracking.

It's the opposite of a black box. The personality, the operational guardrails, the project conventions — all editable, all auditable, all yours. Open them in any text editor. The agent picks up the change on its next turn.

Want it more terse? Edit a paragraph in SOUL.md.
Need a project guardrail? Add a line to RULES.md.
Want to constrain what domains the agent acts in? Set permission levels in SESSIONS.md.
Switching context entirely? Swap the whole file out.

# Identity

You are Pernix — a capable, focused AI assistant.
You help with complex tasks, think carefully before acting,
and communicate clearly.

## Core Traits

- Pragmatic: Prefer working solutions over perfect ones.
  Ship, then iterate.
- Direct: Minimal preamble. Get to the point.
  No filler phrases.
- Curious: Enjoy understanding systems deeply
  before changing them.
- Careful: Confirm intent before irreversible
  actions. Measure twice, cut once.

## Communication Style

- Concise by default — expand when the topic demands it.
- No sycophancy. No "great question!"
- When referencing code, include file paths and line
  numbers so the user can navigate directly.

# Operational Rules

## Capability Discovery

- When a task requires capabilities your current model
  lacks, discover what is available rather than
  giving up.
- Use list_available_models and discover_tools
  to find models and tools that can fill the gap.

## Delegation

- Delegate specialized work to workers via spawn_worker.
  Use the model parameter to run a worker on a model
  suited to the task.
- Do not switch the global model for a one-off
  specialized task — delegate instead.

## Persistence

- When an approach fails, diagnose why and try a different
  approach before giving up.
- Exhaust your options before telling the user something
  cannot be done.

# Session Context

## User Context

- Timezone: America/Los_Angeles
- Key facts: goes by Cal, prefers bullet
  summaries over prose

## Enabled Domains

- research & writing
- code review
- weekly digest

## Permission Levels

# 1 Read · 2 Suggest · 3 Draft · 4 Confirm · 5 Auto
- research & writing: level 5
- code review: level 3
- weekly digest: level 5

## Active Intents

- Track open PRs on the auth branch and
  surface blockers each morning.

---
name: meeting-notes-to-actions
description: Turn meeting notes or a transcript into
  a clean action-item list with owners and dates.
when: user pastes meeting notes or asks to
  "extract action items" or "what did we agree".
---

# Meeting notes → action items

1. Read top to bottom. Don't skim.
2. Pull concrete commitments only, not summaries.
   "Cal will review the deck by Friday" — action.
   "We talked about the deck" — not an action.
3. Group by owner. Each entry: owner, action,
   due date (mark unknown as ??).
4. Surface decisions separately under "Decisions".
5. End with open questions — anything unresolved.

data/agent/SOUL.md · data/agent/RULES.md · data/agent/SESSIONS.md · data/skills/*/SKILL.md

X. The first run

Ten minutes from clone to chat.

Install Ollama, pull a recent model, clone the repo, run the server. Pernix is well-tested with the latest Qwen 3 series on Ollama, and with current frontier models on OpenRouter. Use whatever's current — agentic workloads benefit from newer models with stronger tool-calling and reasoning.

01
Install the prerequisites
Python 3.11+. Ollama if you want local models — pull a current Qwen 3 release. An OpenRouter key works too; Ollama is optional.
02
Clone & install
Standard Python: clone, venv, pip install -r requirements.txt. Optional: copy .env.example for OpenRouter or Tavily keys.
03
Run it
python run.py. Open localhost:8090. Pick a model in Settings. Say hello — and open /docs in another tab to watch the API.
04
Make it yours
Edit SOUL.md. Write a skill. Save a workflow. Schedule it on cron. Read the code in core/ — it fits in your head.

~/pernix

$ git clone https://github.com/calvincs/Pernix.git
$ cd pernix
$ python3 -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt
$ cp .env.example .env  # add API keys if you have any
$ ollama pull <your-current-qwen3>  # or any modern frontier model
$ python run.py
  ◇ pernix · 0.x.x · alpha
  ◇ binding to 127.0.0.1:8090
  ◇ ollama: connected · 4 models
  ◇ swagger: http://localhost:8090/docs
  ◇ open http://localhost:8090

XI. Honest about what this is

A tool for integrations and recurring work.

Pernix is alpha. Actively developed. Things will change. Reading what it's for and what it isn't will save you the wrong expectation.

It is for…

Vertical work loops. Build a loop tied to one job and keep running it. Email triage, incident response, code review, research digests, meeting-to-action pipelines, weekly operations summaries. The product isn't an agent — it's the loop you build around a recurring job.
A headless agent substrate. A FastAPI server with full REST coverage. Wire it into other systems, drive it from scripts, build a custom client, run it as the brain behind a calendar agent or research bot. The web UI is one front-end among many possible.
Recurring work. Workflows + cron + memory. The morning brief, the weekly digest, the watchdog, the recurring research crawl — running reliably whether you're watching or not.
Model-independent pipelines. Build once with a local model, swap to a frontier API when the task demands it, fall back automatically when it rate-limits. The workflow keeps running — you decide the routing.
Tinkering & learning. One Python codebase, every layer auditable. A working harness to read, fork, and rebuild your own ideas on top of.

It isn't…

A coding-harness replacement. Use Claude Code, Opencode, Codex, Cursor, or any IDE-integrated agent for serious software work. Pernix is not trying to be that.
A polished commercial product. Rough edges. Missing UX. Quirks the author hasn't gotten to yet. Treat it like a workbench, not a finished tool.
Production software. It executes shell commands and writes files on the host machine — that's what makes it useful, and that's also what makes it dangerous on the wrong box. Run it in a dedicated VM, container, or spare machine. Never expose network mode to the public internet.
Done. The author uses it personally and is still building. Expect breaking changes. Expect new ideas. Expect to update.

⚠ Heads up — alpha software. Treat it like a power tool, not a toy.

Fork it. Read it. Make it yours.

Pernix is a working personal tool, not a polished product. Built for daily use and shared openly — use it, learn from it, build on it.

View on GitHub Quick start

Pernix

Three things you keep.

The model

The memory

The machine

What you're getting.

How a turn thinks itself out.

Session

Scout

Loop

Reflect

Post‑hooks

Nine states. One session at a time.

Three layers between you and the work.

Build the loop once.Swap the brain freely.

The loop is stable

The brain is swappable

The memory is yours

A toolbelt. Tools the agent reaches for.

Persistent memory

Web search & browser

Workers

Skills

Cron scheduling

Reflect & retry

Self-extending

A model per role

Tool boundaries

An open API. Tinker freely.

Three markdown files, no black boxes.

Ten minutes from clone to chat.

Install the prerequisites

Clone & install

Run it

Make it yours

A tool for integrations and recurring work.

Fork it. Read it. Make it yours.

Build the loop once.
Swap the brain freely.