Pernix

An open-source experiment · MIT licensed · self-hosted

Pernix

per·nix  /ˈpɛɾ.nɪks/  — Latin: nimble, swift of foot.

A self-hosted agent harness, wrought in code.
It lives on your hardware, holds memory you own, and runs the same work loop regardless of which model you put inside it.

Three things you keep.

I

The model

Local with Ollama, cloud via your own OpenRouter key, or both — mixed freely. A fast model scouts and plans. A capable one executes. A lighter one handles background work. Workers pick whichever model fits their task. The workflow doesn't care which brain is inside: swap freely, your keys, your routing, your call.

II

The memory

Markdown files in data/memories/. Plain text, full-text indexed, tagged with source and confidence — yours forever. Stays put when you switch models. Stays put when a provider changes its terms. Not locked to any subscription, not sludge accumulating in a chat thread. Read, edit, or delete with any text editor.

III

The machine

No accounts. No cloud subscription. No usage telemetry. A Python server that binds to localhost until you say otherwise — then network mode with HTTPS and bearer-token auth. The harness is yours to inspect, fork, and rebuild.

What you're getting.

Pernix is one Python codebase that fits in your head. A FastAPI server, a state machine, a streaming agent loop, a memory store, a workspace. On top of it: a built-in PWA, a REST API, and a Swagger UI you can poke from any browser.

Run it on a dedicated VM, container, or spare Linux box. Open http://localhost:8090. Talk to it. Watch it think. Read the code that made it think that way.

  • stackPython 3.11+ · FastAPI · SQLite · SSE · Playwright
  • interfacesWeb PWA · REST · Server-Sent Events
  • storageSQLite for sessions · Markdown for memory · filesystem workspace
  • licenseMIT
localhost:8090
sessions
◉ morning brief
○ refactor auth
○ research: rag
○ weekly digest
+ new
you · 09:14 summarize what happened on the auth branch yesterday and tee it up for review.
pernix · scouting git_log  read_file × 4  search_memory five commits, all on session-token rotation. flagging two for compliance review
processing claude-sonnet-4.6 · 12.4k / 200k

How a turn thinks itself out.

Every message you send rolls through five phases. Each one runs on a model suited to its job — fast for planning, capable for acting, light for verifying.

  1. 01

    Session

    Your message lands on a persistent thread. Append-only. Resumable. Restart-proof.

    queue
  2. 02

    Scout

    A small fast model in a fresh context plans the approach — picks tools, loads only the relevant skills.

    fast model
  3. 03

    Loop

    The main model executes. Streams tokens, calls tools, reads results, calls more tools — until done.

    main model
  4. 04

    Reflect

    A quality gate verifies intent was met. Returns pass, retry, or escalate. Up to two retries before surfacing.

    verify
  5. 05

    Post‑hooks

    Auto-titling, memory distillation, worker cleanup. The cleanup runs in the background after you've already seen the answer.

    background

Compaction trims old turns when context fills past 75%. The originals stay in the database — only the prompt view changes.

Snooze runs between turns. While the agent is idle, it dedupes memory, distills your profile, and archives post-mortems. The moment you send a new message, Snooze stops — your work always wins.

Nine states. One session at a time.

Every session is in exactly one state. Transitions are logged, replayed to the UI in real time, and recovered after a crash.

Three layers between you and the work.

A conversation, a swarm, a recipe. Each one a different way to put the agent to use — chat, parallelize, automate.

layer 01

Sessions

a conversation thread

A persistent thread. Append-only. Every message and tool call survives a restart. One agent loop runs at a time per session — but you can have many sessions open and switch between them.

  • SQLite-backed
  • resumable
  • per-session model overrides
layer 02

Workers

parallel sub‑agents

Within a turn, the main agent can spawn workers — sub-agents in their own sessions, each on whichever model fits its task best. A slow planner orchestrates fast editors. A vision model and a code-specialist run side by side.

  • different models per worker
  • flat — workers don't spawn workers
  • pause & resume at round boundaries
layer 03

Workflows

repeatable YAML pipelines

A workflow is a YAML pipeline in data/workflows/ — explicit steps, explicit dependencies, parallel waves. Build one for a job you do every Monday, then schedule it on cron. The same workflow runs unchanged whether the brain inside is a local model, a frontier API, or a mix of both. The agent assembles the workers, you read the output.

  • YAML steps with depends_on
  • parallel waves auto-dispatched
  • schedulable via cron
  • model-agnostic by design

One conversation. Many workers. Many recipes.   Pernix is happiest when you use all three.

Build the loop once.
Swap the brain freely.

A chatbot is where you ask for help. An agent harness is where work happens. Pernix is a harness — it has a job to do, a place to run, memory of what happened before, and enough structure that the underlying model can change without destroying the workflow.

The loop is stable

Define the workflow once — explicit steps, tool calls, retry logic, scheduling. It runs the same whether today or next quarter, whether the model is local Qwen or a frontier API call. The loop outlives the brain.

The brain is swappable

Use a fast local model for cheap classification passes. Route hard reasoning to a frontier API. Put a vision specialist in one worker and a code model in another. The workflow doesn't care which brain is inside.

The memory is yours

Context, decisions, lessons — in plain Markdown on your disk, tagged with source and confidence, not trapped inside any provider's product. When you switch models or providers, the memory follows. No lock-in. No drift.

A toolbelt. Tools the agent reaches for.

Persistent memory

Markdown facts, decisions, lessons. Searched before each turn. Idle-time consolidation merges duplicates and ages out stale lessons.

Web search & browser

Tavily or DuckDuckGo for search. Playwright for JS-heavy pages, SPAs, paywalled markdown.

Workers

Spawn parallel sub-agents on different models. Slow planner, fast editor, vision specialist — same conversation.

Skills

Markdown capability packs with YAML frontmatter. The agent loads them only when relevant.

Cron scheduling

Run agents on a schedule. Morning brief, weekly digest, watchdog scripts — all built in.

Reflect & retry

When enabled, each response is graded against the original intent. Missed it? The turn re-runs with the lesson appended — bounded retries before surfacing.

Self-extending

The agent can write its own tools and skills. New capabilities show up on the next turn — no rebuild, no restart.

A model per role

Different model for primary, scout, fallback, and background work. Ollama, OpenRouter, or both. Auto-fallback to your local model when the cloud rate-limits, times out, or hiccups.

Tool boundaries

Tools are tagged safe, caution, or dangerous. Dangerous calls require per-session approval after the agent declares exactly what it will do — and every distinct action gates separately. The agent gets hands; the user keeps the keys.

An open API. Tinker freely.

The web UI is one client. There are many. Pernix is built on FastAPI, which means every endpoint the UI calls is also yours to call — from a script, a cron job, another service, your terminal.

Open localhost:8090/docs while the server runs and you get a live Swagger UI: every endpoint, every schema, every response model — try-it-able right from the browser. /redoc if you prefer ReDoc. The fastest way to learn the system is to poke it.

  • streamingServer-Sent Events on /api/sessions/{id}/events for tokens, tool calls, state transitions.
  • resumableLast-Event-ID replay on reconnect — clients never miss an event.
  • scriptableThe same API the PWA uses. Build CLIs, integrations, custom UIs.
  • discoverableOpenAPI 3 schema at /openapi.json. Generate clients in any language.
localhost:8090/docs OAS 3.1
Pernix API v0.x.x
GET /api/sessions List all sessions
POST /api/sessions Create a new session
POST /api/sessions/{id}/messages Send a message · streams response
try-it-out
curl -N http://localhost:8090/api/sessions/abc123/messages \
  -H "Content-Type: application/json" \
  -d '{"text": "summarize the auth branch"}'
GET /api/sessions/{id}/events SSE stream · tokens, tools, state
POST /api/workflows/{name}/run Execute a workflow
GET /api/memory/search BM25 search across memories
+ 82 more endpoints · openapi.json

Three markdown files, no black boxes.

Pernix's behavior beyond raw model output is shaped by plain text on disk. SOUL.md defines who it is. RULES.md defines how it acts. SESSIONS.md injects deployment-specific context — the user's timezone and key facts, the domains this installation is allowed to act in, per-domain permission levels, and active long-running intents the agent is tracking.

It's the opposite of a black box. The personality, the operational guardrails, the project conventions — all editable, all auditable, all yours. Open them in any text editor. The agent picks up the change on its next turn.

  • Want it more terse? Edit a paragraph in SOUL.md.
  • Need a project guardrail? Add a line to RULES.md.
  • Want to constrain what domains the agent acts in? Set permission levels in SESSIONS.md.
  • Switching context entirely? Swap the whole file out.
# Identity

You are Pernix — a capable, focused AI assistant.
You help with complex tasks, think carefully before acting,
and communicate clearly.

## Core Traits

- Pragmatic: Prefer working solutions over perfect ones.
  Ship, then iterate.
- Direct: Minimal preamble. Get to the point.
  No filler phrases.
- Curious: Enjoy understanding systems deeply
  before changing them.
- Careful: Confirm intent before irreversible
  actions. Measure twice, cut once.

## Communication Style

- Concise by default — expand when the topic demands it.
- No sycophancy. No "great question!"
- When referencing code, include file paths and line
  numbers so the user can navigate directly.

data/agent/SOUL.md · data/agent/RULES.md · data/agent/SESSIONS.md · data/skills/*/SKILL.md

Ten minutes from clone to chat.

Install Ollama, pull a recent model, clone the repo, run the server. Pernix is well-tested with the latest Qwen 3 series on Ollama, and with current frontier models on OpenRouter. Use whatever's current — agentic workloads benefit from newer models with stronger tool-calling and reasoning.

  1. 01

    Install the prerequisites

    Python 3.11+. Ollama if you want local models — pull a current Qwen 3 release. An OpenRouter key works too; Ollama is optional.

  2. 02

    Clone & install

    Standard Python: clone, venv, pip install -r requirements.txt. Optional: copy .env.example for OpenRouter or Tavily keys.

  3. 03

    Run it

    python run.py. Open localhost:8090. Pick a model in Settings. Say hello — and open /docs in another tab to watch the API.

  4. 04

    Make it yours

    Edit SOUL.md. Write a skill. Save a workflow. Schedule it on cron. Read the code in core/ — it fits in your head.

~/pernix
$ git clone https://github.com/calvincs/Pernix.git
$ cd pernix
$ python3 -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt
$ cp .env.example .env  # add API keys if you have any
$ ollama pull <your-current-qwen3>  # or any modern frontier model
$ python run.py
  ◇ pernix · 0.x.x · alpha
  ◇ binding to 127.0.0.1:8090
  ◇ ollama: connected · 4 models
  ◇ swagger: http://localhost:8090/docs
  ◇ open http://localhost:8090 

A tool for integrations and recurring work.

Pernix is alpha. Actively developed. Things will change. Reading what it's for and what it isn't will save you the wrong expectation.

It is for…

  • Vertical work loops. Build a loop tied to one job and keep running it. Email triage, incident response, code review, research digests, meeting-to-action pipelines, weekly operations summaries. The product isn't an agent — it's the loop you build around a recurring job.
  • A headless agent substrate. A FastAPI server with full REST coverage. Wire it into other systems, drive it from scripts, build a custom client, run it as the brain behind a calendar agent or research bot. The web UI is one front-end among many possible.
  • Recurring work. Workflows + cron + memory. The morning brief, the weekly digest, the watchdog, the recurring research crawl — running reliably whether you're watching or not.
  • Model-independent pipelines. Build once with a local model, swap to a frontier API when the task demands it, fall back automatically when it rate-limits. The workflow keeps running — you decide the routing.
  • Tinkering & learning. One Python codebase, every layer auditable. A working harness to read, fork, and rebuild your own ideas on top of.

It isn't…

  • A coding-harness replacement. Use Claude Code, Opencode, Codex, Cursor, or any IDE-integrated agent for serious software work. Pernix is not trying to be that.
  • A polished commercial product. Rough edges. Missing UX. Quirks the author hasn't gotten to yet. Treat it like a workbench, not a finished tool.
  • Production software. It executes shell commands and writes files on the host machine — that's what makes it useful, and that's also what makes it dangerous on the wrong box. Run it in a dedicated VM, container, or spare machine. Never expose network mode to the public internet.
  • Done. The author uses it personally and is still building. Expect breaking changes. Expect new ideas. Expect to update.

Heads up — alpha software. Treat it like a power tool, not a toy.

Fork it. Read it. Make it yours.

Pernix is a working personal tool, not a polished product. Built for daily use and shared openly — use it, learn from it, build on it.