
Building My Own Agent Harness
Building My Own Agent Harness
For the last year I've been living inside other people's agent harnesses. Claude Code in the terminal. Codex in the editor. Cursor for the inline stuff. They're all impressive pieces of engineering, and most days they do exactly what I need them to.
But there's a pattern I keep running into.
I'll spend ten minutes nudging the agent toward a workflow I've already run a hundred times. I'll re-explain the shape of my codebase. I'll re-state the rules of how I write. I'll watch it reach for tools I never wanted it to use, or skip ones I rely on. The harness is doing its best to be everything to everyone, and I'm the one paying the tax.
So I started building my own.
What a harness actually is
When people say "Claude Code" or "Codex" they're usually talking about the harness, not the model. The model is the brain. The harness is everything around it: the loop that decides when to call tools, the system prompt that shapes the personality, the permission layer, the file context, the slash commands, the memory.
The harness is where the opinions live. And the opinions baked into a generic harness are, by necessity, generic.
That's fine when you're starting out. It's less fine when you've got specific ways you want to work and you're constantly nudging a tool back into the shape of your day.
Why generic stopped fitting
The friction wasn't one big thing. It was a hundred small ones.
The agent would default to verbose explanations when I just wanted the diff. It would ask permission for things I'd happily delegate, and barrel ahead on things I wanted to gate. The system prompt assumed an audience of "any developer" rather than me, with my codebase, my preferences, my house rules.
And the more domains I tried to use it across (writing, deployment, reviewing PRs, managing the admin system) the more I felt like I was renting someone else's brain instead of building my own muscle.
The fix wasn't to ditch agents. It was to stop treating the harness as fixed.
What I actually want
I don't want one mega-agent that does everything. I want a small set of focused agents, each one shaped for a specific corner of my work, each one with its own personality, its own tools, its own default behaviour.
The model can be the same. The shape around it shouldn't be.
That's the bet behind the harness: if I own the loop, the prompts, and the tool wiring, I can compose agents the way I compose anything else in the system. One for code. One for writing. One for routing things into my admin setup. Different defaults, different guardrails, different vibes.
The early prototype
The first version is built on the PI SDK, the same substrate that powers the assistant behind my Telegram setup. That was a deliberate choice. The admin system already has the tool surface I care about (tasks, memory, fitness, content, dev) and the PI SDK gives me a clean way to wire an agent loop without rebuilding the world.
The harness itself is small. A run loop, a system prompt loader, a tool registry, and a session store. That's mostly it. The interesting work isn't in the framework. It's in the profiles: the bundles of system prompt, tools, and defaults that turn the same loop into a different agent depending on what I'm doing.
It's rough. There are things I'd hate to ship to anyone else. But it already does two things the generic harnesses couldn't quite do for me.
Recipe one: the admin router
The first profile is a router for my personal admin system.
Generic harnesses don't know anything about my Telegram threads, my memory bank, my Strava sync, or the way I like to log a meal. They could, if I wired them up through MCP, but they'd still bring their own opinions about when to call what.
The router profile flips that around. It's an agent whose system prompt knows exactly what the admin system is, what tools belong to which domain, and what the rules are for crossing between them. Fitness tools don't fire in the projects thread. Memory writes are tagged by domain. The agent doesn't ask permission for the things I've already decided are safe, and it does ask for the things I want to keep an eye on.
It's the same shift I wrote about in From One Chat Window to a Personal Operations System: context segregation, but pushed down a level into the agent itself.
Recipe two: the writing agent
The second profile is the one drafting this post.
It's not Claude Code with a clever prompt. It's an agent with its own system prompt built around how I write. It has read access to every post in src/data/blog/, it knows the frontmatter shape, it knows the categories I use, and it knows the difference in tone between a build-log post and a reflection piece.
It doesn't try to "improve" my voice. It doesn't reach for em-dashes I never use. It doesn't pad. When I ask it to draft, it pulls in the closest stylistic neighbours from the existing posts and works from those.
It's a small agent doing a small job. That's the point. A generic harness would do this adequately with enough prompting. This one does it well because the prompting is baked in and I never have to re-state it.
How it fits the wider system
The harness isn't a standalone project. It's another piece of the same infrastructure the admin system runs on. Same Pi. Same Tailscale network. Same deployment pipeline. Same posture of "own the stack, shape it to fit."
I don't think every developer needs to build their own harness. The generic ones are very good and they'll keep getting better. But once you've got a clear sense of how you work, and once you're using agents across more than one domain, the cost of generic starts compounding. Every nudge, every re-explanation, every wrong default is friction you're paying on repeat.
Building your own harness is just refusing to pay that tax forever.
What I'm watching for next
A few things I haven't solved yet.
Profile sprawl is a real risk. If every workflow gets its own agent, I'm going to end up with a directory of half-maintained personalities. I want a small handful of profiles, not thirty.
Sharing context between profiles is the other open question. The writing agent doesn't need to know about my fitness logs, but it probably should know what I shipped this week. There's a version of this where the harness has a shared memory layer underneath the profiles, and I haven't figured out the shape of that yet.
And there's the obvious one: the moment a generic harness ships the exact feature I've built, I have to decide whether to throw mine away. I think the answer is no, because the value isn't the harness itself. It's that building it forced me to articulate how I actually want to work with agents. That articulation doesn't go away.
The agents we use are going to keep getting more capable. The question that interests me more is how much of the shape around them we're willing to take off the shelf.
For now, I'm enjoying writing my own.
If you're building your own agent setup, or you've thought about it and decided against, I'd love to hear why. Reach out on LinkedIn or Twitter.
Found this helpful? Share it!
Enjoyed this post? Subscribe to my newsletter for more insights on web development, career growth, and tech innovations.
Subscribe to NewsletterRelated Posts

The Digital Root System: Reclaiming Sovereignty in the Age of AI
Building Phil-specific apps with AI copilots turned me from consumer to architect—and made data sovereignty feel practical, not philosophical.

From One Chat Window to a Personal Operations System
How my personal admin system evolved from a single AI chat window into a multi-channel operations hub, and why context segregation changed everything.

The Prompt Injection Blind Spot
A conversation about AI security sent me down the rabbit hole of prompt injection, and made me look at my own admin system through very different eyes.
