Harness engineering

For every hour that passes in 2026, there is a new LLM workflow or agent toolkit that is pitched as the one. I wouldn't be surprised if frontier models are training on these at this moment to bring autonomous workflow scaffolding in the future. I can't claim to come up with a novel approach myself but I think it’s worth documenting my journey in testing and adapting these workflows as I build with a limited budget (Claude and Gemini Pro subs).

Trials

I started with a simple PRD.md, task_plan.md and a technical_spec.md flow but Claude struggled to enforce and update stale instructions and tasks as project components got more complex.

Conductor seemed like a good choice to wrangle with Gemini 2.5 Pro’s poor long context retrieval but I was quickly hitting session limits as the state file got bigger. Conductor-beads (inspired by beads) seemed like a quick solution to handle this. It determined which fragments of context were required for the current task and fetches them.

This still felt too spec driven for general use, I wanted a workflow that would pause and let me explore options and push back while making good use of agent skills. r/ClaudeAI and HN is flooded with new variants every day. Ralph loop implementations (requires an big budget), 'Get Shit Done' (resource-intensive with sub-agent swarms) and Gas Town which was too complex for my needs.

For the next attempt at building, I dropped beads and split the workflow into two core personas: An opinionated ‘Orchestrator’ persona that does a few key things:

  • Manage Linear tickets and memory via Linear-Cli
  • Writes detailed specs to hand off
  • Capture workflow and tool lapses during code execution
  • Confers UX designer and Eng Lead personas which discuss, debate on user asks during research phases and planning phases.
  • Pushes back on a few parameters when exploring problems -
    • Is this a real problem
    • Impact to the user
    • Does it tie to my goal
  • Ties it all up with a solutions brief considering Cost, Effort, Speed to implement, Impact.

And a Claude Code Senior Engineer persona that implements the specs using red-green TDD and a skills bank. This role separation helped me parallelise work and optimise for context length and states.

Getting hooked

At this point I was familiarising myself with hooks and convinced that CC's plan mode can be injected with project-specific instructions for more thoroughness. Inspired by ECC which is a more complete multi-agent toolkit, I adapted my workflow to be a distilled version of ECC, implemented hooks to trigger session level memory persistence, guided compaction, checkpoints and pre and post tool use instructions. I also integrated building.md, a nice journaling and decision tracking system that is a great session summariser for humans and a design_system.md state file for design tokens and references. With everything handled by one model now, I was able to plan and implement without jumping between terminals.

The workflow in simplified terms was now led by commands:

  1. Freeform conversation to explore problems and solutions
  2. A custom /plan mode that utilises existing state data
  3. Ticket generation in a chosen issue tracking tool (Linear CLI / Github CLI with a local reference file)
  4. /tdd ’ticket name’ to implement
  5. Optional - /code-review to assess entire project or /handoff to generate handoff context for a new session or another agent.

Why don’t I use subagents in my workflows already? It’s would more useful if had a bigger budget and I'd like to see CC or Codex go wild but unfortunately, I'm limited to the Pro subscriptions for now. I still enjoyed making the most of little so no complaints from me. It's become clear that model quality and the context limit doesn't seem like a bottleneck anymore. Instead harnesses, sandboxes, filesystem access, skills, memory, and observability matter more. That said, agent UI/UX still feels half-baked even with TUIs getting it's mainstream moment, perhaps something for me to experiment with next time.

You can adapt my workflow from here (simply copy and paste it into your project folder and fire away), which was recently used to build ClickSheet, a macOS tool that speeds up uploads to GSheets. I've been thinking of trialing a more complex harness based on this workflow in hopes of making the most of the gemini pro subscription I have access to.