Osprey, A User-Centered Approach to Agent Orchestration

The premise

When you delegate work to a team of agents, what is on your screen?

Osprey is an orchestration platform for building, assigning, and supervising teams of autonomous agents. You assemble a group of agents, design the work they’ll do, hand out the tasks, and then watch the whole thing run, live.

But the reason this project exists isn’t the orchestration engine. It’s the interface. Most tools in this space are built engine-first: powerful underneath, bewildering on top, usable only by the person who built them. In this emerging agentic orchestration space, very few product designers have spent the time to understand the technology and make it usable and human-friendly, to take something genuinely new and stop it from feeling intimidating or complex. I built Osprey the other way around. It starts from the people who have to use an orchestration system, the person designing the automation and the person responsible for trusting it once it’s running, and works backward to the screen.

The problem

How orchestration tools are usually designed

Autonomous agents are a genuinely new idea for most people. Asking someone to learn that new idea and fight an unfamiliar, intimidating interface at the same time is how good technology fails to get adopted. The common failure modes are predictable:

The tool thinks in the machine’s terms, not the user’s. It exposes graph theory, model identifiers, and internal jargon, and expects the user to translate their actual goal into that vocabulary before anything happens.
It optimizes for the builder and forgets the operator. Enormous effort goes into the editor; the “is it working?” view is an afterthought of raw logs and numbers. But trust is won or lost in the running, not the building.
It overwhelms. Everything is dense, everything competes for attention, so nothing stands out, and the user can’t tell at a glance what matters.

Osprey is an answer to all three.

How it works

Three jobs, in the order you do them

Osprey is built around a deliberately ordinary use case, a small team of agents producing a research-and-writing deliverable, because the goal is to make the interaction model legible to users before the domain gets complicated. The same orchestration patterns extend directly to the hard cases: complex payment systems, regulatory workflows, and other critical, consequential operations. In those domains the right way to build it is in collaboration with a product designer and a system architect, working together to identify the agent roles, their authority, and where a human has to stay in the loop.

The product is built around three jobs a person does, in the order they do them. Each has its own dedicated space, reachable from a simple rail down the left side of the screen.

Build your team, Groups
You assemble a group of agents, your workforce for a job. Each has a name, a role, and two settings that make it a real, finite worker: how many things it can do at once, and how reliably it succeeds. The mental model on purpose: you are hiring a team, not configuring software.
Design the work, Build
You lay out the workflow on a canvas, drag in steps, connect them, and the arrows draw themselves. The system quietly checks your work as you go and flags anything disconnected in plain language. Direct manipulation: you build the thing by touching the thing. What you draw is what runs.
Watch it work, Monitor
This is where most tools give up, and where Osprey spends its care. A single progress figure leads; every agent gets its own lane with a live task; a running log narrates the job in plain sentences; failures surface in warning colors against an otherwise calm screen, so problems find you.

The principles

What makes the experience work

These are the ideas worth taking with you, the part meant to be borrowed from.

Speak the user’s language, not the machine’s

Every label, color, and layout choice translates the system’s internal state into something a person can read at a glance. Steps show their status as a word, running, done, failed, not just a color. Agents are named individuals, not opaque identifiers. The log tells a story in sentences. The interface does the work of translation so the user never has to.

Color means something, or it isn’t used

There are exactly four status colors, mapping to the four things you ever need to know at a glance: active, complete, needs attention, failed. Everything idle stays neutral and quiet. This is the single most important discipline in the design. Because color is reserved for meaning, scanning a live run is effortless; your eye is pulled only to what’s happening, what broke, and what’s waiting on a decision.

Built on a custom design system

Osprey isn’t assembled from an off-the-shelf component kit. I built it on my own design system, and proving that system out on a problem this demanding is one of the purposes of this prototype. An agentic interface stresses everything at once, dense state, live motion, hierarchy under pressure, so it’s an honest test of whether the typography, spacing, color tokens, and component primitives hold up, or where they need to bend. Every screen here is an answer in the system’s own vocabulary.

Borrow a familiar language so the new ideas get the attention

Osprey speaks a visual dialect people already read fluently, clean, modern, Material-style chrome. The agents and workflows are the genuinely new concepts; the buttons, cards, and navigation should feel instantly familiar so the user spends their attention on the ideas that are new, not on relearning where things live.

Design for the operator, not just the builder

Half of orchestration is supervision, so half the design goes there. The monitor isn’t a debug console bolted onto the editor. It’s a first-class workspace built around the questions of the person who has to trust the system: did it work, what did it do, where did it stall. Lead with the outcome; keep the mechanism one layer down, available but never in the way.

Make the picture boringly correct

A workflow drawn as a diagram only earns trust if the diagram is exactly right, every arrow meeting its step, at every zoom level, every time. The moment a connection looks even slightly off, the user stops trusting the picture, and a picture you can’t trust is worse than no picture. Getting this invisibly, relentlessly correct was the hardest part of the build, and it’s non-negotiable.

Restraint is a feature

There is almost no decorative motion, no shadow theater, nothing that moves or shines for its own sake. Transitions exist only to explain a change of state. Density is tuned to feel like a calm instrument, not a busy dashboard. The maturity of an operations tool shows in what it refuses to add. Every element has to earn its place or it’s removed.

Why it’s different

What this is, compared to what’s out there

Most agent and workflow tools descend from developer automation software and inherit its assumptions: the user is technical, the editor is the product, and the running is something you read in a log. Osprey rejects all three.

TraditionalThe operator is technical; supervision is a console of raw logs and numbers, bolted on after the editor.

OspreyThe responsible operator is a first-class user; supervision gets a purpose-built space, not a console.

TraditionalThe editor is the product; the running is an afterthought you decode from telemetry.

OspreyUnderstanding-at-a-glance is the central design problem; the color system, agent lanes, and hierarchy are built to solve it.

TraditionalA bespoke, intimidating visual language the user must relearn from scratch.

OspreyA familiar, mainstream visual language so a new category of software feels approachable on first contact.

TraditionalLeads with telemetry: totals and charts, everything shown equally.

OspreyLeads with outcomes: did it work, what did it produce, with the deep detail available but out of the way.

Where it goes

From a polished console to real work

Osprey is a working design prototype. The engine that runs the workflows is a high-fidelity simulator. It schedules work across your agents, respects their limits, models how long things take and when they fail, retries, and routes decisions, so the experience can be proven against realistic behavior before being wired to live agents. The cockpit was built and instrumented first, because the cockpit is what determines whether the thing is usable at all. The honest path from here:

Start with the trigger. Real automation begins with when something happens, a schedule, an incoming message, a form submitted, not a manual Run button.
Speak in outcomes, not graph theory. “Do these at the same time” and “wait until they’re all done” instead of fan-out and join.
Keep a human in the loop. Approval steps that pause and ask a person, and a place where pending decisions wait, the most common business pattern there is.
Connect to the tools people already use, so a step can actually touch their real work.

None of these require tearing down what’s here. They build on it, and the principles above are the foundation worth keeping no matter what you build on top.

Osprey