Harmonica 0.4: a runtime for facilitation

In the last blog post, we argued that facilitation is critical for building the context layer, as the most valuable organizational knowledge has to be produced through collective dialogue (e.g. workshops), not just pulled from systems of record, and the infrastructure for facilitation can be viewed as a primitive. Until now, Harmonica has been pretty good at eliciting deeper responses through async 1:1 conversations. But it didn’t enable workflows.

We use “runtime” in the sense of “a system that executes code”. You write the logic once, and the runtime runs it, managing the moving parts and handing back a result. Any facilitation methodology has a similar kind of logic behind it: a defined sequence of stages, each with its own rules, a clear sense of what carries into the next, and the desired outcome. A protocol, if you will. Since early 2025 (and in partnership with Metagov) we’ve been trying to understand what it takes to actually run a collective dialogue process (e.g. deliberation) the way computers run software.

The first version of Harmonica was an AI interviewer. You describe what you want to achieve with this session, we generate an ad-hoc prompt, you share the invite link with participants, and the LLM (Sonnet 4.6 by default) elicits responses via 1:1 chats. Each participant answers questions, gets follow-ups based on what they said, and you get a synthesis at the end. It’s better than a survey because it adapts to each participants and supports many languages; it isn’t good for executing a protocol because each conversation runs in its own bubble, and after the session ends, you need to handle the results manually.

That was the biggest constraint for us. A Delphi panel runs in rounds: experts answer anonymously, the facilitator feeds back the spread of responses and the reasoning behind the outliers, and everyone answers again, revising toward the group until the estimates stabilize. The method is the loop between rounds. Wardley maps are built in sequence: start from a user need, derive the chain of capabilities that meets it, then position each one from novel to commodity, and you can’t place a component you haven’t surfaced yet. A retrospective in isolation surfaces a handful of fixes; the value compounds only when they carry forward (this cycle’s action items are next cycle’s opening check), and recurring tensions get tracked instead of rediscovered every time. None of that was possible when each session stood alone. You could run a survey, but you couldn’t execute a facilitation method.

0.4 finally removes that constraint.

The engine

Harmonica sessions can now chain into multi-step workflows. You pick a framework; the tool structures each stage as a session with its own role assignments, its own facilitator prompts (different for each role!), and context inherited from the prior stage. When everyone in a stage finishes, the chain can advance automatically — participants get an email notification when the next turn is ready; and the in-app notifications hub tracks chain progression, action-required items, and completions for the host. After the run, the results page allows to navigate between the stages and see how the collective output evolved. Sixteen methods ship as ready-to-run templates: eight multi-step chains (see below) and eight single-step sessions (Appreciative Inquiry, SWOT analysis, Retrospective, Change Readiness Assessment, Force Field Analysis, Impact Assessment, Risk Assessment, Stakeholder Analysis).

Cross-pollination between participants happens during the session, also in line with the templates and facilitation methods behind them. As participants respond, the facilitator surfaces what others have said — recurring themes, emerging tensions, angles the current participant hasn’t yet considered — without revealing who said what. They start to emerge live using a scratchpad, a workspace where LLM can record its step-by-step reasoning.

Every summary now has the participant’s exact words sitting behind each claim. When the synthesis says the team is worried about onboarding time, you can read the actual quotes in the session summary and on the results page, attributed pseudonymously as Participant 3, Participant 7. The quotes are verbatim substrings, never paraphrased, never generated. Hosts control who sees the results: public, participants-only, or restricted.

The facilitation prompts are layered now. Previously the facilitation logic lived in a single blob of text covering everything. Now it composes: a base layer with your organizational context (seeded from onboarding as HARMONICA.md), a methodology layer with the logic specific to selected template, and per-session sources you attach at setup: a research PDF, a prior session’s results (SESSION.md), an MCP server the facilitator should consult during the conversation. All these layers can be reviewed and updated via session settings.

What it runs

Our initial templates library is wide enough to be pointed at a few common “change management” problems. We want to highlight three use cases in this post:

Mapping the strategic landscape, with visual outputs. Wardley mapping is a powerful open-source sensemaking methodology that lets you see the landscape you operate in; understanding of value chains allows you to make sound strategic decisions. You start from a user need, lay out the chain of capabilities that meets it, and place each one along an axis of evolution — from novel and uncertain on the left to commodity and well-understood on the right. The map turns implicit assumptions into a shared picture of where things actually are, which makes the next move easier to see: what to build, what to buy, what’s about to commoditize under you. It’s one of the sharpest sensemaking tools strategy has, and still under-used — partly a learning curve, partly that doing it as a group is genuinely hard. Usually one person drafts the map alone and circulates it for sign-off, at which point it’s their map and not the group’s; or a consultant runs a live workshop, with the scheduling and whiteboard logistics that implies.

That’s exactly the kind of process Harmonica can now run. Pick the Wardley template and the AI walks the group through the stages — surface the user need, derive the components, agree the dependencies, place each on the evolution axis — with everyone contributing async and the engine carrying each stage’s output into the next. The discussion produces an actual map, drawn from what the group said. The output renders as Mermaid’s open-source Wardley syntax (v11.14.0+): standard, portable text, not a diagram locked into our tool. Edit if it gets something wrong, then take it anywhere Mermaid renders. An async multiplayer process produces a thoughtful artifact, without making participants read long articles and playing calendar tetris. More on Wardley mapping with Harmonica.

Retrospectives, so the collective forms shared memory. Most retrospectives are run as projects — with budgets, deliverables, reports nobody reads — which is why they’re the first thing cancelled when the calendar tightens. The cost of skipping them is invisible week to week and severe over years: the same debates return, context evaporates every time a cohort rotates out. What organizations actually want is closer to what brains do at night (save the good stuff for later, prune the rest). Retros should be easy and cheap enough to run at the end of each cycle (ideally automatically), so their output would compound: each retro producing typed decisions with the evidence behind them, action items, and emerging tensions, rather than a summary that flattens what happened. Those become atomic, linked notes the rest of the organization can browse later, with a periodic pass that surfaces recurring themes.

The full argument is in our Retrospectives shouldn’t be projects blog post. The reason retros get cancelled is setup cost, and 0.4 collapses it: when a session ends, the session page offers a one-click follow-up that pre-fills the same project and the same brief, so running it again next cycle has minimal friction. Each run lands in the same shared project, so the context accumulates in one place. The output of one session feeds into the next, and all our summaries can be grounded (another new feature), so the statements are auditable. The structured outputs are accessible via the API/MCP, so you can pipe it into your own knowledge base (or Github repo), and it stays accessible to AI agents. For a team or community who sets it up, it creates a useful context layer instead of Notion pages that get starred and forgotten.

Public sensemaking, so community members can be heard. Most organizations want to hear their employees and community members but lack the methodology and infrastructure to do it well. Public sensemaking only works if it’s easy to participate, the topic resonates, and — unlike a survey — participants get to see the results. Doing it properly has traditionally meant hiring consultants or spending months on manual work. Our Public Sensemaking Package turns the process into something repeatable: templates for running async deliberations at scale, synthesized on public pages, populating browsable knowledge bases, with every claim grounded in the dialogue it came from. The synthesis, structured rather than prose, depends on the template and editable summary prompt. We just took the first real step there: the public pages now let anyone react to the synthesized statements from the project’s sessions — agree, disagree, or pass — and surface an opinion landscape showing where the group converges and where it splits, in the spirit of Polis-like tools. Participants act on the results instead of just reading them, and the page keeps updating as more people weigh in.

Metagov’s gov/acc research used Harmonica to gather inputs from 50+ web3 governance experts: who is working on what, which problems converge, where the gaps are. Harmonica ran structured 1:1 interviews, each following a consistent flow with follow-ups adapted per person. The responses were distilled into 11 convergent problems, 41 solutions in progress or proposed, 59 actors across the space, with every entry having its own page in the gov/acc knowledge commons that we built on Quartz. Full case study.

Artem made the longer case for this kind of work in his recent talk for OpenCivics community called AI-facilitated sensemaking as civic infrastructure.

We’re dogfooding a series of public sensemaking activities to pick the first batch of topics: what’s the most important question facing the world that you’re keen to make sense of together? Add yours to help us design the activities that would resonate the most with the current moment. Join our “session zero” today.

Agentic and open

Those pieces aren’t a longer feature list. Put them together — chained stages, the layered prompt, structured output, cross-pollination checked by evals — and what you have is a runtime: something that can take a facilitation method and run it reliably, repeatably, for a group rather than one person at a time. An AI interviewer asks questions. A runtime executes a workflow and hands back a useful artifact.

That’s what lets the facilitator act like an agent instead of a script — what we mean by agentic facilitation. A script asks the next question on the list. An agent decides in the moment: it reaches for a tool the session gave it, brings in what another participant said when the discussion calls for it, carries context from one stage into the next, and follows the conversation rather than a fixed order. An agent needs an environment to act in — and that’s what the runtime is.

Through the public MCP server, any AI agent can use Harmonica, create and manage sessions tailored to whatever context that agent already holds. Drop the harmonica-chat skill into a coding agent and “run a retro on our API redesign” becomes a session pre-loaded with your actual project — your Claude Code reads your context, picks the method, and launches it; Harmonica runs the facilitation. A researcher already did this: Maria Milosh implemented a novel cross-pollination method entirely as agent orchestration on top of the MCP, with no changes to the platform. The method was a spec the runtime executed, not a feature we shipped.

A method is just that: stages, the logic for each one, the roles, and what carries from one stage to the next, how to cross-pollinate, all written down where the runtime can read it. The sixteen templates we ship are just the specs we created as demos; we would love to see our community create better ones. We’re working on an open format: facilitation protocols anyone can describe, fork for their own upgrades, and hand to Harmonica to run, potentially published the way Karpathy published llm-wiki. We want the outputs to be portable in the same spirit: a Wardley session emits standard Mermaid text, not a diagram locked to our tool. That’s what we’re working on with Metagov: an open library of method specs (protocols + evals), with Harmonica as the runtime that executes them. More on that soon.

How it learns

Any method is only as good as the facilitator running it. The new Review tab on the results page gives you an AI critique of how the facilitator actually behaved during the session — where it stayed on one question too long, where it skipped a handoff — each finding backed by the participant’s own words. Where the review proposes a rule, you apply it in one click and it’s in for next time. Every prompt edit is tracked — who made it, when, and which surface it came from — so when a session behaves differently, you know what changed.

Underneath that is a measurement layer that runs each of Harmonica’s facilitator prompts against a scored rubric — criteria like “does the facilitator ask one question per turn?” or “does the closing turn check whether the participant is satisfied?” — scored by a second LLM. Not every behavior fits a rubric: some rules only trigger deep in a conversation, where a per-message scoring call would add too much latency. Three instrumentation types, matched to three different classes of behavior: detector, judge, smoke. The runtime doesn’t ask you to trust it. It gives you something to check. And that’s not just quality hygiene, it’s what makes agentic facilitation safe to push further: you can’t responsibly let a facilitator act more on its own without the instruments to see and score what it does. Observability and evals are the precondition for that autonomy, not an afterthought to it. The longer story is in We shipped prompt improvements against a broken scoreboard.

The mission

Everything we’ve shipped is in service of making proven protocols for collective sensemaking more accessible. We need them to tap into tacit organizational knowledge — but protocols that can’t be run reliably with groups of people are just literature. The ceiling we’re lifting with 0.4 is what kept facilitation from being something you could run rather than stage: isolated conversations that couldn’t carry context forward, couldn’t sequence, couldn’t compound. Methods that should run quarterly got skipped because setup cost too much. Knowledge that should accumulate got lost when the facilitator changed. Insights that should be auditable got flattened into a summary nobody trusts. We believe that solving these problems is one of the best applications for AI.

If that mission resonates, you can support us with a donation of any size — on Open Collective or Giveth. We don’t have any external funding and it means a lot. If you want to go all in, our limited lifetime deal is open: pay once to get access to every premium feature, forever. No one knows what AI will cost a year from now, so rather than promise unlimited inference, it’s paired with bring-your-own-model — you connect your own LLM and cover the cost of tokens yourself. It’s for our truest supporters. Maybe it’s you!

And we’re hosting a Town Hall on June 17 — an open event to see the latest features in action, ask questions, and shape our roadmap. Join us on Luma →

See you there!