system/devops-cycle-design.md
Status: approved model, landing incrementally. The two-workflow task
status model is now live —Taskstages are
designed → building → submitted(Build) and
submitted → reviewed → assembled → shipped → archived(Deploy — the
shipped → archivedarchive loop closes it, §1.4) — meeting at the
submittedseam — plusblocked(side) andarchived(terminal).
bin/task,bin/dor-check, and the board speak it.Landed since: the
Releasesingleton model; the persistent-release
branch CLI —bin/release init|merge|prepare|ship(§1.1); and
bin/agent-worktree's release-aware base default —newcuts the feature
branch fromorigin/release(falling back toorigin/mainwhere norelease
branch exists) andfinish --propens the PR with--base release.Still to land (each its own task): migrating the heartbeat planner
bin/devops-cyclefrom the old stage names to the new ones (it is
self-consistent on the legacy snapshot today); multi-repoship— the
per-repo app deploy across satellites (+ gem auto-repin + partial-ship +
test_cmdgate);bin/release shiptoday publishes gems then deploys
mcritchie-studio only; the Discord progress webhook
(§5). Where this doc describes those, it is the spec for the follow-up.Deploy-flow redesign (decided, 2026-06-22). The
submitted → shippedhalf
was re-homed by role — review is delegated by Avi to two seniors (no longer
his solo gate),assembledis owned by Steffon (now titled Platform
Engineer), andshippedis owned by Avi (full e2e on the frozen ship
SHA) plus the one operator gate. §1.2 is the rewritten spec. It lands via
three build tasks:deploy-flow-heartbeat-tooling(planner/tooling + the
prepareretry/wait-for-boot fix),stages-page-step-outlines(the per-step
/stagesoutlines), andseed-souls-prod-qa(the reviewer souls, incl. a new
Alex Documentation reviewer persona distinct from the orchestrator seat).Operator companion: the board stage guide at
/stages, and the rendered SOP
infographic — the cycle drawn as accountability swimlanes, one row per owner —
at/stages/sop. Both render from
config/devops_vocabulary.yml(read viaDevops::Vocabulary), the single
source of truth for the SOP vocabulary: rename a term there and it flows to the
UI in one edit, so the page and these docs cannot drift. This document remains
the canonical full SOP.
This design answers seven goals:
| Capability | Where | Reuse as |
|---|---|---|
Task state machine — Build designed→building→submitted→reviewed, Deploy reviewed→assembled→shipped, plus blocked/archived |
Task model, devops-task-board.md |
The spine. Everything routes through the task. |
kind (feature/bug/chore/qa/release/cleanup), metadata["devops"] contract |
devops-task-board.md |
SOP routing key + handoff record. |
Activity log: comment / qa_feedback / handoff + scout reports |
Activity, task-board API |
The durable QA↔feature-agent channel. |
Sealed-bid sizing, backend_migration advisory-lock lane, release_train lane |
sizing-rubric.md, exclusive-lanes.md |
Order-of-operations machinery. |
Test lanes (pr_review_gate / local_proof / qa_acceptance / production_smoke / nightly_deep / quarantine) + config/devops_test_suites.yml + bin/devops-tests |
testing.md |
The when/where axis of the pyramid. |
bin/qa-intake, bin/devops-cycle (scout packets/decisions/readiness), bin/agent-worktree, bin/qa-server, bin/deploy |
parallel-agent-devops.md |
The conductor toolchain the heartbeat agent drives. |
Discord POST /api/v1/release_notes (dry-run, grouped-by-app, standardized) |
release notes service | The standardized visibility primitive. |
| "Future Heartbeats" lease-model spec | devops-task-board.md |
The literal blueprint for the airgapped agent. |
The job is formalize + close gaps, not greenfield.
The flow is two workflows, matching how the work actually splits and who
owns each. Building a change (the feature agent) and shipping a release
(DevOps) are different jobs at different cadences, so they are different
lifecycles that meet at one seam — submitted.
designed → building →
submitted. A task is specced (designed), an agent claims and builds it
(building), and opens a PR (submitted) — where the feature agent's part
ends. A wall, bounced PR, or unready dependency parks it at blocked.submitted → reviewed →
assembled → shipped. Review is a nested chain: Avi thin-delegates the
submitted PR — he confirms product-acceptance and picks the primary +
light pair (bin/reviewer-select) — then hands the lane to the PRIMARY
reviewer, who owns it end-to-end. The PRIMARY does the deep review (acceptance,
base tests, standards, smell, scalability) and spawns the LIGHT reviewer as
its own sub-agent; on two approvals with no blocker the PRIMARY drives
the task to reviewed AND merges its PR into the persistent release
branch (bin/release merge) — which flips that task to assembled
(membership at merge) — assembling a single release candidate (RC). A
blocker lands it at blocked (rework, with a qa_feedback note).
Steffon (Platform Engineer) QAs the RC with the next tier (integration +
an e2e smoke) and deploys origin/release to QA; at ship Avi runs the
full e2e on the frozen ship SHA and, on the operator's OK, the conductor
fast-forwards release → main (shipped). submitted is the seam — the
feature agent hands the PR to DevOps there.QA and production are properties of the release, not the individual task —
so there is no per-task QA stage; the one operator gate is a single OK on the RC.
WORKFLOW 1 · Build (feature agent) WORKFLOW 2 · Deploy (DevOps · Release model)
designed → building → submitted ─────────► submitted → reviewed → assembled → shipped → archived
▲ │ (review) (approved) (merged RC, ("run the (archive
└ blocked ┘ e2e green, deployment" loop,
(rework / env / dep) QA-deployed) → prod) §1.4)
blocked is the single "not in the pipeline's court" state — an agent hit a
wall, QA bounced the PR, or a dependency isn't ready. It records blocked_from
(captured automatically) + block_kind (environment / rework / dependency) so a
heartbeat agent routes it without re-reading the thread. archived is terminal.
The RC is a Release singleton (only one assembles at a time). Member tasks
carry its release_slug; it carries them through QA→prod and flips them to
shipped when it ships. The airgapped agent runs both workflows but never
crosses the ship gate autonomously — the operator's single OK on the RC is the
one human gate. The task API is on production, so the airgapped box only needs an
internet connection — no separate pull/sync layer.
Open every cycle by assessing the board BY STAGE — bin/task list --stage, never the default flat
reviewed|assembled|shipped|submitted|buildingbin/task (it caps at the 20 newest tasks across all stages with no truncation
list
warning, so older actionable work silently falls off). See
parallel-agent-devops.md → Step 0.
release branchRelease (singleton model) coordinates one candidate from assembly through
ship. Consolidation (decided): the legacy release_train field becomes
release_slug — one concept, one name.
The integration branch is PERSISTENT. Every repo keeps a single release
branch — the same name in every repo (Release::BRANCH = "release"). Feature
PRs target release, not main; main is always an ancestor of release.
bin/release init creates the branch (= main) on every gem + app repo, once
(idempotent). Membership flips reviewed → assembled at merge: merging an
approved PR into release (bin/release merge <task>) records the task onto the
active candidate. prepare deploys origin/release to QA; ship
fast-forwards each repo's release → main and deploys prod. After a ship,
release collapses to main and re-accumulates the next candidate.
| Field | Meaning |
|---|---|
slug |
Canonical id, e.g. 2026-06-20-s3-uploads. |
state |
assembling → assembled → shipped (+ abandoned). assembled = the QA candidate is built (members merged into release) and its suite checks out. |
branch |
The persistent integration branch release (same name in every repo); feature PRs merge into it, QA deploys from it, and ship fast-forwards it into main. |
confirmed_at / confirmed_by |
The operator Make the release action at assembled → shipped — the one human gate. |
qa_url / production_url / deployed_sha / release_notes_sent_at |
Deploy + notes record. |
has_many tasks |
via tasks.release_slug. |
Task gains two links:
| Field | Meaning |
|---|---|
release_slug |
The Release this task rides (null until merge; the task becomes assembled once its PR is merged into release). |
dependencies |
Array of task slugs this one needs shipped first. Now enforced by the conductor (Release::Ordering) — a member sorts after every task listed here — composed under the producer-first rule (e.g. an engine gem before the apps that consume it). |
dependencies (task→task) and the exclusive lanes (resource-level:
migration, release, vault single-writer) compose: dependencies say "B needs A's
output"; lanes say "only one of these at a time." As of the gems-first
release work, dependencies is no longer spec-only: Release#ordered_members
honors it (a stable topological sort that falls back to position), so the
conductor sequences members producer-first and respects explicit
task-to-task edges. See "Gem members & producer-first ordering" below.
Membership flips at merge; QA is a deploy of origin/release. Feature PRs
already target release, so there is no branch-cut and no PR-base retarget. For
each reviewed member its PR is merged into release — the primary reviewer
runs bin/release merge <task> for the task it owns (the conductor batch-merges
any pre-existing reviewed backlog with bin/release merge <task> [<task> …]),
which gh pr merges + Release::Conductor.adopt!s, records the task onto the
active candidate, and flips it reviewed →.
assembledmerge is batched — it takes one OR many slugs, gh pr merges
each, then adopts them all in one heroku run (the cold-start-per-PR adopt
otherwise blew the tool timeout and left half-states); with ≥2 slugs it also
prints an overlap planner (colliding files + suggested order + likely
rebases, warning-only). A merge conflict surfaces at this PR-merge step
(resolve on GitHub, or block the task for rework) — release never force-pushes. Once the desired members are
merged and Steffon's integration + e2e-smoke suite is green on
origin/release, bin/release prepare deploys origin/release to QA +
records the QA URL → the release is assembled (the QA candidate). A late PR
merging in after that reopens the RC (Release#reopen!) so it re-QAs before
shipping. At ship, Avi first runs the full e2e + highest-tier suite on the
frozen ship SHA (the exact prod code — closing the merge-forward "shipped ≠
tested" gap); on green he stops for the operator, who then Makes the
release — an explicit action (surfaced as the current release on
/deployments, not a passive status): bin/release ship fast-forwards each
repo's main up to release (so release collapses into main), deploys prod,
and flips members to shipped. The operator's action — after Avi's test
confirmation, before the deploy — is the one human gate.
Gem members & producer-first ordering. A release is not apps-only — it can
carry gem tasks (studio-engine, solana-studio) as first-class members
alongside apps. The classification lives in config/release_repos.yml (read by
Release::Repos): every member is a :gem (producer) or an :app (consumer).
Gems and apps are handled differently at both ends of the Deploy workflow:
release branch like any other, but there is no app artifact
to deploy — so at prepare the conductor skips gem members in the QA-deploy
loop. They ride the release as a record, and are QA'd indirectly through a
consuming app (the consumer's branch is what gets deployed + tested).Release#ordered_members
returns members gems-first (then apps), honoring dependencies within
that. bin/release ship publishes every gem member to RubyGems first
(approval-gated gem push, version from the gem's version_file), and only
then deploys the apps. So a consuming app always builds + deploys against the
just-published gem version — never ahead of it. If a gem fails to publish,
the ship aborts before any app deploys.lib/studio/version.rb
for studio-engine, the .gemspec for solana-studio) is part of the gem task's
own PR; member_plan reads it for the publish + the board's 💎 gem badge.
Post-publish, consuming apps re-pin ~> x.y and deploy as their own members
(or follow-up tasks). See docs/agents/modules/deployment.md →
"Releasing a gem (producer-first)" for the operator runbook.This is the ordered release_train from §4.2 ("gem publish → consumer lockfile
bump → app deploy"), now expressed as first-class release membership rather than
a separate lane: the gem and its consumers can be members of the same release,
sequenced by kind + dependencies.
Abandon (revert, never force-push). release is permanent and shared, so a
stuck RC is not thrown away by deleting a branch — it is unwound by reverting.
Release#abandon! drops board membership (members fall back to reviewed,
release → abandoned, the singleton frees up). The git-side remediation is owned
by the conductor/CLI as a documented step:
release (git revert -m 1 <merge-sha>, push) — never a force-push, since
release is the shared persistent branch.reviewed; the e2e culprit goes to blocked.release (a new candidate
accumulates) and re-prepare. main never moved.Distinguish the accountable role (the soul whose rubric governs the stage)
from the executor (who moves it). The heartbeat agent executes by wearing
each lane's hat; accountability maps to a soul; there is exactly one operator
gate — the ship.
Redesigned submitted → shipped (decided, 2026-06-22; review lane re-homed
2026-06-26). The Deploy half is re-homed by role. Review is a nested chain,
not a flat peer spawn: Avi is a thin delegation pre-step — he confirms
product-acceptance and picks the primary + light pair (bin/reviewer-select),
then hands the lane to the PRIMARY reviewer, who owns it end-to-end (the deep
technical review, spawning the LIGHT as its own sub-agent, and the merge).
assembled is owned by Steffon (now titled Platform Engineer); and
shipped is owned by Avi (who runs the full e2e on the frozen ship SHA)
ahead of the one operator gate. The senior reviewer pool is {Shannon = UI ·
Carl = backend · Jasper = Web3 · Steffon = DevOps/Platform · Alex = Documentation}
(Alex is both the orchestrator and the pool's launchable Documentation review
seat — one identity). Avi picks the pair with bin/reviewer-select <task>
(wraps ReviewerSelector).
Merge timing (decided): a reviewed task is merged when its PR lands into
the persistent release (bin/release merge), which flips it reviewed →. The release branch is the live RC;
assembledmain only moves when it ships.
The PRIMARY reviewer runs the merge — on two approvals (primary + light)
with no blocker it drives the task to reviewed and runs bin/release merge. The
conductor no longer runs the merge for a freshly-reviewed task (it still
batch-merges any pre-existing reviewed backlog). Bias to action: green tests
= go, because release is recoverable by revert, so we don't fear merging there.
| Stage (entity) | Accountable | Progressed by | Action | Gate |
|---|---|---|---|---|
| → submitted (task, entry) | Feature agent | Feature agent | bin/full-suite-check (certify FULL suite + rubocop) → pass bin/dor-check, record checks_run, open PR (base release), move in |
self-gate |
| submitted (task) — REVIEW | PRIMARY reviewer (Avi delegates) | Avi (thin gate) → PRIMARY owns lane → LIGHT | Avi confirms product-acceptance, then picks 2 reviewers from {Shannon=UI · Carl=backend · Jasper=Web3 · Steffon=DevOps/Platform · Alex=Documentation} by domain fit + a logged, seeded-per-task tiebreak via bin/reviewer-select <task>, assigning 1 PRIMARY (deep) + 1 LIGHT (ReviewerSelector, excluding the QA owner so a reviewer never QAs their own change, the task's builder so a soul never reviews their own work, and busy souls so review never lands on an agent mid-build/review elsewhere — the builder is read from devops.built_by, auto-stamped on the move to building from the soul build-claim actor (--actor <soul>) OR the task's assigned agent_slug so a bare bin/task move <slug> building records the builder with no manual flag, falling back to the → building event actor; busy souls come from bin/reviewer-select --busy a,b,c and/or --busy-auto (a board query of agents on stage=building tasks); KEEP fallback: when the builder + QA-owner + busy exclusions would leave fewer than two candidates, the least-bad ones are kept so a PRIMARY+LIGHT pair is always returned (the decision/log flags it), and a non-soul/non-pool builder is never reported excluded; the pair + primary/light is recorded on the submitted→reviewed TaskEvent.metadata["reviewers"] for the avatars UI). Avi then hands the lane to the PRIMARY, who runs the deep review and spawns the LIGHT as its own sub-agent; each confirms DoR base tests green, code standards, code smell, scalability, and acceptance. No blocker on either → the PRIMARY drives the task to reviewed ✅ AND runs the merge (next row); a blocker → blocked (rework, with qa_feedback) |
2 senior approvals (PRIMARY = Opus on migration/payment/solana/auth); ⛔ one complete qa_feedback on fail |
| reviewed ✅ — MERGE (task) | the PRIMARY reviewer | the PRIMARY reviewer executes | On 2 approvals (primary + light, neither blocking) the PRIMARY runs bin/release merge for its task — honoring dependencies + lanes; membership flips at merge → member assembled. The primary owns the merge; the conductor no longer runs it for a freshly-reviewed task (it still batch-merges any reviewed backlog). Bias to action: green tests = go (release reverts cleanly; we don't fear merging there) |
deterministic merge (conflicts surface at PR-merge); the primary owns it — no separate conductor/Avi gate |
| assembled (release) — QA | Steffon (Platform Engineer) | DevOps agent as Steffon | Run the next tier — integration + an e2e smoke on origin/release; green → bin/release prepare deploys it to QA → Discord QA-deployment note → release assembled |
deterministic suite; ⛔ regression → block the task. prepare retries/waits-for-boot (the /up-smoke race): bin/release prepare polls <qa_url>/up via wait_for_boot and defers the assemble (Release::Conductor.curate! then assemble!) until QA returns 200, so a slow dyno can't strand the RC assembling |
| → shipped (release) | Avi, then Mr. McRitchie | Avi tests; operator approves; conductor deploys | Avi runs the full e2e + highest-tier suite on the FROZEN ship SHA (the exact prod code — fixes "shipped ≠ tested"). On green Avi STOPS for the operator → on go: bin/release ship ff's release → main per repo, deploys → production_smoke → Discord release notes → members shipped |
🔒 the one operator gate — after Avi's test confirmation, before deploy; rollback on smoke fail |
Clarifications:
Release::STEP_TEST_TIERS (ownership is disjoint by construction — a tier maps
to exactly one step); ship runs Avi's full-e2e gate on the frozen SHA before
the operator gate (bin/release ship → avi_ship_gate, then confirm).assembled means slightly different things at the two scopes — a task is
assembled the moment its PR is merged into release (bin/release merge,
driven by the two seniors' approval); the release is assembled once the
desired members are in, Steffon's integration + e2e-smoke is green on
origin/release, and prepare has deployed it to QA. The operator gate is at
ship (after Avi's full-suite run on the frozen SHA), not here — at this
scope assembled is a state the conductor flips, not a human approval.release (bin/release merge) on two approvals. Avi does not do the deep
technical review himself, and the conductor no longer runs the merge for a
freshly-reviewed task.
The selection tiebreak is seeded per task (reproducible, not
process-random) and logged (auditable), so reviews spread across the pool
instead of always landing on the obvious domain owner. Because the seed is
derived from the task identity (+ its exclusions), bin/reviewer-select's
preview matches the pair recorded on the submitted→reviewed TaskEvent —
the CLI and the in-app recorder roll identically and never disagree. (That
match holds for the default QA owner; passing a custom --qa-owner changes
the candidate pool + seed, so the preview is then advisory.)assembled step — one soul cannot both review and
QA the same change. (Steffon remains a valid reviewer for other PRs,
especially DevOps/Platform ones.)alex seat is the orchestrator who
also holds the Documentation domain review seat — one identity (seeded in
db/seeds/02_agents.rb, a launchable review agent). ReviewerSelector /
bin/reviewer-select pick alex for docs-shaped PRs (the QA-owner and
builder exclusions still apply, so Alex never reviews a change he built).release
on the two approvals (primary + light) — the consequence of "Review + QA, gate
prod".
Risk raises scrutiny (the PRIMARY review goes to Opus + full
integration/security suite), not a second human. config/release_builder.yml
gates only QA assembly autonomy; adding a separate human pre-merge gate for
payment/solana would require a code/config change, not a doc-only knob.conductor-review and
routes to a human Avi/Steffon session instead of approving the merge into
release.Resolved: release_train → release_slug (one field/model); feature PRs
merge into a persistent per-repo release branch, membership flipping at merge;
no per-task QA stage; Release is its own singleton model — states assembling →, where the operator Makes the release (a page action on
assembled → shipped
the assembled RC). Decided 2026-06-22 (§1.2): review is delegated by Avi to
two seniors (not his solo gate); assembled is owned by Steffon (titled
Platform Engineer); and at shipped Avi runs the full e2e + highest tier
on the frozen ship SHA before the operator's Make-the-release action — so
the human gate sits after test confirmation, before the deploy.
RC assembly autonomy is the one evolving policy — so it lives in one
tunable config file, config/release_builder.yml, read by
Release::BuilderPolicy. Current policy:
migration/payment/solana risk tag.production_ship.operator_gated
is true), regardless of QA autonomy.Change thresholds in config/release_builder.yml, then run
bin/rails test test/models/release/builder_policy_test.rb. This policy only
decides QA assembly autonomy; bin/release ship remains separately
operator-gated.
The /deployments board and /stages page surface a short copy-paste command
per DevOps stage (source of truth: ApplicationHelper#devops_kickoffs). Pasted
into an agent session run from /Users/alex/projects, each kicks off that
stage's workflow. The feature-agent lane (designed → building → blocked →) has none — the operator drives those hands-on. The DevOps lane maps
submitted
each command to a deterministic runbook. The same devops_kickoffs source also
carries one non-stage meta-trigger — Build and Deploy QA Release — rendered
as a prominent chip in the current-release section (#current-release); it is
the operator's main one-trigger that composes the per-stage commands below.
Build and Deploy QA Release (the operator's one-trigger QA-department run — stops at the ship gate)
This is Mr. McRitchie's single trigger for the whole QA department: hand it to an
agent — any model (Claude, Codex, …) — and it walks the persistent-release
model through review, merge, QA deploy, and ship-readiness. It does not ship to
production without Mr. McRitchie's confirmation; after the frozen-SHA ship
tests pass, stop and hand the operator the Run Deployment gate.
⚠️ THIS SOP STOPS BEFORE PROD. A green review + green QA + green ship-tests
produces an assembled RC and a clearbin/release shiphandoff. Production
deploy remains the one operator gate across every release repo
(mcritchie-studio + satellites).Cold-start framing — you are the CONDUCTOR (Deploy lane). When the operator
opens a fresh session with justBuild and Deploy QA Release, follow this
SOP — not the feature-agent "⛔ STOP before writing code" flow inCLAUDE.md
(that is the Build lane). The conductor orchestrates the deploy run on
work that is already built: it delegates review (Avi confirms
product-acceptance + picks the pair; the PRIMARY reviewer does the deep
review, spawns the LIGHT, and runsbin/release mergefor its task), then
assembles the QA candidate and deploys it. The conductor does not
review or run the per-task merge itself, and it does not create a task, take a
worktree, or write feature code. Run every command from
/Users/alex/projects/mcritchie-studio.A non-interactive agent MUST pass
--yesonly for approved non-production confirms. An
agent's shell has no TTY — stdin is EOF, which a confirm prompt reads as
"no". The consequence differs per command:
-prepareaborts without confirmation in a non-interactive shell. Always
runbin/release prepare --yesfor the approved QA deploy step.
-shipaborts loudly without confirmation — that is intentional. Do not
pass--yesunless Mr. McRitchie explicitly gives the production ship go in
this session or an already-approved rollout prompt.
-archivecan use--yesafter the shipped release is verified and the
operator has approved cleanup.
-mergedoes not prompt today;--yesis harmless future-proofing.
--yesbypasses the human confirm only — it never skips a test gate
(avi_ship_gateruns and can still abort the ship).--prodis already the
default (the board is prod) — don't add it redundantly.
The block-and-move rule (every review below). A reviewer block is not a
stop: bin/task block <slug> --kind rework --feedback "<one complete send-back>",
then move on — one block never holds back the PRs that passed (rework is a separate
feature-agent cycle). Surface every blocking event — a review send-back, a merge
conflict that forces a task back, or a ship-time test/preflight abort — as a little
❌ Block Resolved line (❌ Block Resolved — <slug>: <reason>; "resolved" =
recorded and routed back, not fixed). List them in the handoff only if at least
one blocking event happened — omit the section on a clean run.
bin/conductor plan — enumerate the Deploy
queue by stage, name the active release candidate + its assembled members
(the "active pending release" you fold in), and flag blocked + non-pipeline
tasks (e.g. rolio — handle in its own app, never bin/release merge it).bin/task list --stage submitted) and review it in parallel: for each
pipeline task run the nested cascade (the Review submitted PRs runbook
below) — spawn Avi as the thin gate (product-acceptance + bin/reviewer-select
<task> --busy-auto, which records the primary+light pair), then spawn the
PRIMARY reviewer; the PRIMARY spawns the LIGHT. Cap the fan-out at 5
concurrent agents (the board DB's connection budget — see "Concurrency cap" in
the operating model): launch in waves of ≤5, not the whole queue at once,
and don't review one PR start to finish before the next. Both pass with no
blocker → the PRIMARY drives the task to reviewed AND runs bin/release
merge <task> (it owns the merge → assembled); any block → block-and-move.bin/release merge as the last act of owning its lane, so they are
already assembled. Batch-merge any pre-existing reviewed backlog the
cascade didn't own into the active candidate (or open it). Run the
pre-merge checklist per PR — gh pr ready <n> (un-draft), gh pr edit <n> --base
release (retarget; no-op when main == release) — then batched
bin/release merge <slug> [<slug> …] --yes (one dyno, N reviewed → assembled
flips; the overlap warning is advisory). Deploy to QA: bin/release prepare
--yes (records release.qa_url; a /up 000 right after a Heroku release is
usually a still-booting dyno — re-check curl -s -o /dev/null -w "%{http_code}"
<url>/up). Bias to throughput: default to including, not deferring.submitted. Re-fetch bin/task list --stage submitted and run
the same cascade + block-and-move on anything new. Empty → say so, skip to step 6.bin/release merge <slug> … --yes the
round-2 reviewed tasks into the same candidate, then bin/release prepare
--yes to refresh QA. Confirm the refreshed QA app is green.bin/release ship --by conductor; add --yes only after explicit approval in
this session or an already-approved rollout prompt. It runs preflight
(clean-main assertion) →
avi_ship_gate (each app's test suite on the frozen ship SHA; aborts on any
failure before anything irreversible) → producer-first gem publish → ff each
repo's release → main → deploy + smoke /up → prod post_deploy_cmd →
members shipped → auto-posted release notes. If a gate aborts, report it and
don't force past.bin/release archive --yes (shipped → archived +
reclaim worktrees) and bin/release retro --yes (durable learnings doc).The per-stage commands below are the building blocks this meta-trigger sequences:
One-time setup (per machine/clone). Run bin/release init once: it
creates the persistent release branch (= origin/main) on every gem + app repo
in config/release_repos.yml that doesn't already have one. Idempotent — a repo
that already has origin/release is skipped.
Review submitted PRs (submitted → reviewed → assembled)
Review is a nested chain (§1.2): Avi thin-delegates, then the PRIMARY
reviewer owns the lane — it spawns the LIGHT and runs the merge. Not a solo
avi pass, and not a flat peer spawn. For each submitted task
(bin/task list or the board):
release) meet the task's acceptance criteria? — and picks the pair
(step 2). He does not run the deep technical review.bin/reviewer-select <task> — it loads
the app and scores the pool {shannon, carl, jasper, steffon, alex} by
domain fit (the task's shape + repositories + risk tags vs each soul's
domains) with a logged, seeded-per-task tiebreak, returns 1 PRIMARY + 1 LIGHT, and
excludes the QA owner (Steffon, who QAs the assembled RC — no self-gating),
the builder (read from devops.built_by, auto-stamped on the build move
from the assigned agent_slug — so a soul never reviews their own work with
no manual flag), and any busy souls you name. alex is the
orchestrator who also holds the launchable Documentation review seat — one
identity. (--qa-owner SLUG excludes a different soul when Steffon isn't the
one QAing this task; --builder SLUG overrides the recorded built_by;
--busy a,b,c and/or --busy-auto (a board query of agents on
stage=building tasks) drop agents mid-build/review elsewhere — the pool is
never starved below a pair, the least-bad are kept back; --json for a
machine-readable pick; --record writes the picked pair onto the task as a
review intent so /deployments + the task timeline show the two seniors
reviewing live — a green ticking timer — the moment Avi kicks off review,
before →reviewed lands. With a custom --qa-owner or a non-empty --busy
set the preview is advisory — --record it so the busy-aware pair is the
one the timeline shows.)migration/payment/solana/auth) — diff-vs-acceptance + code
standards + code smell + scalability, plus confirming the shape's base
tiers are green — and spawns the LIGHT reviewer as its OWN sub-agent for a
focused pass, so the primary already holds full context when the light's verdict
returns. Honor the ≤5-concurrent cap (operating model): across all PRs in
flight, keep at most 5 review agents running at once — a queue wider than ~2 PRs
reviews in waves of ≤5, never the whole batch at once.reviewed (bin/task move <task>
reviewed) AND runs bin/release merge <task> to land its PR in release
(membership flips reviewed → assembled) — the conductor no longer runs this
for a freshly-reviewed task. Any reviewer blocks → bin/task block <task>
--kind rework --feedback "…" (one complete send-back). Bias to action: two
green approvals = go (release reverts cleanly).Agentic intent — the live "who's on it now". Each event carries the agent
that STARTED it, not only the one that completed it, so /deployments and the
task's consolidated Stage Timeline show who's working right now with a
green ticking timer — the Deploy mirror of the build lane's live counter. These
are append-only TaskEvents of kind: intent (completed transitions stay
kind: transition, and an intent never enters the duration spine); an intent is
"open" only inside the source-stage cycle where it was recorded. A later
transition into its target stage supersedes it, and leaving the source stage
(for example submitted → blocked) closes it even if the target never landed.
If QA blocks a PR for rework and the feature agent rebuilds/resubmits it, Avi can
record a fresh →reviewed intent for the second review round. Build-lane intent = the task's
Pokémon mascot (assigned at create). The
review pair is recorded by bin/reviewer-select <task> (step 2 — recording
is the DEFAULT now; pass --no-record/--dry for an advisory-only preview);
Steffon's QA and Avi's ship intents are auto-recorded by the deploy CLI —
bin/release prepare fires the assembled intent (actor: steffon) and
bin/release ship the shipped intent (actor: avi), both via
Release::Conductor.record_deploy_intents! over every release member (the Deploy
mirror of bin/reviewer-select's default review-intent write). So the conductor
no longer hand-runs them — the 2026-06-25 unfilled-ship-slot incident (a missed
manual bin/task intent --to shipped --actor avi left the ship crew slot blank
mid-deploy). The manual bin/task intent <task> --to assembled --actor /
steffon--to shipped --actor avi (or POST) stays as the fallback / one-off path. All of them
/api/v1/tasks/<slug>/intent
are append-only + idempotent — an identical open intent in the current
source-stage cycle is reused, and the call is a no-op once the target stage has
landed in that cycle. Actor-less
conductor moves on assembled/shipped still attribute to their role owners
(Steffon QAs assembled, Avi ships) so the Deploy crew never goes blank.
Prepare release (reviewed → assembled — an RC for QA)
Two deterministic steps:
release (BATCHED). Run bin/release merge
<task-slug> [<task-slug> …] --yes [--prod] — it accepts one OR many
slugs (run the pre-merge checklist first — gh pr ready <n> to un-draft,
gh pr edit <n> --base release to retarget). It resolves every task's PR in
one board read, verifies each base is release, gh pr merges each, then
Release::Conductor.adopt!s all of them in a SINGLE heroku run (one dyno
spin-up, N reviewed → assembled flips) — opening/reopening the singleton
release as needed. A merge conflict surfaces here (resolve on GitHub or
block the task for rework); release is never force-pushed. Gem PRs merge into
their own repo's release like any other.
heroku
run adopt per PR — three in a row blew the 2-min tool timeout, and a
mid-loop timeout left a PR merged but its task stuck reviewed (a
half-state). Batching collapses the adopts to one dyno; the adopt also runs
in an ensure, so every PR that did merge is still adopted even if a
later gh pr merge aborts — the half-state can't recur.gh pr view <n> --json files), then prints the
pairwise file collisions, a suggested merge order
(smallest-footprint first), and which PRs will likely need a post-merge
rebase (they touch a file an earlier same-repo PR already merged). It
never blocks — merges proceed in the order given; it just surfaces the
"siblings all touched task.rb" rework before it happens. Pure logic:
Release::MergePlan.compute.bin/release prepare --yes [--task SLUG ...]
[--slug rel-…] [--prod] (Release::Conductor.prepare!): finds the active
release, auto-records the Steffon → assembled QA intent for every member
(Release::Conductor.record_deploy_intents!(r, to_stage: "assembled", actor:
"steffon"), append-only + idempotent — a no-op for any member already past
assembled) so /deployments shows Steffon QA-ing the RC live with no hand-run
bin/task intent, runs a per-app merge-forward guard (keeps each repo's
release ahead of main), then bin/qa-server deploy <qa_app> origin/release
for each app member — gem members are skipped (no app artifact; they ride
the record and are QA'd via a consuming app). It records release.qa_url +
per-repo QA SHAs and leaves the RC assembled. --task is operator curation
(adopt the named tasks first); it does not auto-adopt every reviewed task.
Record ops run on the prod board by default (the board IS production) via
heroku run; --local opts into the stale local DB. Post-deploy hook: once each QA dyno boots (after
wait_for_boot, before the assemble flip), for every member that declares
devops.post_deploy_cmd it runs that command on the member's QA heroku app
via heroku run, records the [post-deploy] outcome on the task's
checks_run, and aborts prepare on a non-zero exit (so the RC stays
assembling, re-runnable). The {task, app, cmd} plan + the QA-vs-prod target
resolution are the unit-tested Release::PostDeploy.plan.Run Deployment (assembled → shipped — promote the QA'd RC to prod)
Run bin/release ship [--by NAME] --prod — the one human gate; it confirms
before deploying. Preflight FIRST (before any fast-forward): ship asserts
every app checkout is on a clean main and aborts loudly — naming the
offending branch / dirty files — if any isn't. ship ff's each repo's main up to
the QA-frozen SHA, so a checkout a review agent left on a leftover pr-NNN branch
or with an uncommitted stale schema.rb would otherwise break the ff mid-ship
(after gems published + the operator gate — the worst time). The preflight catches
it up front, before anything irreversible. Pure decision:
Release::ShipSequence.preflight_offenders / .preflight_message. Live ship
crew: right after the one operator gate (so a declined ship never shows it), ship
auto-records the Avi → shipped intent for every member
(Release::Conductor.record_deploy_intents!(r, to_stage: "shipped", actor:) so /deployments shows Avi shipping live — a green ticking timer — through
"avi")
the whole deploy instead of an empty dashed ship slot until ship! lands (the
2026-06-25 incident). Append-only + idempotent (ship! supersedes it; a
partial-ship abort leaves it open — correct, Avi is still shipping — and a re-run
reuses it). Producer-first: before any app deploy, it publishes every
gem member to RubyGems in order — for each it prints the gem + target
version and asks Publish <repo> <version> to RubyGems? (approval-gated; honors
--yes / --dry-run), runs the gem's build (studio-engine: bin/release-check; otherwise
--buildgem build <gemspec>), gem pushes the artifact, and tags
v<version> in the gem repo. A build/push failure aborts the ship before any
app deploys, so apps never deploy against an unpublished gem. Then for the apps
it fast-forwards each repo's main up to release (so release collapses into
main), pushes origin, deploys (git push heroku main; release phase runs
migrations), and smokes /up. After every app deploys + smokes (and before the
shipped record), the post-deploy hook runs each member's
devops.post_deploy_cmd on its production app via heroku run, records the
[post-deploy] outcome, and aborts ship on a non-zero exit — the abort
lands before ship!, so the release stays assembled (recoverable) and a re-run
resumes (the command is expected idempotent). On success it stamps deployed_sha,
flips the RC + its members to shipped (Release::Conductor.ship!), and
auto-posts release notes
(Release::Conductor.post_release_notes → the same Formatter/Discord path as
POST /api/v1/release_notes; non-fatal if the webhook is unset). After a ship,
each repo's release equals main and re-accumulates the next candidate. Run
ship from a primary checkout (not a worktree): the gem repos are resolved
as siblings at the projects root.
Archive completed tasks (shipped → archived — the Deploy loop's conclusion)
Run bin/release archive [--dry-run] [--yes] [--prod] to close the loop. It
archives every shipped task that is not a member of Release.last_shipped
(shipped → archived), so the most recently shipped release stays on the board
as the read-only Last Release while older, superseded completed work is filed
away. The pure, unit-tested rule lives in
Release::Conductor.archive_completed! / .archivable_completed_slugs; the CLI
owns the board write plus the worktree teardown around it. After archiving it
reclaims the merged/shipped feature worktrees (bin/agent-worktree cleanup).
--reclaim --yes--dry-run previews the plan (archivable + kept) and the
reclaim list without mutating anything; --yes runs it hands-off (skips the
single confirm). Idempotent — a re-run finds nothing new to archive. Archiving
only flips a task's stage, never its release_slug, so the board's Last Release
section keeps linking to its members even after they're later archived,
preserving the release history. shipped is therefore no longer terminal —
the Deploy loop now closes at archived. The ledger commits itself: after the
reclaim appends to delete-later.md, archive commits that update to release
(best-effort, only when the ledger is the sole uncommitted change — pure guard
Release::ArtifactCommit), so it ships next round instead of piling up as
ship-preflight stash dirt the conductor has to park.
Release retro (post-ship "review & learn" — completely NON-BLOCKING)
Run bin/release retro [release-slug] [--worked "…"] [--friction "…"] [--followup after a ship to capture what the release
"…"] [--file-tasks] [--yes] [--dry-run]
taught us. It defaults to the current / most-recently-shipped release, auto-gathers
the release record (member tasks + kinds, per-member submitted → shipped cycle
timing from TaskEvents, rework rounds = bounces into blocked, reviewers, and
recorded checks_run), prompts a few judgment questions (what worked / what caused
friction / follow-ups — --worked/--friction/--followup supply them from args,
--yes runs fully non-interactive), and writes a durable doc at
docs/agents/audits/retro-<slug>.md, then commits that doc to release
(best-effort, non-fatal, only when the doc is the sole uncommitted change —
Release::ArtifactCommit) so the generated retro ships next round rather than
becoming ship-preflight stash dirt. --file-tasks opens each follow-up via
bin/task create. The gather + render rule is the pure, unit-tested
Release::Retro (.gather / .render / .write_doc); the CLI reaches it through
the same read-only conductor runner and writes the returned markdown to the local
tree. It writes no agent-memory store — the doc (+ any filed tasks) is the only
record. Retro is decoupled from the pipeline by design: archive does not
depend on, trigger, or wait for it, so the loop closes whether or not a retro was
run. Unlike ship/archive, retro never deploys or mutates the board, so it does
not gate on --yes.
Both ride the same stage machine. They differ at entry and in test emphasis.
Routing lives in AGENTS.md (see §6) so an agent self-loads the right one.
test_plan = the shape's required tiers.bin/dor-check) — --gate
build before you start coding, --gate merge before handoff (§3.3).checks_run, hand off with a handoff note, move to submitted.Every
bin/task moveleaves a paper trail — for free. Each stage change
appends aTaskEventcapturingfrom → to, the timestamp, and the time spent
in the prior stage (the deterministic spine; it renders as the Stage Timeline
on the task page). You do nothing to get it. To also attribute model cost to a
transition, add the optional per-transition usage on the move:
bin/task move <task> submitted --model claude-opus-4-8 --tokens-in N --tokens-out N --cost D.
Usage is best-effort and opt-in; the spine is recorded regardless (and for
non-agent moves too). Details:
task-board-api.md.
hotfix (production broken / funds at risk) vs normal.hotfix may go straight to building and use an expedited review, but
never skips the regression test or the operator ship gate.Why regression-test-first for bugs: it both proves the fix and permanently
pushes that class of bug down the pyramid, shrinking future PR-stage churn.
Not every app is a managed satellite. A standalone / client app (Rolio) uses
the studio's process — the task board, worktrees, the multi-agent merge
patterns, and the evergreen build conventions — but owns its own runtime and
deploy and is eventually handed off to a client. It rides the same Build
workflow (designed → building → submitted); the Deploy half is lighter:
main, not release — there is no persistent release branch
and no release train. bin/agent-worktree already falls back to origin/main
as the base for any repo without a release branch.main by the
app's owner; it is not assembled into a studio RC.bin/dor-check ceremony. (Robust
error/API-failure logging is evergreen for both tiers — managed apps via
studio-engine's rescue_and_log/ErrorLog, standalone apps via plain
Rails.logger and/or their own tracker.)Full tier decision + phased checklist:
new-app-onboarding-sop.md. Multi-agent build/merge
patterns (several agents scaffolding one app in parallel):
../modules/worktrees.md → Multi-Agent Safety &
Merge Patterns.
Shapes are deployment-agnostic.
config/feature_shapes.ymland the
shape→tier contract (§3) classify the kind of change (ui-only, backend,
library, …), not the deploy tier — so they apply unchanged to a standalone app.
The shape still selects which tiers you write; the standalone tier only changes
where the PR lands and who ships it. Nofeature_shapes.ymlchange is needed.
Your insight: the pyramid must adapt to the nature of the feature, from one
general strategy, across all five repos. Three pieces: tier definitions (the
what), the shape→contract matrix (the adaptation), and the DoR gate (the
enforcement).
| Tier | General definition | Rails apps | studio-engine | solana-studio | turf-vault |
|---|---|---|---|---|---|
| Unit | Pure logic, no I/O | model/service/PORO/decoder specs | pure lib (ColorScale, Email…) |
Borsh/keypair/tx builders | single instruction handler logic |
| Component | One behavior + its immediate collaborators, no full stack | request/controller specs + rendered partial + Alpine factory | UI primitive via a host harness | client method w/ stubbed RPC | instruction + its account constraints |
| Integration | Multiple objects across a boundary | request→DB→job, RPC-mocked Solana (FakeVault) |
consumer-CI against both apps | client against test-validator | multi-instruction lifecycle (create→enter→settle) |
| E2E | Real browser / real chain | Playwright | (via consumers) | (via consumers) | devnet on-chain spec |
| Manual | Operator visual/UX acceptance | the release QA stop (eyeball the assembled RC, then Make the release) |
— | — | contract transparency / /contract review |
Tiers are the what; the existing test lanes are the when/where.
Mapping: Unit+Component+Integration → pr_review_gate/local_proof (block
merge); E2E happy-path → local_proof, full E2E → nightly_deep; Manual →
qa_acceptance; post-deploy → production_smoke.
A feature's shape is recorded in devops.shape. It selects the minimum
tiers that must be green by the time the task is submitted for review:
| Shape | Example | Required tiers (DoR contract) |
|---|---|---|
| ui-only | "make the button blue" | Component (rendered partial / Alpine) + Manual at QA. Unit only if it adds logic. |
| ui+db | new form that persists | Unit (model/validation) + Component (request+view) + Integration (request→DB) + E2E happy path |
| backend | new job/service | Unit (service/PORO) + Integration (job + mocked I/O) |
| library | studio-engine change | Unit in engine + consumer-CI (component/integration in both apps) |
| onchain | new turf-vault instruction | Anchor unit + Anchor integration (lifecycle) + Ruby decoder unit + devnet E2E (nightly) |
| onchain-vertical | new workflow w/ wallet + DB + UI + program | all tiers + devnet E2E; almost always its own release |
The matrix is the single source of "how much testing is enough" — it removes
the per-task judgment call that currently lets thin PRs through.
A task may not advance submitted → reviewed unless, for its shape:
checks_run;rubocop are certified green against the
exact code being shipped — not the touched-file subset. The shape's tier tags
prove the agent wrote unit/integration; they do not prove nothing else
broke. bin/full-suite-check <task> runs bin/rails test and bin/rubocop in
full and stamps fingerprint-bound [full-suite@<fp>] / [rubocop@<fp>]
checks_run lines; bin/dor-check re-grades them against the current code
fingerprint (a git tree hash — content-addressed, so it is stable across the
pre-commit→commit boundary and identical in a reviewer's fresh checkout), so a
stale (edited-since) or partial (one-lane / touched-files) record is
refused. Escape hatch — a record, exactly like post_deploy_cmd: none: a
reasoned [full-suite-bypass] <why> checks_run line passes the gate but is
flagged loudly in the verdict (use it for a pre-existing, unrelated red
tracked elsewhere — never to wave through your own break);metadata["devops"] fields are populated (existing contract);db/seeds,
db/migrate/), the task declares devops.post_deploy_cmd — the command
bin/release runs on the deployed app (QA on prepare, prod on ship) so a
seed/backfill isn't run by hand post-ship. Heroku's release phase auto-runs
db:migrate but not db:seed or a backfill rake; set post_deploy_cmd to
none to acknowledge a schema-only migration that needs no command.post_deploy_cmd safety rule (both gates): bin/release runs the command
verbatim against PRODUCTION, so it must be narrow, prod-safe, and
idempotent. A declared post_deploy_cmd is rejected when it is a bare
full-suite seed — bin/rails db:seed, rails db:seed, bundle exec rails
db:seed, db:seed:replant, or rake db:seed. db/seeds.rb loads every
db/seeds/*.rb, so a bare seed would inject demo News/Content/Tasks into prod
and abort the release on the first non-idempotent seed file. Declare a
narrow command instead: a scoped single-file runner —
rails runner 'load Rails.root.join("db/seeds/NN_x.rb").to_s' — or a
dedicated idempotent rake task (e.g. bin/rails pokemon:seed). This is the
fix for a real near-miss: merge-docs-reviewer-into-alex shipped
post_deploy_cmd='bin/rails db:seed' and was caught only when QA aborted,
because reviewers read the code diff, not the deploy metadata.This is deterministic — a bin/ gate (bin/dor-check <task>, default
--gate merge), not a judgment call. There is also a lighter --gate build
(spec-complete, no tiers) for the designed → building entry. The feature agent
runs it before handoff; the heartbeat agent re-runs --gate merge as gate zero
of review (the fingerprint-bound full-suite evidence is checkout-independent, so
gate-zero credits the same evidence the feature agent recorded). A failed DoR is
an immediate, cheap send-back that never consumes review-judgment tokens. This
is the structural fix for the review ping-pong: most "PR not ready" churn becomes
a pre-PR mechanical check.
bin/dor-check itself stays a fast, deterministic verdict — it does not
run the suite; bin/full-suite-check is the (slower, run-once-before-handoff)
runner that produces the evidence (format + fingerprint live in
bin/lib/full_suite_gate.rb). It closes the retro gap where a build passed only
the files it touched while the full suite or rubocop broke post-merge. For
those who want the lanes wired locally, bin/full-suite-check --install-hook
installs an opt-in pre-push hook (off by default; runs the gate before each
push, blocks a red push; remove with --uninstall-hook) — pre-push, not
pre-commit, because a full suite on every commit is untenable. But the
authoritative gate is bin/dor-check validating the recorded evidence: the
hook is a convenience, and evidence on the task record survives a fresh checkout
where a local hook artifact would not.
| Tier | Author | When |
|---|---|---|
| Unit | Feature agent | During build, before first commit |
| Component | Feature agent | Before submitted |
| Integration | Feature agent | Before submitted (mandatory for any migration/solana/payment/auth risk tag) |
| E2E (happy path) | Feature agent | Before submitted for ui+db / vertical shapes |
| Full suite + rubocop | Feature agent | Before submitted — bin/full-suite-check <task> certifies the WHOLE suite + lint (not the touched-file subset); records fingerprint-bound evidence bin/dor-check re-grades |
| E2E (edge/regression) | QA lane (Avi/Steffon) | May add during review; becomes a follow-up task if large |
| Manual | Mr. McRitchie | At the release QA stop (this is the manual tier) |
Pruning is a recurring chore task owned by the QA/infra lane (Steffon),
on a monthly cadence, tracked like any other task.
quarantine lane + a follow-up task (never silently
skip); redundant → delete the higher-tier test when a lower tier now covers
it (push coverage down the pyramid); dead → remove tests for removed
behavior.bin/devops-tests; daily steering still comes from task status.Runs on the OpenClaw box every ~10 minutes. Builds directly on the
devops-cycle/qa-intake toolchain and the "Future Heartbeats" lease spec.
# Workflow 1 — per task (review). Each submitted task, one safe step.
for each task in {submitted}:
acquire lease (claimed_by, claim_expires_at) # resilience: reclaimable
1. bin/dor-check --gate merge (gate-zero: metadata + tiers + FRESH full-suite/rubocop evidence) — fail ⇒ block(rework) + qa_feedback, release
2. run pr_review_gate suite (base: unit/component) — fail ⇒ classify, block(rework), release
3. Avi delegates: 2 seniors (domain fit + LOGGED tiebreak), 1 PRIMARY + 1 LIGHT — §1.2
each: diff-vs-acceptance + standards/smell/scalability — changes ⇒ ONE complete qa_feedback + block
4. on 2 approvals → reviewed ✅ — Discord: approved
# Workflow 2 — the ONE active release (singleton).
release.assembling:
pick next reviewed member(s) honoring dependencies + lanes (§4.2)
bin/release merge <task...>: overlap planner (warn) → gh pr merge each (base release) → adopt ALL in ONE heroku run (ensure) — conflict ⇒ block task (resolve on GitHub)
member(s) → assembled; when desired members in + integration + e2e-smoke green on origin/release (Steffon):
bin/release prepare → deploy origin/release → QA + Discord notes → release.assembled
# full e2e + highest tier runs at ship, on the FROZEN ship SHA (Avi) — §1.2
release.assembled:
Avi: full e2e + highest tier on the FROZEN ship SHA # §1.2 — closes "shipped ≠ tested"
if operator_made_the_release: bin/release ship → PREFLIGHT (each app on clean main, else abort) → ff release → main, bin/deploy → production_smoke → notes → members shipped # ONLY here
else: no-op (HARD STOP — wait for the operator to Make the release)
update last_heartbeat_at, current_command, blocked_reason; emit progress
Properties that give resilience + scale:
claimed_by, claim_expires_at, last_heartbeat_at) →
an interrupted task is reclaimable by the next heartbeat; this is exactly the
interruption-resilience you asked for, at the task level.The heartbeat agent will not merge-race conflicting work:
bin/release merge a b c
prints, before merging, the files each PR shares with the others, a
suggested merge order (smallest-footprint first), and which PRs will need a
post-merge rebase — so siblings that all touched task.rb / a shared helper /
the docs don't conflict on release after passing review. Warning-only (it
never blocks); the conductor reads it to choose order / rebase the loser.db/schema.rb or migrations → serialize
via the existing backend_migration advisory lock; second one holds with a
note.release_train (existing lane); the agent promotes the
train in order, never a consumer ahead of its gem.release_train deployed in order
(Squads program upgrade first, then app IDL re-pin via bin/deploy's
allow-list dance). The agent refuses to deploy two program upgrades
concurrently.migration/solana/payment) still land in your one-click queue
with full context — the agent sequences, you approve.Three message classes, deterministic templates with a small freeform
notes slot. Posted freely by the heartbeat agent.
| Class | Trigger | Shape (deterministic) |
|---|---|---|
| Heartbeat digest | every tick (or every N) | 🔄 DevOps tick HH:MM — N in review · M in QA · K awaiting approval. Blockers: … |
| Task event | stage advance / send-back | ✅ <title> merged → QA <url> · ⛔ <title> sent back: <reason> · 🟡 <title> QA-passed — approve to ship: <qa url> |
| Release notes | after prod deploy | existing POST /api/v1/release_notes (already standardized, grouped-by-app, task-linked) |
The 1000ft view: blockers + "awaiting approval" are the only two classes you
must read; the digest is ambient. Webhooks: reuse
DISCORD_RELEASE_NOTES_WEBHOOK_URL; add DISCORD_DEVOPS_PROGRESS_WEBHOOK_URL
for digests/events so release notes stay clean.
Add a routing block to AGENTS.md so a fresh agent self-selects its SOP:
## DevOps Routing
Before implementing, identify your role and read the matching section of
docs/agents/system/devops-cycle-design.md:
- Handling a FEATURE → § Feature SOP. Classify the feature SHAPE and load its
test contract before writing code. Build: designed → building → submitted.
- Handling a BUG → § Bug SOP. Write the failing regression test first.
- Running the airgapped/QA cycle → § Heartbeat agent. One safe step per task;
review moves submitted → reviewed or blocked; never ship a release without the
operator OK.
Everything else the agent needs already loads via the existing Start Here
table. No per-session explanation from you.
Compartmentalize tokens: deterministic scripts carry the 80%; escalate to a
capable model only for genuine review judgment, and only to Opus for high-risk
surfaces.
| Step | Nature | Engine | Model |
|---|---|---|---|
| DoR gate, metadata presence | deterministic | bin/dor-check |
none |
| Run test suites | deterministic | CI / bin/devops-tests |
none |
| Conflict / lane check | deterministic | bin/ + advisory locks |
none |
| Classify a check failure (real / flaky / stale) | light judgment | small model | Haiku |
| QA acceptance evaluation | suite + light judgment | suite + small model | Haiku |
| PR diff vs acceptance review | judgment | capable model | Sonnet, Opus if solana/payment/migration/auth |
| Merge decision | rules-gated judgment | rules + model | Sonnet |
| QA deploy / prod deploy | deterministic | bin/qa-server / bin/deploy |
none |
| Release-notes formatting | deterministic | POST /api/v1/release_notes |
none |
| Release-notes highlights prose | light judgment | small model | Haiku |
| Discord digest / event messages | deterministic templates | script | none |
| Production approval | human | — | Mr. McRitchie |
shape field vs inferring shape from risk_tags — add an explicit
devops.shape field, or derive it? (Recommend explicit; it's the contract key.)hotfix severity that goes
straight to building and shortens review, still regression-tested +
ship-gated? (Recommend yes.)DISCORD_DEVOPS_PROGRESS_WEBHOOK_URL, or
reuse the release channel? (Recommend separate.)Done
bin/dor-check + the shape→contract matrix in config/feature_shapes.yml.AGENTS.md / CLAUDE.md routing block.bin/task / bin/dor-check / board, the data migration, blocked metadata
(blocked_from + block_kind), and the DoR-to-Build / DoR-to-Merge gates.Release singleton model + release_slug / dependencies on Task + the
board's "current release" header.release branch cutover: bin/release init|merge|prepare|
ship on the persistent per-repo release branch — membership flips at merge
(merge → gh pr merge + Release::Conductor.adopt!), prepare deploys
origin/release to QA, ship fast-forwards release → main (§1.1).Next
bin/agent-worktree's automatic PR --base default from main to
release (branch-from + finish --pr base), so feature agents no longer pass
--base release by hand.bin/devops-cycle (+ its snapshot fixture +
bin/devops-tests lane names) from the legacy stage names to the new ones.config/devops_test_suites.yml.DISCORD_DEVOPS_PROGRESS_WEBHOOK_URL.We emailed a one-tap sign-in link to . It expires shortly and can only be used once.
No email? Check spam, or close this and try again.