Agents Builders

Multi-repo conductor (c): multi-repo ship + gem auto-repin + partial-ship policy

Archived
task-a386407ba0a4

Created

Jun 22, 04:41

Started

Jun 22, 06:27

Completed

Jun 22, 13:55

DevOps handoff

Type

Feature

Shape

backend

Worktree Slug

multi-repo-conductor-c

Repositories

mcritchie-studio

Release Train

Branch

feat/multi-repo-ship

Local URL

QA URL

Production URL

release-pipeline prod-deploy gem-publish multi-repo

Acceptance Criteria

  • bin/release ship dispatches per-repo prod adapters from repo_plan: git_push_heroku (hub: ff main->release, push origin, push remote branch, smoke smoke_url) + repo_script (satellite: ff main->release + push origin, then run command+args e.g. bin/deploy --yes cwd=repo; repo owns its smoke/rollback). Gems published first.
  • Gem auto-repin (D1): after each gem publish, for each app member consuming it (GemfileRepin.references_branch?), rewrite Gemfile -> ~> x.y + bundle lock + commit on that consumer's release branch BEFORE it deploys; idempotent; retry bundle on RubyGems propagation lag; one pass per consumer after all gem publishes.
  • Partial-ship policy (decision #3): producer-first gems->hub->satellites; ABORT on first failure (never deploy a consumer against an unpublished gem); idempotent re-run (published gems skip, ff no-op, repin idempotent); print a 'what is already live' report on abort.
  • BLOCKING (Avi): optional registry per-repo test_cmd run on the MERGED release branch before the irreversible prod deploy (hub defaults to bin/rails test; sh chdir=repo_path; scoped abort; skipped when unset/dry-run).
  • Fold phase (b) nits: drop dead QA_URL const; abort on git -C checkout failure; mirror PROJECTS_DIR env override; tighten dry-run --task abort copy.

Expected Test Plan

  • [unit] prod-adapter dispatch selection + auto-repin orchestration (extract testable helpers); GemfileRepin already covered
  • [integration]/[manual] ship --dry-run multi-repo walkthrough: gems publish -> repin consumers -> hub -> satellites, abort-on-failure + idempotent re-run

Checks Run

  • [unit] ship_sequence_test (strategy/hub-order/gems_to_repin/publish_needed?/yanked?) + repos test_cmd — 110 runs, 0 failures; zeitwerk clean; ruby -c OK
  • [integration] release_cli ship --dry-run multi-repo: gem publish/skip/yank-guard, repin pass, hub-first, both adapters, ff-to-frozen, partial-ship line

Stage Timeline

Who handled each stage, the time it took (measured), and the model / tokens / cost reported (best-effort) — plus who's on it right now. means the agent didn't report that metric.

No stage changes recorded yet.

Conversation

QA review feedback, agent handoffs, and follow-up notes for this task.

Comment 5 days ago

DESIGN LOCKED (Plan pass) — implementation deferred. SCOPE: entirely bin/release ship + new PURE Ruby (app/models/release/ship_sequence.rb) for testable decisions; record side (Release/Conductor/Repos/Ordering/GemfileRepin) needs NO behavior change (repo_plan already emits the per-repo structure; GemfileRepin already does the pure rewrite). + optional test_cmd in release_repos.yml + Release::Repos.test_cmd reader. SHIP SEQUENCE (producer-first, decision #3): (1) publish ALL gems (reuse publish_gem_members + an idempotency pre-check); (2) auto-repin pass over consumers (after all gems live, before any app deploy); (3) per-repo app deploy loop, HUB FIRST then satellites (hub=SSO/board provider; consumers tolerate newer provider; supported skew), each: (i) test_cmd gate on the MERGED branch, (ii) prod adapter dispatch; (4) Release::Conductor.ship! + post_release_notes LAST (atomicity boundary). ADAPTERS (dispatch on prod_deploy.strategy): git_push_heroku (hub: git -C path ff main<-release, push origin, push remote branch, smoke smoke_url/up, abort no rollback) | repo_script (satellite: git -C path ff main<-release + push ORIGIN, then sh(command,*args,chdir:path) e.g. bin/deploy --yes — repo owns smoke+rollback, hub must NOT re-smoke; turf/bin/deploy needs CLEAN TREE so repin must commit BEFORE deploy; remote hardcoded heroku-mainnet; one human confirm + --yes covers the train). AUTO-REPIN (D1, build-time branch-ref convention; ship does rewrite-back): per consumer app group, per published gem, if GemfileRepin.references_branch? -> rewrite -> bundle lock --update <gem> (chdir, BOUNDED RETRY for RubyGems propagation) -> git add+commit on the consumer release branch. One pass per consumer after ALL gems. Idempotent (references_branch? false once pinned). PARTIAL-SHIP (NOT atomic across RubyGems+N heroku): abort-on-first-failure; ship! flips record LAST so a partial ship stays 'assembled' (recoverable); IDEMPOTENT re-run = gem_already_published?(RubyGems query) skips publish + ff/push are no-ops + repin idempotent; rescue SystemExit -> print 'what is already live' progress report + failed repo + recovery cmd, re-exit nonzero. TEST_CMD GATE (Avi BLOCKING): optional registry test_cmd, run sh(chdir:repo_path) on MERGED branch BEFORE that repo's prod deploy (after repin so it tests the pinned lock); hub defaults bin/rails test; satellites unset (their bin/deploy tests); scoped abort; skip unset/dry. (Caveat: setting it on a satellite double-runs tests — document, leave unset.) NITS folded: drop dead QA_URL const; abort on git -C checkout/fetch failure (dirty-sibling wrong-branch footgun); mirror PROJECTS_DIR env in projects_root; fix dry-run --task abort copy. TESTS: extract pure to ship_sequence.rb (strategy_handler, gems_to_repin, publish_needed?/repin_needed?, ordered_app_groups hub-first, Repos.test_cmd) — unit; shell verified via ship --dry-run in release_cli_test.rb; GemfileRepin already covered.

Comment 5 days ago

OPEN DECISIONS for the operator (settle before/at implementation): (1) idempotency detection mechanism (gem list -r -e -a vs RubyGems versions API) + YANKED-version handling (abort vs re-push). (2) per-satellite confirm for high-risk prod (turf-monster handles real Stripe funds) vs one-confirm-covers-train. (3) GAP: prod deploys against the published-VERSION pin while QA tested the branch-REF (same code, different lock resolution) — accept, or have prepare do a provisional repin so QA tests the pinned state? (4) bundle-lock retry count/backoff + abort-vs-warn on persistent failure. (5) strict all-or-nothing ship! flip confirmed (re-run is idempotent). (6) hub-new/satellite-self-rolled-back resting state acceptable? (7) release branches are local-only until ship -> ship MUST run from the same checkout that ran prepare.

Comment 5 days ago

OPEN DECISIONS RESOLVED (Mr. McRitchie): (1) Yanked target version -> ABORT 'version yanked, bump + re-run' (RubyGems forbids re-pushing a yanked number); idempotent SKIP only for a LIVE published version. (2) HANDS-OFF: ONE confirm authorizes the whole train incl. turf mainnet — NO per-satellite re-prompt (turf's own bin/deploy keeps its smoke+rollback safety net at the repo level). (4) bundle-lock: bounded retry+backoff for RubyGems propagation, abort on persistent failure. Satellite self-rollback resting state (hub-new/satellite-old) ACCEPTED as the safe skew direction. NOTE: this phase-(c) plan is REVISED by the release-branch model decision on the epic — re-plan before implementing.

Comment 5 days ago

PR2 build START (stacked on PR1 release-branch-cutover, branch feat/multi-repo-ship). Adjusted for the release-branch model: ship ff's main up to the QA-FROZEN SHA (release.metadata qa_shas recorded at prepare) per repo, NOT release HEAD; gem repos also have release branches; consumers auto-repinned on their release branch then ff'd. Otherwise per the locked phase-c design (gems->hub->satellites, publish+auto-repin+partial-ship+test_cmd gate, idempotent re-run, ship! flips LAST).

Comment 5 days ago

QA (Avi): APPROVE-WITH-NITS — irreversible-ordering crux SOUND (gems->consumers, hub->satellites, frozen qa_sha not release HEAD, record flips LAST, every failure fails closed). Nits ADDRESSED: (1) yank-detection was dead vs the real RubyGems versions API (excludes yanked) — removed yanked?/synthetic tests; yank safety now honestly delegated to gem push failing closed (rejects re-push of a yanked number, before any app deploy); (3) re-pin drift guard now fetches origin + asserts LOCAL release HEAD == frozen qa_sha; (4) push_origin_main aborts on failure. Deferred follow-up: gem frozen-SHA (prepare records qa_shas for apps only; gems fall back to origin/release HEAD) — symmetric fix in a later prepare change. 106/0; zeitwerk clean. Stacked on PR1 #87; retarget #89 to main after #87 merges.

Sealed-bid sizing

Edit →

Alex (PM)

Avi (PO)

Dev

Actual