Agents Builders

Reviewer-select weight followups

Archived
reviewer-select-weight-followups

Created

Jun 23, 12:43

Started

Jun 23, 21:57

Completed

Jun 24, 00:16

DevOps handoff

Type

Bug

Shape

backend

Worktree Slug

reviewer-select-weight-followups

Repositories

mcritchie-studio

Release Train

Branch

feat/reviewer-select-weight-followups

Local URL

QA URL

Production URL

tooling

Acceptance Criteria

  • review_weight stored numeric so heavy seat is weight-driven
  • CLI reviewer-select pick matches recorded reviewers on ties
  • regression test covers string-weight and tie divergence

Expected Test Plan

  • unit

Checks Run

  • [unit] reviewer_selector_test: weight-label heavy seat + seeded reproducibility + builder-in-seed tie (CLI==recorded)
  • [unit] agents_seed_test: senior review_weight asserts numeric 2.0 (was 'heavy')
  • [integration] task_test: submitted->reviewed recorded pair == bin/reviewer-select preview
  • [suite] full bin/rails test 1308 runs 0 failures 0 errors 4 skips; bin/rubocop 464 files no offenses

Agent Context

Two non-blocking observations from Carl's heavy review of PR #123 (wire-2-senior-review). (1) PRE-EXISTING: prod seeds db/seeds/02_agents.rb:44,77,93,109,125 store review_weight as the STRING 'heavy'/'light'; ReviewerSelector#reviewer_weight does raw.to_f -> 0.0 for all of them, so the seeded weight label never drives the heavy seat in prod — fit breaks the tie instead (happens to put the right domain owner heavy, so low impact). The unit test reviewer_selector_test.rb:140 uses numeric weights (9/1), masking this. Reconcile label<->numeric (map heavy->2.0/light->1.0 or store numeric). (2) bin/reviewer-select calls .decision while the avatars recorder task.rb:418 independently calls ReviewerSelector.select; the CLI never writes Current.task_event_reviewers, so on a genuine fit+weight tie Avi could spawn one pair while the avatars record another. Mitigated by the CLI 'advisory until PR opened' note; tidy fix = CLI surfaces/persists the pair for Avi to curate.

Stage Timeline

Who handled each stage, the time it took (measured), and the model / tokens / cost reported (best-effort) — plus who's on it right now. means the agent didn't report that metric.

  1. Created Designed
    G Geodude
    Geodude
    Model
    Duration
    Tokens
    Cost
    Completed Jun 23, 12:43 · 4 days ago
    api
  2. Designed Building
    G Geodude
    Geodude
    Model
    claude-opus-4-8
    Duration
    about 8 hours
    Tokens
    Cost
    Started Jun 23, 12:43
    Completed Jun 23, 21:04 · 4 days ago
    cli
  3. Building Submitted
    G Geodude
    Geodude
    Model
    claude-opus-4-8
    Duration
    5 minutes
    Tokens
    3,397,366
    Cost
    ~$2.33
    Started Jun 23, 21:04
    Completed Jun 23, 21:09 · 4 days ago
    cli
  4. Submitted Blocked
    Model
    Duration
    8 minutes
    Tokens
    Cost
    Started Jun 23, 21:09
    Completed Jun 23, 21:17 · 4 days ago
    api
  5. Blocked Building
    G Geodude
    Geodude
    Model
    claude-opus-4-8
    Duration
    40 minutes
    Tokens
    Cost
    Started Jun 23, 21:17
    Completed Jun 23, 21:57 · 3 days ago
    cli
  6. Building Submitted
    G Geodude
    Geodude
    Model
    claude-opus-4-8
    Duration
    4 minutes
    Tokens
    28,678,973
    Cost
    ~$19.75
    Started Jun 23, 21:57
    Completed Jun 23, 22:02 · 3 days ago
    cli
  7. Submitted Reviewed
    C Carl
    Carl primary
    J Jasper
    Jasper light
    Model
    claude-opus-4-8
    Duration
    4 minutes
    Tokens
    6,047,002
    Cost
    ~$3.74
    Started Jun 23, 22:02
    Completed Jun 23, 22:06 · 3 days ago
    cli
  8. Reviewed Assembled
    S Steffon
    Steffon
    Model
    Duration
    under a minute
    Tokens
    Cost
    Started Jun 23, 22:06
    Completed Jun 23, 22:06 · 3 days ago
  9. Assembled Shipped
    A Avi
    Avi
    Model
    Duration
    about 2 hours
    Tokens
    Cost
    Started Jun 23, 22:06
    Completed Jun 24, 00:16 · 3 days ago
  10. Shipped Archived
    Model
    Duration
    about 3 hours
    Tokens
    Cost
    Started Jun 24, 00:16
    Completed Jun 24, 03:44 · 3 days ago

Conversation

QA review feedback, agent handoffs, and follow-up notes for this task.

QA Feedback avi 4 days ago

2-senior review: carl (HEAVY) REQUEST_CHANGES; alex-docs (LIGHT) REQUEST_CHANGES. Fix before resubmit: (1) BLOCKER test/models/agents_seed_test.rb:62 — the seed change review_weight 'heavy'->2.0 is NOT reflected here; this guard loads the seed file and asserts =="heavy" for all 5 seniors, so the FULL suite goes RED post-merge (you only ran the 2 touched files). Update the assertion to 2.0 (+ rename off 'heavy'); RUN THE FULL bin/rails test, not just touched files. (2) MEDIUM docs same-pass (alex-docs): behavior changed random->per-task-seeded/reproducible but docs weren't updated. Fix devops-cycle-design.md §1.2 (~line 275) to say the tiebreak is seeded per-task (reproducible), logged for audit, + add the 'bin/reviewer-select preview matches the recorded submitted->reviewed pick' guarantee; sweep remaining 'random tiebreak' wording (devops-cycle-design 247/426, avi/role.md 10/37, mission.md 72-73, bin/reviewer-select 14/138). (3) MINOR scope the 'preview matches recorded pick' claim to the DEFAULT qa_owner (a custom --qa-owner diverges pool+seed). (4) MERGE COORDINATION: this PR will REBASE onto exclude-builder-from-reviewers (#136, merging first); fold the EXCLUDED BUILDER into seed_for's key alongside qa_owner, else two passes excluding different builders share a seed but differ in pool -> divergence returns. (5) NIT WEIGHT_LABELS fallback treats '12abc' as 12.0. NOTE: rework dispatched AFTER #136 merges so you can rebase onto it.

QA Feedback avi 3 days ago

Rework RE-REVIEW PASS: carl (HEAVY) approve — agents_seed_test asserts 2.0 (full suite green); seed-key coordination verified (seed_for folds excluded_builder so .decision/.select agree post-exclusion; tie-with-builder-excluded test proves CLI preview==recorded); rebase kept BOTH #136 exclusion + #135 seed/weight intact; CI 1308/0 + rubocop 0. alex-docs (LIGHT) approve — 'random tiebreak' wording swept across devops-cycle-design/avi-role/mission/bin, preview guarantee scoped to default qa_owner, #136 §247 preserved. DEFERRED nits: mission.md guarantee lacks the default-qa_owner caveat (overview altitude, ok); one code comment says 'random' loosely.

Sealed-bid sizing

Edit →

Alex (PM)

Avi (PO)

Dev

Actual