Agents Builders

Prune cached commit observations

Archived
prune-cached-commit-observations

Created

Jun 26, 04:41

Started

Jun 26, 05:32

Completed

Jun 26, 06:33

DevOps handoff

Type

Chore

Shape

backend

Worktree Slug

prune-cached-commit-observations

Repositories

mcritchie-studio

Release Train

Branch

feat/prune-cached-commit-observations

QA URL

Production URL

destructive-data backend

Acceptance Criteria

  • Delete builder observations after full window cached
  • Prune only when cache reports complete
  • Preserve trailing 90-day baseline correctness
  • Log pruned observation count per builder

Expected Test Plan

  • unit: batch runner prunes after complete cache
  • integration: aggregate then prune keeps metrics

Checks Run

  • [unit] builder_history_batch_runner_test: prune-when-complete, retain-when-incomplete, watermark advances before delete
  • [integration] builder_history_prune_integration_test: real aggregator baseline+metrics, then prune
  • [integration] ai_builder_multiple_controller_test: dashboard partial-week protection survives prune via watermark
  • [full-suite@396447e10a512fa2f3975358299e83a4824bc4ea] bin/rails test green
  • [rubocop@396447e10a512fa2f3975358299e83a4824bc4ea] bin/rubocop clean

Agent Context

Root-causes the 2026-06-25 prod outage: github_commit_observations grew unbounded (231K rows / 782MB, 639MB raw_payload TOAST) and tipped essential-0 Postgres over its storage limit, which clamped DB connections. Observations are a disposable staging area: BuilderWeeklyAggregator derives GithubBuilderCommitRangeCache (metrics + commit_shas) from them. Fix: in BuilderHistoryBatchRunner#fetch_with_segment_cache, AFTER the final full-window aggregate! (batch_runner.rb:121), delete that builder's observations, gated on cache_summary[:complete]. MUST NOT delete per-segment: aggregator needs a trailing-90-day baseline of raw observations (aggregator.rb:25,57) AND re-aggregates the whole window at the end; per-segment deletion corrupts builder_multiple. Re-runs simply re-fetch from GitHub.

Stage Timeline

Who handled each stage, the time it took (measured), and the model / tokens / cost reported (best-effort) — plus who's on it right now. means the agent didn't report that metric.

Sizing Avi · PO SMALL Dev SMALL Actual XL ≠ forecast
  1. Created Designed
    M Marowak
    Marowak
    Model
    Duration
    Tokens
    Cost
    Completed Jun 26, 04:41 · 1 day ago
    api
  2. Designed Building
    M Marowak
    Marowak
    Model
    claude-opus-4-8
    Duration
    under a minute
    Tokens
    Cost
    Started Jun 26, 04:41
    Completed Jun 26, 04:42 · 1 day ago
    cli
  3. Building Submitted
    M Marowak
    Marowak
    Model
    claude-opus-4-8
    Duration
    13 minutes
    Tokens
    10,366,418
    Cost
    ~$8.09
    Started Jun 26, 04:42
    Completed Jun 26, 04:55 · 1 day ago
    cli
  4. Submitted Blocked
    Model
    Duration
    19 minutes
    Tokens
    Cost
    Started Jun 26, 04:55
    Completed Jun 26, 05:14 · 1 day ago
    api
  5. Blocked Building
    M Marowak
    Marowak
    Model
    claude-opus-4-8
    Duration
    18 minutes
    Tokens
    1,098,432
    Cost
    ~$1.63
    Started Jun 26, 05:14
    Completed Jun 26, 05:32 · 1 day ago
    cli
  6. Building Submitted
    M Marowak
    Marowak
    Model
    claude-opus-4-8
    Duration
    14 minutes
    Tokens
    19,676,610
    Cost
    ~$13.15
    Started Jun 26, 05:32
    Completed Jun 26, 05:46 · 1 day ago
    cli
  7. Submitted Reviewed
    C Carl
    Carl primary
    J Jasper
    Jasper light
    Model
    claude-opus-4-8
    Duration
    14 minutes
    Tokens
    1,930,151
    Cost
    ~$3.44
    Started Jun 26, 05:46
    Completed Jun 26, 06:00 · 1 day ago
    cli
  8. Reviewed Assembled
    S Steffon
    Steffon
    Model
    claude-opus-4-8
    Duration
    1 minute
    Tokens
    1,043,532
    Cost
    ~$0.84
    Started Jun 26, 06:00
    Completed Jun 26, 06:01 · 1 day ago
  9. Assembled Shipped
    A Avi
    Avi
    Model
    claude-opus-4-8
    Duration
    32 minutes
    Tokens
    11,407,316
    Cost
    ~$11.13
    Started Jun 26, 06:01
    Completed Jun 26, 06:33 · 1 day ago
  10. Shipped Archived
    Model
    Duration
    about 12 hours
    Tokens
    Cost
    Started Jun 26, 06:33
    Completed Jun 26, 18:43 · about 15 hours ago

Conversation

QA review feedback, agent handoffs, and follow-up notes for this task.

QA Feedback 1 day ago

Heavy review BLOCK (Carl): the premise 'observations read by nothing' is false. GithubCommitObservation rows ARE read: Admin::AiBuilderMultipleController derives @observed_through_date from GithubCommitObservation.maximum(:committed_at) (lines 12,40) -> drives boundary_week_partial?->representative_metrics_week; pruning nulls it, defeats partial-week protection, surfaces artificially-low builder multiples as 'representative'. Also diagnostics at :142/:148/:166/:172/:175 and backtest_csv_exporter.rb:155 read observations. FIX: re-source observed_through_date + diagnostics from a non-pruned table (GithubBuilderCommitRangeCache / WeeklyMetric) OR make boundary logic tolerate an empty observations table, plus a dashboard test against an empty-observations table. NOTE: this only stops future growth (skip_complete:true skips the 231k backlog) and delete_all needs VACUUM to reclaim disk -> a one-time skip_complete:false sweep + VACUUM is still required to relieve the cap.

Handoff 1 day ago

Addressed Carl's heavy BLOCK: observation ROWS are read by the AI Builder dashboard (observed_through_date -> boundary_week_partial? -> representative week), not just the write-only raw_payload column. Day-granular observed_through can't come from the week-granular caches (in-progress week looks complete -> the exact bug), so added a durable GithubObservationWindow watermark advanced by the batch runner BEFORE pruning; the dashboard prefers live observations and falls back to the watermark (identical while rows exist, preserved once pruned). Diagnostics + rollups re-sourced from caches; backtest sample degrades to empty (documented). New tests: watermark-advances-before-delete (unit) + dashboard-keeps-boundary-exclusion-after-prune (controller). Schema-only migration (github_observation_windows), post_deploy_cmd=none. Clean rebase onto origin/release; full-suite + rubocop green at 396447e1. PR #223 force-pushed.

Sealed-bid sizing

Edit →

Alex (PM)

Avi (PO)

SMALL

Dev

SMALL

Actual

XL