| 2026-05-03 |
one-off |
Data Health Dashboard v1 |
0 |
110 KB |
Coverage grid (rank tier × source) with 4-state encoding + appendable scan_log + this wave history ledger. Unified inventory view across all pipelines. |
| 2026-04-27 → 2026-04-28 |
one-off |
April 27 discovery refresh |
181,055 |
1.8 GB |
Bulk-pass that doubled the roster from ~190K to 389K. Added cm_stats + profile metadata for 181,055 newly discovered artists. NO daily history pulled — these artists are snapshot-only (the yellow band on the coverage grid). |
| 2026-04-23 → 2026-04-30 |
one-off |
Instagram daily history |
181,055 |
9.0 GB |
Full IG history 2017–now for all 181,055 artists with IG presence per cmStats. 125,882 actually wrote a parquet (others had API gaps). Unlocks social-first vs streaming-first typology + IG-as-leading-indicator analysis. |
| 2026-04-15 → 2026-04-16 |
incident |
Disk-fill incident & recovery |
41 |
— |
macOS code_sign_clone bloat filled root partition to 100%, causing 41 silent parquet corruptions and 13 errored entities. Documented in RESUME_AFTER_REBOOT.md; reset to pending and resumed. |
| 2026-04-11 → 2026-05-03 |
one-off |
Roster extension (5K–20K follower band) |
75,000 |
6.0 GB |
Discovery + cmStats + metadata + full daily for ~75K artists in the 5K–20K sp_followers band — fills the gap below Wave 1's 15.7K floor. The 'developing artists' negative-example cohort from the strategic risks doc. |
| 2026-04-01 → 2026-04-12 |
one-off |
Track full history (pop>=30/50) |
122,608 |
3.2 GB |
Full 2017–now daily on 8 platforms for tracks with sp_popularity >= 30 (~91K initial, expanded to 122,608). Async client with platform pre-filter. |
| 2026-03-25 → 2026-04-15 |
one-off |
Wave 4 — Full artist history (priority sources) |
99,826 |
8.0 GB |
Full daily 2017–now for 100K artists × 5 priority sources. Interrupted by disk-fill on Apr 15; recovered via RESUME_AFTER_REBOOT.md playbook (41 silent corruptions, 13 errors reset). |
| 2026-03-16 → 2026-03-25 |
one-off |
Wave 3 — Track daily (730d lookback) |
188,194 |
2.2 GB |
188,194 tracks × 8 platforms × 730d. First use of async client (aiohttp + 4 req/sec sliding window) with track-level platform pre-filter (~47% call savings, avg 4.2/8 platforms per track). |
| 2026-03-14 → 2026-03-18 |
one-off |
Wave 3 — Playlist registry + daily snapshots |
116,859 |
493 MB |
100 editorial Spotify playlists; daily snapshots back to Jan 2023. 116,859 (playlist × date) snapshots. |
| 2026-03-09 → 2026-03-11 |
one-off |
DJ Universe — Chartmetric name search |
5,995 |
19 MB |
SequenceMatcher + genre bonus matched 5,995 of 6,402 unmatched DJ names (93.6% hit rate). 4 parallel Sonnet agents vetted 199 suspect matches; 118 rejects applied to master. |
| 2026-03-03 → 2026-03-04 |
one-off |
Wave 2.7 — Artist profile metadata |
99,826 |
700 MB |
59-col extraction from /artist/:id (genres, career_stage, hometown, label, etc.). Filled the gap left by Wave 1, which collected cmStats but not profile metadata. |
| 2026-03-01 → 2026-03-03 |
one-off |
Wave 2.5 — Track universe + metadata |
188,194 |
650 MB |
Track universe = union of chart-discovered + filter-discovered + per-artist catalog tracks. 188,194 tracks × 51-field cm_statistics snapshot. |
| 2026-03-01 → 2026-03-05 |
one-off |
Wave 2.6 — Artist 2yr daily lookback |
99,826 |
7.0 GB |
99,826 artists × 5 priority sources (Spotify, YouTube channel/artist, TikTok, Wikipedia) over Mar 2024–Mar 2026. cmStats source pre-filter saved ~30% of calls. |
| 2026-02-28 → 2026-03-02 |
one-off |
Wave 2.0 — Chart snapshots (7 series, 2017–2026) |
23,429 |
54 MB |
Snapshot-based ingestion across 7 chart series back to 2017. ~270x fewer API calls than per-artist chart history. |
| 2026-02-27 → 2026-02-28 |
one-off |
Wave 1 — 100K artist roster + cmStats |
99,826 |
320 MB |
Bulk pagination discovery yielded 99,826 artists. cmStats enrichment added 341 columns per artist (76 base + 130 weekly diff + 130 monthly diff + pct variants). |
| 2026-02-26 |
one-off |
Wave 0 — Seed artists |
20 |
10 MB |
20 hand-picked top artists (Bad Bunny, Drake, Taylor Swift…) ingested at full depth — metadata, daily metrics across all 12 sources, track catalogs. Pipeline validation. |
| 2026-02-25 → 2026-03-08 |
one-off |
DJ Universe — Discovery + ID resolution |
7,627 |
30 MB |
Editorial-seeded DJ/electronic artist discovery from 12 sources. 22K raw → 13.5K cleansed → 7,627 ID-matched (4,435 Chartmetric + 3,767 Resident Advisor profiles). |