The full per-iteration journal and agent transcript for every rep are committed to the repository. No data is summarized away. Below is the index.
bench/results.jsonl — one row per rep, structured. Schema: model, rep, status, final_fitness, best_fitness, baseline_fitness, delta_pct, iterations, accepted, rejected, broken, broken_by_class, wall_clock_sec, total_cost_usd, total_tokens_in/out, best_lut4, best_ff, best_fmax_mhz, best_iterations, best_cycles, best_ipc_coremark.bench/leaderboard.csv — per-model aggregate (mean fitness, best, broken counts).bench/LEADERBOARD.md — human-readable leaderboard with failure-mode breakdowns.| Model | Rep | Status | Iters | Best fit | Log | Transcript | Summary |
|---|---|---|---|---|---|---|---|
| gemini-3_1-pro | rep1 | done | 46 | 354.73 | log.jsonl | agent.log | summary.json |
| gemini-3_1-pro | rep2 | done | 46 | 339.62 | log.jsonl | agent.log | summary.json |
| gemini-3_1-pro | rep3 | done | 46 | 323.92 | log.jsonl | agent.log | summary.json |
| gpt-5_4_xhigh | rep1 | done | 46 | 496.11 | log.jsonl | agent.log | summary.json |
| gpt-5_4_xhigh | rep2 | done | 46 | 513.84 | log.jsonl | agent.log | summary.json |
| gpt-5_5_high | rep1 | done | 46 | 461.87 | log.jsonl | agent.log | summary.json |
| gpt-5_5_high | rep2 | done | 46 | 420.61 | log.jsonl | agent.log | summary.json |
| gpt-5_5_high | rep3 | done | 46 | 408.01 | log.jsonl | agent.log | summary.json |
| gpt-5_5_medium | rep1 | done | 46 | 431.58 | log.jsonl | agent.log | summary.json |
| gpt-5_5_medium | rep2 | done | 46 | 407.55 | log.jsonl | agent.log | summary.json |
| gpt-5_5_medium | rep3 | done | 46 | 431.24 | log.jsonl | agent.log | summary.json |
| gpt-5_5_xhigh | rep1 | done | 46 | 397.83 | log.jsonl | agent.log | summary.json |
| gpt-5_5_xhigh | rep2 | done | 46 | 525.04 | log.jsonl | agent.log | summary.json |
| gpt-5_5_xhigh | rep3 | done | 46 | 482.03 | log.jsonl | agent.log | summary.json |
| kimi-k2_6 | rep1 | done | 46 | 347.76 | log.jsonl | agent.log | summary.json |
| kimi-k2_6 | rep2 | done | 46 | 331.22 | log.jsonl | agent.log | summary.json |
| kimi-k2_6 | rep3 | failed | 31 | 396.13 | log.jsonl | agent.log | summary.json |
Each log.jsonl is one row per iteration: hypothesis ID, title, outcome
(improvement / regression / broken), fitness,
delta vs baseline, LUT4, FF, Fmax, IPC, cycles, error class if broken, timestamp.
Each agent.log is the verbatim model transcript: every bash command,
every file read, every write.