Per-model detail

What each model actually did.

Below: the per-rep outcomes for every model run on HWE Bench so far, plus the accepted-improvement hypotheses each rep produced — verbatim titles, fitness, LUT4, and Fmax. The hypothesis titles are exactly what the agent wrote.

gpt-5_5_xhigh

gpt 5 5 xhigh

Best
525.04
+85.6% vs baseline
Mean
468.3
+65.6% mean Δ
Reps
3/3
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 397.83 +40.7% 6.1k 182 5 2 7.0h
rep2 done 525.04 +85.6% 5.5k 220 5 3 6.4h
rep3 done 482.03 +70.4% 3.2k 216 7 1 4.9h

Broken classes (all reps combined): cosim_failed×5, formal_failed×1

rep1 — winning hypotheses

R1 · Decouple slow RV32M ops from EX
fitness 368.83 (+30.4%) · LUT4 6.0k · Fmax 166 MHz
R3 · Retiming slow M finalization
fitness 380.70 (+3.2%) · LUT4 6.0k · Fmax 171 MHz
R4 · Register forwarding selects
fitness 386.71 (+1.6%) · LUT4 6.0k · Fmax 174 MHz
R8 · Registered low-half MUL path
fitness 397.83 (+2.9%) · LUT4 6.1k · Fmax 182 MHz

rep2 — winning hypotheses

R1 · Iterative divider off ALU critical path
fitness 400.55 (+41.6%) · LUT4 5.5k · Fmax 180 MHz
R6 · Prune dead pipeline control bits
fitness 427.56 (+6.7%) · LUT4 5.4k · Fmax 192 MHz
R7 · Valid-only pipeline payload resets
fitness 432.21 (+1.1%) · LUT4 5.4k · Fmax 194 MHz
R10 · Tiny ifetch replay predictor
fitness 525.04 (+21.5%) · LUT4 5.5k · Fmax 220 MHz

rep3 — winning hypotheses

R1 · Move DIV/REM to multicycle EX unit
fitness 350.21 (+23.8%) · LUT4 5.6k · Fmax 157 MHz
R2 · Share MUL hardware in ALU
fitness 353.49 (+0.9%) · LUT4 5.6k · Fmax 159 MHz
R4 · Hazard-only source-use interlock
fitness 381.04 (+7.8%) · LUT4 5.6k · Fmax 171 MHz
R6 · Register writeback payload in MEM/WB
fitness 406.96 (+6.8%) · LUT4 5.7k · Fmax 183 MHz
R9 · Remove regfile reset fanout
fitness 413.23 (+1.5%) · LUT4 3.2k · Fmax 186 MHz
R10 · Stage-local control bundles
fitness 482.03 (+16.6%) · LUT4 3.2k · Fmax 216 MHz
gpt-5_4_xhigh

gpt 5 4 xhigh

Best
513.84
+81.7% vs baseline
Mean
505.0
+78.5% mean Δ
Reps
2/2
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 496.11 +75.4% 3.2k 221 5 7 7.2h
rep2 done 513.84 +81.7% 10.1k 203 7 11 8.1h

Broken classes (all reps combined): cosim_failed×11, formal_failed×7

rep1 — winning hypotheses

R1 · Move DIV/REM Off The ALU Critical Path
fitness 343.51 (+21.5%) · LUT4 5.6k · Fmax 154 MHz
R2 · One-Deep Stalled-Store Retirement Slot
fitness 385.04 (+12.1%) · LUT4 5.6k · Fmax 172 MHz
R6 · Non-Aliasing Load Bypass Around Store Slot
fitness 439.32 (+14.1%) · LUT4 5.6k · Fmax 196 MHz
R10 · Reset-Light Write-First Register File
fitness 496.11 (+12.9%) · LUT4 3.2k · Fmax 221 MHz

rep2 — winning hypotheses

R2 · Registered I-Fetch Replay Predictor
fitness 316.05 (+11.8%) · LUT4 10.1k · Fmax 134 MHz
R3 · MEM-to-EX load bypass
fitness 332.31 (+5.1%) · LUT4 9.9k · Fmax 134 MHz
R4 · Shared signedness-selectable multiplier
fitness 334.91 (+0.8%) · LUT4 10.4k · Fmax 135 MHz
R5 · One-entry posted store buffer
fitness 377.60 (+12.8%) · LUT4 10.1k · Fmax 151 MHz
R6 · EX fast-path add/address bypass
fitness 391.32 (+3.6%) · LUT4 10.0k · Fmax 157 MHz
R8 · Resolve direct JAL in ID
fitness 513.84 (+31.3%) · LUT4 10.1k · Fmax 203 MHz
gpt-5_5_high

gpt 5 5 high

Best
461.87
+63.3% vs baseline
Mean
430.2
+52.1% mean Δ
Reps
3/3
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 461.87 +63.3% 9.8k 187 7 3 4.7h
rep2 done 420.61 +48.7% 12.0k 178 8 4 11.9h
rep3 done 408.01 +44.3% 5.6k 176 4 2 5.6h

Broken classes (all reps combined): cosim_failed×5, formal_failed×3, implementation_compile_failed×1

rep1 — winning hypotheses

R1 · Static backward branch predictor
fitness 338.66 (+19.7%) · LUT4 10.0k · Fmax 144 MHz
R3 · Add MEM-to-EX load forwarding
fitness 355.27 (+4.9%) · LUT4 10.2k · Fmax 144 MHz
R8 · Isolate M-extension ALU mux
fitness 366.02 (+3.0%) · LUT4 10.1k · Fmax 148 MHz
R9 · Gate static branch target formation
fitness 420.37 (+14.8%) · LUT4 9.9k · Fmax 170 MHz
R12 · Factor forwarding matches
fitness 437.68 (+4.1%) · LUT4 10.1k · Fmax 177 MHz
R14 · Register memory request metadata
fitness 461.87 (+5.5%) · LUT4 9.8k · Fmax 187 MHz

rep2 — winning hypotheses

R1 · Add small BTB branch predictor
fitness 285.54 (+1.0%) · LUT4 12.1k · Fmax 121 MHz
R2 · Gate false load-use stalls
fitness 290.93 (+1.9%) · LUT4 12.4k · Fmax 123 MHz
R3 · Optimize load byte-lane formatter
fitness 305.22 (+4.9%) · LUT4 12.4k · Fmax 129 MHz
R4 · Consolidate multiply datapath
fitness 313.60 (+2.8%) · LUT4 12.6k · Fmax 133 MHz
R6 · Trim BTB tag compare
fitness 326.00 (+4.0%) · LUT4 12.0k · Fmax 138 MHz
R7 · Prune dead pipeline payload
fitness 341.75 (+4.8%) · LUT4 12.2k · Fmax 145 MHz
R11 · Bypass ALU for LSU addresses
fitness 420.61 (+23.1%) · LUT4 12.0k · Fmax 178 MHz

rep3 — winning hypotheses

R1 · Move DIV/REM off the ALU critical path
fitness 352.98 (+24.8%) · LUT4 5.5k · Fmax 159 MHz
R4 · Case-based MEM byte-lane muxes
fitness 380.43 (+7.8%) · LUT4 5.5k · Fmax 171 MHz
R6 · Lookahead hot-branch predictor
fitness 408.01 (+7.2%) · LUT4 5.6k · Fmax 176 MHz
gpt-5_5_medium

gpt 5 5 medium

Best
431.58
+52.6% vs baseline
Mean
423.5
+49.7% mean Δ
Reps
3/3
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 431.58 +52.6% 7.8k 201 7 5 5.8h
rep2 done 407.55 +44.1% 7.4k 187 5 9 6.6h
rep3 done 431.24 +52.5% 10.0k 194 4 9 7.2h

Broken classes (all reps combined): cosim_failed×12, formal_failed×11

rep1 — winning hypotheses

R1 · Remove regfile reset fanout
fitness 316.22 (+11.8%) · LUT4 8.0k · Fmax 142 MHz
R2 · Move M extension to multicycle unit
fitness 356.85 (+12.8%) · LUT4 7.5k · Fmax 167 MHz
R3 · Prune dead pipeline metadata
fitness 397.64 (+11.4%) · LUT4 7.7k · Fmax 186 MHz
R5 · Add posted store buffer
fitness 405.49 (+2.0%) · LUT4 7.4k · Fmax 188 MHz
R9 · Register final writeback data
fitness 422.61 (+4.2%) · LUT4 7.0k · Fmax 196 MHz
R12 · Add narrow forwarding sideband
fitness 431.58 (+2.1%) · LUT4 7.8k · Fmax 201 MHz

rep2 — winning hypotheses

R1 · Multicycle RV32M arithmetic unit
fitness 375.73 (+32.9%) · LUT4 9.7k · Fmax 172 MHz
R5 · Drop regfile reset fanout
fitness 382.99 (+1.9%) · LUT4 7.6k · Fmax 176 MHz
R8 · Share M-unit multiplier hardware
fitness 393.66 (+2.8%) · LUT4 7.5k · Fmax 181 MHz
R9 · Split RVFI shadow metadata from datapath
fitness 407.55 (+3.5%) · LUT4 7.4k · Fmax 187 MHz

rep3 — winning hypotheses

R1 · Gate and share M-extension ALU hardware
fitness 398.35 (+40.9%) · LUT4 9.8k · Fmax 179 MHz
R5 · Precompute PC targets in decode
fitness 412.07 (+3.4%) · LUT4 10.2k · Fmax 185 MHz
R15 · Retire PC-next in MEM
fitness 431.24 (+4.7%) · LUT4 10.0k · Fmax 194 MHz
kimi-k2_6

kimi k2 6

Best
396.13
+40.1% vs baseline
Mean
339.5
+20.0% mean Δ
Reps
2/3
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 347.76 +23.0% 10.3k 146 3 23 9.0h
rep2 done 331.22 +17.1% 10.0k 141 3 26 8.6h
rep3 failed ⚠ 396.13 +40.1% 9.9k 166 4 14 8.9h

Broken classes (all reps combined): hypothesis_gen_failed×45, formal_failed×8, cosim_failed×6, implementation_compile_failed×2, schema_error×2

rep1 — winning hypotheses

R1 · Add static branch predictor (backward-taken, JAL-always-taken)
fitness 324.08 (+14.6%) · LUT4 10.2k · Fmax 138 MHz
R4 · Add 32-entry 2-bit BHT for forward branch direction prediction
fitness 347.76 (+7.3%) · LUT4 10.3k · Fmax 146 MHz

rep2 — winning hypotheses

R1 · IF-stage static predictor: backward branches and JAL always taken
fitness 316.95 (+12.1%) · LUT4 10.6k · Fmax 135 MHz
R3 · 4-entry Return Address Stack for JALR returns
fitness 331.22 (+4.5%) · LUT4 10.0k · Fmax 141 MHz

rep3 — winning hypotheses

R1 · Guard ALU multipliers off critical path for non-M ops
fitness 315.22 (+11.5%) · LUT4 10.2k · Fmax 142 MHz
R5 · 8-entry direct-mapped instruction cache in IF to absorb imem stalls
fitness 334.35 (+6.1%) · LUT4 10.0k · Fmax 140 MHz
R8 · Split ALU into fast and M-extension paths with final 2:1 mux
fitness 396.13 (+18.5%) · LUT4 9.9k · Fmax 166 MHz
gemini-3_1-pro

gemini 3 1 pro

Best
354.73
+25.4% vs baseline
Mean
339.4
+20.0% mean Δ
Reps
3/3
completed / total
per-rep detail
RepStatus BestΔ% LUT4Fmax accbrk Wall
rep1 done 354.73 +25.4% 10.2k 150 2 31 5.6h
rep2 done 339.62 +20.1% 11.1k 142 3 29 7.3h
rep3 done 323.92 +14.5% 11.8k 136 4 31 6.0h

Broken classes (all reps combined): hypothesis_gen_failed×68, sandbox_violation×9, formal_failed×6, cosim_failed×5, implementation_compile_failed×3

rep1 — winning hypotheses

R5 · 1-Cycle 64-entry BTB in IF Stage with Fast Redirect
fitness 354.73 (+25.4%) · LUT4 10.2k · Fmax 150 MHz

rep2 — winning hypotheses

R4 · 16-entry BTB/BHT predictor in IF stage
fitness 323.95 (+14.5%) · LUT4 10.3k · Fmax 137 MHz
R5 · 128-entry BTB + 8-entry RAS
fitness 339.62 (+4.8%) · LUT4 11.1k · Fmax 142 MHz

rep3 — winning hypotheses

R5 · IF-stage Static BTFN and JAL Predictor
fitness 282.91 (+0.0%) · LUT4 10.0k · Fmax 121 MHz
R9 · BHT and RAS for frontend branch prediction
fitness 321.67 (+13.7%) · LUT4 11.0k · Fmax 135 MHz
R10 · GShare Predictor with 256-entry BHT and 8-bit GHR
fitness 323.92 (+0.7%) · LUT4 11.8k · Fmax 136 MHz