HWE Bench
Per-model detail
What each model actually did.
Below: the per-rep outcomes for every model run on HWE Bench so far, plus the
accepted-improvement hypotheses each rep produced — verbatim titles, fitness,
LUT4, and Fmax. The hypothesis titles are exactly what the agent wrote.
gpt-5_5_xhigh
gpt 5 5 xhigh
Best
525.04
+85.6% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
397.83
+40.7%
6.1k
182
5
2
7.0h
rep2
done
525.04
+85.6%
5.5k
220
5
3
6.4h
rep3
done
482.03
+70.4%
3.2k
216
7
1
4.9h
Broken classes (all reps combined): cosim_failed×5, formal_failed×1
rep1 — winning hypotheses
R1 · Decouple slow RV32M ops from EX fitness 368.83 (+30.4% ) · LUT4 6.0k · Fmax 166 MHz
R3 · Retiming slow M finalization fitness 380.70 (+3.2% ) · LUT4 6.0k · Fmax 171 MHz
R4 · Register forwarding selects fitness 386.71 (+1.6% ) · LUT4 6.0k · Fmax 174 MHz
R8 · Registered low-half MUL path fitness 397.83 (+2.9% ) · LUT4 6.1k · Fmax 182 MHz
rep2 — winning hypotheses
R1 · Iterative divider off ALU critical path fitness 400.55 (+41.6% ) · LUT4 5.5k · Fmax 180 MHz
R6 · Prune dead pipeline control bits fitness 427.56 (+6.7% ) · LUT4 5.4k · Fmax 192 MHz
R7 · Valid-only pipeline payload resets fitness 432.21 (+1.1% ) · LUT4 5.4k · Fmax 194 MHz
R10 · Tiny ifetch replay predictor fitness 525.04 (+21.5% ) · LUT4 5.5k · Fmax 220 MHz
rep3 — winning hypotheses
R1 · Move DIV/REM to multicycle EX unit fitness 350.21 (+23.8% ) · LUT4 5.6k · Fmax 157 MHz
R2 · Share MUL hardware in ALU fitness 353.49 (+0.9% ) · LUT4 5.6k · Fmax 159 MHz
R4 · Hazard-only source-use interlock fitness 381.04 (+7.8% ) · LUT4 5.6k · Fmax 171 MHz
R6 · Register writeback payload in MEM/WB fitness 406.96 (+6.8% ) · LUT4 5.7k · Fmax 183 MHz
R9 · Remove regfile reset fanout fitness 413.23 (+1.5% ) · LUT4 3.2k · Fmax 186 MHz
R10 · Stage-local control bundles fitness 482.03 (+16.6% ) · LUT4 3.2k · Fmax 216 MHz
gpt-5_4_xhigh
gpt 5 4 xhigh
Best
513.84
+81.7% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
496.11
+75.4%
3.2k
221
5
7
7.2h
rep2
done
513.84
+81.7%
10.1k
203
7
11
8.1h
Broken classes (all reps combined): cosim_failed×11, formal_failed×7
rep1 — winning hypotheses
R1 · Move DIV/REM Off The ALU Critical Path fitness 343.51 (+21.5% ) · LUT4 5.6k · Fmax 154 MHz
R2 · One-Deep Stalled-Store Retirement Slot fitness 385.04 (+12.1% ) · LUT4 5.6k · Fmax 172 MHz
R6 · Non-Aliasing Load Bypass Around Store Slot fitness 439.32 (+14.1% ) · LUT4 5.6k · Fmax 196 MHz
R10 · Reset-Light Write-First Register File fitness 496.11 (+12.9% ) · LUT4 3.2k · Fmax 221 MHz
rep2 — winning hypotheses
R2 · Registered I-Fetch Replay Predictor fitness 316.05 (+11.8% ) · LUT4 10.1k · Fmax 134 MHz
R3 · MEM-to-EX load bypass fitness 332.31 (+5.1% ) · LUT4 9.9k · Fmax 134 MHz
R4 · Shared signedness-selectable multiplier fitness 334.91 (+0.8% ) · LUT4 10.4k · Fmax 135 MHz
R5 · One-entry posted store buffer fitness 377.60 (+12.8% ) · LUT4 10.1k · Fmax 151 MHz
R6 · EX fast-path add/address bypass fitness 391.32 (+3.6% ) · LUT4 10.0k · Fmax 157 MHz
R8 · Resolve direct JAL in ID fitness 513.84 (+31.3% ) · LUT4 10.1k · Fmax 203 MHz
gpt-5_5_high
gpt 5 5 high
Best
461.87
+63.3% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
461.87
+63.3%
9.8k
187
7
3
4.7h
rep2
done
420.61
+48.7%
12.0k
178
8
4
11.9h
rep3
done
408.01
+44.3%
5.6k
176
4
2
5.6h
Broken classes (all reps combined): cosim_failed×5, formal_failed×3, implementation_compile_failed×1
rep1 — winning hypotheses
R1 · Static backward branch predictor fitness 338.66 (+19.7% ) · LUT4 10.0k · Fmax 144 MHz
R3 · Add MEM-to-EX load forwarding fitness 355.27 (+4.9% ) · LUT4 10.2k · Fmax 144 MHz
R8 · Isolate M-extension ALU mux fitness 366.02 (+3.0% ) · LUT4 10.1k · Fmax 148 MHz
R9 · Gate static branch target formation fitness 420.37 (+14.8% ) · LUT4 9.9k · Fmax 170 MHz
R12 · Factor forwarding matches fitness 437.68 (+4.1% ) · LUT4 10.1k · Fmax 177 MHz
R14 · Register memory request metadata fitness 461.87 (+5.5% ) · LUT4 9.8k · Fmax 187 MHz
rep2 — winning hypotheses
R1 · Add small BTB branch predictor fitness 285.54 (+1.0% ) · LUT4 12.1k · Fmax 121 MHz
R2 · Gate false load-use stalls fitness 290.93 (+1.9% ) · LUT4 12.4k · Fmax 123 MHz
R3 · Optimize load byte-lane formatter fitness 305.22 (+4.9% ) · LUT4 12.4k · Fmax 129 MHz
R4 · Consolidate multiply datapath fitness 313.60 (+2.8% ) · LUT4 12.6k · Fmax 133 MHz
R6 · Trim BTB tag compare fitness 326.00 (+4.0% ) · LUT4 12.0k · Fmax 138 MHz
R7 · Prune dead pipeline payload fitness 341.75 (+4.8% ) · LUT4 12.2k · Fmax 145 MHz
R11 · Bypass ALU for LSU addresses fitness 420.61 (+23.1% ) · LUT4 12.0k · Fmax 178 MHz
rep3 — winning hypotheses
R1 · Move DIV/REM off the ALU critical path fitness 352.98 (+24.8% ) · LUT4 5.5k · Fmax 159 MHz
R4 · Case-based MEM byte-lane muxes fitness 380.43 (+7.8% ) · LUT4 5.5k · Fmax 171 MHz
R6 · Lookahead hot-branch predictor fitness 408.01 (+7.2% ) · LUT4 5.6k · Fmax 176 MHz
gpt-5_5_medium
gpt 5 5 medium
Best
431.58
+52.6% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
431.58
+52.6%
7.8k
201
7
5
5.8h
rep2
done
407.55
+44.1%
7.4k
187
5
9
6.6h
rep3
done
431.24
+52.5%
10.0k
194
4
9
7.2h
Broken classes (all reps combined): cosim_failed×12, formal_failed×11
rep1 — winning hypotheses
R1 · Remove regfile reset fanout fitness 316.22 (+11.8% ) · LUT4 8.0k · Fmax 142 MHz
R2 · Move M extension to multicycle unit fitness 356.85 (+12.8% ) · LUT4 7.5k · Fmax 167 MHz
R3 · Prune dead pipeline metadata fitness 397.64 (+11.4% ) · LUT4 7.7k · Fmax 186 MHz
R5 · Add posted store buffer fitness 405.49 (+2.0% ) · LUT4 7.4k · Fmax 188 MHz
R9 · Register final writeback data fitness 422.61 (+4.2% ) · LUT4 7.0k · Fmax 196 MHz
R12 · Add narrow forwarding sideband fitness 431.58 (+2.1% ) · LUT4 7.8k · Fmax 201 MHz
rep2 — winning hypotheses
R1 · Multicycle RV32M arithmetic unit fitness 375.73 (+32.9% ) · LUT4 9.7k · Fmax 172 MHz
R5 · Drop regfile reset fanout fitness 382.99 (+1.9% ) · LUT4 7.6k · Fmax 176 MHz
R8 · Share M-unit multiplier hardware fitness 393.66 (+2.8% ) · LUT4 7.5k · Fmax 181 MHz
R9 · Split RVFI shadow metadata from datapath fitness 407.55 (+3.5% ) · LUT4 7.4k · Fmax 187 MHz
rep3 — winning hypotheses
R1 · Gate and share M-extension ALU hardware fitness 398.35 (+40.9% ) · LUT4 9.8k · Fmax 179 MHz
R5 · Precompute PC targets in decode fitness 412.07 (+3.4% ) · LUT4 10.2k · Fmax 185 MHz
R15 · Retire PC-next in MEM fitness 431.24 (+4.7% ) · LUT4 10.0k · Fmax 194 MHz
kimi-k2_6
kimi k2 6
Best
396.13
+40.1% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
347.76
+23.0%
10.3k
146
3
23
9.0h
rep2
done
331.22
+17.1%
10.0k
141
3
26
8.6h
rep3
failed ⚠
396.13
+40.1%
9.9k
166
4
14
8.9h
Broken classes (all reps combined): hypothesis_gen_failed×45, formal_failed×8, cosim_failed×6, implementation_compile_failed×2, schema_error×2
rep1 — winning hypotheses
R1 · Add static branch predictor (backward-taken, JAL-always-taken) fitness 324.08 (+14.6% ) · LUT4 10.2k · Fmax 138 MHz
R4 · Add 32-entry 2-bit BHT for forward branch direction prediction fitness 347.76 (+7.3% ) · LUT4 10.3k · Fmax 146 MHz
rep2 — winning hypotheses
R1 · IF-stage static predictor: backward branches and JAL always taken fitness 316.95 (+12.1% ) · LUT4 10.6k · Fmax 135 MHz
R3 · 4-entry Return Address Stack for JALR returns fitness 331.22 (+4.5% ) · LUT4 10.0k · Fmax 141 MHz
rep3 — winning hypotheses
R1 · Guard ALU multipliers off critical path for non-M ops fitness 315.22 (+11.5% ) · LUT4 10.2k · Fmax 142 MHz
R5 · 8-entry direct-mapped instruction cache in IF to absorb imem stalls fitness 334.35 (+6.1% ) · LUT4 10.0k · Fmax 140 MHz
R8 · Split ALU into fast and M-extension paths with final 2:1 mux fitness 396.13 (+18.5% ) · LUT4 9.9k · Fmax 166 MHz
gemini-3_1-pro
gemini 3 1 pro
Best
354.73
+25.4% vs baseline
per-rep detail
Rep Status
Best Δ%
LUT4 Fmax
acc brk
Wall
rep1
done
354.73
+25.4%
10.2k
150
2
31
5.6h
rep2
done
339.62
+20.1%
11.1k
142
3
29
7.3h
rep3
done
323.92
+14.5%
11.8k
136
4
31
6.0h
Broken classes (all reps combined): hypothesis_gen_failed×68, sandbox_violation×9, formal_failed×6, cosim_failed×5, implementation_compile_failed×3
rep1 — winning hypotheses
R5 · 1-Cycle 64-entry BTB in IF Stage with Fast Redirect fitness 354.73 (+25.4% ) · LUT4 10.2k · Fmax 150 MHz
rep2 — winning hypotheses
R4 · 16-entry BTB/BHT predictor in IF stage fitness 323.95 (+14.5% ) · LUT4 10.3k · Fmax 137 MHz
R5 · 128-entry BTB + 8-entry RAS fitness 339.62 (+4.8% ) · LUT4 11.1k · Fmax 142 MHz
rep3 — winning hypotheses
R5 · IF-stage Static BTFN and JAL Predictor fitness 282.91 (+0.0% ) · LUT4 10.0k · Fmax 121 MHz
R9 · BHT and RAS for frontend branch prediction fitness 321.67 (+13.7% ) · LUT4 11.0k · Fmax 135 MHz
R10 · GShare Predictor with 256-entry BHT and 8-bit GHR fitness 323.92 (+0.7% ) · LUT4 11.8k · Fmax 136 MHz