HWE Bench · Models

gpt-5_5_xhigh

gpt 5 5 xhigh

Best

525.04

+85.6% vs baseline

Mean

468.3

+65.6% mean Δ

Reps

3/3

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	397.83	+40.7%	6.1k	182	5	2	7.0h
rep2	done	525.04	+85.6%	5.5k	220	5	3	6.4h
rep3	done	482.03	+70.4%	3.2k	216	7	1	4.9h

Broken classes (all reps combined): cosim_failed×5, formal_failed×1

rep1 , winning hypotheses

R1 · Decouple slow RV32M ops from EX: fitness 368.83 (+30.4%) · area 6.0k LUT4 · Fmax 166 MHz
R3 · Retiming slow M finalization: fitness 380.70 (+3.2%) · area 6.0k LUT4 · Fmax 171 MHz
R4 · Register forwarding selects: fitness 386.71 (+1.6%) · area 6.0k LUT4 · Fmax 174 MHz
R8 · Registered low-half MUL path: fitness 397.83 (+2.9%) · area 6.1k LUT4 · Fmax 182 MHz

rep2 , winning hypotheses

R1 · Iterative divider off ALU critical path: fitness 400.55 (+41.6%) · area 5.5k LUT4 · Fmax 180 MHz
R6 · Prune dead pipeline control bits: fitness 427.56 (+6.7%) · area 5.4k LUT4 · Fmax 192 MHz
R7 · Valid-only pipeline payload resets: fitness 432.21 (+1.1%) · area 5.4k LUT4 · Fmax 194 MHz
R10 · Tiny ifetch replay predictor: fitness 525.04 (+21.5%) · area 5.5k LUT4 · Fmax 220 MHz

rep3 , winning hypotheses

R1 · Move DIV/REM to multicycle EX unit: fitness 350.21 (+23.8%) · area 5.6k LUT4 · Fmax 157 MHz
R2 · Share MUL hardware in ALU: fitness 353.49 (+0.9%) · area 5.6k LUT4 · Fmax 159 MHz
R4 · Hazard-only source-use interlock: fitness 381.04 (+7.8%) · area 5.6k LUT4 · Fmax 171 MHz
R6 · Register writeback payload in MEM/WB: fitness 406.96 (+6.8%) · area 5.7k LUT4 · Fmax 183 MHz
R9 · Remove regfile reset fanout: fitness 413.23 (+1.5%) · area 3.2k LUT4 · Fmax 186 MHz
R10 · Stage-local control bundles: fitness 482.03 (+16.6%) · area 3.2k LUT4 · Fmax 216 MHz

gpt-5_4_xhigh

gpt 5 4 xhigh

Best

513.84

+81.7% vs baseline

Mean

505.0

+78.5% mean Δ

Reps

2/2

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	496.11	+75.4%	3.2k	221	5	7	7.2h
rep2	done	513.84	+81.7%	10.1k	203	7	11	8.1h

Broken classes (all reps combined): cosim_failed×11, formal_failed×7

rep1 , winning hypotheses

R1 · Move DIV/REM Off The ALU Critical Path: fitness 343.51 (+21.5%) · area 5.6k LUT4 · Fmax 154 MHz
R2 · One-Deep Stalled-Store Retirement Slot: fitness 385.04 (+12.1%) · area 5.6k LUT4 · Fmax 172 MHz
R6 · Non-Aliasing Load Bypass Around Store Slot: fitness 439.32 (+14.1%) · area 5.6k LUT4 · Fmax 196 MHz
R10 · Reset-Light Write-First Register File: fitness 496.11 (+12.9%) · area 3.2k LUT4 · Fmax 221 MHz

rep2 , winning hypotheses

R2 · Registered I-Fetch Replay Predictor: fitness 316.05 (+11.8%) · area 10.1k LUT4 · Fmax 134 MHz
R3 · MEM-to-EX load bypass: fitness 332.31 (+5.1%) · area 9.9k LUT4 · Fmax 134 MHz
R4 · Shared signedness-selectable multiplier: fitness 334.91 (+0.8%) · area 10.4k LUT4 · Fmax 135 MHz
R5 · One-entry posted store buffer: fitness 377.60 (+12.8%) · area 10.1k LUT4 · Fmax 151 MHz
R6 · EX fast-path add/address bypass: fitness 391.32 (+3.6%) · area 10.0k LUT4 · Fmax 157 MHz
R8 · Resolve direct JAL in ID: fitness 513.84 (+31.3%) · area 10.1k LUT4 · Fmax 203 MHz

gpt-5_5_high

gpt 5 5 high

Best

461.87

+63.3% vs baseline

Mean

430.2

+52.1% mean Δ

Reps

3/3

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	461.87	+63.3%	9.8k	187	7	3	4.7h
rep2	done	420.61	+48.7%	12.0k	178	8	4	11.9h
rep3	done	408.01	+44.3%	5.6k	176	4	2	5.6h

Broken classes (all reps combined): cosim_failed×5, formal_failed×3, implementation_compile_failed×1

rep1 , winning hypotheses

R1 · Static backward branch predictor: fitness 338.66 (+19.7%) · area 10.0k LUT4 · Fmax 144 MHz
R3 · Add MEM-to-EX load forwarding: fitness 355.27 (+4.9%) · area 10.2k LUT4 · Fmax 144 MHz
R8 · Isolate M-extension ALU mux: fitness 366.02 (+3.0%) · area 10.1k LUT4 · Fmax 148 MHz
R9 · Gate static branch target formation: fitness 420.37 (+14.8%) · area 9.9k LUT4 · Fmax 170 MHz
R12 · Factor forwarding matches: fitness 437.68 (+4.1%) · area 10.1k LUT4 · Fmax 177 MHz
R14 · Register memory request metadata: fitness 461.87 (+5.5%) · area 9.8k LUT4 · Fmax 187 MHz

rep2 , winning hypotheses

R1 · Add small BTB branch predictor: fitness 285.54 (+1.0%) · area 12.1k LUT4 · Fmax 121 MHz
R2 · Gate false load-use stalls: fitness 290.93 (+1.9%) · area 12.4k LUT4 · Fmax 123 MHz
R3 · Optimize load byte-lane formatter: fitness 305.22 (+4.9%) · area 12.4k LUT4 · Fmax 129 MHz
R4 · Consolidate multiply datapath: fitness 313.60 (+2.8%) · area 12.6k LUT4 · Fmax 133 MHz
R6 · Trim BTB tag compare: fitness 326.00 (+4.0%) · area 12.0k LUT4 · Fmax 138 MHz
R7 · Prune dead pipeline payload: fitness 341.75 (+4.8%) · area 12.2k LUT4 · Fmax 145 MHz
R11 · Bypass ALU for LSU addresses: fitness 420.61 (+23.1%) · area 12.0k LUT4 · Fmax 178 MHz

rep3 , winning hypotheses

R1 · Move DIV/REM off the ALU critical path: fitness 352.98 (+24.8%) · area 5.5k LUT4 · Fmax 159 MHz
R4 · Case-based MEM byte-lane muxes: fitness 380.43 (+7.8%) · area 5.5k LUT4 · Fmax 171 MHz
R6 · Lookahead hot-branch predictor: fitness 408.01 (+7.2%) · area 5.6k LUT4 · Fmax 176 MHz

gpt-5_5_medium

gpt 5 5 medium

Best

431.58

+52.6% vs baseline

Mean

423.5

+49.7% mean Δ

Reps

3/3

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	431.58	+52.6%	7.8k	201	7	5	5.8h
rep2	done	407.55	+44.1%	7.4k	187	5	9	6.6h
rep3	done	431.24	+52.5%	10.0k	194	4	9	7.2h

Broken classes (all reps combined): cosim_failed×12, formal_failed×11

rep1 , winning hypotheses

R1 · Remove regfile reset fanout: fitness 316.22 (+11.8%) · area 8.0k LUT4 · Fmax 142 MHz
R2 · Move M extension to multicycle unit: fitness 356.85 (+12.8%) · area 7.5k LUT4 · Fmax 167 MHz
R3 · Prune dead pipeline metadata: fitness 397.64 (+11.4%) · area 7.7k LUT4 · Fmax 186 MHz
R5 · Add posted store buffer: fitness 405.49 (+2.0%) · area 7.4k LUT4 · Fmax 188 MHz
R9 · Register final writeback data: fitness 422.61 (+4.2%) · area 7.0k LUT4 · Fmax 196 MHz
R12 · Add narrow forwarding sideband: fitness 431.58 (+2.1%) · area 7.8k LUT4 · Fmax 201 MHz

rep2 , winning hypotheses

R1 · Multicycle RV32M arithmetic unit: fitness 375.73 (+32.9%) · area 9.7k LUT4 · Fmax 172 MHz
R5 · Drop regfile reset fanout: fitness 382.99 (+1.9%) · area 7.6k LUT4 · Fmax 176 MHz
R8 · Share M-unit multiplier hardware: fitness 393.66 (+2.8%) · area 7.5k LUT4 · Fmax 181 MHz
R9 · Split RVFI shadow metadata from datapath: fitness 407.55 (+3.5%) · area 7.4k LUT4 · Fmax 187 MHz

rep3 , winning hypotheses

R1 · Gate and share M-extension ALU hardware: fitness 398.35 (+40.9%) · area 9.8k LUT4 · Fmax 179 MHz
R5 · Precompute PC targets in decode: fitness 412.07 (+3.4%) · area 10.2k LUT4 · Fmax 185 MHz
R15 · Retire PC-next in MEM: fitness 431.24 (+4.7%) · area 10.0k LUT4 · Fmax 194 MHz

kimi-k2_6

kimi k2 6

Best

396.13

+40.1% vs baseline

Mean

339.5

+20.0% mean Δ

Reps

2/3

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	347.76	+23.0%	10.3k	146	3	23	9.0h
rep2	done	331.22	+17.1%	10.0k	141	3	26	8.6h
rep3	failed ⚠	396.13	+40.1%	9.9k	166	4	14	8.9h

Broken classes (all reps combined): hypothesis_gen_failed×45, formal_failed×8, cosim_failed×6, implementation_compile_failed×2, schema_error×2

rep1 , winning hypotheses

R1 · Add static branch predictor (backward-taken, JAL-always-taken): fitness 324.08 (+14.6%) · area 10.2k LUT4 · Fmax 138 MHz
R4 · Add 32-entry 2-bit BHT for forward branch direction prediction: fitness 347.76 (+7.3%) · area 10.3k LUT4 · Fmax 146 MHz

rep2 , winning hypotheses

R1 · IF-stage static predictor: backward branches and JAL always taken: fitness 316.95 (+12.1%) · area 10.6k LUT4 · Fmax 135 MHz
R3 · 4-entry Return Address Stack for JALR returns: fitness 331.22 (+4.5%) · area 10.0k LUT4 · Fmax 141 MHz

rep3 , winning hypotheses

R1 · Guard ALU multipliers off critical path for non-M ops: fitness 315.22 (+11.5%) · area 10.2k LUT4 · Fmax 142 MHz
R5 · 8-entry direct-mapped instruction cache in IF to absorb imem stalls: fitness 334.35 (+6.1%) · area 10.0k LUT4 · Fmax 140 MHz
R8 · Split ALU into fast and M-extension paths with final 2:1 mux: fitness 396.13 (+18.5%) · area 9.9k LUT4 · Fmax 166 MHz

gemini-3_1-pro

gemini 3 1 pro

Best

354.73

+25.4% vs baseline

Mean

339.4

+20.0% mean Δ

Reps

3/3

completed / total

per-rep detail
Rep	Status	Best	Δ%	Area (LUT4)	Fmax (MHz)	acc	brk	Wall
rep1	done	354.73	+25.4%	10.2k	150	2	31	5.6h
rep2	done	339.62	+20.1%	11.1k	142	3	29	7.3h
rep3	done	323.92	+14.5%	11.8k	136	4	31	6.0h

Broken classes (all reps combined): hypothesis_gen_failed×68, sandbox_violation×9, formal_failed×6, cosim_failed×5, implementation_compile_failed×3

rep1 , winning hypotheses

R5 · 1-Cycle 64-entry BTB in IF Stage with Fast Redirect: fitness 354.73 (+25.4%) · area 10.2k LUT4 · Fmax 150 MHz

rep2 , winning hypotheses

R4 · 16-entry BTB/BHT predictor in IF stage: fitness 323.95 (+14.5%) · area 10.3k LUT4 · Fmax 137 MHz
R5 · 128-entry BTB + 8-entry RAS: fitness 339.62 (+4.8%) · area 11.1k LUT4 · Fmax 142 MHz

rep3 , winning hypotheses

R5 · IF-stage Static BTFN and JAL Predictor: fitness 282.91 (+0.0%) · area 10.0k LUT4 · Fmax 121 MHz
R9 · BHT and RAS for frontend branch prediction: fitness 321.67 (+13.7%) · area 11.0k LUT4 · Fmax 135 MHz
R10 · GShare Predictor with 256-entry BHT and 8-bit GHR: fitness 323.92 (+0.7%) · area 11.8k LUT4 · Fmax 136 MHz

What each model actually did.

gpt 5 5 xhigh

rep1 , winning hypotheses

rep2 , winning hypotheses

rep3 , winning hypotheses

gpt 5 4 xhigh

rep1 , winning hypotheses

rep2 , winning hypotheses

gpt 5 5 high

rep1 , winning hypotheses

rep2 , winning hypotheses

rep3 , winning hypotheses

gpt 5 5 medium

rep1 , winning hypotheses

rep2 , winning hypotheses

rep3 , winning hypotheses

kimi k2 6

rep1 , winning hypotheses

rep2 , winning hypotheses

rep3 , winning hypotheses

gemini 3 1 pro

rep1 , winning hypotheses

rep2 , winning hypotheses

rep3 , winning hypotheses