a3e1996833
Opt 1: Eliminated ST_BF_SHIFT state — arithmetic right-shift is pure bit-selection (zero logic levels), merged into BF_WRITE combinational add/subtract. Saves LOG2N * N/2 = 5120 cycles per 1024-pt FFT. Opt 2: Replaced idx_val * tw_stride_reg general multiply with idx_val << (LOG2N-1-stage) barrel shift. tw_stride_reg is always a power of 2, so this is mathematically identical and frees a multiplier. Regression: 18/18 FPGA pass (bit-exact results).