Jason
a3e1996833
FFT engine: merge SHIFT into WRITE (5→4 cycle butterfly, 20% throughput) + barrel-shift twiddle index
...
Opt 1: Eliminated ST_BF_SHIFT state — arithmetic right-shift is pure
bit-selection (zero logic levels), merged into BF_WRITE combinational
add/subtract. Saves LOG2N * N/2 = 5120 cycles per 1024-pt FFT.
Opt 2: Replaced idx_val * tw_stride_reg general multiply with
idx_val << (LOG2N-1-stage) barrel shift. tw_stride_reg is always a
power of 2, so this is mathematically identical and frees a multiplier.
Regression: 18/18 FPGA pass (bit-exact results).
2026-03-20 00:20:59 +02:00
Jason
36ad15247c
Split fft_engine FSM: async reset for control, sync reset for DSP/BRAM datapath (Build 11)
...
Split monolithic always block into two:
- Block 1 (async reset): FSM state, counters, output interface
(dout_re/im, dout_valid, done) — deterministic startup
- Block 2 (sync reset): DSP/BRAM pipeline registers (rd_b_re/im,
rd_tw_cos/sin, bf_prod_re/im, rd_a_re/im, bf_t_re/im, rd_tw_idx,
rd_addr_even/odd, rd_inverse) — enables hard block absorption
Also convert output pipeline (out_pipe_valid/inverse) to sync reset.
Expected synthesis impact:
- DSP48E1 AREG/BREG absorption for butterfly multiply inputs
- DSP48E1 PREG absorption for multiply outputs (bf_prod_re/im)
- BRAM output register absorption for rd_a_re/im
- Eliminate ~300 DPIR-1 methodology warnings per FFT instance
- Resolve DPOP-2 (PREG=0), RBOR-1 (BRAM DOA), REQP-1839/1840
13/13 regression suites pass. Integration golden: 2048/2048 exact match.
2026-03-17 21:40:09 +02:00
Jason
00fbab6c9d
Achieve full timing closure on xc7a100tcsg324-1 at 400 MHz (0 violations)
...
Complete FPGA timing closure across all clock domains after 9 iterative
Vivado builds. WNS improved from -48.325ns to +0.018ns (107,886 endpoints).
RTL fixes for 400 MHz timing:
- NCO: 6-stage pipeline with DSP48E1 phase accumulator, registered LUT
index (Fix D splits address decode from ROM read), distributed RAM
- CIC: explicit DSP48E1 PCOUT->PCIN cascade for 5 integrator stages,
CREG=1 on integrator_0 to eliminate fabric->DSP setup violation
- DDC: 400 MHz reset synchronizer (async-assert/sync-deassert),
active-high reset register for DSP48E1 RST ports, posedge output stage
- FIR: 5-stage binary adder tree pipeline (7-cycle latency)
- FFT: 5-cycle butterfly pipeline with registered twiddle index,
XPM_MEMORY_TDPRAM for data storage
- XDC: CDC false paths, async reset false paths, CIC comb multicycle paths
Final Build 9 timing (all MET):
adc_dco_p (400 MHz): WNS = +0.278ns
clk_100m (100 MHz): WNS = +0.018ns
clk_120m_dac (120 MHz): WNS = +0.992ns
ft601_clk_in (100 MHz): WNS = +5.229ns
Cross-domain (adc_dco_p->clk_100m): WNS = +7.105ns
2026-03-16 15:02:35 +02:00
Jason
692b6a3bfa
Replace FFT stubs with synthesizable radix-2 DIT engine, fix BRAM inference
...
Implement iterative single-butterfly FFT engine (fft_engine.v) supporting
1024-pt and 32-pt transforms with quarter-wave twiddle ROM, XPM_MEMORY_TDPRAM
for guaranteed BRAM mapping in Vivado, and behavioral model for simulation.
Add xfft_32.v AXI-Stream wrapper for doppler_processor integration and
dual-branch matched_filter_processing_chain.v (behavioral + synthesis paths).
Fix placement failure caused by 68K+ registers from dissolved memory arrays:
- doppler_processor.v: extract mem writes to sync-only always block for BRAM
- xfft_32.v: extract buffer writes to sync-only always block for LUTRAM
Post-implementation: 37K regs (29%), 23K LUTs (37%), 10 BRAM (7%), fully routed.
All testbenches pass: fft_engine 12/12, xfft_32 10/10, mf_chain 27/27.
2026-03-16 10:25:07 +02:00