Commit Graph

5 Commits

Author SHA1 Message Date
Jason e9705e40b7 feat: 2048-pt FFT upgrade with decimation=4, 512 output bins, 6m spacing
Complete cross-layer upgrade from 1024-pt/64-bin to 2048-pt/512-bin FFT:

FPGA RTL (14+ modules):
- radar_params.vh: FFT_SIZE=2048, RANGE_BINS=512, 9-bit range, 6-bit stream
- fft_engine.v: 2048-pt FFT with XPM BRAM
- chirp_memory_loader_param.v: 2 segments x 2048 (was 4 x 1024)
- matched_filter_multi_segment.v: BRAM inference for overlap_cache, explicit ov_waddr
- mti_canceller.v: BRAM inference for prev_i/q arrays (was fabric FFs)
- doppler_processor.v: 16384-deep memory, 14-bit addressing
- cfar_ca.v: 512 rows, indentation fix
- radar_receiver_final.v: rising-edge detector for frame_complete, 11-bit sample_addr
- range_bin_decimator.v: 512 output bins
- usb_data_interface_ft2232h.v: bulk per-frame with Manhattan magnitude
- radar_mode_controller.v: XOR edge detector for toggle signals
- rx_gain_control.v: updated for new bin count

Python GUI + Protocol (8 files):
- radar_protocol.py: 512-bin bulk frame parser, LSB-first bitmap
- GUI_V65_Tk.py, v7/*.py: updated for 512 bins, 6m range resolution

Golden data + tests:
- All .hex/.csv/.npy golden references regenerated for 2048/512
- fft_twiddle_2048.mem added
- Deleted stale seg2/seg3 chirp mem files
- 9 new bulk frame cross-layer tests, deleted 6 stale per-sample tests
- Deleted stale tb_cross_layer_ft2232h.v and dead contract_parser functions
- Updated validate_mem_files.py for 2048/2-segment config

MCU: RadarSettings.cpp max_distance/map_size 1536->3072

All 4 CI jobs pass: 285 tests, 0 failures, 0 skips
2026-04-16 17:27:55 +05:45
Jason a3e1996833 FFT engine: merge SHIFT into WRITE (5→4 cycle butterfly, 20% throughput) + barrel-shift twiddle index
Opt 1: Eliminated ST_BF_SHIFT state — arithmetic right-shift is pure
bit-selection (zero logic levels), merged into BF_WRITE combinational
add/subtract. Saves LOG2N * N/2 = 5120 cycles per 1024-pt FFT.

Opt 2: Replaced idx_val * tw_stride_reg general multiply with
idx_val << (LOG2N-1-stage) barrel shift. tw_stride_reg is always a
power of 2, so this is mathematically identical and frees a multiplier.

Regression: 18/18 FPGA pass (bit-exact results).
2026-03-20 00:20:59 +02:00
Jason 36ad15247c Split fft_engine FSM: async reset for control, sync reset for DSP/BRAM datapath (Build 11)
Split monolithic always block into two:
- Block 1 (async reset): FSM state, counters, output interface
  (dout_re/im, dout_valid, done) — deterministic startup
- Block 2 (sync reset): DSP/BRAM pipeline registers (rd_b_re/im,
  rd_tw_cos/sin, bf_prod_re/im, rd_a_re/im, bf_t_re/im, rd_tw_idx,
  rd_addr_even/odd, rd_inverse) — enables hard block absorption

Also convert output pipeline (out_pipe_valid/inverse) to sync reset.

Expected synthesis impact:
- DSP48E1 AREG/BREG absorption for butterfly multiply inputs
- DSP48E1 PREG absorption for multiply outputs (bf_prod_re/im)
- BRAM output register absorption for rd_a_re/im
- Eliminate ~300 DPIR-1 methodology warnings per FFT instance
- Resolve DPOP-2 (PREG=0), RBOR-1 (BRAM DOA), REQP-1839/1840

13/13 regression suites pass. Integration golden: 2048/2048 exact match.
2026-03-17 21:40:09 +02:00
Jason 00fbab6c9d Achieve full timing closure on xc7a100tcsg324-1 at 400 MHz (0 violations)
Complete FPGA timing closure across all clock domains after 9 iterative
Vivado builds. WNS improved from -48.325ns to +0.018ns (107,886 endpoints).

RTL fixes for 400 MHz timing:
- NCO: 6-stage pipeline with DSP48E1 phase accumulator, registered LUT
  index (Fix D splits address decode from ROM read), distributed RAM
- CIC: explicit DSP48E1 PCOUT->PCIN cascade for 5 integrator stages,
  CREG=1 on integrator_0 to eliminate fabric->DSP setup violation
- DDC: 400 MHz reset synchronizer (async-assert/sync-deassert),
  active-high reset register for DSP48E1 RST ports, posedge output stage
- FIR: 5-stage binary adder tree pipeline (7-cycle latency)
- FFT: 5-cycle butterfly pipeline with registered twiddle index,
  XPM_MEMORY_TDPRAM for data storage
- XDC: CDC false paths, async reset false paths, CIC comb multicycle paths

Final Build 9 timing (all MET):
  adc_dco_p (400 MHz): WNS = +0.278ns
  clk_100m  (100 MHz): WNS = +0.018ns
  clk_120m_dac (120 MHz): WNS = +0.992ns
  ft601_clk_in (100 MHz): WNS = +5.229ns
  Cross-domain (adc_dco_p->clk_100m): WNS = +7.105ns
2026-03-16 15:02:35 +02:00
Jason 692b6a3bfa Replace FFT stubs with synthesizable radix-2 DIT engine, fix BRAM inference
Implement iterative single-butterfly FFT engine (fft_engine.v) supporting
1024-pt and 32-pt transforms with quarter-wave twiddle ROM, XPM_MEMORY_TDPRAM
for guaranteed BRAM mapping in Vivado, and behavioral model for simulation.

Add xfft_32.v AXI-Stream wrapper for doppler_processor integration and
dual-branch matched_filter_processing_chain.v (behavioral + synthesis paths).

Fix placement failure caused by 68K+ registers from dissolved memory arrays:
- doppler_processor.v: extract mem writes to sync-only always block for BRAM
- xfft_32.v: extract buffer writes to sync-only always block for LUTRAM

Post-implementation: 37K regs (29%), 23K LUTs (37%), 10 BRAM (7%), fully routed.
All testbenches pass: fft_engine 12/12, xfft_32 10/10, mf_chain 27/27.
2026-03-16 10:25:07 +02:00