Achieve timing closure: DSP48E1 pipelines, 4-stage NCO, 28-bit CIC, ASYNC_REG

Phase 0+ timing optimization (attempts #13-22 + implementation):

NCO (nco_400m_enhanced.v):
- 4-stage pipeline: DSP48E1 accumulate -> LUT read -> negate -> quadrant MUX
- DSP48E1 phase accumulator in P=P+C mode (eliminates 8-stage CARRY4 chain)
- Registered phase_inc_dithered to break cascaded 32-bit add path

DDC (ddc_400m.v):
- Direct DSP48E1 instantiation for I/Q mixers (AREG=1, BREG=1, MREG=1, PREG=1)
- CEP=1, RSTP=!reset_n for proper pipeline control
- 3-stage dsp_valid_pipe for PREG=1 latency
- Behavioral sim model under ifdef SIMULATION for Icarus compatibility

CIC (cic_decimator_4x_enhanced.v):
- 28-bit accumulators (was 36) per CIC width formula: 18 + 5*log2(4) = 28
- Removed integrator/comb saturation (CIC uses wrapping arithmetic by design)
- Pipelined output saturation comparison

CDC/ASYNC_REG:
- ASYNC_REG attribute on all CDC synchronizer registers (cdc_modules.v,
  radar_system_top.v, usb_data_interface.v)
- Sync reset in generate blocks (cdc_modules.v)

Results: Vivado post-implementation WNS=+1.196ns, 0 failing endpoints,
850 LUTs (1.34%), 466 FFs (0.37%), 2 DSP48E1 (0.83%) on xc7a100t.
All testbenches pass: 241/244 (3 known stub failures).
This commit is contained in:
Jason
2026-03-16 01:02:07 +02:00
parent 1e51b739a7
commit c983a3c705
15 changed files with 5883 additions and 5466 deletions
+3 -3
View File
@@ -63,9 +63,9 @@ reg ft601_data_oe; // Output enable for bidirectional data bus
// Even though both are 100 MHz, they are asynchronous clocks and need synchronization.
// 2-stage synchronizers for valid signals
reg [1:0] range_valid_sync;
reg [1:0] doppler_valid_sync;
reg [1:0] cfar_valid_sync;
(* ASYNC_REG = "TRUE" *) reg [1:0] range_valid_sync;
(* ASYNC_REG = "TRUE" *) reg [1:0] doppler_valid_sync;
(* ASYNC_REG = "TRUE" *) reg [1:0] cfar_valid_sync;
// Synchronized data captures (registered in ft601_clk_in domain)
reg [31:0] range_profile_cap;