fix(rtl): force FIR adder tree to fabric to free 30 DSPs for FFT butterfly on 50T

Add (* USE_DSP = "no" *) attribute to FIR lowpass adder tree registers
(add_l1, add_l2, add_l3, accumulator_reg) to prevent Vivado from
inferring DSP48E1 slices for pure addition operations.

Each fir_lowpass_parallel_enhanced instance was using 47 DSPs (32 for
multiply + 15 for the adder tree). The 15 adder-tree DSPs per instance
(30 total for I/Q pair) performed only PCIN+A:B additions with no
multiplier usage. On the XC7A50T with only 120 DSP48E1 slices, this
caused 100% DSP utilization and forced FFT butterfly complex multipliers
to spill into 18-level fabric carry chains (WNS=-1.103ns).

Moving these 36-bit additions to fabric CARRY4 chains (~9 CARRY4 per
add, ~2ns propagation) is well within the 10ns clock period and frees
~30 DSPs for the FFT engine to use native DSP48E1 multipliers.

Regression: 23/23 FPGA tests PASS (attribute is synthesis-only).
This commit is contained in:
Jason
2026-04-07 14:45:47 +03:00
parent d1927f150a
commit 8d7b6e04a0
+7 -4
View File
@@ -57,13 +57,16 @@ wire signed [DATA_WIDTH+COEFF_WIDTH-1:0] mult_result [0:TAPS-1];
// Level 0: 16 pairwise sums of 32 products
reg signed [ACCUM_WIDTH-1:0] add_l0 [0:15];
// Level 1: 8 pairwise sums
reg signed [ACCUM_WIDTH-1:0] add_l1 [0:7];
// USE_DSP="no" forces pure additions to fabric CARRY4 chains, freeing DSP48E1
// slices for the FFT butterfly multipliers that otherwise spill to 18-level
// fabric carry chains causing timing violations on the XC7A50T (120 DSP budget).
(* USE_DSP = "no" *) reg signed [ACCUM_WIDTH-1:0] add_l1 [0:7];
// Level 2: 4 pairwise sums
reg signed [ACCUM_WIDTH-1:0] add_l2 [0:3];
(* USE_DSP = "no" *) reg signed [ACCUM_WIDTH-1:0] add_l2 [0:3];
// Level 3: 2 pairwise sums
reg signed [ACCUM_WIDTH-1:0] add_l3 [0:1];
(* USE_DSP = "no" *) reg signed [ACCUM_WIDTH-1:0] add_l3 [0:1];
// Level 4: final sum
reg signed [ACCUM_WIDTH-1:0] accumulator_reg;
(* USE_DSP = "no" *) reg signed [ACCUM_WIDTH-1:0] accumulator_reg;
// Valid pipeline: 9-stage shift register (was 7 before BREG+MREG addition)
// [0]=BREG done, [1]=MREG done, [2]=L0 done, [3]=L1 done, [4]=L2 done,