ed6f79c6d3
FIR: Add coeff_reg/mult_reg pipeline stages to fix 68 DPIP-1 + 35 DPOP-2 DRC warnings. Valid pipeline widened 7→9 bits (+2 cycle latency). Matched filter: Migrate input_buffer_i/q from register arrays to BRAM (~33K FF savings). Overlap-save uses register cache captured during ST_PROCESSING to avoid BRAM read/write conflicts during overlap copy. New ST_OVERLAP_COPY state writes cached tail samples back sequentially. Both changes pass 18/18 FPGA regression. Golden data regenerated for +2 FIR latency baseline.