35539ea934
checkSystemHealth()'s internal watchdog (pre-fix step 9) had two linked
defects that, combined with the previous commit's escalation of
ERROR_WATCHDOG_TIMEOUT to Emergency_Stop(), would false-latch AERIS-10:
1. Cold-start false trip:
static uint32_t last_health_check = 0;
if (HAL_GetTick() - last_health_check > 60000) { trip; }
On the first call, last_health_check == 0, so the subtraction
against a seeded-zero sentinel exceeds 60 000 ms as soon as the MCU
has been up >60 s -- normal after the ADAR1000 / AD9523 / ADF4382
init sequence -- and the watchdog trips spuriously.
2. Stale timestamp after early returns:
last_health_check = HAL_GetTick(); // at END of function
Every earlier sub-check (IMU, BMP180, GPS, PA Idq, temperature) has
an `if (fault) return current_error;` path that skips the update.
After ~60 s of transient faults, the next clean call compares
against a long-stale last_health_check and trips.
With ERROR_WATCHDOG_TIMEOUT now escalating to Emergency_Stop(), either
failure mode would cut the RF rails on a perfectly healthy system.
Fix: move the watchdog check to function ENTRY. A dedicated cold-start
branch seeds the timestamp on the first call without checking. On every
subsequent call, the elapsed delta is captured first and
last_health_check is updated BEFORE any sub-check runs, so early returns
no longer leave a stale value. 32-bit tick-wrap semantics are preserved
because the subtraction remains on uint32_t.
Add test_gap3_health_watchdog_cold_start.c covering cold-start, paced
main-loop, stall detection, boundary (exactly 60 000 ms), recovery
after trip, and 32-bit HAL_GetTick() wrap -- wired into tests/Makefile
alongside the existing gap-3 safety tests.