That mutex fix was a giant leap forward in stability. My module has now been running for a whole week without a single WDT / event overflow. Well, one, but that was kind of expected, I had an unfiltered CAN log running in the web UI via a poor Wifi connection. Still will try to prevent that, but that's a different issue. There still have been some (few!) WDT and event queue starvations in the field. I had some detail reports from users and will try to find the cause. But according to the latest comment to the PSRAM issue (https://github.com/espressif/esp-idf/issues/2892#issuecomment-667099130) the fix in the official release had a regression. In some cases, the bug will occur again. The report issue #5423 had the effect of this freezing the LwIP task, which would create the WDT / event issue for us as well. I'd rather apply the coming toolchain fix release first. I'll keep you informed about the toolchain progress. Regards, Michael Am 24.07.20 um 15:27 schrieb Michael Balzer:
As a side effect, this may also solve the strange event task starvations (hope so…). I was investigating this as I suspected some busy loop in the netmanager context. With the netman running at prio 22, that would effectively block almost all other processing including the timer service. I've found & fixed one potential busy loop trigger in the netman that would have been caused by the netman task still running while all interfaces had been lost -- not sure if that could happen, but it would explain the effects.
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26