<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Everyone,<br>
<br>
TL;DR: update your local esp-idf clone from our esp-idf repository
before doing the next firmware build.<br>
<br>
While testing the Mongoose API lock I found out the Mongoose task
priority would still occasionally get raised to 22 = wifi task
priority. In combination with the Mongoose task essentially being
100% busy while outside the locked poll call, this lead to blocking
all other tasks from Mongoose, which caused at least one of the
crash effects observed (watchdog timeout from our events task).
Adding a priority fix to our netmanager eliminated quite a lot of
these crashes.<br>
<br>
I then investigated this, as I thought that priority bug was
originally coming from the buggy POSIX mutex implementation we fixed
in July 2020 (→
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/pipermail/ovmsdev/2020-July/006971.html">http://lists.openvehicles.com/pipermail/ovmsdev/2020-July/006971.html</a>).<br>
<br>
It turned out I was wrong, the actual culprit is a bug in the
esp-idf spi_flash component: each access to the SPI flash memory
needs to be running at maximum priority. The spi_flash methods did
this by temporarily changing the current task priority, and
reverting to the previous priority without taking into account that
the task may have had an inherited priority from an aquired mutex
lock. Thus the priority inherited from e.g. the Wifi task would
stick.<br>
<br>
That was especially present and reproducable when opening the web
UI's Config→Firmware page, as that page handler reads the OTA
status, which in turn reads the current boot configuration from
flash. It also affected the AutoFlash task during firmware updates,
and there may be more paths, basically running any "ota" command via
a network channel.<br>
<br>
As config reads & writes also use SPI flash, these also could
produce the bug for any task trying to lock some mutex also being
requested by a higher priority task.<br>
<br>
This SPI flash bug has been found by other esp-idf users, and has
finally been fixed, but only for esp-idf 4.3 & higher:<br>
<ul>
<li><a class="moz-txt-link-freetext" href="https://github.com/espressif/esp-idf/issues/5116">https://github.com/espressif/esp-idf/issues/5116</a></li>
<li><a class="moz-txt-link-freetext" href="https://github.com/espressif/esp-idf/issues/7580">https://github.com/espressif/esp-idf/issues/7580</a></li>
</ul>
I have now backported the fix to our version, and haven't had a
single unplanned priority change since.<br>
<br>
Positive side effect: I see no event queue overflows & almost no
effect on the overall performance during an OTA flash process now.<br>
<br>
This is though probably not the only cause of remaining watchdog
timeouts -- crash reports will tell.<br>
<br>
Regards,<br>
Michael<br>
<br>
<pre class="moz-signature" cols="72">--
Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
Fon 02330 9104094 * Handy 0176 20698926</pre>
</body>
</html>