<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Followup to…<br>
<br>
<div class="moz-cite-prefix">Am 26.09.21 um 19:28 schrieb Michael
Balzer:<br>
</div>
<blockquote type="cite"
cite="mid:d2058325-7bdd-73a9-46dc-c8e9686d5602@expeedo.de">If so,
my conclusions so far would be:
<br>
<br>
a) we've got a real heap corruption issue that sometimes gets
triggered by the test. I've had it twice around the same place,
i.e. within the scheduled event processing. I'll check my code.
<br>
<br>
Am 26.09.21 um 18:46 schrieb Craig Leres:
<br>
<blockquote type="cite">Second crash on the production module,
CORRUPT HEAP after ~350 minutes.
<br>
<br>
OVMS# CORRUPT HEAP: Bad head at 0x3f8adc78. Expected 0xabba1234
got 0x3f8adcc0
<br>
abort() was called at PC 0x400844c3 on core 0
<br>
</blockquote>
</blockquote>
<br>
I've found the heap corruption source and a new bug class:<br>
<br>
The corruption was caused by a duplicate free() (here via delete),
which was basically impossible: it was the free() call for the event
message for a delayed event delivery. There is exactly one producer
(OvmsEvents::ScheduleEvent) and exactly one consumer
(OvmsEvents::SignalScheduledEvent), which is called exactly once --
when the single shot timer expires.<br>
<br>
In theory. <u>In reality, the timer callback occasionally gets
executed twice</u>. To exclude every possible race condition, I
enclosed both producer & consumer into a semaphore lock. I then
changed the code in order to clear the timer payload as soon as it's
read, and added a test for a NULL payload -- and voila, the timer
callback now gets occasionally called with a NULL payload, which is
also impossible as the allocation result is checked in the producer.<br>
<br>
I've had no luck reproducing this in a reduced test project, even
with multiple auto reload timers and distribution over both cores,
but I still see no other explanation than a bug in the FreeRTOS
timer service (or the Espressif FreeRTOS multi core adaption). Ah,
and yes, this occurs on the ESP32/r3 as well.<br>
<br>
You should be able to reproduce the effect using the same event test
loop as for the Duktape TypeError issue:
<a class="moz-txt-link-freetext" href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055">https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055</a><br>
<br>
My workaround prevents crashes and outputs a log entry when the NULL
payload is detected.<br>
<br>
Example log excerpt:<br>
<br>
<font face="monospace">script eval 'testcnt=0;
PubSub.subscribe("usr.testev", function(ev) { var
ms=Number(ev.substr(11))||10; if (++testcnt % (3*1000/ms) == 0)
print(ev + ": " + testcnt); OvmsEvents.Raise("usr.testev."+ms,
ms); })'<br>
script eval 'testcnt=0; OvmsEvents.Raise("usr.testev.10")'<br>
I (13493019) events: ScheduleEvent: creating new timer<br>
<b>W (13495919) events: SignalScheduledEvent: duplicate callback
invocation detected</b><br>
I (13497029) ovms-duk-util: [eval:1:] usr.testev.10: 300<br>
I (13501109) housekeeping: 2021-12-30 18:12:35 CET (RAM:
8b=64448-67004 32b=6472)<br>
--<br>
I (13521779) ovms-duk-util: [eval:1:] usr.testev.10: 2100<br>
I (13525809) ovms-duk-util: [eval:1:] usr.testev.10: 2400<br>
<b>W (13527579) events: SignalScheduledEvent: duplicate callback
invocation detected</b><b><br>
</b><b>W (13527629) events: SignalScheduledEvent: duplicate
callback invocation detected</b><br>
I (13529839) ovms-duk-util: [eval:1:] usr.testev.10: 2700<br>
I (13533329) ovms-server-v2: Incoming Msg: MP-0 AFA<br>
--<br>
I (13579149) ovms-duk-util: [eval:1:] usr.testev.10: 6300<br>
I (13583319) ovms-duk-util: [eval:1:] usr.testev.10: 6600<br>
<b>W (13584679) events: SignalScheduledEvent: duplicate callback
invocation detected</b><br>
I (13587439) ovms-duk-util: [eval:1:] usr.testev.10: 6900<br>
I (13591589) ovms-duk-util: [eval:1:] usr.testev.10: 7200<br>
--<br>
I (13714299) ovms-duk-util: [eval:1:] usr.testev.10: 16200<br>
I (13718339) ovms-duk-util: [eval:1:] usr.testev.10: 16500<br>
<b>W (13718719) events: SignalScheduledEvent: duplicate callback
invocation detected</b><br>
I (13722459) ovms-duk-util: [eval:1:] usr.testev.10: 16800<br>
I (13726509) ovms-duk-util: [eval:1:] usr.testev.10: 17100<br>
--<br>
I (13743149) ovms-duk-util: [eval:1:] usr.testev.10: 18300<br>
I (13747129) ovms-duk-util: [eval:1:] usr.testev.10: 18600<br>
<b>W (13748979) events: SignalScheduledEvent: duplicate callback
invocation detected</b><br>
I (13751299) ovms-duk-util: [eval:1:] usr.testev.10: 18900<br>
I (13755349) ovms-duk-util: [eval:1:] usr.testev.10: 19200<br>
--<br>
I (13784029) ovms-duk-util: [eval:1:] usr.testev.10: 21300<br>
I (13788059) ovms-duk-util: [eval:1:] usr.testev.10: 21600<br>
<b>W (13791409) events: SignalScheduledEvent: duplicate callback
invocation detected</b><br>
I (13792179) ovms-duk-util: [eval:1:] usr.testev.10: 21900<br>
I (13796239) ovms-duk-util: [eval:1:] usr.testev.10: 22200<br>
…<br>
</font><br>
<br>
The bug frequency differs from boot to boot, but as you can see can
be very high. I've had runs with ~ 1 occurrence every 300.000
callbacks, and runs like the above with ~ 1 every 3.000 callbacks.<br>
<br>
If this is a common effect with timer callbacks, that may cause some
of the remaining issues. It's possible this only happens with single
shot timers, haven't checked our periodic timers yet.<br>
<br>
Any additional input on this is welcome.<br>
<br>
Regards,<br>
Michael<br>
<br>
<pre class="moz-signature" cols="72">--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
</body>
</html>