[Ovmsdev] Duplicate timer callbacks (was: Branch for-v3.3 network issues)

Michael Balzer dexter at expeedo.de
Fri Jan 14 00:03:37 HKT 2022


Everyone,

I've managed to reproduce the effect on a standard ESP32 development 
board with a simple test project only involving timers, some CPU/memory 
load & wifi networking.

I've tested both the standard esp-idf release 3.3 and the latest esp-idf 
release 4.4 (using gcc 8.4) for this, and the effect is still present.

→ Bug report: https://github.com/espressif/esp-idf/issues/8234

Attached are my test projects if you'd like to reproduce this or improve 
the test.

I haven't tested periodic timer callbacks yet for the effect. These are 
normally designed to run periodically, but if the timing is important 
(e.g. on some CAN transmission?), this could cause erroneous behaviour 
as well.

Regards,
Michael


Am 30.12.21 um 19:25 schrieb Michael Balzer:
> Followup to…
>
> Am 26.09.21 um 19:28 schrieb Michael Balzer:
>> If so, my conclusions so far would be:
>>
>> a) we've got a real heap corruption issue that sometimes gets 
>> triggered by the test. I've had it twice around the same place, i.e. 
>> within the scheduled event processing. I'll check my code.
>>
>> Am 26.09.21 um 18:46 schrieb Craig Leres:
>>> Second crash on the production module, CORRUPT HEAP after ~350 minutes.
>>>
>>> OVMS# CORRUPT HEAP: Bad head at 0x3f8adc78. Expected 0xabba1234 got 
>>> 0x3f8adcc0
>>> abort() was called at PC 0x400844c3 on core 0
>
> I've found the heap corruption source and a new bug class:
>
> The corruption was caused by a duplicate free() (here via delete), 
> which was basically impossible: it was the free() call for the event 
> message for a delayed event delivery. There is exactly one producer 
> (OvmsEvents::ScheduleEvent) and exactly one consumer 
> (OvmsEvents::SignalScheduledEvent), which is called exactly once -- 
> when the single shot timer expires.
>
> In theory. _In reality, the timer callback occasionally gets executed 
> twice_. To exclude every possible race condition, I enclosed both 
> producer & consumer into a semaphore lock. I then changed the code in 
> order to clear the timer payload as soon as it's read, and added a 
> test for a NULL payload -- and voila, the timer callback now gets 
> occasionally called with a NULL payload, which is also impossible as 
> the allocation result is checked in the producer.
>
> I've had no luck reproducing this in a reduced test project, even with 
> multiple auto reload timers and distribution over both cores, but I 
> still see no other explanation than a bug in the FreeRTOS timer 
> service (or the Espressif FreeRTOS multi core adaption). Ah, and yes, 
> this occurs on the ESP32/r3 as well.
>
> You should be able to reproduce the effect using the same event test 
> loop as for the Duktape TypeError issue: 
> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055
>
> My workaround prevents crashes and outputs a log entry when the NULL 
> payload is detected.
>
> Example log excerpt:
>
> script eval 'testcnt=0; PubSub.subscribe("usr.testev", function(ev) { 
> var ms=Number(ev.substr(11))||10; if (++testcnt % (3*1000/ms) == 0) 
> print(ev + ": " + testcnt); OvmsEvents.Raise("usr.testev."+ms, ms); })'
> script eval 'testcnt=0; OvmsEvents.Raise("usr.testev.10")'
> I (13493019) events: ScheduleEvent: creating new timer
> *W (13495919) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13497029) ovms-duk-util: [eval:1:] usr.testev.10: 300
> I (13501109) housekeeping: 2021-12-30 18:12:35 CET (RAM: 
> 8b=64448-67004 32b=6472)
> --
> I (13521779) ovms-duk-util: [eval:1:] usr.testev.10: 2100
> I (13525809) ovms-duk-util: [eval:1:] usr.testev.10: 2400
> *W (13527579) events: SignalScheduledEvent: duplicate callback 
> invocation detected**
> **W (13527629) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13529839) ovms-duk-util: [eval:1:] usr.testev.10: 2700
> I (13533329) ovms-server-v2: Incoming Msg: MP-0 AFA
> --
> I (13579149) ovms-duk-util: [eval:1:] usr.testev.10: 6300
> I (13583319) ovms-duk-util: [eval:1:] usr.testev.10: 6600
> *W (13584679) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13587439) ovms-duk-util: [eval:1:] usr.testev.10: 6900
> I (13591589) ovms-duk-util: [eval:1:] usr.testev.10: 7200
> --
> I (13714299) ovms-duk-util: [eval:1:] usr.testev.10: 16200
> I (13718339) ovms-duk-util: [eval:1:] usr.testev.10: 16500
> *W (13718719) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13722459) ovms-duk-util: [eval:1:] usr.testev.10: 16800
> I (13726509) ovms-duk-util: [eval:1:] usr.testev.10: 17100
> --
> I (13743149) ovms-duk-util: [eval:1:] usr.testev.10: 18300
> I (13747129) ovms-duk-util: [eval:1:] usr.testev.10: 18600
> *W (13748979) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13751299) ovms-duk-util: [eval:1:] usr.testev.10: 18900
> I (13755349) ovms-duk-util: [eval:1:] usr.testev.10: 19200
> --
> I (13784029) ovms-duk-util: [eval:1:] usr.testev.10: 21300
> I (13788059) ovms-duk-util: [eval:1:] usr.testev.10: 21600
> *W (13791409) events: SignalScheduledEvent: duplicate callback 
> invocation detected*
> I (13792179) ovms-duk-util: [eval:1:] usr.testev.10: 21900
> I (13796239) ovms-duk-util: [eval:1:] usr.testev.10: 22200
>>
>
> The bug frequency differs from boot to boot, but as you can see can be 
> very high. I've had runs with ~ 1 occurrence every 300.000 callbacks, 
> and runs like the above with ~ 1 every 3.000 callbacks.
>
> If this is a common effect with timer callbacks, that may cause some 
> of the remaining issues. It's possible this only happens with single 
> shot timers, haven't checked our periodic timers yet.
>
> Any additional input on this is welcome.
>
> Regards,
> Michael
>
> -- 
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20220113/4b9b83d2/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-timer-dupecall.zip
Type: application/zip
Size: 35758 bytes
Desc: not available
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20220113/4b9b83d2/attachment-0001.zip>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20220113/4b9b83d2/attachment-0001.sig>


More information about the OvmsDev mailing list