[Ovmsdev] Heap corruption alerts & heap tracing
Michael Balzer
dexter at expeedo.de
Mon Dec 29 18:42:54 HKT 2025
The heap check + one-off alert function is now exposed for inclusion in
component code and can be called as a shell command to enable inclusion
in event scripts et al:
> #include "ovms_module.h"
>
> /**
> * module_check_heap_alert: check for and send one-off alert
> notification on heap corruption
> *
> * To enable the check every 5 minutes, set config "module"
> "debug.heap" to "yes".
> *
> * To add custom checks, call from your code, or register event
> scripts as needed.
> * Example: perform heap integrity check when the server V2 gets
> stopped:
> * vfs echo "module check alert"
> /store/events/server.v2.stopped/90-checkheap
> *
> * @param verbosity -- optional: channel capacity (default 0)
> * @param OvmsWriter -- optional: channel (default NULL)
> * @return heapok -- false = heap corrupted/full
> */
> extern bool module_check_heap_alert(int verbosity=0, OvmsWriter*
> writer=NULL);
Regards,
Michael
Am 28.12.25 um 19:21 schrieb Michael Balzer via OvmsDev:
> PS: when creating a debug build, you may also consider enabling the
> "comprehensive" corruption detection mode, as that will catch more cases.
>
> But be aware that has a substantial impact on performance, so a user
> will probably not tolerate this for regular daily use.
>
> Regards,
> Michael
>
>
> Am 28.12.25 um 18:26 schrieb Michael Balzer via OvmsDev:
>> Everyone,
>>
>> from the crash reports
>> (https://ovms.dexters-web.de/firmware/developer/), most remaining
>> crashes seem to be caused by heap corruptions.
>>
>> Not all heap corruptions are easily detectable from the backtrace
>> analysis, and the component or action causing the corruption isn't
>> detectable at all that way. So I've added a debug option to enable a
>> regular heap integrity check ever 5 minutes, with the module sending
>> an alert notification when a corruption has been detected. Example:
>>
>>> Heap corruption detected; reboot advised ASAP!
>>> Please forward including task records and system log:
>>>
>>> CORRUPT HEAP: Bad tail at 0x3f8e44f0 owner 0x3ffea9bc. Expected
>>> 0xbaad5678 got 0xbaad5600
>>
>> I also added a final heap integrity check to our crash handler, so
>> the crash debug records should now show exactly which crashes
>> occurred with a corrupted heap.
>>
>> In combination with the system log and the task log, that should give
>> us some more opportunities to narrow down the cause(s).
>>
>> I've also added task ownershop to the heap corruption report. Note,
>> this needs my latest additions to our esp-idf fork, so take care to
>> pull these before building.
>>
>> Be aware, task ownership of corrupted blocks doesn't necessarily tell
>> about the task doing the corruption. If the tail canary is
>> compromised, and no other block located before that block is
>> compromised, it *may* be that task doing the out of bounds write. But
>> it may also be a use after free of some previous owner. So take task
>> ownership with a grain of salt.
>>
>> The corruptions are most probably caused by some unclean shutdown of
>> a component or by an undetected race conditions within a shutdown
>> procedure. The heap seems to be stable on modules with standard
>> configurations and components not being started & shut down on a
>> regular base. The heap corruptions are especially present now with
>> Smart (SQ) vehicles -- as the Smart doesn't keep the 12V battery
>> charged from the main battery, most Smart users probably use the
>> power management to shut down Wifi and/or modem while parking.
>>
>> So our main focus should be on analysing what happens before the
>> corruption. Ask users reporting heap corruptions to provide their
>> system logs, and possibly also their task logs. To encourage enabling
>> these, I've added the config to the web UI (Config→Notifications).
>>
>>
>> Once you can reproduce (!) the corruption, heap tracing might provide
>> some more insight as to where exactly the corruption occurs.
>>
>> Heap tracing means recording all memory allocations and frees. This
>> adds a recording layer on top of the heap functions, so comes with
>> some cost, even when inactive. CPU overhead is low, but stack
>> overhead may be an issue, so I think we should not enable heap
>> tracing by default for now, but rather use a debug build specifically
>> in cases we think heap tracing might help.
>>
>> To enable heap tracing on some user device, I've reworked the ESP-IDF
>> heap tracing to enable remote execution and to also include the task
>> handles performing the allocations and deallocations.
>>
>> To enable heap tracing for a build:
>>
>>> * Under|makemenuconfig|, navigate
>>> to|Componentsettings|->|HeapMemoryDebugging|and
>>> setCONFIG_HEAP_TRACING
>>> <https://docs.espressif.com/projects/esp-idf/en/v3.3/api-reference/kconfig.html#config-heap-tracing>.
>>>
>>
>> There's also an option to set the number of stack backtrace frames.
>> Tracing with two frames is mostly useless in our (C++) context, as it
>> will normally only show some inner frames of the allocator. I've
>> tried raising that to 5, and got an immediate crash in the mdns
>> component. I assume raising the depth will need raising some stack
>> sizes. If you find a good compromise, please report.
>>
>> Heap tracing will work best in a reduced configuration. In normal
>> operation, my module fills a 500 records buffer within seconds. The
>> buffer is a FIFO, so will always contain the latest n allocations,
>> and the last entry in the dump is the newest one.
>>
>> Reduced example dump:
>>
>>> OVMS# mod trace dump
>>> Heap tracing started/resumed at ccount 0x1c70b453 = logtime 19415
>>> 1000 allocations trace (1000 entry buffer)
>>> 258 bytes (@ 0x3fffb3a8) allocated CPU 1 task 0x3ffc92e8 ccount
>>> 0x89074e74 caller 0x4031d4f0:0x4031f553
>>> freed task 0x3ffc92e8 by 0x4031d514:0x4031f595
>>> 201 bytes (@ 0x3f8e92f4) allocated CPU 1 task 0x3fff2290 ccount
>>> 0x89218bbc caller 0x40131b12:0x4014633c
>>> freed task 0x3fff2290 by 0x402c4634:0x402cb14d
>>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>>> 0x8a25d47c caller 0x4029b455:0x4017a5dc
>>> freed task 0x3ffee918 by 0x4029b744:0x4017a5dc
>>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>>> 0x8db89c40 caller 0x4029b455:0x4017a5dc
>>> freed task 0x3ffee918 by 0x4029b744:0x4017a5dc
>>> 12 bytes (@ 0x3f85d0d4) allocated CPU 1 task 0x3ffdfcd0 ccount
>>> 0x8fccdfd8 caller 0x40131b12:0x4014633c
>>> freed task 0x3ffdfcd0 by 0x402c4634:0x401e0a04
>>> […]
>>> 12 bytes (@ 0x3f85d0d4) allocated CPU 1 task 0x3ffdfcd0 ccount
>>> 0x3cf429f4 caller 0x40131b12:0x4014633c
>>> freed task 0x3ffdfcd0 by 0x402c4634:0x401e0a04
>>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>>> 0x3fe1a0bc caller 0x4029b455:0x4017a5dc
>>> 229 bytes alive in trace (3/1000 allocations)
>>> total allocations 1207 total frees 3356
>>> (NB: Buffer has overflowed; so trace data is incomplete.)
>>
>> "ccount" is the ESP32 CCOUNT register (CPU cycles), so provides an
>> orientation on when the allocation occurred.
>>
>>
>> Some more basics on heap debugging are also covered here:
>> https://docs.espressif.com/projects/esp-idf/en/v3.3/api-reference/system/heap_debug.html
>>
>> Regards,
>> Michael
>>
>> --
>> Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
>> Fon 02330 9104094 * Handy 0176 20698926
>>
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> --
> Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
> Fon 02330 9104094 * Handy 0176 20698926
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
Fon 02330 9104094 * Handy 0176 20698926
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20251229/c8fd00ac/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20251229/c8fd00ac/attachment-0001.sig>
More information about the OvmsDev
mailing list