[Ovmsdev] Heap corruption alerts & heap tracing
Michael Balzer
dexter at expeedo.de
Mon Dec 29 02:21:38 HKT 2025
PS: when creating a debug build, you may also consider enabling the
"comprehensive" corruption detection mode, as that will catch more cases.
But be aware that has a substantial impact on performance, so a user
will probably not tolerate this for regular daily use.
Regards,
Michael
Am 28.12.25 um 18:26 schrieb Michael Balzer via OvmsDev:
> Everyone,
>
> from the crash reports
> (https://ovms.dexters-web.de/firmware/developer/), most remaining
> crashes seem to be caused by heap corruptions.
>
> Not all heap corruptions are easily detectable from the backtrace
> analysis, and the component or action causing the corruption isn't
> detectable at all that way. So I've added a debug option to enable a
> regular heap integrity check ever 5 minutes, with the module sending
> an alert notification when a corruption has been detected. Example:
>
>> Heap corruption detected; reboot advised ASAP!
>> Please forward including task records and system log:
>>
>> CORRUPT HEAP: Bad tail at 0x3f8e44f0 owner 0x3ffea9bc. Expected
>> 0xbaad5678 got 0xbaad5600
>
> I also added a final heap integrity check to our crash handler, so the
> crash debug records should now show exactly which crashes occurred
> with a corrupted heap.
>
> In combination with the system log and the task log, that should give
> us some more opportunities to narrow down the cause(s).
>
> I've also added task ownershop to the heap corruption report. Note,
> this needs my latest additions to our esp-idf fork, so take care to
> pull these before building.
>
> Be aware, task ownership of corrupted blocks doesn't necessarily tell
> about the task doing the corruption. If the tail canary is
> compromised, and no other block located before that block is
> compromised, it *may* be that task doing the out of bounds write. But
> it may also be a use after free of some previous owner. So take task
> ownership with a grain of salt.
>
> The corruptions are most probably caused by some unclean shutdown of a
> component or by an undetected race conditions within a shutdown
> procedure. The heap seems to be stable on modules with standard
> configurations and components not being started & shut down on a
> regular base. The heap corruptions are especially present now with
> Smart (SQ) vehicles -- as the Smart doesn't keep the 12V battery
> charged from the main battery, most Smart users probably use the power
> management to shut down Wifi and/or modem while parking.
>
> So our main focus should be on analysing what happens before the
> corruption. Ask users reporting heap corruptions to provide their
> system logs, and possibly also their task logs. To encourage enabling
> these, I've added the config to the web UI (Config→Notifications).
>
>
> Once you can reproduce (!) the corruption, heap tracing might provide
> some more insight as to where exactly the corruption occurs.
>
> Heap tracing means recording all memory allocations and frees. This
> adds a recording layer on top of the heap functions, so comes with
> some cost, even when inactive. CPU overhead is low, but stack overhead
> may be an issue, so I think we should not enable heap tracing by
> default for now, but rather use a debug build specifically in cases we
> think heap tracing might help.
>
> To enable heap tracing on some user device, I've reworked the ESP-IDF
> heap tracing to enable remote execution and to also include the task
> handles performing the allocations and deallocations.
>
> To enable heap tracing for a build:
>
>> * Under|makemenuconfig|, navigate
>> to|Componentsettings|->|HeapMemoryDebugging|and
>> setCONFIG_HEAP_TRACING
>> <https://docs.espressif.com/projects/esp-idf/en/v3.3/api-reference/kconfig.html#config-heap-tracing>.
>>
>
> There's also an option to set the number of stack backtrace frames.
> Tracing with two frames is mostly useless in our (C++) context, as it
> will normally only show some inner frames of the allocator. I've tried
> raising that to 5, and got an immediate crash in the mdns component. I
> assume raising the depth will need raising some stack sizes. If you
> find a good compromise, please report.
>
> Heap tracing will work best in a reduced configuration. In normal
> operation, my module fills a 500 records buffer within seconds. The
> buffer is a FIFO, so will always contain the latest n allocations, and
> the last entry in the dump is the newest one.
>
> Reduced example dump:
>
>> OVMS# mod trace dump
>> Heap tracing started/resumed at ccount 0x1c70b453 = logtime 19415
>> 1000 allocations trace (1000 entry buffer)
>> 258 bytes (@ 0x3fffb3a8) allocated CPU 1 task 0x3ffc92e8 ccount
>> 0x89074e74 caller 0x4031d4f0:0x4031f553
>> freed task 0x3ffc92e8 by 0x4031d514:0x4031f595
>> 201 bytes (@ 0x3f8e92f4) allocated CPU 1 task 0x3fff2290 ccount
>> 0x89218bbc caller 0x40131b12:0x4014633c
>> freed task 0x3fff2290 by 0x402c4634:0x402cb14d
>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>> 0x8a25d47c caller 0x4029b455:0x4017a5dc
>> freed task 0x3ffee918 by 0x4029b744:0x4017a5dc
>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>> 0x8db89c40 caller 0x4029b455:0x4017a5dc
>> freed task 0x3ffee918 by 0x4029b744:0x4017a5dc
>> 12 bytes (@ 0x3f85d0d4) allocated CPU 1 task 0x3ffdfcd0 ccount
>> 0x8fccdfd8 caller 0x40131b12:0x4014633c
>> freed task 0x3ffdfcd0 by 0x402c4634:0x401e0a04
>> […]
>> 12 bytes (@ 0x3f85d0d4) allocated CPU 1 task 0x3ffdfcd0 ccount
>> 0x3cf429f4 caller 0x40131b12:0x4014633c
>> freed task 0x3ffdfcd0 by 0x402c4634:0x401e0a04
>> 112 bytes (@ 0x3ffebc80) allocated CPU 1 task 0x3ffee918 ccount
>> 0x3fe1a0bc caller 0x4029b455:0x4017a5dc
>> 229 bytes alive in trace (3/1000 allocations)
>> total allocations 1207 total frees 3356
>> (NB: Buffer has overflowed; so trace data is incomplete.)
>
> "ccount" is the ESP32 CCOUNT register (CPU cycles), so provides an
> orientation on when the allocation occurred.
>
>
> Some more basics on heap debugging are also covered here:
> https://docs.espressif.com/projects/esp-idf/en/v3.3/api-reference/system/heap_debug.html
>
> Regards,
> Michael
>
> --
> Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
> Fon 02330 9104094 * Handy 0176 20698926
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
Fon 02330 9104094 * Handy 0176 20698926
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20251228/59d686c6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20251228/59d686c6/attachment-0001.sig>
More information about the OvmsDev
mailing list