[Ovmsdev] Memory leak
Stephen Casner
casner at acm.org
Thu Apr 26 14:16:46 HKT 2018
On Thu, 26 Apr 2018, Mark Webb-Johnson wrote:
> I’m seeing this also, when I stress-test by sending in a large
> canbus dump into retools:
>
> CORRUPT HEAP: Bad head at 0x3f80e3c0. Expected 0xabba1234 got 0x3f80eb00
> assertion "head != NULL" failed: file "/Users/mark/esp/esp-idf/components/heap/./multi_heap_poisoning.c", line 229, function: multi_heap_realloc
Indeed, this is the same error signature as what I was seeing, but the
reason I saw it was because of a mistake in my code. I had not
included the "if (size == 0)" test in ExternalRamRealloc(). Since
mongoose calls MG_REALLOC with size == 0, that caused
heap_caps_realloc() to do heap_caps_free() and return NULL, so my code
called realloc() on that same ptr that had just been freed.
But before my commit this morning I fixed that bug by adding my own test for
size==0 so I could do my own heap_caps_free() and not call calling
heap_caps_realloc().
So, as I stare at the code now, I don't see anything yet. Also, your
crash occurred when MG_REALLOC was called with nonzero size.
> That code path does a lot of realloc’s.
>
> If I switch back to use normal malloc/calloc/realloc, it still crashes.
>
> If I disable the malloc #defined in mongoose mg_locals.h, it still crashes.
>
> The only way I can avoid this is to consume ALL the data on the
> MG_EV_RECV event, which means realloc is presumably not called (so
> often?) within mongoose.
>
> I assumed the conclusion is that fundamentally realloc is broken in
> the ESP IDF version we use.
So, by switching back to the old code you've also verified that it was
not my new code. But why are you seeing this crash now when we have
not seen it before? Is there any way that my moving
ExternalRamMalloc() to a C namespace could cause trouble in some
allocation outside of mongoose leaving a bad heap for the mongoose
call to trip over?
> I tried updating our ESP-IDF to espressif/master, but get compile
> errors in build/heap:
>
> CC build/heap/multi_heap_poisoning.o
> In file included from /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:27:0:
> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:66:32: error: unknown type name 'TaskHandle_t'
> #define MULTI_HEAP_BLOCK_OWNER TaskHandle_t task;
> ^
> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:50:5: note: in expansion of macro 'MULTI_HEAP_BLOCK_OWNER'
> MULTI_HEAP_BLOCK_OWNER
> ^
> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'poison_allocated_region':
> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:67:57: error: implicit declaration of function 'xTaskGetCurrentTaskHandle' [-Werror=implicit-function-declaration]
> #define MULTI_HEAP_SET_BLOCK_OWNER(HEAD) (HEAD)->task = xTaskGetCurrentTaskHandle()
> ^
> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:71:5: note: in expansion of macro 'MULTI_HEAP_SET_BLOCK_OWNER'
> MULTI_HEAP_SET_BLOCK_OWNER(head);
> ^
> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'multi_heap_get_block_owner':
> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:68:42: warning: return makes pointer from integer without a cast [-Wint-conversion]
> #define MULTI_HEAP_GET_BLOCK_OWNER(HEAD) ((HEAD)->task)
> ^
> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:284:12: note: in expansion of macro 'MULTI_HEAP_GET_BLOCK_OWNER'
> return MULTI_HEAP_GET_BLOCK_OWNER((poison_head_t*)multi_heap_get_block_address_impl(block));
> ^
> cc1: some warnings being treated as errors
> make[1]: *** [multi_heap_poisoning.o] Error 1
Those three macros are part of the code I provided in the pull
request, but in 4c532a59b2503dd8bc60d437bb65961676db9e55 we already
updated to a version of expressif/esp-idf master with the pull request
included. It seems that the references in the macros are to other
code in ESP-IDF that used to be valid but now may not be. Perhaps
they made a change but didn't test compilation with
CONFIG_HEAP_TASK_TRACKING=y?
> I then tried:
>
> +CONFIG_HEAP_POISONING_DISABLED=y
> +CONFIG_HEAP_POISONING_LIGHT=
> -CONFIG_HEAP_TASK_TRACKING=y
>
> But no difference. Still crashes.
Do you mean that it then compiles OK, but when you run the stress test
it crashes? If it compiles OK, then my hypothesis that they made a
change and didn't test with CONFIG_HEAP_TASK_TRACKING=y might be
right.
> Also tried:
>
> -CONFIG_FREERTOS_USE_TRACE_FACILITY=y
> -CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=
> +CONFIG_FREERTOS_USE_TRACE_FACILITY=
>
> But no difference. Still crashes.
>
> Although not in a simply way:
>
> OVMS# module memory
> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4141392/4194252
>
> OVMS# test realloc
> First check heap integrity...
> Now allocate 4KB RAM...
> Check heap integrity...
> Now re-allocate bigger, 1,000 times...
> Check heap integrity...
> Now re-allocate smaller, 1,000 times...
> Check heap integrity...
> And free the buffer...
> Final check of heap integrity…
>
> OVMS# module memory
> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4140892/4194252
>
> Maybe something inside mongoose itself? As that is the only place we are seeing this.
You say it still crashes, but you don't show a crash here?
-- Steve
More information about the OvmsDev
mailing list