[Ovmsdev] Memory leak

Thu Apr 26 17:03:34 HKT 2018

>  But why are you seeing this crash now when we have
> not seen it before? 

I haven’t really done much with that code path in the past. It is a high volume of data being uploaded to the module, and due to the way my code was handling it (badly), it resulted in a lot of realloc’s.

For CRTD format, I read line-by-line. The issue is that the mongoose buffer presented may not have a complete line in it.

Mongoose was sending me a MG_EV_RECV with about 1KB in the buffer.
I removed about 30 bytes (one line), and returned.
That meant mongoose realloc’d down to 1KB - 30 bytes, then called me again. Rinse and repeat.

I first noticed this because SPI RAM memory consumption was going through the roof during retools canbus dump simulations. I was handling the incoming messages so slowly, that they were coming in faster than they were handled. Lesson learned: when mongoose tells you it has data, consume it otherwise mongoose will keep receiving (a sensible framework would back off the receiving if the consumer was not fast enough).

My change now is to consume all I can on the MG_EV_RECV (in a while loop), but if there is half a line left over that gets left in the mongoose buffer and that causes a realloc.

To get around this, I’ll have to do my own line buffering (so I can completely drain the mongoose buffer on MG_EV_RECV). A hassle, but I’ll try to get it done tonight. That will resolve my problem, but I don’t think it addresses the core mongoose heap corruption issue.

My code is in retools.cpp re::MongooseHandler MG_EV_RECV part.

> You say it still crashes, but you don't show a crash here?

I can get it to crash in retools. Repeatedly. Easy and every time.

Start a wifi
Start a vehicle mode, on can1 (maybe not necessary, but I do it anyway)
OVMS# re start
OVMS# re serve mode simulate
$ cat my-canbus-dump.crtd | nc TESTBOX.local 3000
Sit back and watch the crash

If you don’t have a CRTD file, just use this snippet:

593.138 1R11 7E8 00 00 00 00 00 00 00 00
593.138 1R11 202 4D 5F 07 9D C4 09 3F 23
593.138 1R11 20A 00 01 01 83 D1 3E 00 00
593.138 1R11 228 40 C0 30 F8
593.138 1R11 6F2 1F 18 08 F9 A1 80 08 1F
593.138 1R11 212 D8 09 12 71 80

Make 1,000 copies of it, appending them all to one .crtd file. That can be your test data.

So, I then wrote a ‘test realloc’ in test_framework.cpp to see if I could repeat it. I couldn’t. Which leads me to suspect the issue is in mongoose (or with the way I am only consuming part of the mongoose buffer).

Regards, Mark.

> On 26 Apr 2018, at 2:16 PM, Stephen Casner <casner at acm.org> wrote:
> 
> On Thu, 26 Apr 2018, Mark Webb-Johnson wrote:
> 
>> I’m seeing this also, when I stress-test by sending in a large
>> canbus dump into retools:
>> 
>> CORRUPT HEAP: Bad head at 0x3f80e3c0. Expected 0xabba1234 got 0x3f80eb00
>> assertion "head != NULL" failed: file "/Users/mark/esp/esp-idf/components/heap/./multi_heap_poisoning.c", line 229, function: multi_heap_realloc
> 
> Indeed, this is the same error signature as what I was seeing, but the
> reason I saw it was because of a mistake in my code.  I had not
> included the "if (size == 0)" test in ExternalRamRealloc().  Since
> mongoose calls MG_REALLOC with size == 0, that caused
> heap_caps_realloc() to do heap_caps_free() and return NULL, so my code
> called realloc() on that same ptr that had just been freed.
> 
> But before my commit this morning I fixed that bug by adding my own test for
> size==0 so I could do my own heap_caps_free() and not call calling
> heap_caps_realloc().
> 
> So, as I stare at the code now, I don't see anything yet.  Also, your
> crash occurred when MG_REALLOC was called with nonzero size.
> 
>> That code path does a lot of realloc’s.
>> 
>> If I switch back to use normal malloc/calloc/realloc, it still crashes.
>> 
>> If I disable the malloc #defined in mongoose mg_locals.h, it still crashes.
>> 
>> The only way I can avoid this is to consume ALL the data on the
>> MG_EV_RECV event, which means realloc is presumably not called (so
>> often?) within mongoose.
>> 
>> I assumed the conclusion is that fundamentally realloc is broken in
>> the ESP IDF version we use.
> 
> So, by switching back to the old code you've also verified that it was
> not my new code.  But why are you seeing this crash now when we have
> not seen it before?  Is there any way that my moving
> ExternalRamMalloc() to a C namespace could cause trouble in some
> allocation outside of mongoose leaving a bad heap for the mongoose
> call to trip over?
> 
>> I tried updating our ESP-IDF to espressif/master, but get compile
>> errors in build/heap:
>> 
>> CC build/heap/multi_heap_poisoning.o
>> In file included from /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:27:0:
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:66:32: error: unknown type name 'TaskHandle_t'
>> #define MULTI_HEAP_BLOCK_OWNER TaskHandle_t task;
>>                                ^
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:50:5: note: in expansion of macro 'MULTI_HEAP_BLOCK_OWNER'
>>     MULTI_HEAP_BLOCK_OWNER
>>     ^
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'poison_allocated_region':
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:67:57: error: implicit declaration of function 'xTaskGetCurrentTaskHandle' [-Werror=implicit-function-declaration]
>> #define MULTI_HEAP_SET_BLOCK_OWNER(HEAD) (HEAD)->task = xTaskGetCurrentTaskHandle()
>>                                                         ^
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:71:5: note: in expansion of macro 'MULTI_HEAP_SET_BLOCK_OWNER'
>>     MULTI_HEAP_SET_BLOCK_OWNER(head);
>>     ^
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'multi_heap_get_block_owner':
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:68:42: warning: return makes pointer from integer without a cast [-Wint-conversion]
>> #define MULTI_HEAP_GET_BLOCK_OWNER(HEAD) ((HEAD)->task)
>>                                          ^
>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:284:12: note: in expansion of macro 'MULTI_HEAP_GET_BLOCK_OWNER'
>>     return MULTI_HEAP_GET_BLOCK_OWNER((poison_head_t*)multi_heap_get_block_address_impl(block));
>>            ^
>> cc1: some warnings being treated as errors
>> make[1]: *** [multi_heap_poisoning.o] Error 1
> 
> Those three macros are part of the code I provided in the pull
> request, but in 4c532a59b2503dd8bc60d437bb65961676db9e55 we already
> updated to a version of expressif/esp-idf master with the pull request
> included.  It seems that the references in the macros are to other
> code in ESP-IDF that used to be valid but now may not be.  Perhaps
> they made a change but didn't test compilation with
> CONFIG_HEAP_TASK_TRACKING=y?
> 
>> I then tried:
>> 
>> +CONFIG_HEAP_POISONING_DISABLED=y
>> +CONFIG_HEAP_POISONING_LIGHT=
>> -CONFIG_HEAP_TASK_TRACKING=y
>> 
>> But no difference. Still crashes.
> 
> Do you mean that it then compiles OK, but when you run the stress test
> it crashes?  If it compiles OK, then my hypothesis that they made a
> change and didn't test with CONFIG_HEAP_TASK_TRACKING=y might be
> right.
> 
>> Also tried:
>> 
>> -CONFIG_FREERTOS_USE_TRACE_FACILITY=y
>> -CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=
>> +CONFIG_FREERTOS_USE_TRACE_FACILITY=
>> 
>> But no difference. Still crashes.
>> 
>> Although not in a simply way:
>> 
>> OVMS# module memory
>> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4141392/4194252
>> 
>> OVMS# test realloc
>> First check heap integrity...
>> Now allocate 4KB RAM...
>> Check heap integrity...
>> Now re-allocate bigger, 1,000 times...
>> Check heap integrity...
>> Now re-allocate smaller, 1,000 times...
>> Check heap integrity...
>> And free the buffer...
>> Final check of heap integrity…
>> 
>> OVMS# module memory
>> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4140892/4194252
>> 
>> Maybe something inside mongoose itself? As that is the only place we are seeing this.
> 
> You say it still crashes, but you don't show a crash here?
> 
>                                                        -- Steve_______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20180426/c4edbb19/attachment-0001.html>