[Ovmsdev] Memory leak

Thu Apr 26 22:13:36 HKT 2018

The code I’ve just committed re-factors the can dump system, and retools, to always consume all the data from MG_EV_RECV. That reduces memory consumption, and works around the realloc issue.

It seems to work in my limited testing here, and performance is acceptable. I can transmit a 1MB CRTD file in about 10 seconds. About 25,000 CAN bus messages.

The CRTB format (binary) will require less encoding/decoding and should be faster still.

Regards, Mark.

> On 26 Apr 2018, at 5:03 PM, Mark Webb-Johnson <mark at webb-johnson.net> wrote:
> 
> 
>>  But why are you seeing this crash now when we have
>> not seen it before? 
> 
> I haven’t really done much with that code path in the past. It is a high volume of data being uploaded to the module, and due to the way my code was handling it (badly), it resulted in a lot of realloc’s.
> 
> For CRTD format, I read line-by-line. The issue is that the mongoose buffer presented may not have a complete line in it.
> 
> Mongoose was sending me a MG_EV_RECV with about 1KB in the buffer.
> I removed about 30 bytes (one line), and returned.
> That meant mongoose realloc’d down to 1KB - 30 bytes, then called me again. Rinse and repeat.
> 
> I first noticed this because SPI RAM memory consumption was going through the roof during retools canbus dump simulations. I was handling the incoming messages so slowly, that they were coming in faster than they were handled. Lesson learned: when mongoose tells you it has data, consume it otherwise mongoose will keep receiving (a sensible framework would back off the receiving if the consumer was not fast enough).
> 
> My change now is to consume all I can on the MG_EV_RECV (in a while loop), but if there is half a line left over that gets left in the mongoose buffer and that causes a realloc.
> 
> To get around this, I’ll have to do my own line buffering (so I can completely drain the mongoose buffer on MG_EV_RECV). A hassle, but I’ll try to get it done tonight. That will resolve my problem, but I don’t think it addresses the core mongoose heap corruption issue.
> 
> My code is in retools.cpp re::MongooseHandler MG_EV_RECV part.
> 
>> You say it still crashes, but you don't show a crash here?
> 
> 
> I can get it to crash in retools. Repeatedly. Easy and every time.
> 
> Start a wifi
> Start a vehicle mode, on can1 (maybe not necessary, but I do it anyway)
> OVMS# re start
> OVMS# re serve mode simulate
> $ cat my-canbus-dump.crtd | nc TESTBOX.local 3000
> Sit back and watch the crash
> 
> If you don’t have a CRTD file, just use this snippet:
> 
> 593.138 1R11 7E8 00 00 00 00 00 00 00 00
> 593.138 1R11 202 4D 5F 07 9D C4 09 3F 23
> 593.138 1R11 20A 00 01 01 83 D1 3E 00 00
> 593.138 1R11 228 40 C0 30 F8
> 593.138 1R11 6F2 1F 18 08 F9 A1 80 08 1F
> 593.138 1R11 212 D8 09 12 71 80
> 
> Make 1,000 copies of it, appending them all to one .crtd file. That can be your test data.
> 
> So, I then wrote a ‘test realloc’ in test_framework.cpp to see if I could repeat it. I couldn’t. Which leads me to suspect the issue is in mongoose (or with the way I am only consuming part of the mongoose buffer).
> 
> Regards, Mark.
> 
>> On 26 Apr 2018, at 2:16 PM, Stephen Casner <casner at acm.org <mailto:casner at acm.org>> wrote:
>> 
>> On Thu, 26 Apr 2018, Mark Webb-Johnson wrote:
>> 
>>> I’m seeing this also, when I stress-test by sending in a large
>>> canbus dump into retools:
>>> 
>>> CORRUPT HEAP: Bad head at 0x3f80e3c0. Expected 0xabba1234 got 0x3f80eb00
>>> assertion "head != NULL" failed: file "/Users/mark/esp/esp-idf/components/heap/./multi_heap_poisoning.c", line 229, function: multi_heap_realloc
>> 
>> Indeed, this is the same error signature as what I was seeing, but the
>> reason I saw it was because of a mistake in my code.  I had not
>> included the "if (size == 0)" test in ExternalRamRealloc().  Since
>> mongoose calls MG_REALLOC with size == 0, that caused
>> heap_caps_realloc() to do heap_caps_free() and return NULL, so my code
>> called realloc() on that same ptr that had just been freed.
>> 
>> But before my commit this morning I fixed that bug by adding my own test for
>> size==0 so I could do my own heap_caps_free() and not call calling
>> heap_caps_realloc().
>> 
>> So, as I stare at the code now, I don't see anything yet.  Also, your
>> crash occurred when MG_REALLOC was called with nonzero size.
>> 
>>> That code path does a lot of realloc’s.
>>> 
>>> If I switch back to use normal malloc/calloc/realloc, it still crashes.
>>> 
>>> If I disable the malloc #defined in mongoose mg_locals.h, it still crashes.
>>> 
>>> The only way I can avoid this is to consume ALL the data on the
>>> MG_EV_RECV event, which means realloc is presumably not called (so
>>> often?) within mongoose.
>>> 
>>> I assumed the conclusion is that fundamentally realloc is broken in
>>> the ESP IDF version we use.
>> 
>> So, by switching back to the old code you've also verified that it was
>> not my new code.  But why are you seeing this crash now when we have
>> not seen it before?  Is there any way that my moving
>> ExternalRamMalloc() to a C namespace could cause trouble in some
>> allocation outside of mongoose leaving a bad heap for the mongoose
>> call to trip over?
>> 
>>> I tried updating our ESP-IDF to espressif/master, but get compile
>>> errors in build/heap:
>>> 
>>> CC build/heap/multi_heap_poisoning.o
>>> In file included from /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:27:0:
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:66:32: error: unknown type name 'TaskHandle_t'
>>> #define MULTI_HEAP_BLOCK_OWNER TaskHandle_t task;
>>>                                ^
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:50:5: note: in expansion of macro 'MULTI_HEAP_BLOCK_OWNER'
>>>     MULTI_HEAP_BLOCK_OWNER
>>>     ^
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'poison_allocated_region':
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:67:57: error: implicit declaration of function 'xTaskGetCurrentTaskHandle' [-Werror=implicit-function-declaration]
>>> #define MULTI_HEAP_SET_BLOCK_OWNER(HEAD) (HEAD)->task = xTaskGetCurrentTaskHandle()
>>>                                                         ^
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:71:5: note: in expansion of macro 'MULTI_HEAP_SET_BLOCK_OWNER'
>>>     MULTI_HEAP_SET_BLOCK_OWNER(head);
>>>     ^
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'multi_heap_get_block_owner':
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:68:42: warning: return makes pointer from integer without a cast [-Wint-conversion]
>>> #define MULTI_HEAP_GET_BLOCK_OWNER(HEAD) ((HEAD)->task)
>>>                                          ^
>>> /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:284:12: note: in expansion of macro 'MULTI_HEAP_GET_BLOCK_OWNER'
>>>     return MULTI_HEAP_GET_BLOCK_OWNER((poison_head_t*)multi_heap_get_block_address_impl(block));
>>>            ^
>>> cc1: some warnings being treated as errors
>>> make[1]: *** [multi_heap_poisoning.o] Error 1
>> 
>> Those three macros are part of the code I provided in the pull
>> request, but in 4c532a59b2503dd8bc60d437bb65961676db9e55 we already
>> updated to a version of expressif/esp-idf master with the pull request
>> included.  It seems that the references in the macros are to other
>> code in ESP-IDF that used to be valid but now may not be.  Perhaps
>> they made a change but didn't test compilation with
>> CONFIG_HEAP_TASK_TRACKING=y?
>> 
>>> I then tried:
>>> 
>>> +CONFIG_HEAP_POISONING_DISABLED=y
>>> +CONFIG_HEAP_POISONING_LIGHT=
>>> -CONFIG_HEAP_TASK_TRACKING=y
>>> 
>>> But no difference. Still crashes.
>> 
>> Do you mean that it then compiles OK, but when you run the stress test
>> it crashes?  If it compiles OK, then my hypothesis that they made a
>> change and didn't test with CONFIG_HEAP_TASK_TRACKING=y might be
>> right.
>> 
>>> Also tried:
>>> 
>>> -CONFIG_FREERTOS_USE_TRACE_FACILITY=y
>>> -CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=
>>> +CONFIG_FREERTOS_USE_TRACE_FACILITY=
>>> 
>>> But no difference. Still crashes.
>>> 
>>> Although not in a simply way:
>>> 
>>> OVMS# module memory
>>> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4141392/4194252
>>> 
>>> OVMS# test realloc
>>> First check heap integrity...
>>> Now allocate 4KB RAM...
>>> Check heap integrity...
>>> Now re-allocate bigger, 1,000 times...
>>> Check heap integrity...
>>> Now re-allocate smaller, 1,000 times...
>>> Check heap integrity...
>>> And free the buffer...
>>> Final check of heap integrity…
>>> 
>>> OVMS# module memory
>>> Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4140892/4194252
>>> 
>>> Maybe something inside mongoose itself? As that is the only place we are seeing this.
>> 
>> You say it still crashes, but you don't show a crash here?
>> 
>>                                                        -- Steve_______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20180426/2b6edc59/attachment.htm>