The code I’ve just committed re-factors the can dump system, and retools, to always consume all the data from MG_EV_RECV. That reduces memory consumption, and works around the realloc issue. It seems to work in my limited testing here, and performance is acceptable. I can transmit a 1MB CRTD file in about 10 seconds. About 25,000 CAN bus messages. The CRTB format (binary) will require less encoding/decoding and should be faster still. Regards, Mark.
On 26 Apr 2018, at 5:03 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
But why are you seeing this crash now when we have not seen it before?
I haven’t really done much with that code path in the past. It is a high volume of data being uploaded to the module, and due to the way my code was handling it (badly), it resulted in a lot of realloc’s.
For CRTD format, I read line-by-line. The issue is that the mongoose buffer presented may not have a complete line in it.
Mongoose was sending me a MG_EV_RECV with about 1KB in the buffer. I removed about 30 bytes (one line), and returned. That meant mongoose realloc’d down to 1KB - 30 bytes, then called me again. Rinse and repeat.
I first noticed this because SPI RAM memory consumption was going through the roof during retools canbus dump simulations. I was handling the incoming messages so slowly, that they were coming in faster than they were handled. Lesson learned: when mongoose tells you it has data, consume it otherwise mongoose will keep receiving (a sensible framework would back off the receiving if the consumer was not fast enough).
My change now is to consume all I can on the MG_EV_RECV (in a while loop), but if there is half a line left over that gets left in the mongoose buffer and that causes a realloc.
To get around this, I’ll have to do my own line buffering (so I can completely drain the mongoose buffer on MG_EV_RECV). A hassle, but I’ll try to get it done tonight. That will resolve my problem, but I don’t think it addresses the core mongoose heap corruption issue.
My code is in retools.cpp re::MongooseHandler MG_EV_RECV part.
You say it still crashes, but you don't show a crash here?
I can get it to crash in retools. Repeatedly. Easy and every time.
Start a wifi Start a vehicle mode, on can1 (maybe not necessary, but I do it anyway) OVMS# re start OVMS# re serve mode simulate $ cat my-canbus-dump.crtd | nc TESTBOX.local 3000 Sit back and watch the crash
If you don’t have a CRTD file, just use this snippet:
593.138 1R11 7E8 00 00 00 00 00 00 00 00 593.138 1R11 202 4D 5F 07 9D C4 09 3F 23 593.138 1R11 20A 00 01 01 83 D1 3E 00 00 593.138 1R11 228 40 C0 30 F8 593.138 1R11 6F2 1F 18 08 F9 A1 80 08 1F 593.138 1R11 212 D8 09 12 71 80
Make 1,000 copies of it, appending them all to one .crtd file. That can be your test data.
So, I then wrote a ‘test realloc’ in test_framework.cpp to see if I could repeat it. I couldn’t. Which leads me to suspect the issue is in mongoose (or with the way I am only consuming part of the mongoose buffer).
Regards, Mark.
On 26 Apr 2018, at 2:16 PM, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote:
On Thu, 26 Apr 2018, Mark Webb-Johnson wrote:
I’m seeing this also, when I stress-test by sending in a large canbus dump into retools:
CORRUPT HEAP: Bad head at 0x3f80e3c0. Expected 0xabba1234 got 0x3f80eb00 assertion "head != NULL" failed: file "/Users/mark/esp/esp-idf/components/heap/./multi_heap_poisoning.c", line 229, function: multi_heap_realloc
Indeed, this is the same error signature as what I was seeing, but the reason I saw it was because of a mistake in my code. I had not included the "if (size == 0)" test in ExternalRamRealloc(). Since mongoose calls MG_REALLOC with size == 0, that caused heap_caps_realloc() to do heap_caps_free() and return NULL, so my code called realloc() on that same ptr that had just been freed.
But before my commit this morning I fixed that bug by adding my own test for size==0 so I could do my own heap_caps_free() and not call calling heap_caps_realloc().
So, as I stare at the code now, I don't see anything yet. Also, your crash occurred when MG_REALLOC was called with nonzero size.
That code path does a lot of realloc’s.
If I switch back to use normal malloc/calloc/realloc, it still crashes.
If I disable the malloc #defined in mongoose mg_locals.h, it still crashes.
The only way I can avoid this is to consume ALL the data on the MG_EV_RECV event, which means realloc is presumably not called (so often?) within mongoose.
I assumed the conclusion is that fundamentally realloc is broken in the ESP IDF version we use.
So, by switching back to the old code you've also verified that it was not my new code. But why are you seeing this crash now when we have not seen it before? Is there any way that my moving ExternalRamMalloc() to a C namespace could cause trouble in some allocation outside of mongoose leaving a bad heap for the mongoose call to trip over?
I tried updating our ESP-IDF to espressif/master, but get compile errors in build/heap:
CC build/heap/multi_heap_poisoning.o In file included from /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:27:0: /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:66:32: error: unknown type name 'TaskHandle_t' #define MULTI_HEAP_BLOCK_OWNER TaskHandle_t task; ^ /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:50:5: note: in expansion of macro 'MULTI_HEAP_BLOCK_OWNER' MULTI_HEAP_BLOCK_OWNER ^ /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'poison_allocated_region': /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:67:57: error: implicit declaration of function 'xTaskGetCurrentTaskHandle' [-Werror=implicit-function-declaration] #define MULTI_HEAP_SET_BLOCK_OWNER(HEAD) (HEAD)->task = xTaskGetCurrentTaskHandle() ^ /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:71:5: note: in expansion of macro 'MULTI_HEAP_SET_BLOCK_OWNER' MULTI_HEAP_SET_BLOCK_OWNER(head); ^ /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'multi_heap_get_block_owner': /Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:68:42: warning: return makes pointer from integer without a cast [-Wint-conversion] #define MULTI_HEAP_GET_BLOCK_OWNER(HEAD) ((HEAD)->task) ^ /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:284:12: note: in expansion of macro 'MULTI_HEAP_GET_BLOCK_OWNER' return MULTI_HEAP_GET_BLOCK_OWNER((poison_head_t*)multi_heap_get_block_address_impl(block)); ^ cc1: some warnings being treated as errors make[1]: *** [multi_heap_poisoning.o] Error 1
Those three macros are part of the code I provided in the pull request, but in 4c532a59b2503dd8bc60d437bb65961676db9e55 we already updated to a version of expressif/esp-idf master with the pull request included. It seems that the references in the macros are to other code in ESP-IDF that used to be valid but now may not be. Perhaps they made a change but didn't test compilation with CONFIG_HEAP_TASK_TRACKING=y?
I then tried:
+CONFIG_HEAP_POISONING_DISABLED=y +CONFIG_HEAP_POISONING_LIGHT= -CONFIG_HEAP_TASK_TRACKING=y
But no difference. Still crashes.
Do you mean that it then compiles OK, but when you run the stress test it crashes? If it compiles OK, then my hypothesis that they made a change and didn't test with CONFIG_HEAP_TASK_TRACKING=y might be right.
Also tried:
-CONFIG_FREERTOS_USE_TRACE_FACILITY=y -CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS= +CONFIG_FREERTOS_USE_TRACE_FACILITY=
But no difference. Still crashes.
Although not in a simply way:
OVMS# module memory Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4141392/4194252
OVMS# test realloc First check heap integrity... Now allocate 4KB RAM... Check heap integrity... Now re-allocate bigger, 1,000 times... Check heap integrity... Now re-allocate smaller, 1,000 times... Check heap integrity... And free the buffer... Final check of heap integrity…
OVMS# module memory Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4140892/4194252
Maybe something inside mongoose itself? As that is the only place we are seeing this.
You say it still crashes, but you don't show a crash here?
-- Steve_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev