But why are you seeing this crash now when we have
not seen it before? 

I haven’t really done much with that code path in the past. It is a high volume of data being uploaded to the module, and due to the way my code was handling it (badly), it resulted in a lot of realloc’s.

For CRTD format, I read line-by-line. The issue is that the mongoose buffer presented may not have a complete line in it.


I first noticed this because SPI RAM memory consumption was going through the roof during retools canbus dump simulations. I was handling the incoming messages so slowly, that they were coming in faster than they were handled. Lesson learned: when mongoose tells you it has data, consume it otherwise mongoose will keep receiving (a sensible framework would back off the receiving if the consumer was not fast enough).

My change now is to consume all I can on the MG_EV_RECV (in a while loop), but if there is half a line left over that gets left in the mongoose buffer and that causes a realloc.

To get around this, I’ll have to do my own line buffering (so I can completely drain the mongoose buffer on MG_EV_RECV). A hassle, but I’ll try to get it done tonight. That will resolve my problem, but I don’t think it addresses the core mongoose heap corruption issue.

My code is in retools.cpp re::MongooseHandler MG_EV_RECV part.

You say it still crashes, but you don't show a crash here?

I can get it to crash in retools. Repeatedly. Easy and every time.

  1. Start a wifi
  2. Start a vehicle mode, on can1 (maybe not necessary, but I do it anyway)
  3. OVMS# re start
  4. OVMS# re serve mode simulate
  5. $ cat my-canbus-dump.crtd | nc TESTBOX.local 3000
  6. Sit back and watch the crash

If you don’t have a CRTD file, just use this snippet:

593.138 1R11 7E8 00 00 00 00 00 00 00 00
593.138 1R11 202 4D 5F 07 9D C4 09 3F 23
593.138 1R11 20A 00 01 01 83 D1 3E 00 00
593.138 1R11 228 40 C0 30 F8
593.138 1R11 6F2 1F 18 08 F9 A1 80 08 1F
593.138 1R11 212 D8 09 12 71 80

Make 1,000 copies of it, appending them all to one .crtd file. That can be your test data.

So, I then wrote a ‘test realloc’ in test_framework.cpp to see if I could repeat it. I couldn’t. Which leads me to suspect the issue is in mongoose (or with the way I am only consuming part of the mongoose buffer).

Regards, Mark.

On 26 Apr 2018, at 2:16 PM, Stephen Casner <casner@acm.org> wrote:

On Thu, 26 Apr 2018, Mark Webb-Johnson wrote:

I’m seeing this also, when I stress-test by sending in a large
canbus dump into retools:

CORRUPT HEAP: Bad head at 0x3f80e3c0. Expected 0xabba1234 got 0x3f80eb00
assertion "head != NULL" failed: file "/Users/mark/esp/esp-idf/components/heap/./multi_heap_poisoning.c", line 229, function: multi_heap_realloc

Indeed, this is the same error signature as what I was seeing, but the
reason I saw it was because of a mistake in my code.  I had not
included the "if (size == 0)" test in ExternalRamRealloc().  Since
mongoose calls MG_REALLOC with size == 0, that caused
heap_caps_realloc() to do heap_caps_free() and return NULL, so my code
called realloc() on that same ptr that had just been freed.

But before my commit this morning I fixed that bug by adding my own test for
size==0 so I could do my own heap_caps_free() and not call calling
heap_caps_realloc().

So, as I stare at the code now, I don't see anything yet.  Also, your
crash occurred when MG_REALLOC was called with nonzero size.

That code path does a lot of realloc’s.

If I switch back to use normal malloc/calloc/realloc, it still crashes.

If I disable the malloc #defined in mongoose mg_locals.h, it still crashes.

The only way I can avoid this is to consume ALL the data on the
MG_EV_RECV event, which means realloc is presumably not called (so
often?) within mongoose.

I assumed the conclusion is that fundamentally realloc is broken in
the ESP IDF version we use.

So, by switching back to the old code you've also verified that it was
not my new code.  But why are you seeing this crash now when we have
not seen it before?  Is there any way that my moving
ExternalRamMalloc() to a C namespace could cause trouble in some
allocation outside of mongoose leaving a bad heap for the mongoose
call to trip over?

I tried updating our ESP-IDF to espressif/master, but get compile
errors in build/heap:

CC build/heap/multi_heap_poisoning.o
In file included from /Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:27:0:
/Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:66:32: error: unknown type name 'TaskHandle_t'
#define MULTI_HEAP_BLOCK_OWNER TaskHandle_t task;
                               ^
/Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:50:5: note: in expansion of macro 'MULTI_HEAP_BLOCK_OWNER'
    MULTI_HEAP_BLOCK_OWNER
    ^
/Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'poison_allocated_region':
/Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:67:57: error: implicit declaration of function 'xTaskGetCurrentTaskHandle' [-Werror=implicit-function-declaration]
#define MULTI_HEAP_SET_BLOCK_OWNER(HEAD) (HEAD)->task = xTaskGetCurrentTaskHandle()
                                                        ^
/Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:71:5: note: in expansion of macro 'MULTI_HEAP_SET_BLOCK_OWNER'
    MULTI_HEAP_SET_BLOCK_OWNER(head);
    ^
/Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c: In function 'multi_heap_get_block_owner':
/Users/mark/esp/esp-idf/components/heap/multi_heap_platform.h:68:42: warning: return makes pointer from integer without a cast [-Wint-conversion]
#define MULTI_HEAP_GET_BLOCK_OWNER(HEAD) ((HEAD)->task)
                                         ^
/Users/mark/esp/esp-idf/components/heap/multi_heap_poisoning.c:284:12: note: in expansion of macro 'MULTI_HEAP_GET_BLOCK_OWNER'
    return MULTI_HEAP_GET_BLOCK_OWNER((poison_head_t*)multi_heap_get_block_address_impl(block));
           ^
cc1: some warnings being treated as errors
make[1]: *** [multi_heap_poisoning.o] Error 1

Those three macros are part of the code I provided in the pull
request, but in 4c532a59b2503dd8bc60d437bb65961676db9e55 we already
updated to a version of expressif/esp-idf master with the pull request
included.  It seems that the references in the macros are to other
code in ESP-IDF that used to be valid but now may not be.  Perhaps
they made a change but didn't test compilation with
CONFIG_HEAP_TASK_TRACKING=y?

I then tried:

+CONFIG_HEAP_POISONING_DISABLED=y
+CONFIG_HEAP_POISONING_LIGHT=
-CONFIG_HEAP_TASK_TRACKING=y

But no difference. Still crashes.

Do you mean that it then compiles OK, but when you run the stress test
it crashes?  If it compiles OK, then my hypothesis that they made a
change and didn't test with CONFIG_HEAP_TASK_TRACKING=y might be
right.

Also tried:

-CONFIG_FREERTOS_USE_TRACE_FACILITY=y
-CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=
+CONFIG_FREERTOS_USE_TRACE_FACILITY=

But no difference. Still crashes.

Although not in a simply way:

OVMS# module memory
Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4141392/4194252

OVMS# test realloc
First check heap integrity...
Now allocate 4KB RAM...
Check heap integrity...
Now re-allocate bigger, 1,000 times...
Check heap integrity...
Now re-allocate smaller, 1,000 times...
Check heap integrity...
And free the buffer...
Final check of heap integrity…

OVMS# module memory
Free 8-bit 119432/281952, 32-bit 8020/24276, SPIRAM 4140892/4194252

Maybe something inside mongoose itself? As that is the only place we are seeing this.

You say it still crashes, but you don't show a crash here?

                                                       -- Steve_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev