[Ovmsdev] std::string data corruption (issue #189)

Michael Balzer dexter at expeedo.de
Wed Jan 30 19:39:56 HKT 2019


Oh my…

Can it be the first 2 MB are taboo for core 1 and the second 2 MB for core 0?

Anyway, we won't need those 2 MB for now, and hopefully this will be fixed by Espressif soon.

So I'd say yes, commit.

Regards,
Michael


Am 30.01.19 um 12:26 schrieb Mark Webb-Johnson:
> I tried changing CONFIG_SPIRAM_SIZE in sdkconfig directly, but it just reset back to the default 4MB.
>
> So changed esp-idf itself, components/esp32/Kconfig, and added a prompt for that config option. I could then change it in menuconfig and the
> change persisted.
>
> I then tested:
>
>     I (0) cpu_start: App cpu up.
>     I (715) spiram: SPI SRAM memory test OK
>     I (716) heap_init: Initializing. RAM available for dynamic allocation:
>     I (716) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
>     I (723) heap_init: At 3FFBA920 len 000256E0 (149 KiB): DRAM
>     I (728) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
>     I (735) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
>     I (742) heap_init: At 400953CC len 0000AC34 (43 KiB): IRAM
>     I (747) cpu_start: Pro cpu start user code
>     I (752) spiram: Adding pool of 2048K of external SPI memory to heap allocator
>     I (158) cpu_start: Starting scheduler on PRO CPU.
>     I (0) cpu_start: Starting scheduler on APP CPU.
>     This is ESP32 chip with 2 CPU cores, WiFi/BT/BLE, silicon revision 1, 16MB external flash
>>     I (9566) test_task: testing mode 1 loopcnt 1000
>     E (9576) test_task: loop #1: corruption detected at pos=497 size=498
>     4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
>     #1: size=498, data @0x3f901ca0
>     4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
>     E (9806) test_task: loop #5: corruption detected at pos=497 size=498
>     4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
>     E (10036) test_task: loop #9: corruption detected at pos=829 size=830
>     4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
>
> which looks like a FAIL to me?
>
> So, then I tried something a little funky:
>
>  1. Go back to normal CONFIG_SPIRAM_SIZE 4MB
>
>  2. Add this to main/test_memory_corruption.cpp app_main():
>
>     void* big2mb = heap_caps_malloc(1024*1024*2, MALLOC_CAP_SPIRAM);
>     printf("Big 2MB spi memory allocation is at %p\n",big2mb);
>
>  3. Rebuild and run...
>
>
> Here is what I get:
>
>     I (0) cpu_start: App cpu up.
>     I (1151) spiram: SPI SRAM memory test OK
>     I (1152) heap_init: Initializing. RAM available for dynamic allocation:
>     I (1152) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
>     I (1159) heap_init: At 3FFBA920 len 000256E0 (149 KiB): DRAM
>     I (1164) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
>     I (1171) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
>     I (1179) heap_init: At 400953CC len 0000AC34 (43 KiB): IRAM
>     I (1184) cpu_start: Pro cpu start user code
>     I (1188) spiram: Adding pool of 4096K of external SPI memory to heap allocator
>     I (206) cpu_start: Starting scheduler on PRO CPU.
>     I (0) cpu_start: Starting scheduler on APP CPU.
>     ...
>     This is ESP32 chip with 2 CPU cores, WiFi/BT/BLE, silicon revision 1, 16MB external flash
>     Big 2MB spi memory allocation is at 0x3f800028
>     ...
>     I (291) wifi station: ESP_WIFI_MODE_APSTA
>     I (291) wifi: wifi driver task: 3ffc25c4, prio:23, stack:3584, core=0
>     ...
>     I (3281) wifi station: retry to connect to the AP
>     I (3281) wifi station: connect to the AP fail
>
>     I (6111) wifi station: retry to connect to the AP
>     I (6111) wifi station: connect to the AP fail
>
>     I (8461) mem_stresser: started on core 0 mem @0x3fa01b98
>     I (8461) mem_stresser: started on core 1 mem @0x3fa81bac
>     I (8951) wifi station: retry to connect to the AP
>     I (8951) wifi station: connect to the AP fail
>
>     I (9461) test_task: running on core 1
>
>     I (9561) test_task: testing mode 1 loopcnt 1000
>     #1: size=1039, data @0x3fb01ca4
>     4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
>     rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
>     15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1
>
>     I (11781) wifi station: retry to connect to the AP
>     I (11781) wifi station: connect to the AP fail
>
>     I (14621) wifi station: retry to connect to the AP
>     I (14621) wifi station: connect to the AP fail
>
>     I (17451) wifi station: connect to the AP fail
>
>
> I left it running for a few minutes, but no faults shown. Modes 1 through 4 all clean.
>
> So, to take the funkiness to the next level, I added the same to ovms_main, right at the start (actually in the log level setter, as that
> seemed to be the first thing that initialises). Somebody else gets in for 28bytes of SPIRAM (or maybe that is a structure header), but anyway...
>
>     I (1742) spiram: SPI SRAM memory test OK
>     I (1743) heap_init: Initializing. RAM available for dynamic allocation:
>     I (1743) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
>     I (1750) heap_init: At 3FFCEDE8 len 00011218 (68 KiB): DRAM
>     I (1755) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
>     I (1762) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
>     I (1768) heap_init: At 4009F530 len 00000AD0 (2 KiB): IRAM
>     I (1775) cpu_start: Pro cpu start user code
>     I (1779) spiram: Adding pool of 4096K of external SPI memory to heap allocator
>     I (125) ovms_main: Set default logging level for * to INFO
>     I (126) ovms_main: Pre-allocated 2MB of SPIRAM at 0x3f800028 (workaround ESP32 bug)
>     ...
>     I (17783) webserver: Launching Web Server
>     I (17823) ssh: Launching SSH Server
>     OVMS# test string 500 2
>     #1: stdlen = 1027
>     1985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
>     rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
>     19:17:20)012.61985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
>     rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
>     19:17:20)012.61985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
>     v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
>     rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
>     19:17:20)012.61985504
>
>     finished
>     OVMS#
>
>     OVMS# module memory
>     Free 8-bit 76160/230212, 32-bit 812/2720, SPIRAM 1886108/4194252
>     --Task--     Total DRAM D/IRAM   IRAM SPIRAM   +/- DRAM D/IRAM   IRAM SPIRAM
>     main*              8368      0      0   8456      +8368     +0     +0  +8456
>     no task*           5376      0      0      0      +5376     +0     +0     +0
>     esp_timer         34808      0    6442123100     +34808     +0   +644+2123100
>     OVMS Events       36352  41040      0  35120     +36352 +41040     +0 +35120
>     OVMS DukTape        188      0      0  34668       +188     +0     +0 +34668
>     OVMS CanRx         3124      0      0  26168      +3124     +0     +0 +26168
>     ipc0               7776      0      0      0      +7776     +0     +0     +0
>     ipc1                 12      0      0      0        +12     +0     +0     +0
>     Tmr Svc               0     88      0      0         +0    +88     +0     +0
>     tiT                 232    468      0    664       +232   +468     +0   +664
>     wifi                600   8896      0   3324       +600  +8896     +0  +3324
>     OVMS Console          0    200   1180  26332         +0   +200  +1180 +26332
>     mdns                  0    104      0      4         +0   +104     +0     +4
>     OVMS NetMan           0    216      0     84         +0   +216     +0    +84
>
>
> I think that wins as the most ugly kludge to date.
>
> Shall I commit?
>
> Regards, Mark.
>
>> On 30 Jan 2019, at 5:00 PM, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>
>>
>>
>>> On 30 Jan 2019, at 4:41 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>
>>> Mark,
>>>
>>> issue #2892 mentions using only the lower 2 MB of SPI RAM as a workaround. Where are our allocations placed by the allocator? Does it fill
>>> from the middle or end? If not we don't use the upper half yet.
>>
>> I saw that, and had a look. I don’t see anything in makeconfig to limit to 2MB (not 4MB).
>>
>> I do see a CONFIG_SPIRAM_SIZE=4194304 in sdkconfig, but not sure how it gets there. Perhaps we can just change there to 2MB and try?
>>
>> I did see this in the documentation:
>>
>>     During ESP-IDF startup, external RAM is mapped into the data address space starting at at address 0x3F800000 (byte-accessible). The
>>     length of this region is the same as the SPIRAM size (up to the limit of 4MiB).
>>
>>
>> So, maybe we can look at the address our allocations come from to see their offset from 0x3F800000? If it is coming from the top 2MB, then
>> perhaps we start with a big 2MB allocation that we never use?
>>  
>>> A possible workaround for the most apparent issues with this (string assembly) could be to use char buffers instead of std::string. I was
>>> thinking about doing that at least for the websocket stream.
>>
>> It seems strange that we are only seeing it with std::string. Maybe that just stresses the system more, or does a lot of reallocations?
>>
>>> But if this really is a hardware issue it can affect all objects in SPI RAM, appending to std::string then only triggers this more often.
>>>
>>> Another workaround could be to run everything on core 0.
>>
>> Urgh.
>>
>>> Regards,
>>> Michael
>>>
>>>
>>> Am 30.01.19 um 03:33 schrieb Mark Webb-Johnson:
>>>> Michael,
>>>>
>>>> Espressif’s response (and linking to that other issue) sounds like others are seeing this.
>>>>
>>>> Lousy timing (with Chinese New Year next week), so don’t expect anything quick from Espressif. I guess we’ll just have to live with it
>>>> until they can find a workaround. It sounds like the issue is at the hardware level and a compiler patch will be needed.
>>>>
>>>> Regards, Mark.
>>>>
>>>>> On 30 Jan 2019, at 3:07 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>
>>>>>https://github.com/espressif/esp-idf/issues/3006
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 28.01.19 um 20:46 schrieb Michael Balzer:
>>>>>> To clarify: the bug is most likely not restricted to the case of building a message in a buffer. It can possibly cause corruptions in any
>>>>>> RAM section, so can well be responsible for many/most of the unidentified crashes and stack/heap corruptions we're experiencing.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 28.01.19 um 20:43 schrieb Michael Balzer:
>>>>>>> For those not following the github discussion: I'm pretty sure I've nailed the bug down.
>>>>>>>
>>>>>>> I have reproduced the bug in a simple test project and intend to raise an issue with Espressif on this.
>>>>>>>
>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457965435
>>>>>>>
>>>>>>> If you'd like to test this on your module, configure the project to your wifi credentials, then use "make flash". As the test project is
>>>>>>> small, this normally will not erase your OVMS config partition, but a backup is always recommended.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Am 24.01.19 um 21:19 schrieb Michael Balzer:
>>>>>>>> Everyone please have a look at…
>>>>>>>>
>>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457334248
>>>>>>>>
>>>>>>>> Please try to reproduce the bug on your modules.
>>>>>>>>
>>>>>>>> I'm open for explanations.
>>>>>>>>
>>>>>>>> I thought this might be some copy-on-write bug with std::string, but the gcc 5.x libstdc++ does no longer use that implementation
>>>>>>>> (wouldn't be C++11 compliant as well). I also tried moving all strings to temporary buffers, but modes 5 & 6 eliminated this
>>>>>>>> explanation as well.
>>>>>>>>
>>>>>>>> My remaining theories:
>>>>>>>>
>>>>>>>>   * A task writing out of bounds (but only 0-bytes?)
>>>>>>>>   * A hardware issue only affecting some modules
>>>>>>>>
>>>>>>>> A hardware issue only affecting some percentage of ESP32 could explain this as well as the strange heap corruptions that seem to affect
>>>>>>>> some modules especially often.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190130/0c3a073a/attachment.htm>


More information about the OvmsDev mailing list