[Ovmsdev] std::string data corruption (issue #189)
Michael Balzer
dexter at expeedo.de
Wed Jan 30 19:39:56 HKT 2019
Oh my…
Can it be the first 2 MB are taboo for core 1 and the second 2 MB for core 0?
Anyway, we won't need those 2 MB for now, and hopefully this will be fixed by Espressif soon.
So I'd say yes, commit.
Regards,
Michael
Am 30.01.19 um 12:26 schrieb Mark Webb-Johnson:
> I tried changing CONFIG_SPIRAM_SIZE in sdkconfig directly, but it just reset back to the default 4MB.
>
> So changed esp-idf itself, components/esp32/Kconfig, and added a prompt for that config option. I could then change it in menuconfig and the
> change persisted.
>
> I then tested:
>
> I (0) cpu_start: App cpu up.
> I (715) spiram: SPI SRAM memory test OK
> I (716) heap_init: Initializing. RAM available for dynamic allocation:
> I (716) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
> I (723) heap_init: At 3FFBA920 len 000256E0 (149 KiB): DRAM
> I (728) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
> I (735) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
> I (742) heap_init: At 400953CC len 0000AC34 (43 KiB): IRAM
> I (747) cpu_start: Pro cpu start user code
> I (752) spiram: Adding pool of 2048K of external SPI memory to heap allocator
> I (158) cpu_start: Starting scheduler on PRO CPU.
> I (0) cpu_start: Starting scheduler on APP CPU.
> This is ESP32 chip with 2 CPU cores, WiFi/BT/BLE, silicon revision 1, 16MB external flash
> …
> I (9566) test_task: testing mode 1 loopcnt 1000
> E (9576) test_task: loop #1: corruption detected at pos=497 size=498
> 4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
> #1: size=498, data @0x3f901ca0
> 4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
> E (9806) test_task: loop #5: corruption detected at pos=497 size=498
> 4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
> E (10036) test_task: loop #9: corruption detected at pos=829 size=830
> 4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23
>
>
> which looks like a FAIL to me?
>
> So, then I tried something a little funky:
>
> 1. Go back to normal CONFIG_SPIRAM_SIZE 4MB
>
> 2. Add this to main/test_memory_corruption.cpp app_main():
>
> void* big2mb = heap_caps_malloc(1024*1024*2, MALLOC_CAP_SPIRAM);
> printf("Big 2MB spi memory allocation is at %p\n",big2mb);
>
> 3. Rebuild and run...
>
>
> Here is what I get:
>
> I (0) cpu_start: App cpu up.
> I (1151) spiram: SPI SRAM memory test OK
> I (1152) heap_init: Initializing. RAM available for dynamic allocation:
> I (1152) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
> I (1159) heap_init: At 3FFBA920 len 000256E0 (149 KiB): DRAM
> I (1164) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
> I (1171) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
> I (1179) heap_init: At 400953CC len 0000AC34 (43 KiB): IRAM
> I (1184) cpu_start: Pro cpu start user code
> I (1188) spiram: Adding pool of 4096K of external SPI memory to heap allocator
> I (206) cpu_start: Starting scheduler on PRO CPU.
> I (0) cpu_start: Starting scheduler on APP CPU.
> ...
> This is ESP32 chip with 2 CPU cores, WiFi/BT/BLE, silicon revision 1, 16MB external flash
> Big 2MB spi memory allocation is at 0x3f800028
> ...
> I (291) wifi station: ESP_WIFI_MODE_APSTA
> I (291) wifi: wifi driver task: 3ffc25c4, prio:23, stack:3584, core=0
> ...
> I (3281) wifi station: retry to connect to the AP
> I (3281) wifi station: connect to the AP fail
>
> I (6111) wifi station: retry to connect to the AP
> I (6111) wifi station: connect to the AP fail
>
> I (8461) mem_stresser: started on core 0 mem @0x3fa01b98
> I (8461) mem_stresser: started on core 1 mem @0x3fa81bac
> I (8951) wifi station: retry to connect to the AP
> I (8951) wifi station: connect to the AP fail
>
> I (9461) test_task: running on core 1
>
> I (9561) test_task: testing mode 1 loopcnt 1000
> #1: size=1039, data @0x3fb01ca4
> 4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019 15:08:23)4348864OVMS WIFI BLE BT cores=2
> rev=ESP32/1WLAN-214677-52wifi103.2.001-22-g14d16854-dirty/factory/edge (build idf v3.2-dev-1419-g5afc1b0cb-dirty Jan 27 2019
> 15:08:23)4348864OVMS WIFI BLE BT cores=2 rev=ESP32/1
>
> I (11781) wifi station: retry to connect to the AP
> I (11781) wifi station: connect to the AP fail
>
> I (14621) wifi station: retry to connect to the AP
> I (14621) wifi station: connect to the AP fail
>
> I (17451) wifi station: connect to the AP fail
>
>
> I left it running for a few minutes, but no faults shown. Modes 1 through 4 all clean.
>
> So, to take the funkiness to the next level, I added the same to ovms_main, right at the start (actually in the log level setter, as that
> seemed to be the first thing that initialises). Somebody else gets in for 28bytes of SPIRAM (or maybe that is a structure header), but anyway...
>
> I (1742) spiram: SPI SRAM memory test OK
> I (1743) heap_init: Initializing. RAM available for dynamic allocation:
> I (1743) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
> I (1750) heap_init: At 3FFCEDE8 len 00011218 (68 KiB): DRAM
> I (1755) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
> I (1762) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
> I (1768) heap_init: At 4009F530 len 00000AD0 (2 KiB): IRAM
> I (1775) cpu_start: Pro cpu start user code
> I (1779) spiram: Adding pool of 4096K of external SPI memory to heap allocator
> I (125) ovms_main: Set default logging level for * to INFO
> I (126) ovms_main: Pre-allocated 2MB of SPIRAM at 0x3f800028 (workaround ESP32 bug)
> ...
> I (17783) webserver: Launching Web Server
> I (17823) ssh: Launching SSH Server
> OVMS# test string 500 2
> #1: stdlen = 1027
> 1985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
> rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
> 19:17:20)012.61985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
> rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
> 19:17:20)012.61985504OVMS WIFI BLE BT cores=2 rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf
> v3.1-dev-2835-g151269458-dirty Jan 30 2019 19:17:20)012.61985504OVMS WIFI BLE BT cores=2
> rev=ESP32/168HIGHWAYS-29wifi163.2.001-22-g14d16854-dirty/factory/edge (build idf v3.1-dev-2835-g151269458-dirty Jan 30 2019
> 19:17:20)012.61985504
>
> finished
> OVMS#
>
> OVMS# module memory
> Free 8-bit 76160/230212, 32-bit 812/2720, SPIRAM 1886108/4194252
> --Task-- Total DRAM D/IRAM IRAM SPIRAM +/- DRAM D/IRAM IRAM SPIRAM
> main* 8368 0 0 8456 +8368 +0 +0 +8456
> no task* 5376 0 0 0 +5376 +0 +0 +0
> esp_timer 34808 0 6442123100 +34808 +0 +644+2123100
> OVMS Events 36352 41040 0 35120 +36352 +41040 +0 +35120
> OVMS DukTape 188 0 0 34668 +188 +0 +0 +34668
> OVMS CanRx 3124 0 0 26168 +3124 +0 +0 +26168
> ipc0 7776 0 0 0 +7776 +0 +0 +0
> ipc1 12 0 0 0 +12 +0 +0 +0
> Tmr Svc 0 88 0 0 +0 +88 +0 +0
> tiT 232 468 0 664 +232 +468 +0 +664
> wifi 600 8896 0 3324 +600 +8896 +0 +3324
> OVMS Console 0 200 1180 26332 +0 +200 +1180 +26332
> mdns 0 104 0 4 +0 +104 +0 +4
> OVMS NetMan 0 216 0 84 +0 +216 +0 +84
>
>
> I think that wins as the most ugly kludge to date.
>
> Shall I commit?
>
> Regards, Mark.
>
>> On 30 Jan 2019, at 5:00 PM, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>
>>
>>
>>> On 30 Jan 2019, at 4:41 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>
>>> Mark,
>>>
>>> issue #2892 mentions using only the lower 2 MB of SPI RAM as a workaround. Where are our allocations placed by the allocator? Does it fill
>>> from the middle or end? If not we don't use the upper half yet.
>>
>> I saw that, and had a look. I don’t see anything in makeconfig to limit to 2MB (not 4MB).
>>
>> I do see a CONFIG_SPIRAM_SIZE=4194304 in sdkconfig, but not sure how it gets there. Perhaps we can just change there to 2MB and try?
>>
>> I did see this in the documentation:
>>
>> During ESP-IDF startup, external RAM is mapped into the data address space starting at at address 0x3F800000 (byte-accessible). The
>> length of this region is the same as the SPIRAM size (up to the limit of 4MiB).
>>
>>
>> So, maybe we can look at the address our allocations come from to see their offset from 0x3F800000? If it is coming from the top 2MB, then
>> perhaps we start with a big 2MB allocation that we never use?
>>
>>> A possible workaround for the most apparent issues with this (string assembly) could be to use char buffers instead of std::string. I was
>>> thinking about doing that at least for the websocket stream.
>>
>> It seems strange that we are only seeing it with std::string. Maybe that just stresses the system more, or does a lot of reallocations?
>>
>>> But if this really is a hardware issue it can affect all objects in SPI RAM, appending to std::string then only triggers this more often.
>>>
>>> Another workaround could be to run everything on core 0.
>>
>> Urgh.
>>
>>> Regards,
>>> Michael
>>>
>>>
>>> Am 30.01.19 um 03:33 schrieb Mark Webb-Johnson:
>>>> Michael,
>>>>
>>>> Espressif’s response (and linking to that other issue) sounds like others are seeing this.
>>>>
>>>> Lousy timing (with Chinese New Year next week), so don’t expect anything quick from Espressif. I guess we’ll just have to live with it
>>>> until they can find a workaround. It sounds like the issue is at the hardware level and a compiler patch will be needed.
>>>>
>>>> Regards, Mark.
>>>>
>>>>> On 30 Jan 2019, at 3:07 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>
>>>>> → https://github.com/espressif/esp-idf/issues/3006
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 28.01.19 um 20:46 schrieb Michael Balzer:
>>>>>> To clarify: the bug is most likely not restricted to the case of building a message in a buffer. It can possibly cause corruptions in any
>>>>>> RAM section, so can well be responsible for many/most of the unidentified crashes and stack/heap corruptions we're experiencing.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 28.01.19 um 20:43 schrieb Michael Balzer:
>>>>>>> For those not following the github discussion: I'm pretty sure I've nailed the bug down.
>>>>>>>
>>>>>>> I have reproduced the bug in a simple test project and intend to raise an issue with Espressif on this.
>>>>>>>
>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457965435
>>>>>>>
>>>>>>> If you'd like to test this on your module, configure the project to your wifi credentials, then use "make flash". As the test project is
>>>>>>> small, this normally will not erase your OVMS config partition, but a backup is always recommended.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Am 24.01.19 um 21:19 schrieb Michael Balzer:
>>>>>>>> Everyone please have a look at…
>>>>>>>>
>>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457334248
>>>>>>>>
>>>>>>>> Please try to reproduce the bug on your modules.
>>>>>>>>
>>>>>>>> I'm open for explanations.
>>>>>>>>
>>>>>>>> I thought this might be some copy-on-write bug with std::string, but the gcc 5.x libstdc++ does no longer use that implementation
>>>>>>>> (wouldn't be C++11 compliant as well). I also tried moving all strings to temporary buffers, but modes 5 & 6 eliminated this
>>>>>>>> explanation as well.
>>>>>>>>
>>>>>>>> My remaining theories:
>>>>>>>>
>>>>>>>> * A task writing out of bounds (but only 0-bytes?)
>>>>>>>> * A hardware issue only affecting some modules
>>>>>>>>
>>>>>>>> A hardware issue only affecting some percentage of ESP32 could explain this as well as the strange heap corruptions that seem to affect
>>>>>>>> some modules especially often.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190130/0c3a073a/attachment.htm>
More information about the OvmsDev
mailing list