[Ovmsdev] std::string data corruption (issue #189)

Michael Balzer dexter at expeedo.de
Wed Jan 30 19:13:27 HKT 2019


My test log shows the buffer address 0x3f802044, so allocations are done from the beginning and we're not touching the upper 2 MB.

Maybe the 2 MB workaround doesn't apply in our case, or has a correlation with the core assignment.

Reallocations as a source can be ruled out, as I reserve enough capacity on the std::string.

> Urgh. 

Indeed.


Am 30.01.19 um 10:00 schrieb Mark Webb-Johnson:
>> On 30 Jan 2019, at 4:41 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>
>> Mark,
>>
>> issue #2892 mentions using only the lower 2 MB of SPI RAM as a workaround. Where are our allocations placed by the allocator? Does it fill
>> from the middle or end? If not we don't use the upper half yet.
>
> I saw that, and had a look. I don’t see anything in makeconfig to limit to 2MB (not 4MB).
>
> I do see a CONFIG_SPIRAM_SIZE=4194304 in sdkconfig, but not sure how it gets there. Perhaps we can just change there to 2MB and try?
>
> I did see this in the documentation:
>
>     During ESP-IDF startup, external RAM is mapped into the data address space starting at at address 0x3F800000 (byte-accessible). The length
>     of this region is the same as the SPIRAM size (up to the limit of 4MiB).
>
>
> So, maybe we can look at the address our allocations come from to see their offset from 0x3F800000? If it is coming from the top 2MB, then
> perhaps we start with a big 2MB allocation that we never use?
>  
>> A possible workaround for the most apparent issues with this (string assembly) could be to use char buffers instead of std::string. I was
>> thinking about doing that at least for the websocket stream.
>
> It seems strange that we are only seeing it with std::string. Maybe that just stresses the system more, or does a lot of reallocations?
>
>> But if this really is a hardware issue it can affect all objects in SPI RAM, appending to std::string then only triggers this more often.
>>
>> Another workaround could be to run everything on core 0.
>
> Urgh.
>
>> Regards,
>> Michael
>>
>>
>> Am 30.01.19 um 03:33 schrieb Mark Webb-Johnson:
>>> Michael,
>>>
>>> Espressif’s response (and linking to that other issue) sounds like others are seeing this.
>>>
>>> Lousy timing (with Chinese New Year next week), so don’t expect anything quick from Espressif. I guess we’ll just have to live with it until
>>> they can find a workaround. It sounds like the issue is at the hardware level and a compiler patch will be needed.
>>>
>>> Regards, Mark.
>>>
>>>> On 30 Jan 2019, at 3:07 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>
>>>>https://github.com/espressif/esp-idf/issues/3006
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 28.01.19 um 20:46 schrieb Michael Balzer:
>>>>> To clarify: the bug is most likely not restricted to the case of building a message in a buffer. It can possibly cause corruptions in any
>>>>> RAM section, so can well be responsible for many/most of the unidentified crashes and stack/heap corruptions we're experiencing.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 28.01.19 um 20:43 schrieb Michael Balzer:
>>>>>> For those not following the github discussion: I'm pretty sure I've nailed the bug down.
>>>>>>
>>>>>> I have reproduced the bug in a simple test project and intend to raise an issue with Espressif on this.
>>>>>>
>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457965435
>>>>>>
>>>>>> If you'd like to test this on your module, configure the project to your wifi credentials, then use "make flash". As the test project is
>>>>>> small, this normally will not erase your OVMS config partition, but a backup is always recommended.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 24.01.19 um 21:19 schrieb Michael Balzer:
>>>>>>> Everyone please have a look at…
>>>>>>>
>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/189#issuecomment-457334248
>>>>>>>
>>>>>>> Please try to reproduce the bug on your modules.
>>>>>>>
>>>>>>> I'm open for explanations.
>>>>>>>
>>>>>>> I thought this might be some copy-on-write bug with std::string, but the gcc 5.x libstdc++ does no longer use that implementation
>>>>>>> (wouldn't be C++11 compliant as well). I also tried moving all strings to temporary buffers, but modes 5 & 6 eliminated this explanation
>>>>>>> as well.
>>>>>>>
>>>>>>> My remaining theories:
>>>>>>>
>>>>>>>   * A task writing out of bounds (but only 0-bytes?)
>>>>>>>   * A hardware issue only affecting some modules
>>>>>>>
>>>>>>> A hardware issue only affecting some percentage of ESP32 could explain this as well as the strange heap corruptions that seem to affect
>>>>>>> some modules especially often.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Michael

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190130/c457b2e4/attachment.html>


More information about the OvmsDev mailing list