[Ovmsdev] Attention: esp-idf upgrade still breaks configuration

Mark Webb-Johnson mark at webb-johnson.net
Tue Nov 27 09:02:47 HKT 2018


I think unmount should be on the system.shutdown event. That would be after system.shuttingdown and the MyBoot.RestartPending/RestartReady control steps.

That means if a component really wants to write to config (or elsewhere on /store), it should do it during system.shuttingdown event (and optionally do the MyBoot.RestartPending(TAG) and MyBoot.RestartReady(TAG) thing, like simcom).

As you say, that doesn’t address the crash-during-shuttingdown issue, but can be done in a very component manner. No changes to MyBoot - all done in ovms_config. Or just hard-code the /store unmount before esp_restart().

Regards, Mark.

> On 27 Nov 2018, at 12:47 AM, Michael Balzer <dexter at expeedo.de> wrote:
> 
> Dimitris question about unclean shutdowns let me check our reboot code, and we actually do not take care of unmounting the config store.
> 
> While my crash observations do not look like crash/clean makes a difference to this bug, I still think we should do that.
> 
> The simple solution would be to do this last, in Boot::boot_shutdown_done() just before the esp_restart(), but that does not take care of crashes during shutdown, which still are too frequent.
> 
> How about unmounting /store ASAP after the shuttingdown event (i.e. normal shutdown handling)?
> Writing to the config would then have no effect during shutdown, but do we actually need (want) that to be possible?
> 
> Regards,
> Michael
> 
> 
> Am 25.11.18 um 03:12 schrieb Mark Webb-Johnson:
>> I second this. A fantastic effort, Michael.
>> 
>> The log you provided on the GitHub issue looks really helpful, and I see Dimitri has replied:
>> 
>> dmitry1945 <https://github.com/dmitry1945> commented 6 hours ago <https://github.com/espressif/esp-idf#issuecomment-441391665>
>> Thank you @dexterbg <https://github.com/dexterbg>, the place is clear.
>> 
>> Hopefully Dimitri can find the issue from here. So many little bugs in wifi and bluetooth stacks causing us random issues - it would be helpful to be able to update to the latest IDF.
>> 
>> Thanks, and Regards,
>> Mark.
>> 
>>> On 25 Nov 2018, at 3:22 AM, Stephen Casner <casner at acm.org <mailto:casner at acm.org>> wrote:
>>> 
>>> Michael -- Kudos for this Herculean effort -- Steve
>>> 
>>> On Sat, 24 Nov 2018, Michael Balzer wrote:
>>> 
>>>> Narrowing this down is a real PITA. The effect sometimes stops to occur, I then need to leave the module powered down for a while to get the effect back.
>>>> Sometimes 1 hour is sufficient, currently it's been off for 2 hours and still works. I had a test window yesterday evening, one this morning and one in the
>>>> afternoon. Temperature is most probably irrelevant, as the last power downs were outside in ~4-5 °C to rule this out.
>>>> 
>>>> So as a "passed" can always be a false positive I need to validate every passed step by switching back to a failing version and see if that still fails.
>>>> 
>>>> It seems I now have to wait until tomorrow for the next test window, but I have bisected down to this range of just 8 commits:
>>>> 
>>>> balzer at leela:~/esp/esp-idf> git rev-list 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8 ^9d2f7c60d9aef9860c61c2756318ada68c80fddf
>>>> 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
>>>> f392727abf7d56490c2f33127a59bfac42c937e0
>>>> e834d6fffc23a6fcfc0d2e871c9235417a7fb48f
>>>> 35842d02abb5f574aaab466d46081a232fdd20a6
>>>> f05f3fbde87a9ce45c6818f71b49cd13888fd457
>>>> a6d6c58ecadb9759a0bacf35cd7332ac641e598d
>>>> 321b1e02052de95db60ddce87eecce5f9e04e9b8
>>>> 40486c872345584d34949b3ce83f9e956a7eea13
>>>> 
>>>> ...with 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8 being the last identified bad commit, and 9d2f7c60d9aef9860c61c2756318ada68c80fddf being the last good.
>>>> 
>>>> If I should guess now, it's probably one of Dmitry's commits on the wear leveling code.
>>>> 
>>>> Regards,
>>>> Michael
>>>> 
>>>> 
>>>> Am 23.11.18 um 17:20 schrieb Michael Balzer:
>>>>> It's not a timing issue, I've let it reboot about 30 times without any successful mount after the first failure.
>>>>> 
>>>>> Going into bisecting now...
>>>>> 
>>>>> 
>>>>> Am 23.11.18 um 15:54 schrieb Mark Webb-Johnson:
>>>>>>> It may actually not be a corruption of the filesystem but some timing issue on the mount procedure. To test that we could disable the auto formatting on
>>>>>>> mount failures.
>>>>>> 
>>>>>> True.
>>>>>> 
>>>>>> A couple of Espressif guys have jumped on the issue, and I have provided some more information for them. I think key will be reproducing it.
>>>>>> 
>>>>>>> The issue may also be dependant on the hardware version, i.e. it could be caused by the bug that caused the SD speed issue on the first 3.1 batch.
>>>>>> 
>>>>>> That was definitely a hardware issue with the CP2102 chip. I don't think related to ESP in any way.
>>>>>> 
>>>>>> Regards, Mark.
>>>>>> 
>>>>>>> On 23 Nov 2018, at 10:34 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de><mailto:dexter at expeedo.de <mailto:dexter at expeedo.de>>> wrote:
>>>>>>> 
>>>>>>> It may actually not be a corruption of the filesystem but some timing issue on the mount procedure. To test that we could disable the auto formatting on
>>>>>>> mount failures.
>>>>>>> 
>>>>>>> The issue may also be dependant on the hardware version, i.e. it could be caused by the bug that caused the SD speed issue on the first 3.1 batch.
>>>>>>> 
>>>>>>> I only have tried the idf update on my batch 1 module (my bench / development module). I think most of our edge testers also have that version.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Michael
>>>>>>> 
>>>>>>> 
>>>>>>> Am 23.11.18 um 02:32 schrieb Mark Webb-Johnson:
>>>>>>>> I have raised the following github issue to Espressif:
>>>>>>>> 
>>>>>>>>    https://github.com/espressif/esp-idf/issues/2730 <https://github.com/espressif/esp-idf/issues/2730>
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        Environment
>>>>>>>> 
>>>>>>>>      * Development Kit: none
>>>>>>>>      * Kit version (for WroverKit/PicoKit/DevKitC): none
>>>>>>>>      * Module or chip used: ESP32-WROVER 16MB
>>>>>>>>      * IDF version (run |git describe --tags| to find it): v3.2-beta1-208-g0d7f2d77c
>>>>>>>>      * Build System: make
>>>>>>>>      * Compiler version (run |xtensa-esp32-elf-gcc --version| to find it): (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0
>>>>>>>>      * Operating System: macOS
>>>>>>>>      * Power Supply: USB
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        Problem Description
>>>>>>>> 
>>>>>>>>    TLDR: Between May and July 2018 a change was made to esp idf master that is causing corruption on FAT filesystems mounted on SPI flash.
>>>>>>>> 
>>>>>>>>    Our project uses a partitions.csv as follows:
>>>>>>>> 
>>>>>>>>    |# Name, Type, SubType, Offset, Size nvs, data, nvs, 0x9000, 0x4000 otadata, data, ota, 0xd000, 0x2000 phy_init, data, phy, 0xf000, 0x1000 factory,
>>>>>>>>    app, factory, 0x10000, 4M ota_0, app, ota_0, , 4M ota_1, app, ota_1, , 4M store, data, fat, , 1M |
>>>>>>>> 
>>>>>>>>    The 'store' partition is formatted as FAT, as follows:
>>>>>>>> 
>>>>>>>>    esp_vfs_fat_mount_config_t m_store_fat;
>>>>>>>>    wl_handle_t m_store_wlh;
>>>>>>>>    memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
>>>>>>>>    m_store_fat.format_if_mount_failed = true;
>>>>>>>>    m_store_fat.max_files = 5;
>>>>>>>>    esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat, &m_store_wlh);
>>>>>>>> 
>>>>>>>>    We have previously used a clone of esp idf master, dated around May 22 2018, without issues. The partition is very reliable.
>>>>>>>> 
>>>>>>>>    However, on Jul 6 2018, we updated our clone to use the latest esp idf master at that time. Shortly afterwards, users started to report that their
>>>>>>>>    'store' filesystem contents were corrupted. We rolled back.
>>>>>>>> 
>>>>>>>>    We have now tried again (updating on Oct 20 2018 to v3.2-beta1-208-g0d7f2d77c) and immediately had the same issue. Random corruption of FAT filesystem
>>>>>>>>    in SPI flash.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>          Expected Behavior
>>>>>>>> 
>>>>>>>>    No corruption of FAT filesystem.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>          Actual Behavior
>>>>>>>> 
>>>>>>>>    Corruption of FAT filesystem.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>          Steps to reproduce
>>>>>>>> 
>>>>>>>>     1. Create a partition in SPI flash, and mount FAT filesystem
>>>>>>>>     2. Read and write to files on FAT filesystem
>>>>>>>>     3. Reboot
>>>>>>>>     4. Observe random corruption and unmountable filesystem
>>>>>>>> 
>>>>>>>> 
>>>>>>>>          Code to reproduce this issue
>>>>>>>> 
>>>>>>>>    esp_vfs_fat_mount_config_t m_store_fat;
>>>>>>>>    wl_handle_t m_store_wlh;
>>>>>>>>    memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
>>>>>>>>    m_store_fat.format_if_mount_failed = true;
>>>>>>>>    m_store_fat.max_files = 5;
>>>>>>>>    esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat, &m_store_wlh);
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        Debug Logs
>>>>>>>> 
>>>>>>>>    n/a
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        Other items if possible
>>>>>>>> 
>>>>>>>>    Please advise if you need anything further.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I think the timeline is correct (the issue is in esp idf master some time between May and July 2018), but please let me know if you know differently (or
>>>>>>>> update the github issue with your comments).
>>>>>>>> 
>>>>>>>> Regards, Mark
>>>>>>>> 
>>>>>>>>> On 23 Nov 2018, at 6:19 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de><mailto:dexter at expeedo.de <mailto:dexter at expeedo.de>>> wrote:
>>>>>>>>> 
>>>>>>>>> esp-idf and OVMS branches are back to the working version.
>>>>>>>>> 
>>>>>>>>> In case you also lost your config:  I also just fixed a bug on restoring into an empty /store partition.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Michael
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Am 22.11.18 um 22:34 schrieb Michael Balzer:
>>>>>>>>>> See https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165 <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165>
>>>>>>>>>> 
>>>>>>>>>> I'll reset both master branches now.
>>>>>>>>>> 
>>>>>>>>>> If you're about to pull, please wait until I've reverted the branches.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Michael
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> OvmsDev mailing list
>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com> <mailto:OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>>
>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> OvmsDev mailing list
>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>> 
>>>>>>> --
>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com> <mailto:OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>>
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> OvmsDev mailing list
>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>> 
>>>>> --
>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>> 
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>> 
>>>> --
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>> 
>>>> 
>>> 
>>>                                                        -- Steve_______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>> 
>> 
>> 
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
> 
> -- 
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20181127/21028f29/attachment-0001.html>


More information about the OvmsDev mailing list