[Ovmsdev] Attention: esp-idf upgrade still breaks configuration
Michael Balzer
dexter at expeedo.de
Wed Jan 23 04:17:49 HKT 2019
Wifi AP mode is broken. A fix should be available in some days:
https://github.com/espressif/esp-idf/issues/2915#issuecomment-452606571
I haven't been able to test the modem yet. It seems to work but my test SIM ran out of balance, so I need to wait for the payment process to finish.
Apart from that it works pretty well so far.
Am 22.01.19 um 17:43 schrieb Michael Balzer:
> Dmitry found out we had a bad timing with our merge. The bug was fixed by him just five days later:
>
> https://github.com/espressif/esp-idf/commit/82eca97300410dd743de7f116836879cf3cdc6ba
>
> I have merged the latest upstream HEAD locally now. It builds correctly, with just one little issue in the Kconfig for Smart ED, and seems to work correctly
> as well. I'll do some testing now.
>
> Regards,
> Michael
>
>
> Am 27.12.18 um 11:03 schrieb Michael Balzer:
>> Update: I managed to reproduce the bug, and Dmitry's assumption is correct.
>>
>> The wear leveling system attempts to do an update when mounting a partition formatted with the old version. That update fails and leads to the next reboot
>> failing to mount the partition.
>>
>> So the WL update process needs to be fixed.
>>
>> Regards,
>> Michael
>>
>>
>> Am 27.11.18 um 22:35 schrieb Michael Balzer:
>>> @all: we need some help on this.
>>>
>>> Dmitry has asked us to verify an assumption about the source of the problem. Unfortunately, my module has decided to not show the bug anymore.
>>>
>>> Can anyone a) reproduce the bug and b) check Dmitry's code change? The code change is very simple:
>>>
>>> https://github.com/espressif/esp-idf/issues/2730#issuecomment-441928779
>>>
>>> …that's in file esp-idf/components/wear_levelling/WL_Flash.cpp line 183.
>>>
>>> Chances are the bug is more frequently present on V3.1 first batch modules.
>>>
>>> _To test a build including the bug, do_:
>>>
>>> OVMS# config backup /sd/cfgbackup.zip
>>>
>>> ~/…/esp-idf> git checkout 5afc1b0cbbcd5903744cd2948ee235b7016eeede
>>> ~/…/esp-idf> git merge --no-commit 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
>>> ~/…/esp-idf> git submodule update --recursive
>>>
>>> ~/…/OVMS.V3> git checkout c3c908c00a09c4a841d1fec115be785c8f486270
>>> ~/…/OVMS.V3> make app-flash
>>>
>>> Then do some reboots of the module.
>>>
>>> If the bug shows up, usually the second reboot already will result in a lost configuration. You'll see some warnings in the boot process (mount failure),
>>> and the prompt will directly be "#" as no module password is in place.
>>>
>>> If that happens, rebuild after applying the change as described by Dmitry, restore your config (see below), check if the bug is gone. If it isn't, the
>>> change _definitely doesn't work_.
>>>
>>> If it's gone, as the bug can stop showing up, you need to revert Dmitry's change and check once again if the bug reappears. If it does, the change
>>> _definitely works_.
>>>
>>> If the bug doesn't reappear, you need to switch off the module for some hours, then redo the above steps.
>>>
>>>
>>> _Note on restoring the config backup_: as my later restore fix isn't included in that version, you may need to create the events directory manually:
>>>
>>> OVMS# vfs mkdir /store/events
>>> OVMS# config restore /sd/cfgbackup.zip <password>
>>>
>>> To return to normal build setup:
>>>
>>> ~/…/esp-idf> git checkout -f master
>>> ~/…/esp-idf> git submodule update --recursive
>>>
>>> ~/…/OVMS.V3> git checkout master
>>>
>>>
>>> Please report definitive results directly to Dmitry in the issue #2730.
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> Am 27.11.18 um 02:02 schrieb Mark Webb-Johnson:
>>>> I think unmount should be on the system.shutdown event. That would be after system.shuttingdown and the MyBoot.RestartPending/RestartReady control steps.
>>>>
>>>> That means if a component really wants to write to config (or elsewhere on /store), it should do it during system.shuttingdown event (and optionally do
>>>> the MyBoot.RestartPending(TAG) and MyBoot.RestartReady(TAG) thing, like simcom).
>>>>
>>>> As you say, that doesn’t address the crash-during-shuttingdown issue, but can be done in a very component manner. No changes to MyBoot - all done in
>>>> ovms_config. Or just hard-code the /store unmount before esp_restart().
>>>>
>>>> Regards, Mark.
>>>>
>>>>> On 27 Nov 2018, at 12:47 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>
>>>>> Dimitris question about unclean shutdowns let me check our reboot code, and we actually do not take care of unmounting the config store.
>>>>>
>>>>> While my crash observations do not look like crash/clean makes a difference to this bug, I still think we should do that.
>>>>>
>>>>> The simple solution would be to do this last, in Boot::boot_shutdown_done() just before the esp_restart(), but that does not take care of crashes during
>>>>> shutdown, which still are too frequent.
>>>>>
>>>>> How about unmounting /store ASAP after the shuttingdown event (i.e. normal shutdown handling)?
>>>>> Writing to the config would then have no effect during shutdown, but do we actually need (want) that to be possible?
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 25.11.18 um 03:12 schrieb Mark Webb-Johnson:
>>>>>> I second this. A fantastic effort, Michael.
>>>>>>
>>>>>> The log you provided on the GitHub issue looks really helpful, and I see Dimitri has replied:
>>>>>>
>>>>>>
>>>>>> dmitry1945 <https://github.com/dmitry1945> commented 6 hours ago <https://github.com/espressif/esp-idf#issuecomment-441391665>
>>>>>>
>>>>>> Thank you @dexterbg <https://github.com/dexterbg>, the place is clear.
>>>>>>
>>>>>>
>>>>>> Hopefully Dimitri can find the issue from here. So many little bugs in wifi and bluetooth stacks causing us random issues - it would be helpful to be
>>>>>> able to update to the latest IDF.
>>>>>>
>>>>>> Thanks, and Regards,
>>>>>> Mark.
>>>>>>
>>>>>>> On 25 Nov 2018, at 3:22 AM, Stephen Casner <casner at acm.org <mailto:casner at acm.org>> wrote:
>>>>>>>
>>>>>>> Michael -- Kudos for this Herculean effort -- Steve
>>>>>>>
>>>>>>> On Sat, 24 Nov 2018, Michael Balzer wrote:
>>>>>>>
>>>>>>>> Narrowing this down is a real PITA. The effect sometimes stops to occur, I then need to leave the module powered down for a while to get the effect back.
>>>>>>>> Sometimes 1 hour is sufficient, currently it's been off for 2 hours and still works. I had a test window yesterday evening, one this morning and one in the
>>>>>>>> afternoon. Temperature is most probably irrelevant, as the last power downs were outside in ~4-5 °C to rule this out.
>>>>>>>>
>>>>>>>> So as a "passed" can always be a false positive I need to validate every passed step by switching back to a failing version and see if that still fails.
>>>>>>>>
>>>>>>>> It seems I now have to wait until tomorrow for the next test window, but I have bisected down to this range of just 8 commits:
>>>>>>>>
>>>>>>>> balzer at leela:~/esp/esp-idf> git rev-list 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8 ^9d2f7c60d9aef9860c61c2756318ada68c80fddf
>>>>>>>> 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
>>>>>>>> f392727abf7d56490c2f33127a59bfac42c937e0
>>>>>>>> e834d6fffc23a6fcfc0d2e871c9235417a7fb48f
>>>>>>>> 35842d02abb5f574aaab466d46081a232fdd20a6
>>>>>>>> f05f3fbde87a9ce45c6818f71b49cd13888fd457
>>>>>>>> a6d6c58ecadb9759a0bacf35cd7332ac641e598d
>>>>>>>> 321b1e02052de95db60ddce87eecce5f9e04e9b8
>>>>>>>> 40486c872345584d34949b3ce83f9e956a7eea13
>>>>>>>>
>>>>>>>> ...with 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8 being the last identified bad commit, and 9d2f7c60d9aef9860c61c2756318ada68c80fddf being the last good.
>>>>>>>>
>>>>>>>> If I should guess now, it's probably one of Dmitry's commits on the wear leveling code.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 23.11.18 um 17:20 schrieb Michael Balzer:
>>>>>>>>> It's not a timing issue, I've let it reboot about 30 times without any successful mount after the first failure.
>>>>>>>>>
>>>>>>>>> Going into bisecting now...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 23.11.18 um 15:54 schrieb Mark Webb-Johnson:
>>>>>>>>>>> It may actually not be a corruption of the filesystem but some timing issue on the mount procedure. To test that we could disable the auto formatting on
>>>>>>>>>>> mount failures.
>>>>>>>>>>
>>>>>>>>>> True.
>>>>>>>>>>
>>>>>>>>>> A couple of Espressif guys have jumped on the issue, and I have provided some more information for them. I think key will be reproducing it.
>>>>>>>>>>
>>>>>>>>>>> The issue may also be dependant on the hardware version, i.e. it could be caused by the bug that caused the SD speed issue on the first 3.1 batch.
>>>>>>>>>>
>>>>>>>>>> That was definitely a hardware issue with the CP2102 chip. I don't think related to ESP in any way.
>>>>>>>>>>
>>>>>>>>>> Regards, Mark.
>>>>>>>>>>
>>>>>>>>>>> On 23 Nov 2018, at 10:34 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de> <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> It may actually not be a corruption of the filesystem but some timing issue on the mount procedure. To test that we could disable the auto formatting on
>>>>>>>>>>> mount failures.
>>>>>>>>>>>
>>>>>>>>>>> The issue may also be dependant on the hardware version, i.e. it could be caused by the bug that caused the SD speed issue on the first 3.1 batch.
>>>>>>>>>>>
>>>>>>>>>>> I only have tried the idf update on my batch 1 module (my bench / development module). I think most of our edge testers also have that version.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Michael
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 23.11.18 um 02:32 schrieb Mark Webb-Johnson:
>>>>>>>>>>>> I have raised the following github issue to Espressif:
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/espressif/esp-idf/issues/2730
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Environment
>>>>>>>>>>>>
>>>>>>>>>>>> * Development Kit: none
>>>>>>>>>>>> * Kit version (for WroverKit/PicoKit/DevKitC): none
>>>>>>>>>>>> * Module or chip used: ESP32-WROVER 16MB
>>>>>>>>>>>> * IDF version (run |git describe --tags| to find it): v3.2-beta1-208-g0d7f2d77c
>>>>>>>>>>>> * Build System: make
>>>>>>>>>>>> * Compiler version (run |xtensa-esp32-elf-gcc --version| to find it): (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0
>>>>>>>>>>>> * Operating System: macOS
>>>>>>>>>>>> * Power Supply: USB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Problem Description
>>>>>>>>>>>>
>>>>>>>>>>>> TLDR: Between May and July 2018 a change was made to esp idf master that is causing corruption on FAT filesystems mounted on SPI flash.
>>>>>>>>>>>>
>>>>>>>>>>>> Our project uses a partitions.csv as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> |# Name, Type, SubType, Offset, Size nvs, data, nvs, 0x9000, 0x4000 otadata, data, ota, 0xd000, 0x2000 phy_init, data, phy, 0xf000, 0x1000 factory,
>>>>>>>>>>>> app, factory, 0x10000, 4M ota_0, app, ota_0, , 4M ota_1, app, ota_1, , 4M store, data, fat, , 1M |
>>>>>>>>>>>>
>>>>>>>>>>>> The 'store' partition is formatted as FAT, as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> esp_vfs_fat_mount_config_t m_store_fat;
>>>>>>>>>>>> wl_handle_t m_store_wlh;
>>>>>>>>>>>> memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
>>>>>>>>>>>> m_store_fat.format_if_mount_failed = true;
>>>>>>>>>>>> m_store_fat.max_files = 5;
>>>>>>>>>>>> esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat, &m_store_wlh);
>>>>>>>>>>>>
>>>>>>>>>>>> We have previously used a clone of esp idf master, dated around May 22 2018, without issues. The partition is very reliable.
>>>>>>>>>>>>
>>>>>>>>>>>> However, on Jul 6 2018, we updated our clone to use the latest esp idf master at that time. Shortly afterwards, users started to report that their
>>>>>>>>>>>> 'store' filesystem contents were corrupted. We rolled back.
>>>>>>>>>>>>
>>>>>>>>>>>> We have now tried again (updating on Oct 20 2018 to v3.2-beta1-208-g0d7f2d77c) and immediately had the same issue. Random corruption of FAT
>>>>>>>>>>>> filesystem
>>>>>>>>>>>> in SPI flash.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Expected Behavior
>>>>>>>>>>>>
>>>>>>>>>>>> No corruption of FAT filesystem.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Actual Behavior
>>>>>>>>>>>>
>>>>>>>>>>>> Corruption of FAT filesystem.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Steps to reproduce
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Create a partition in SPI flash, and mount FAT filesystem
>>>>>>>>>>>> 2. Read and write to files on FAT filesystem
>>>>>>>>>>>> 3. Reboot
>>>>>>>>>>>> 4. Observe random corruption and unmountable filesystem
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Code to reproduce this issue
>>>>>>>>>>>>
>>>>>>>>>>>> esp_vfs_fat_mount_config_t m_store_fat;
>>>>>>>>>>>> wl_handle_t m_store_wlh;
>>>>>>>>>>>> memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
>>>>>>>>>>>> m_store_fat.format_if_mount_failed = true;
>>>>>>>>>>>> m_store_fat.max_files = 5;
>>>>>>>>>>>> esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat, &m_store_wlh);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Debug Logs
>>>>>>>>>>>>
>>>>>>>>>>>> n/a
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Other items if possible
>>>>>>>>>>>>
>>>>>>>>>>>> Please advise if you need anything further.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think the timeline is correct (the issue is in esp idf master some time between May and July 2018), but please let me know if you know
>>>>>>>>>>>> differently (or
>>>>>>>>>>>> update the github issue with your comments).
>>>>>>>>>>>>
>>>>>>>>>>>> Regards, Mark
>>>>>>>>>>>>
>>>>>>>>>>>>> On 23 Nov 2018, at 6:19 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de> <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> esp-idf and OVMS branches are back to the working version.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In case you also lost your config: I also just fixed a bug on restoring into an empty /store partition.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 22.11.18 um 22:34 schrieb Michael Balzer:
>>>>>>>>>>>>>> See https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll reset both master branches now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you're about to pull, please wait until I've reverted the branches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> OvmsDev mailing list
>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -- Steve_______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> OvmsDev mailing list
>>>>>> OvmsDev at lists.openvehicles.com
>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>
>>>>> --
>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>>
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>
>>> --
>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>> --
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> --
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190122/f17ab8a6/attachment.htm>
More information about the OvmsDev
mailing list