Dimitris question about unclean shutdowns let me check our reboot
code, and we actually do not take care of unmounting the config
store.
While my crash observations do not look like crash/clean makes a
difference to this bug, I still think we should do that.
The simple solution would be to do this last, in
Boot::boot_shutdown_done() just before the esp_restart(), but that
does not take care of crashes during shutdown, which still are too
frequent.
How about unmounting /store ASAP after the shuttingdown event (i.e.
normal shutdown handling)?
Writing to the config would then have no effect during shutdown, but
do we actually need (want) that to be possible?
Regards,
Michael
Am 25.11.18 um 03:12 schrieb Mark
Webb-Johnson:
I second this. A fantastic effort, Michael.
The log you provided on the GitHub issue looks
really helpful, and I see Dimitri has replied:
Hopefully Dimitri can find the issue from here. So many
little bugs in wifi and bluetooth stacks causing us random
issues - it would be helpful to be able to update to the
latest IDF.
Thanks, and Regards,
Mark.
Michael -- Kudos for this Herculean effort
-- Steve
On Sat, 24 Nov 2018, Michael Balzer wrote:
Narrowing this down is
a real PITA. The effect sometimes stops to occur, I
then need to leave the module powered down for a while
to get the effect back.
Sometimes 1 hour is sufficient, currently it's been
off for 2 hours and still works. I had a test window
yesterday evening, one this morning and one in the
afternoon. Temperature is most probably irrelevant, as
the last power downs were outside in ~4-5 °C to rule
this out.
So as a "passed" can always be a false positive I need
to validate every passed step by switching back to a
failing version and see if that still fails.
It seems I now have to wait until tomorrow for the
next test window, but I have bisected down to this
range of just 8 commits:
balzer@leela:~/esp/esp-idf> git rev-list
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
^9d2f7c60d9aef9860c61c2756318ada68c80fddf
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
f392727abf7d56490c2f33127a59bfac42c937e0
e834d6fffc23a6fcfc0d2e871c9235417a7fb48f
35842d02abb5f574aaab466d46081a232fdd20a6
f05f3fbde87a9ce45c6818f71b49cd13888fd457
a6d6c58ecadb9759a0bacf35cd7332ac641e598d
321b1e02052de95db60ddce87eecce5f9e04e9b8
40486c872345584d34949b3ce83f9e956a7eea13
...with 9d609af54c63e7f949a4fbc43d4f1c13b57f49d8 being
the last identified bad commit, and
9d2f7c60d9aef9860c61c2756318ada68c80fddf being the
last good.
If I should guess now, it's probably one of Dmitry's
commits on the wear leveling code.
Regards,
Michael
Am 23.11.18 um 17:20 schrieb Michael Balzer:
It's not a timing
issue, I've let it reboot about 30 times without any
successful mount after the first failure.
Going into bisecting now...
Am 23.11.18 um 15:54 schrieb Mark Webb-Johnson:
It may actually
not be a corruption of the filesystem but some
timing issue on the mount procedure. To test
that we could disable the auto formatting on
mount failures.
True.
A couple of Espressif guys have jumped on the
issue, and I have provided some more information
for them. I think key will be reproducing it.
The issue may
also be dependant on the hardware version, i.e.
it could be caused by the bug that caused the SD
speed issue on the first 3.1 batch.
That was definitely a hardware issue with the
CP2102 chip. I don't think related to ESP in any
way.
Regards, Mark.
On 23 Nov 2018,
at 10:34 PM, Michael Balzer <dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
It may actually not be a corruption of the
filesystem but some timing issue on the mount
procedure. To test that we could disable the
auto formatting on
mount failures.
The issue may also be dependant on the hardware
version, i.e. it could be caused by the bug that
caused the SD speed issue on the first 3.1
batch.
I only have tried the idf update on my batch 1
module (my bench / development module). I think
most of our edge testers also have that version.
Regards,
Michael
Am 23.11.18 um 02:32 schrieb Mark Webb-Johnson:
I have raised
the following github issue to Espressif:
https://github.com/espressif/esp-idf/issues/2730
Environment
* Development Kit: none
* Kit version (for
WroverKit/PicoKit/DevKitC): none
* Module or chip used: ESP32-WROVER 16MB
* IDF version (run |git describe --tags|
to find it): v3.2-beta1-208-g0d7f2d77c
* Build System: make
* Compiler version (run
|xtensa-esp32-elf-gcc --version| to find it):
(crosstool-NG crosstool-ng-1.22.0-80-g6c4433a)
5.2.0
* Operating System: macOS
* Power Supply: USB
Problem Description
TLDR: Between May and July 2018 a change
was made to esp idf master that is causing
corruption on FAT filesystems mounted on SPI
flash.
Our project uses a partitions.csv as
follows:
|# Name, Type, SubType, Offset, Size nvs,
data, nvs, 0x9000, 0x4000 otadata, data, ota,
0xd000, 0x2000 phy_init, data, phy, 0xf000,
0x1000 factory,
app, factory, 0x10000, 4M ota_0, app,
ota_0, , 4M ota_1, app, ota_1, , 4M store,
data, fat, , 1M |
The 'store' partition is formatted as FAT,
as follows:
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files = 5;
esp_vfs_fat_spiflash_mount("/store",
"store", &m_store_fat, &m_store_wlh);
We have previously used a clone of esp idf
master, dated around May 22 2018, without
issues. The partition is very reliable.
However, on Jul 6 2018, we updated our
clone to use the latest esp idf master at that
time. Shortly afterwards, users started to
report that their
'store' filesystem contents were corrupted.
We rolled back.
We have now tried again (updating on Oct 20
2018 to v3.2-beta1-208-g0d7f2d77c) and
immediately had the same issue. Random
corruption of FAT filesystem
in SPI flash.
Expected Behavior
No corruption of FAT filesystem.
Actual Behavior
Corruption of FAT filesystem.
Steps to reproduce
1. Create a partition in SPI flash, and
mount FAT filesystem
2. Read and write to files on FAT
filesystem
3. Reboot
4. Observe random corruption and
unmountable filesystem
Code to reproduce this issue
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files = 5;
esp_vfs_fat_spiflash_mount("/store",
"store", &m_store_fat, &m_store_wlh);
Debug Logs
n/a
Other items if possible
Please advise if you need anything further.
I think the timeline is correct (the issue is
in esp idf master some time between May and
July 2018), but please let me know if you know
differently (or
update the github issue with your comments).
Regards, Mark
On 23 Nov
2018, at 6:19 AM, Michael Balzer <dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
esp-idf and OVMS branches are back to the
working version.
In case you also lost your config: I also
just fixed a bug on restoring into an empty
/store partition.
Regards,
Michael
Am 22.11.18 um 22:34 schrieb Michael Balzer:
See https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165
I'll reset both master branches now.
If you're about to pull, please wait until
I've reverted the branches.
Regards,
Michael
--
Michael Balzer * Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989
26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256
Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256
Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256
Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
--
Steve_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
dmitry1945 commented 6 hours ago