Dimitris question about unclean shutdowns let me
check our reboot code, and we actually do not take
care of unmounting the config store.
While my crash observations do not look like
crash/clean makes a difference to this bug, I
still think we should do that.
The simple solution would be to do this last, in
Boot::boot_shutdown_done() just before the
esp_restart(), but that does not take care of
crashes during shutdown, which still are too
frequent.
How about unmounting /store ASAP after the
shuttingdown event (i.e. normal shutdown
handling)?
Writing to the config would then have no effect
during shutdown, but do we actually need (want)
that to be possible?
Regards,
Michael
Am 25.11.18 um 03:12
schrieb Mark Webb-Johnson:
I second this. A fantastic effort, Michael.
The log you provided on the GitHub
issue looks really helpful, and I see Dimitri
has replied:
Hopefully Dimitri can find the
issue from here. So many little bugs in wifi
and bluetooth stacks causing us random
issues - it would be helpful to be able to
update to the latest IDF.
Thanks, and Regards,
Mark.
Michael -- Kudos for this
Herculean effort -- Steve
On Sat, 24 Nov 2018, Michael Balzer
wrote:
Narrowing
this down is a real PITA. The effect
sometimes stops to occur, I then
need to leave the module powered
down for a while to get the effect
back.
Sometimes 1 hour is sufficient,
currently it's been off for 2 hours
and still works. I had a test window
yesterday evening, one this morning
and one in the
afternoon. Temperature is most
probably irrelevant, as the last
power downs were outside in ~4-5 °C
to rule this out.
So as a "passed" can always be a
false positive I need to validate
every passed step by switching back
to a failing version and see if that
still fails.
It seems I now have to wait until
tomorrow for the next test window,
but I have bisected down to this
range of just 8 commits:
balzer@leela:~/esp/esp-idf> git
rev-list
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
^9d2f7c60d9aef9860c61c2756318ada68c80fddf
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
f392727abf7d56490c2f33127a59bfac42c937e0
e834d6fffc23a6fcfc0d2e871c9235417a7fb48f
35842d02abb5f574aaab466d46081a232fdd20a6
f05f3fbde87a9ce45c6818f71b49cd13888fd457
a6d6c58ecadb9759a0bacf35cd7332ac641e598d
321b1e02052de95db60ddce87eecce5f9e04e9b8
40486c872345584d34949b3ce83f9e956a7eea13
...with
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
being the last identified bad
commit, and
9d2f7c60d9aef9860c61c2756318ada68c80fddf
being the last good.
If I should guess now, it's probably
one of Dmitry's commits on the wear
leveling code.
Regards,
Michael
Am 23.11.18 um 17:20 schrieb Michael
Balzer:
It's
not a timing issue, I've let it
reboot about 30 times without any
successful mount after the first
failure.
Going into bisecting now...
Am 23.11.18 um 15:54 schrieb Mark
Webb-Johnson:
It
may actually not be a
corruption of the filesystem
but some timing issue on the
mount procedure. To test that
we could disable the auto
formatting on
mount failures.
True.
A couple of Espressif guys have
jumped on the issue, and I have
provided some more information
for them. I think key will be
reproducing it.
The
issue may also be dependant on
the hardware version, i.e. it
could be caused by the bug
that caused the SD speed issue
on the first 3.1 batch.
That was definitely a hardware
issue with the CP2102 chip. I
don't think related to ESP in
any way.
Regards, Mark.
On
23 Nov 2018, at 10:34 PM,
Michael Balzer <dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
It may actually not be a
corruption of the filesystem
but some timing issue on the
mount procedure. To test that
we could disable the auto
formatting on
mount failures.
The issue may also be
dependant on the hardware
version, i.e. it could be
caused by the bug that caused
the SD speed issue on the
first 3.1 batch.
I only have tried the idf
update on my batch 1 module
(my bench / development
module). I think most of our
edge testers also have that
version.
Regards,
Michael
Am 23.11.18 um 02:32 schrieb
Mark Webb-Johnson:
I have raised the
following github issue to
Espressif:
https://github.com/espressif/esp-idf/issues/2730
Environment
* Development Kit: none
* Kit version (for
WroverKit/PicoKit/DevKitC):
none
* Module or chip used:
ESP32-WROVER 16MB
* IDF version (run |git
describe --tags| to find
it):
v3.2-beta1-208-g0d7f2d77c
* Build System: make
* Compiler version (run
|xtensa-esp32-elf-gcc
--version| to find it):
(crosstool-NG
crosstool-ng-1.22.0-80-g6c4433a)
5.2.0
* Operating System:
macOS
* Power Supply: USB
Problem Description
TLDR: Between May and
July 2018 a change was made
to esp idf master that is
causing corruption on FAT
filesystems mounted on SPI
flash.
Our project uses a
partitions.csv as follows:
|# Name, Type, SubType,
Offset, Size nvs, data, nvs,
0x9000, 0x4000 otadata,
data, ota, 0xd000, 0x2000
phy_init, data, phy, 0xf000,
0x1000 factory,
app, factory, 0x10000, 4M
ota_0, app, ota_0, , 4M
ota_1, app, ota_1, , 4M
store, data, fat, , 1M |
The 'store' partition is
formatted as FAT, as
follows:
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files =
5;
esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat,
&m_store_wlh);
We have previously used a
clone of esp idf master,
dated around May 22 2018,
without issues. The
partition is very reliable.
However, on Jul 6 2018,
we updated our clone to use
the latest esp idf master at
that time. Shortly
afterwards, users started to
report that their
'store' filesystem
contents were corrupted. We
rolled back.
We have now tried again
(updating on Oct 20 2018 to
v3.2-beta1-208-g0d7f2d77c)
and immediately had the same
issue. Random corruption of
FAT filesystem
in SPI flash.
Expected Behavior
No corruption of FAT
filesystem.
Actual Behavior
Corruption of FAT
filesystem.
Steps to reproduce
1. Create a partition in
SPI flash, and mount FAT
filesystem
2. Read and write to
files on FAT filesystem
3. Reboot
4. Observe random
corruption and unmountable
filesystem
Code to reproduce
this issue
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files =
5;
esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat,
&m_store_wlh);
Debug Logs
n/a
Other items if
possible
Please advise if you need
anything further.
I think the timeline is
correct (the issue is in esp
idf master some time between
May and July 2018), but
please let me know if you
know differently (or
update the github issue with
your comments).
Regards, Mark
On 23 Nov 2018,
at 6:19 AM, Michael Balzer
<dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
esp-idf and OVMS branches
are back to the working
version.
In case you also lost your
config: I also just fixed
a bug on restoring into an
empty /store partition.
Regards,
Michael
Am 22.11.18 um 22:34
schrieb Michael Balzer:
See https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165
I'll reset both master
branches now.
If you're about to pull,
please wait until I've
reverted the branches.
Regards,
Michael
--
Michael Balzer *
Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 *
Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger
Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy
0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg
9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176
/ 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9
* D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 /
206 989 26
--
Steve_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
dmitry1945 commented 6 hours ago