@all: we
need some help on this.
Dmitry has asked us to verify an assumption about the
source of the problem. Unfortunately, my module has
decided to not show the bug anymore.
Can anyone a) reproduce the bug and b) check Dmitry's
code change? The code change is very simple:
https://github.com/espressif/esp-idf/issues/2730#issuecomment-441928779
…that's in file
esp-idf/components/wear_levelling/WL_Flash.cpp
line 183.
Chances are the bug is more frequently present on V3.1
first batch modules.
To test a build including the bug, do:
OVMS# config backup
/sd/cfgbackup.zip
~/…/esp-idf> git checkout
5afc1b0cbbcd5903744cd2948ee235b7016eeede
~/…/esp-idf> git merge --no-commit
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
~/…/esp-idf> git submodule update
--recursive
~/…/OVMS.V3> git checkout
c3c908c00a09c4a841d1fec115be785c8f486270
~/…/OVMS.V3> make app-flash
Then do some reboots of the module.
If the bug shows up, usually the second reboot already
will result in a lost configuration. You'll see some
warnings in the boot process (mount failure), and the
prompt will directly be "#" as no module password is in
place.
If that happens, rebuild after applying the change as
described by Dmitry, restore your config (see below),
check if the bug is gone. If it isn't, the change
definitely doesn't work.
If it's gone, as the bug can stop showing up, you need
to revert Dmitry's change and check once again if the
bug reappears. If it does, the change
definitely
works.
If the bug doesn't reappear, you need to switch off the
module for some hours, then redo the above steps.
Note on restoring the config backup: as
my later restore fix isn't included in that version, you
may need to create the events directory manually:
OVMS# vfs mkdir
/store/events
OVMS# config restore
/sd/cfgbackup.zip <password>
To return to normal build setup:
~/…/esp-idf> git
checkout -f master
~/…/esp-idf> git submodule
update --recursive
~/…/OVMS.V3> git checkout master
Please report definitive results directly to Dmitry in
the issue #2730.
Thanks,
Michael
Am 27.11.18 um 02:02
schrieb Mark Webb-Johnson:
I think unmount should be on the system.shutdown
event. That would be after system.shuttingdown and the
MyBoot.RestartPending/RestartReady control steps.
That means if a component really wants
to write to config (or elsewhere on /store), it
should do it during system.shuttingdown event (and
optionally do the MyBoot.RestartPending(TAG)
and MyBoot.RestartReady(TAG) thing, like simcom).
As you say, that doesn’t address the
crash-during-shuttingdown issue, but can be done in
a very component manner. No changes to MyBoot - all
done in ovms_config. Or just hard-code the /store
unmount before esp_restart().
Regards, Mark.
Dimitris question about unclean shutdowns
let me check our reboot code, and we
actually do not take care of unmounting the
config store.
While my crash observations do not look like
crash/clean makes a difference to this bug,
I still think we should do that.
The simple solution would be to do this
last, in Boot::boot_shutdown_done() just
before the esp_restart(), but that does not
take care of crashes during shutdown, which
still are too frequent.
How about unmounting /store ASAP after the
shuttingdown event (i.e. normal shutdown
handling)?
Writing to the config would then have no
effect during shutdown, but do we actually
need (want) that to be possible?
Regards,
Michael
Am 25.11.18 um
03:12 schrieb Mark Webb-Johnson:
I second this. A fantastic effort,
Michael.
The log you provided on the
GitHub issue looks really helpful, and I
see Dimitri has replied:
Hopefully Dimitri can find
the issue from here. So many little
bugs in wifi and bluetooth stacks
causing us random issues - it would be
helpful to be able to update to the
latest IDF.
Thanks, and Regards,
Mark.
Michael -- Kudos for
this Herculean effort -- Steve
On Sat, 24 Nov 2018, Michael
Balzer wrote:
Narrowing
this down is a real PITA. The
effect sometimes stops to
occur, I then need to leave
the module powered down for a
while to get the effect back.
Sometimes 1 hour is
sufficient, currently it's
been off for 2 hours and still
works. I had a test window
yesterday evening, one this
morning and one in the
afternoon. Temperature is most
probably irrelevant, as the
last power downs were outside
in ~4-5 °C to rule this out.
So as a "passed" can always be
a false positive I need to
validate every passed step by
switching back to a failing
version and see if that still
fails.
It seems I now have to wait
until tomorrow for the next
test window, but I have
bisected down to this range of
just 8 commits:
balzer@leela:~/esp/esp-idf>
git rev-list
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
^9d2f7c60d9aef9860c61c2756318ada68c80fddf
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
f392727abf7d56490c2f33127a59bfac42c937e0
e834d6fffc23a6fcfc0d2e871c9235417a7fb48f
35842d02abb5f574aaab466d46081a232fdd20a6
f05f3fbde87a9ce45c6818f71b49cd13888fd457
a6d6c58ecadb9759a0bacf35cd7332ac641e598d
321b1e02052de95db60ddce87eecce5f9e04e9b8
40486c872345584d34949b3ce83f9e956a7eea13
...with
9d609af54c63e7f949a4fbc43d4f1c13b57f49d8
being the last identified bad
commit, and
9d2f7c60d9aef9860c61c2756318ada68c80fddf
being the last good.
If I should guess now, it's
probably one of Dmitry's
commits on the wear leveling
code.
Regards,
Michael
Am 23.11.18 um 17:20 schrieb
Michael Balzer:
It's not a timing
issue, I've let it reboot
about 30 times without any
successful mount after the
first failure.
Going into bisecting now...
Am 23.11.18 um 15:54 schrieb
Mark Webb-Johnson:
It may actually
not be a corruption of
the filesystem but some
timing issue on the
mount procedure. To test
that we could disable
the auto formatting on
mount failures.
True.
A couple of Espressif guys
have jumped on the issue,
and I have provided some
more information for them.
I think key will be
reproducing it.
The issue may
also be dependant on the
hardware version, i.e.
it could be caused by
the bug that caused the
SD speed issue on the
first 3.1 batch.
That was definitely a
hardware issue with the
CP2102 chip. I don't think
related to ESP in any way.
Regards, Mark.
On 23 Nov 2018,
at 10:34 PM, Michael
Balzer <dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
It may actually not be a
corruption of the
filesystem but some
timing issue on the
mount procedure. To test
that we could disable
the auto formatting on
mount failures.
The issue may also be
dependant on the
hardware version, i.e.
it could be caused by
the bug that caused the
SD speed issue on the
first 3.1 batch.
I only have tried the
idf update on my batch 1
module (my bench /
development module). I
think most of our edge
testers also have that
version.
Regards,
Michael
Am 23.11.18 um 02:32
schrieb Mark
Webb-Johnson:
I have raised
the following github
issue to Espressif:
https://github.com/espressif/esp-idf/issues/2730
Environment
* Development
Kit: none
* Kit version
(for
WroverKit/PicoKit/DevKitC):
none
* Module or chip
used: ESP32-WROVER
16MB
* IDF version
(run |git describe
--tags| to find it):
v3.2-beta1-208-g0d7f2d77c
* Build System:
make
* Compiler
version (run
|xtensa-esp32-elf-gcc
--version| to find
it): (crosstool-NG
crosstool-ng-1.22.0-80-g6c4433a)
5.2.0
* Operating
System: macOS
* Power Supply:
USB
Problem
Description
TLDR: Between May
and July 2018 a change
was made to esp idf
master that is causing
corruption on FAT
filesystems mounted on
SPI flash.
Our project uses a
partitions.csv as
follows:
|# Name, Type,
SubType, Offset, Size
nvs, data, nvs,
0x9000, 0x4000
otadata, data, ota,
0xd000, 0x2000
phy_init, data, phy,
0xf000, 0x1000
factory,
app, factory,
0x10000, 4M ota_0,
app, ota_0, , 4M
ota_1, app, ota_1, ,
4M store, data, fat, ,
1M |
The 'store'
partition is formatted
as FAT, as follows:
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t
m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files = 5;
esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat,
&m_store_wlh);
We have previously
used a clone of esp
idf master, dated
around May 22 2018,
without issues. The
partition is very
reliable.
However, on Jul 6
2018, we updated our
clone to use the
latest esp idf master
at that time. Shortly
afterwards, users
started to report that
their
'store' filesystem
contents were
corrupted. We rolled
back.
We have now tried
again (updating on Oct
20 2018 to
v3.2-beta1-208-g0d7f2d77c)
and immediately had
the same issue. Random
corruption of FAT
filesystem
in SPI flash.
Expected
Behavior
No corruption of
FAT filesystem.
Actual
Behavior
Corruption of FAT
filesystem.
Steps to
reproduce
1. Create a
partition in SPI
flash, and mount FAT
filesystem
2. Read and write
to files on FAT
filesystem
3. Reboot
4. Observe random
corruption and
unmountable filesystem
Code to
reproduce this issue
esp_vfs_fat_mount_config_t m_store_fat;
wl_handle_t
m_store_wlh;
memset(&m_store_fat,0,sizeof(esp_vfs_fat_sdmmc_mount_config_t));
m_store_fat.format_if_mount_failed = true;
m_store_fat.max_files = 5;
esp_vfs_fat_spiflash_mount("/store", "store", &m_store_fat,
&m_store_wlh);
Debug Logs
n/a
Other items if
possible
Please advise if
you need anything
further.
I think the timeline
is correct (the issue
is in esp idf master
some time between May
and July 2018), but
please let me know if
you know differently
(or
update the github
issue with your
comments).
Regards, Mark
On
23 Nov 2018, at 6:19
AM, Michael Balzer
<dexter@expeedo.de
<mailto:dexter@expeedo.de>>
wrote:
esp-idf and OVMS
branches are back to
the working version.
In case you also
lost your config: I
also just fixed a
bug on restoring
into an empty /store
partition.
Regards,
Michael
Am 22.11.18 um 22:34
schrieb Michael
Balzer:
See https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/pull/165
I'll reset both
master branches
now.
If you're about to
pull, please wait
until I've
reverted the
branches.
Regards,
Michael
--
Michael Balzer *
Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735
* Handy 0176 / 206
989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer *
Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 *
Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
<mailto:OvmsDev@lists.openvehicles.com>
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer *
Helkenberger Weg 9 * D-58256
Ennepetal
Fon 02333 / 833 5735 * Handy
0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger
Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy
0176 / 206 989 26
--
Steve_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
dmitry1945 commented 6 hours ago