<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Perfect, that's what I was hoping for. Verified I;m still on .006,
so whenever .008 gets posted, I will let it update on its own.<br>
<br>
Good work!<br>
<br>
Greg<br>
<br>
<br>
<div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br>
</div>
<blockquote type="cite"
cite="mid:7CB6D28B-816F-4335-92CD-1616D7EFDCED@webb-johnson.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Greg,
<div class=""><br class="">
</div>
<div class="">The issue is upgrading from 3.1.007. Previous
versions were fine. The 3.1.007 was only ever released to
‘edge’, and I caught the problem before releasing it to ‘main’.</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 22 Jun 2018, at 1:57 AM, Greg D. <<a
href="mailto:gregd2350@gmail.com" class=""
moz-do-not-send="true">gregd2350@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> My module
is still on .006, and has not rebooted since late May
when .006 replaced .005. I'm trying to keep it as stock
and "not messed with like a developer would" as
possible.<br class="">
<br class="">
Is .007 not linked to "main"? Or, should I unplug it
until .007 is replaced?<br class="">
<br class="">
Greg<br class="">
<br class="">
<br class="">
<div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br
class="">
</div>
<blockquote type="cite"
cite="mid:9BD6861D-C423-414B-9567-E1FBEB5C29F0@webb-johnson.net"
class="">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" class="">
I’ve raised a github issue for Espressif to look at:
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none;
padding: 0px;" class="">
<div class=""><a
href="https://github.com/espressif/esp-idf/issues/2083"
class="" moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/2083</a></div>
<div class=""><br class="">
</div>
<div class="">
<p style="box-sizing: border-box; margin-bottom:
16px; caret-color: rgb(36, 41, 46); color:
rgb(36, 41, 46); font-family: -apple-system,
BlinkMacSystemFont, "Segoe UI",
Helvetica, Arial, sans-serif, "Apple Color
Emoji", "Segoe UI Emoji",
"Segoe UI Symbol"; font-size: 14px;
margin-top: 0px !important;" class="">We use
16MB ESP32 modules with 4MB OTA partitions and
external SPI RAM. Currently our firmware image
is approximately 2.8MB in size. The flow of our
code is to make an http GET request for the
firmware image, read the headers (in particular
download size), call esp_ota_begin() with the
firmware download size, then download chunk by
chunk and call esp_ota_write for each chunk, and
a final esp_ota_end when done. Our networking
buffers are in INTERNAL RAM, not SPI RAM.</p>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">After our firmware
exceeded about 2MB in size, and we started to
use SPIRAM more in our application, we started
to see random crashes during OTA firmware
updates over wifi.</p>
<blockquote style="box-sizing: border-box; margin:
0px 0px 16px; padding: 0px 1em; color: rgb(106,
115, 125); border-left-width: 0.25em;
border-left-style: solid; border-left-color:
rgb(223, 226, 229); font-family: -apple-system,
BlinkMacSystemFont, "Segoe UI",
Helvetica, Arial, sans-serif, "Apple Color
Emoji", "Segoe UI Emoji",
"Segoe UI Symbol"; font-size: 14px;"
class="">
<p style="box-sizing: border-box; margin-top:
0px; margin-bottom: 16px;" class="">abort()
was called at PC 0x401b8e84 on core 0<br
style="box-sizing: border-box;" class="">
0x401b8e84: pm_on_beacon_rx at ??:?</p>
<p style="box-sizing: border-box; margin-top:
0px; margin-bottom: 16px;" class="">Backtrace:
0x40091e6b:0x3ffcc4a0 0x40091fc3:0x3ffcc4c0
0x401b8e84:0x3ffcc4e0 0x401b94ef:0x3ffcc520
0x401b9bd1:0x3ffcc550 0x40089e62:0x3ffcc5a0</p>
<div style="box-sizing: border-box; margin-top:
0px; margin-bottom: 0px;" class="">0x40091e6b:
invoke_abort at
/Users/mark/esp/esp-idf/components/esp32/panic.c:669<br
style="box-sizing: border-box;" class="">
0x40091fc3: abort at
/Users/mark/esp/esp-idf/components/esp32/panic.c:669<br
style="box-sizing: border-box;" class="">
0x401b8e84: pm_on_beacon_rx at ??:?<br
style="box-sizing: border-box;" class="">
0x401b94ef: ppRxProtoProc at ??:?<br
style="box-sizing: border-box;" class="">
0x401b9bd1: ppRxPkt at ??:?<br
style="box-sizing: border-box;" class="">
0x40089e62: ppTask at ??:?</div>
</blockquote>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">We narrowed down the
issue to using networking functions (reading
from the TCP/IP socket) after calling
esp_ota_begin with large image sizes (over
approximately 2MB). The Espressif code calls a
single esp_partition_erase_range() which
disables the SPI RAM cache and blocks any task
trying to access that. If the system networking
task is blocked for too long, then it seems to
get messed up in handling wifi beacons, and
panics when it finally gets some CPU time?</p>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">If we change
esp_ota_begin() to use a loop to erase the
partition in 256KB chunks (calling
esp_partition_erase_range() multiple times),
with a 1 tick vTaskDelay between each chunk, the
problem goes away and OTA works again. The
vTaskDelay is required as without it the panic
in pm_on_beacon_rx still happens.</p>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">I am not sure how to
address this issue. The core
spi_flash_erase_range, that this all depends on,
already has a loop to erase sector/block by
sector/block, and it uses
spi_flash_guard_start() and
spi_flash_guard_end() correctly for each
sector/block. It seems that the tasks are not
getting any/enough cpu time between calls to
spi_flash_guard_start()/spi_flash_guard_end() in
that sector/block erase loop.</p>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">Adding a delay there
seems kludgy. Perhaps a freertos call to allow
other blocked tasks to run? I tried adding a
taskYIELD() after the spi_flash_guard_end() in
spi_flash_erase_range, but that didn't solve the
problem (presumably the task that called
esp_ota_begin was higher priority than the
networking task?). A vTaskDelay(1) in the same
place works and solves the problem, but just
seems horribly kludgy.</p>
<p style="box-sizing: border-box; margin-top: 0px;
margin-bottom: 16px; caret-color: rgb(36, 41,
46); color: rgb(36, 41, 46); font-family:
-apple-system, BlinkMacSystemFont, "Segoe
UI", Helvetica, Arial, sans-serif,
"Apple Color Emoji", "Segoe UI
Emoji", "Segoe UI Symbol";
font-size: 14px;" class="">Overall, I'm just
very uncomfortable with how invasive
esp_ota_begin() is. With a 2.8MB image size, it
blocks all other tasks that touch SPI RAM for
about 17 seconds. That is not good. We should be
able to OTA flash without starving other tasks
in the system so badly.</p>
<div style="box-sizing: border-box; margin-top:
0px; caret-color: rgb(36, 41, 46); color:
rgb(36, 41, 46); font-family: -apple-system,
BlinkMacSystemFont, "Segoe UI",
Helvetica, Arial, sans-serif, "Apple Color
Emoji", "Segoe UI Emoji",
"Segoe UI Symbol"; font-size: 14px;
margin-bottom: 0px !important;" class="">There
is a separate bug in the pm_on_beacon_rx that is
triggered by this, but that seems more a symptom
that a solution.</div>
</div>
</blockquote>
<div class="">
<div class=""><br class="">
</div>
<div class="">As a temporary work-around, I’ve added
a one-liner to our Espressif esp-idf clone
repository, base function spi_flash_erase_range(),
to delay 1 tick after writing each sector. In my
testing it solves our problem and doesn’t seem to
have much of an impact on spi flash erase
performance. However, it is a horrible kludge and
we need a better solution - need to wait for
Espressif to give guidance. In the meantime,
please git pull our esp-idf repository so you get
this fix/workaround.</div>
<div class=""><br class="">
</div>
<div class="">I am concerned about how we can get
out of a loop with this. A module wakes up at 2am,
checks for an update, finds 3.1.008, downloads it,
crashes, reboots, and tries again, and again, and
again. With current code it looks like it will do
that five times each night.</div>
<div class=""><br class="">
</div>
<div class="">On my server I only have three cars
(other than my own) running 3.1.007. Maybe people
here? I guess I can write to those owners and ask
them to downgrade to 3.1.006. I can also make a
3.1.008 and put it there but don’t change the
ovms3.ver file so it is not automatically
downloaded. Any suggestions?</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 21 Jun 2018, at 9:36 AM, Mark
Webb-Johnson <<a
href="mailto:mark@webb-johnson.net" class=""
moz-do-not-send="true">mark@webb-johnson.net</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" class="">
<div style="word-wrap: break-word;
-webkit-nbsp-mode: space; line-break:
after-white-space;" class="">
<div class=""><br class="">
</div>
<div class="">On my devices I’m seeing a
problem with doing OTA updates when
running 3.1.007. I’m seeing:</div>
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px;
border: none; padding: 0px;" class="">
<div class="">
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">OVMS#
ota flash http</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">Current
running partition is: factory</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">Target
partition is: ota_0</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">Download
firmware from <a
href="http://api.openvehicles.com/firmware/ota/v3.1/edge/ovms3.bin"
class="" moz-do-not-send="true">api.openvehicles.com/firmware/ota/v3.1/edge/ovms3.bin</a>
to ota_0</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">Expected
file size is 2850336</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">Preparing
flash partition...</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">abort()
was called at PC 0x401b8e84 on
core 0</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x401b8e84:
pm_on_beacon_rx at ??:?</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><span style="font-size:
18px; font-family: "Andale
Mono";" class="">Backtrace:
0x40091e6b:0x3ffcc4a0
0x40091fc3:0x3ffcc4c0
0x401b8e84:0x3ffcc4e0
0x401b94ef:0x3ffcc520
0x401b9bd1:0x3ffcc550
0x40089e62:0x3ffcc5a0</span></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x40091e6b:
invoke_abort at
/Users/mark/esp/esp-idf/components/esp32/panic.c:669</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x40091fc3:
abort at
/Users/mark/esp/esp-idf/components/esp32/panic.c:669</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x401b8e84:
pm_on_beacon_rx at ??:?</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x401b94ef:
ppRxProtoProc at ??:?</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x401b9bd1:
ppRxPkt at ??:?</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class=""><br
class="">
</span></font></div>
<div class=""><font class=""
face="Andale Mono"><span
style="font-size: 18px;" class="">0x40089e62:
ppTask at ??:?</span></font></div>
</div>
<div class=""><font class="" face="Andale
Mono"><span style="font-size: 18px;"
class=""><br class="">
</span></font></div>
<div class=""><font class="" face="Andale
Mono"><span style="font-size: 18px;"
class="">(this is a firmware rebuild
so addresses may vary from stock
3.1.007, but the functions are the
same)</span></font></div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">The pm_on_beacon_rx function
is in libpp (as are ppRxProtoProc,
ppRxPkt, and ppTask.</div>
<div class=""><br class="">
</div>
<div class="">If I comment out the esp_ota_*
functions, it does the download and works
fine. But if esp_ota_begin is called, then
the crash occurs within http.BodyRead.
(although I think it is within the ppTask,
not in our code).</div>
<div class=""><br class="">
</div>
<div class="">It I reduce the size parameter
given to esp_ota_begin, I can get it to
work just fine. Around 1.5MB is fine, but
above that is starts to get flaky.
Espressif say that during flash writes the
cache must be disabled, so all access to
SPIRAM halted. Sometimes, I get to see a
message 'I (20642) wifi:
bcn_timout,ap_probe_send_start’ which
seems to confirm the wifi task is being
starved for cpu time during the
esp_ota_begin(). I guess that is
triggering a bug within the wifi stack
where it doesn’t handle it gracefully.</div>
<div class=""><br class="">
</div>
<div class="">I then tried hacking around
the esp_ota_begin() function. The
Espressif implementation simply calls
esp_partition_erase_range() to erase the
partition in one call. For us, that now
means blanking 2.8MB of flash and takes
about 15 seconds or so. I changed that to
call esp_partition_erase_range() multiple
times, in 256KB chunks, with a 10ms
vTaskDelay between each chunk. Problem
solved. OTA works again doing it that way.</div>
<div class=""><br class="">
</div>
<div class="">I’m not sure how to get this
fix applied. The issue is in Espressif’s
code, related to big partition sizes. I
guess they haven’t seen it because with
1MB OTA partitions the delay is only 3
seconds or so and probably not long enough
to trigger the issue. I’ll raise a bug
report with them, but suspect it will be a
while for them to address it. As a
temporary workaround for us, I think I’ll
bring esp_ota_begin() into our ovms_ota
code and customise it there specifically
for us. We can always switch back to the
normal esp_ota_begin when/if Espressif
come up with a permanent solution. I don’t
really want to change that core Espressif
code, without direction from Espressif as
to how to change it (been there, done
that, and wasted a lot of time doing it).</div>
<div class=""> </div>
<div class="">For safety, I’ve rolled back
edge to 3.1.006-15-g14b8eb6. That will
stop new deployments of this, while we
solve the problem. For existing
deployments in the field, I can only think
of two solutions: (a) switch ota to
factory, reboot, and flash to fixed code,
or (b) ota from sdcard, with wifi
unused/off.</div>
<div class=""><br class="">
</div>
<div class="">I should have the fix later
today.</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.</div>
<div class=""><br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
<br class="">
<fieldset class="mimeAttachmentHeader"></fieldset>
<br class="">
<pre class="" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
</blockquote>
<br class="">
</div>
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com" class=""
moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
</blockquote>
<br>
</body>
</html>