<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Perfect, that's what I was hoping for.  Verified I;m still on .006,
    so whenever .008 gets posted, I will let it update on its own.<br>
    <br>
    Good work!<br>
    <br>
    Greg<br>
    <br>
    <br>
    <div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:7CB6D28B-816F-4335-92CD-1616D7EFDCED@webb-johnson.net">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      Greg,
      <div class=""><br class="">
      </div>
      <div class="">The issue is upgrading from 3.1.007. Previous
        versions were fine. The 3.1.007 was only ever released to
        ‘edge’, and I caught the problem before releasing it to ‘main’.</div>
      <div class=""><br class="">
      </div>
      <div class="">Regards, Mark.<br class="">
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On 22 Jun 2018, at 1:57 AM, Greg D. <<a
                href="mailto:gregd2350@gmail.com" class=""
                moz-do-not-send="true">gregd2350@gmail.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <meta http-equiv="Content-Type" content="text/html;
                charset=UTF-8" class="">
              <div text="#000000" bgcolor="#FFFFFF" class=""> My module
                is still on .006, and has not rebooted since late May
                when .006 replaced .005.  I'm trying to keep it as stock
                and "not messed with like a developer would" as
                possible.<br class="">
                <br class="">
                Is .007 not linked to "main"?  Or, should I unplug it
                until .007 is replaced?<br class="">
                <br class="">
                Greg<br class="">
                <br class="">
                <br class="">
                <div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br
                    class="">
                </div>
                <blockquote type="cite"
                  cite="mid:9BD6861D-C423-414B-9567-E1FBEB5C29F0@webb-johnson.net"
                  class="">
                  <meta http-equiv="Content-Type" content="text/html;
                    charset=UTF-8" class="">
                  I’ve raised a github issue for Espressif to look at:
                  <div class=""><br class="">
                  </div>
                  <blockquote style="margin: 0 0 0 40px; border: none;
                    padding: 0px;" class="">
                    <div class=""><a
                        href="https://github.com/espressif/esp-idf/issues/2083"
                        class="" moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/2083</a></div>
                    <div class=""><br class="">
                    </div>
                    <div class="">
                      <p style="box-sizing: border-box; margin-bottom:
                        16px; caret-color: rgb(36, 41, 46); color:
                        rgb(36, 41, 46); font-family: -apple-system,
                        BlinkMacSystemFont, "Segoe UI",
                        Helvetica, Arial, sans-serif, "Apple Color
                        Emoji", "Segoe UI Emoji",
                        "Segoe UI Symbol"; font-size: 14px;
                        margin-top: 0px !important;" class="">We use
                        16MB ESP32 modules with 4MB OTA partitions and
                        external SPI RAM. Currently our firmware image
                        is approximately 2.8MB in size. The flow of our
                        code is to make an http GET request for the
                        firmware image, read the headers (in particular
                        download size), call esp_ota_begin() with the
                        firmware download size, then download chunk by
                        chunk and call esp_ota_write for each chunk, and
                        a final esp_ota_end when done. Our networking
                        buffers are in INTERNAL RAM, not SPI RAM.</p>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">After our firmware
                        exceeded about 2MB in size, and we started to
                        use SPIRAM more in our application, we started
                        to see random crashes during OTA firmware
                        updates over wifi.</p>
                      <blockquote style="box-sizing: border-box; margin:
                        0px 0px 16px; padding: 0px 1em; color: rgb(106,
                        115, 125); border-left-width: 0.25em;
                        border-left-style: solid; border-left-color:
                        rgb(223, 226, 229); font-family: -apple-system,
                        BlinkMacSystemFont, "Segoe UI",
                        Helvetica, Arial, sans-serif, "Apple Color
                        Emoji", "Segoe UI Emoji",
                        "Segoe UI Symbol"; font-size: 14px;"
                        class="">
                        <p style="box-sizing: border-box; margin-top:
                          0px; margin-bottom: 16px;" class="">abort()
                          was called at PC 0x401b8e84 on core 0<br
                            style="box-sizing: border-box;" class="">
                          0x401b8e84: pm_on_beacon_rx at ??:?</p>
                        <p style="box-sizing: border-box; margin-top:
                          0px; margin-bottom: 16px;" class="">Backtrace:
                          0x40091e6b:0x3ffcc4a0 0x40091fc3:0x3ffcc4c0
                          0x401b8e84:0x3ffcc4e0 0x401b94ef:0x3ffcc520
                          0x401b9bd1:0x3ffcc550 0x40089e62:0x3ffcc5a0</p>
                        <div style="box-sizing: border-box; margin-top:
                          0px; margin-bottom: 0px;" class="">0x40091e6b:
                          invoke_abort at
                          /Users/mark/esp/esp-idf/components/esp32/panic.c:669<br
                            style="box-sizing: border-box;" class="">
                          0x40091fc3: abort at
                          /Users/mark/esp/esp-idf/components/esp32/panic.c:669<br
                            style="box-sizing: border-box;" class="">
                          0x401b8e84: pm_on_beacon_rx at ??:?<br
                            style="box-sizing: border-box;" class="">
                          0x401b94ef: ppRxProtoProc at ??:?<br
                            style="box-sizing: border-box;" class="">
                          0x401b9bd1: ppRxPkt at ??:?<br
                            style="box-sizing: border-box;" class="">
                          0x40089e62: ppTask at ??:?</div>
                      </blockquote>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">We narrowed down the
                        issue to using networking functions (reading
                        from the TCP/IP socket) after calling
                        esp_ota_begin with large image sizes (over
                        approximately 2MB). The Espressif code calls a
                        single esp_partition_erase_range() which
                        disables the SPI RAM cache and blocks any task
                        trying to access that. If the system networking
                        task is blocked for too long, then it seems to
                        get messed up in handling wifi beacons, and
                        panics when it finally gets some CPU time?</p>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">If we change
                        esp_ota_begin() to use a loop to erase the
                        partition in 256KB chunks (calling
                        esp_partition_erase_range() multiple times),
                        with a 1 tick vTaskDelay between each chunk, the
                        problem goes away and OTA works again. The
                        vTaskDelay is required as without it the panic
                        in pm_on_beacon_rx still happens.</p>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">I am not sure how to
                        address this issue. The core
                        spi_flash_erase_range, that this all depends on,
                        already has a loop to erase sector/block by
                        sector/block, and it uses
                        spi_flash_guard_start() and
                        spi_flash_guard_end() correctly for each
                        sector/block. It seems that the tasks are not
                        getting any/enough cpu time between calls to
                        spi_flash_guard_start()/spi_flash_guard_end() in
                        that sector/block erase loop.</p>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">Adding a delay there
                        seems kludgy. Perhaps a freertos call to allow
                        other blocked tasks to run? I tried adding a
                        taskYIELD() after the spi_flash_guard_end() in
                        spi_flash_erase_range, but that didn't solve the
                        problem (presumably the task that called
                        esp_ota_begin was higher priority than the
                        networking task?). A vTaskDelay(1) in the same
                        place works and solves the problem, but just
                        seems horribly kludgy.</p>
                      <p style="box-sizing: border-box; margin-top: 0px;
                        margin-bottom: 16px; caret-color: rgb(36, 41,
                        46); color: rgb(36, 41, 46); font-family:
                        -apple-system, BlinkMacSystemFont, "Segoe
                        UI", Helvetica, Arial, sans-serif,
                        "Apple Color Emoji", "Segoe UI
                        Emoji", "Segoe UI Symbol";
                        font-size: 14px;" class="">Overall, I'm just
                        very uncomfortable with how invasive
                        esp_ota_begin() is. With a 2.8MB image size, it
                        blocks all other tasks that touch SPI RAM for
                        about 17 seconds. That is not good. We should be
                        able to OTA flash without starving other tasks
                        in the system so badly.</p>
                      <div style="box-sizing: border-box; margin-top:
                        0px; caret-color: rgb(36, 41, 46); color:
                        rgb(36, 41, 46); font-family: -apple-system,
                        BlinkMacSystemFont, "Segoe UI",
                        Helvetica, Arial, sans-serif, "Apple Color
                        Emoji", "Segoe UI Emoji",
                        "Segoe UI Symbol"; font-size: 14px;
                        margin-bottom: 0px !important;" class="">There
                        is a separate bug in the pm_on_beacon_rx that is
                        triggered by this, but that seems more a symptom
                        that a solution.</div>
                    </div>
                  </blockquote>
                  <div class="">
                    <div class=""><br class="">
                    </div>
                    <div class="">As a temporary work-around, I’ve added
                      a one-liner to our Espressif esp-idf clone
                      repository, base function spi_flash_erase_range(),
                      to delay 1 tick after writing each sector. In my
                      testing it solves our problem and doesn’t seem to
                      have much of an impact on spi flash erase
                      performance. However, it is a horrible kludge and
                      we need a better solution - need to wait for
                      Espressif to give guidance. In the meantime,
                      please git pull our esp-idf repository so you get
                      this fix/workaround.</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">I am concerned about how we can get
                      out of a loop with this. A module wakes up at 2am,
                      checks for an update, finds 3.1.008, downloads it,
                      crashes, reboots, and tries again, and again, and
                      again. With current code it looks like it will do
                      that five times each night.</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">On my server I only have three cars
                      (other than my own) running 3.1.007. Maybe people
                      here? I guess I can write to those owners and ask
                      them to downgrade to 3.1.006. I can also make a
                      3.1.008 and put it there but don’t change the
                      ovms3.ver file so it is not automatically
                      downloaded. Any suggestions?</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">Regards, Mark.</div>
                    <div class=""><br class="">
                      <blockquote type="cite" class="">
                        <div class="">On 21 Jun 2018, at 9:36 AM, Mark
                          Webb-Johnson <<a
                            href="mailto:mark@webb-johnson.net" class=""
                            moz-do-not-send="true">mark@webb-johnson.net</a>>
                          wrote:</div>
                        <br class="Apple-interchange-newline">
                        <div class="">
                          <meta http-equiv="Content-Type"
                            content="text/html; charset=UTF-8" class="">
                          <div style="word-wrap: break-word;
                            -webkit-nbsp-mode: space; line-break:
                            after-white-space;" class="">
                            <div class=""><br class="">
                            </div>
                            <div class="">On my devices I’m seeing a
                              problem with doing OTA updates when
                              running 3.1.007. I’m seeing:</div>
                            <div class=""><br class="">
                            </div>
                            <blockquote style="margin: 0 0 0 40px;
                              border: none; padding: 0px;" class="">
                              <div class="">
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">OVMS#
                                      ota flash http</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">Current
                                      running partition is: factory</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">Target
                                      partition is: ota_0</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">Download
                                      firmware from <a
                                        href="http://api.openvehicles.com/firmware/ota/v3.1/edge/ovms3.bin"
                                        class="" moz-do-not-send="true">api.openvehicles.com/firmware/ota/v3.1/edge/ovms3.bin</a>
                                      to ota_0</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">Expected
                                      file size is 2850336</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">Preparing
                                      flash partition...</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">abort()
                                      was called at PC 0x401b8e84 on
                                      core 0</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x401b8e84:
                                      pm_on_beacon_rx at ??:?</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><span style="font-size:
                                    18px; font-family: "Andale
                                    Mono";" class="">Backtrace:
                                    0x40091e6b:0x3ffcc4a0
                                    0x40091fc3:0x3ffcc4c0
                                    0x401b8e84:0x3ffcc4e0
                                    0x401b94ef:0x3ffcc520
                                    0x401b9bd1:0x3ffcc550
                                    0x40089e62:0x3ffcc5a0</span></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x40091e6b:
                                      invoke_abort at
                                      /Users/mark/esp/esp-idf/components/esp32/panic.c:669</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x40091fc3:
                                      abort at
                                      /Users/mark/esp/esp-idf/components/esp32/panic.c:669</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x401b8e84:
                                      pm_on_beacon_rx at ??:?</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x401b94ef:
                                      ppRxProtoProc at ??:?</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x401b9bd1:
                                      ppRxPkt at ??:?</span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class=""><br
                                        class="">
                                    </span></font></div>
                                <div class=""><font class=""
                                    face="Andale Mono"><span
                                      style="font-size: 18px;" class="">0x40089e62:
                                      ppTask at ??:?</span></font></div>
                              </div>
                              <div class=""><font class="" face="Andale
                                  Mono"><span style="font-size: 18px;"
                                    class=""><br class="">
                                  </span></font></div>
                              <div class=""><font class="" face="Andale
                                  Mono"><span style="font-size: 18px;"
                                    class="">(this is a firmware rebuild
                                    so addresses may vary from stock
                                    3.1.007, but the functions are the
                                    same)</span></font></div>
                            </blockquote>
                            <div class=""><br class="">
                            </div>
                            <div class="">The pm_on_beacon_rx function
                              is in libpp (as are ppRxProtoProc,
                              ppRxPkt, and ppTask.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">If I comment out the esp_ota_*
                              functions, it does the download and works
                              fine. But if esp_ota_begin is called, then
                              the crash occurs within http.BodyRead.
                              (although I think it is within the ppTask,
                              not in our code).</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">It I reduce the size parameter
                              given to esp_ota_begin, I can get it to
                              work just fine. Around 1.5MB is fine, but
                              above that is starts to get flaky.
                              Espressif say that during flash writes the
                              cache must be disabled, so all access to
                              SPIRAM halted. Sometimes, I get to see a
                              message 'I (20642) wifi:
                              bcn_timout,ap_probe_send_start’ which
                              seems to confirm the wifi task is being
                              starved for cpu time during the
                              esp_ota_begin(). I guess that is
                              triggering a bug within the wifi stack
                              where it doesn’t handle it gracefully.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I then tried hacking around
                              the esp_ota_begin() function. The
                              Espressif implementation simply calls
                              esp_partition_erase_range() to erase the
                              partition in one call. For us, that now
                              means blanking 2.8MB of flash and takes
                              about 15 seconds or so. I changed that to
                              call esp_partition_erase_range() multiple
                              times, in 256KB chunks, with a 10ms
                              vTaskDelay between each chunk. Problem
                              solved. OTA works again doing it that way.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I’m not sure how to get this
                              fix applied. The issue is in Espressif’s
                              code, related to big partition sizes. I
                              guess they haven’t seen it because with
                              1MB OTA partitions the delay is only 3
                              seconds or so and probably not long enough
                              to trigger the issue. I’ll raise a bug
                              report with them, but suspect it will be a
                              while for them to address it. As a
                              temporary workaround for us, I think I’ll
                              bring esp_ota_begin() into our ovms_ota
                              code and customise it there specifically
                              for us. We can always switch back to the
                              normal esp_ota_begin when/if Espressif
                              come up with a permanent solution. I don’t
                              really want to change that core Espressif
                              code, without direction from Espressif as
                              to how to change it (been there, done
                              that, and wasted a lot of time doing it).</div>
                            <div class=""> </div>
                            <div class="">For safety, I’ve rolled back
                              edge to 3.1.006-15-g14b8eb6. That will
                              stop new deployments of this, while we
                              solve the problem. For existing
                              deployments in the field, I can only think
                              of two solutions: (a) switch ota to
                              factory, reboot, and flash to fixed code,
                              or (b) ota from sdcard, with wifi
                              unused/off.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">I should have the fix later
                              today.</div>
                            <div class=""><br class="">
                            </div>
                            <div class="">Regards, Mark.</div>
                            <div class=""><br class="">
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br class="">
                  </div>
                  <br class="">
                  <fieldset class="mimeAttachmentHeader"></fieldset>
                  <br class="">
                  <pre class="" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
                </blockquote>
                <br class="">
              </div>
              _______________________________________________<br
                class="">
              OvmsDev mailing list<br class="">
              <a href="mailto:OvmsDev@lists.openvehicles.com" class=""
                moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
                class="">
              <a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
                class="">
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>