<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    The effect is also present with SPIRAM disabled (and 64K for
    cpuload_task to fit in internal RAM), and it's even more frequent
    when running in single core mode.<br>
    <br>
    I haven't reviewed all our timers yet, I'll try to test periodic
    timers first for the effect so we know if these need to be reviewed
    also.<br>
    <br>
    Regards,<br>
    Michael<br>
    <br>
    <br>
    <div class="moz-cite-prefix">Am 14.01.22 um 06:50 schrieb Mark
      Webb-Johnson:<br>
    </div>
    <blockquote type="cite"
      cite="mid:3E9B2190-87C8-4CAD-B09C-F7ADC0AF91D1@webb-johnson.net">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      Good to hear this has been confirmed in a test case.
      <div class=""><br class="">
      </div>
      <div class="">Do we need to review all our use of timers? Or have
        you already checked?</div>
      <div class=""><br class="">
      </div>
      <div class="">I wonder if it is reproducible single core and
        without SPI RAM? Whether this is another ESP32 hardware bug
        affecting something else...</div>
      <div class=""><br class="">
      </div>
      <div class="">Regards, Mark.<br class="">
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On 14 Jan 2022, at 12:03 AM, Michael Balzer
              <<a href="mailto:dexter@expeedo.de"
                class="moz-txt-link-freetext" moz-do-not-send="true">dexter@expeedo.de</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div class="content-isolator__container">
                <div class="protected-part">
                  <div class="protected-title">Signed PGP part</div>
                  <div class="protected-content">
                    <meta http-equiv="Content-Type" content="text/html;
                      charset=UTF-8" class="">
                    <div class=""> Everyone,<br class="">
                      <br class="">
                      I've managed to reproduce the effect on a standard
                      ESP32 development board with a simple test project
                      only involving timers, some CPU/memory load &
                      wifi networking.<br class="">
                      <br class="">
                      I've tested both the standard esp-idf release 3.3
                      and the latest esp-idf release 4.4 (using gcc 8.4)
                      for this, and the effect is still present.<br
                        class="">
                      <br class="">
                      → Bug report: <a class="moz-txt-link-freetext"
                        href="https://github.com/espressif/esp-idf/issues/8234"
                        moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234</a><br
                        class="">
                      <br class="">
                      Attached are my test projects if you'd like to
                      reproduce this or improve the test.<br class="">
                      <br class="">
                      I haven't tested periodic timer callbacks yet for
                      the effect. These are normally designed to run
                      periodically, but if the timing is important (e.g.
                      on some CAN transmission?), this could cause
                      erroneous behaviour as well.<br class="">
                      <br class="">
                      Regards,<br class="">
                      Michael<br class="">
                      <br class="">
                      <br class="">
                      <div class="moz-cite-prefix">Am 30.12.21 um 19:25
                        schrieb Michael Balzer:<br class="">
                      </div>
                      <blockquote type="cite"
                        cite="mid:b0b3d3cb-5b7a-6a1c-31e5-a719eaaddb45@expeedo.de"
                        class="">
                        <meta http-equiv="Content-Type"
                          content="text/html; charset=UTF-8" class="">
                        Followup to…<br class="">
                        <br class="">
                        <div class="moz-cite-prefix">Am 26.09.21 um
                          19:28 schrieb Michael Balzer:<br class="">
                        </div>
                        <blockquote type="cite"
                          cite="mid:d2058325-7bdd-73a9-46dc-c8e9686d5602@expeedo.de"
                          class="">If so, my conclusions so far would
                          be: <br class="">
                          <br class="">
                          a) we've got a real heap corruption issue that
                          sometimes gets triggered by the test. I've had
                          it twice around the same place, i.e. within
                          the scheduled event processing. I'll check my
                          code. <br class="">
                          <br class="">
                          Am 26.09.21 um 18:46 schrieb Craig Leres: <br
                            class="">
                          <blockquote type="cite" class="">Second crash
                            on the production module, CORRUPT HEAP after
                            ~350 minutes. <br class="">
                            <br class="">
                            OVMS# CORRUPT HEAP: Bad head at 0x3f8adc78.
                            Expected 0xabba1234 got 0x3f8adcc0 <br
                              class="">
                            abort() was called at PC 0x400844c3 on core
                            0 <br class="">
                          </blockquote>
                        </blockquote>
                        <br class="">
                        I've found the heap corruption source and a new
                        bug class:<br class="">
                        <br class="">
                        The corruption was caused by a duplicate free()
                        (here via delete), which was basically
                        impossible: it was the free() call for the event
                        message for a delayed event delivery. There is
                        exactly one producer (OvmsEvents::ScheduleEvent)
                        and exactly one consumer
                        (OvmsEvents::SignalScheduledEvent), which is
                        called exactly once -- when the single shot
                        timer expires.<br class="">
                        <br class="">
                        In theory. <u class="">In reality, the timer
                          callback occasionally gets executed twice</u>.
                        To exclude every possible race condition, I
                        enclosed both producer & consumer into a
                        semaphore lock. I then changed the code in order
                        to clear the timer payload as soon as it's read,
                        and added a test for a NULL payload -- and
                        voila, the timer callback now gets occasionally
                        called with a NULL payload, which is also
                        impossible as the allocation result is checked
                        in the producer.<br class="">
                        <br class="">
                        I've had no luck reproducing this in a reduced
                        test project, even with multiple auto reload
                        timers and distribution over both cores, but I
                        still see no other explanation than a bug in the
                        FreeRTOS timer service (or the Espressif
                        FreeRTOS multi core adaption). Ah, and yes, this
                        occurs on the ESP32/r3 as well.<br class="">
                        <br class="">
                        You should be able to reproduce the effect using
                        the same event test loop as for the Duktape
                        TypeError issue: <a
                          class="moz-txt-link-freetext"
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055"
                          moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055</a><br
                          class="">
                        <br class="">
                        My workaround prevents crashes and outputs a log
                        entry when the NULL payload is detected.<br
                          class="">
                        <br class="">
                        Example log excerpt:<br class="">
                        <br class="">
                        <font class="" face="monospace">script eval
                          'testcnt=0; PubSub.subscribe("usr.testev",
                          function(ev) { var
                          ms=Number(ev.substr(11))||10; if (++testcnt %
                          (3*1000/ms) == 0) print(ev + ": " + testcnt);
                          OvmsEvents.Raise("usr.testev."+ms, ms); })'<br
                            class="">
                          script eval 'testcnt=0;
                          OvmsEvents.Raise("usr.testev.10")'<br class="">
                          I (13493019) events: ScheduleEvent: creating
                          new timer<br class="">
                          <b class="">W (13495919) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13497029) ovms-duk-util: [eval:1:]
                          usr.testev.10: 300<br class="">
                          I (13501109) housekeeping: 2021-12-30 18:12:35
                          CET (RAM: 8b=64448-67004 32b=6472)<br class="">
                          --<br class="">
                          I (13521779) ovms-duk-util: [eval:1:]
                          usr.testev.10: 2100<br class="">
                          I (13525809) ovms-duk-util: [eval:1:]
                          usr.testev.10: 2400<br class="">
                          <b class="">W (13527579) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><b class=""><br
                              class="">
                          </b><b class="">W (13527629) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13529839) ovms-duk-util: [eval:1:]
                          usr.testev.10: 2700<br class="">
                          I (13533329) ovms-server-v2: Incoming Msg:
                          MP-0 AFA<br class="">
                          --<br class="">
                          I (13579149) ovms-duk-util: [eval:1:]
                          usr.testev.10: 6300<br class="">
                          I (13583319) ovms-duk-util: [eval:1:]
                          usr.testev.10: 6600<br class="">
                          <b class="">W (13584679) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13587439) ovms-duk-util: [eval:1:]
                          usr.testev.10: 6900<br class="">
                          I (13591589) ovms-duk-util: [eval:1:]
                          usr.testev.10: 7200<br class="">
                          --<br class="">
                          I (13714299) ovms-duk-util: [eval:1:]
                          usr.testev.10: 16200<br class="">
                          I (13718339) ovms-duk-util: [eval:1:]
                          usr.testev.10: 16500<br class="">
                          <b class="">W (13718719) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13722459) ovms-duk-util: [eval:1:]
                          usr.testev.10: 16800<br class="">
                          I (13726509) ovms-duk-util: [eval:1:]
                          usr.testev.10: 17100<br class="">
                          --<br class="">
                          I (13743149) ovms-duk-util: [eval:1:]
                          usr.testev.10: 18300<br class="">
                          I (13747129) ovms-duk-util: [eval:1:]
                          usr.testev.10: 18600<br class="">
                          <b class="">W (13748979) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13751299) ovms-duk-util: [eval:1:]
                          usr.testev.10: 18900<br class="">
                          I (13755349) ovms-duk-util: [eval:1:]
                          usr.testev.10: 19200<br class="">
                          --<br class="">
                          I (13784029) ovms-duk-util: [eval:1:]
                          usr.testev.10: 21300<br class="">
                          I (13788059) ovms-duk-util: [eval:1:]
                          usr.testev.10: 21600<br class="">
                          <b class="">W (13791409) events:
                            SignalScheduledEvent: duplicate callback
                            invocation detected</b><br class="">
                          I (13792179) ovms-duk-util: [eval:1:]
                          usr.testev.10: 21900<br class="">
                          I (13796239) ovms-duk-util: [eval:1:]
                          usr.testev.10: 22200<br class="">
                          …<br class="">
                        </font><br class="">
                        <br class="">
                        The bug frequency differs from boot to boot, but
                        as you can see can be very high. I've had runs
                        with ~ 1 occurrence every 300.000 callbacks, and
                        runs like the above with ~ 1 every 3.000
                        callbacks.<br class="">
                        <br class="">
                        If this is a common effect with timer callbacks,
                        that may cause some of the remaining issues.
                        It's possible this only happens with single shot
                        timers, haven't checked our periodic timers yet.<br
                          class="">
                        <br class="">
                        Any additional input on this is welcome.<br
                          class="">
                        <br class="">
                        Regards,<br class="">
                        Michael<br class="">
                        <br class="">
                        <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                        <br class="">
                        <fieldset class="moz-mime-attachment-header"></fieldset>
                        <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
                      </blockquote>
                      <br class="">
                      <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                    </div>
                    <span id="cid:8BFE1EF7-790B-4FBE-9A22-85504702B44F"><test-timer-dupecall.zip></span></div>
                </div>
                <br class="">
                <iframe class="content-isolator__isolated-content"
                  sandbox="allow-scripts" scrolling="auto"
                  style="border:none;display:block;overflow:auto;"
data-src="data:text/html;charset=UTF-8;base64,PGlmcmFtZS1jb250ZW50IGRhdGEtaWZyYW1lLWhlaWdodD0idHJ1ZSI+X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX188QlI+T3Ztc0RldiBtYWlsaW5nIGxpc3Q8QlI+T3Ztc0RldkBsaXN0cy5vcGVudmVoaWNsZXMuY29tPEJSPmh0dHA6Ly9saXN0cy5vcGVudmVoaWNsZXMuY29tL21haWxtYW4vbGlzdGluZm8vb3Ztc2RldjxCUj48L2lmcmFtZS1jb250ZW50Pg=="
                  width="200" height="10"></iframe></div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
  </body>
</html>