<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    The effect also affects periodic timers, but with a very low
    frequency.<br>
    <br>
    Extended test & results:<br>
<a class="moz-txt-link-freetext" href="https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548">https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548</a><br>
    <br>
    The test run had 3 periodic callback dupes while counting 5912
    single shot callback dupes.<br>
    <br>
    Timer usages in the OVMS aside events are a) the housekeeping per
    second timer, from which the monotictime and all standard tickers
    are derived, and b) the update ticker for websocket connections.<br>
    <br>
    Vehicles using timers are: Nissan Leaf, Tesla Roadster, Renault
    Twizy & VW e-Up. I suggest the respective maintainers check
    their timer uses for issues that might result from a duplicate
    callback.<br>
    <br>
    The event timer already detects duplicate invocations and produces
    the log warning (see below) on this.<br>
    <br>
    Regarding the main ticker timer: a duplicate call would cause the
    monotonictime counter to get ouf of sync with the actual system
    time, but I don't think we would rely on that anywhere. More serious
    could be that the per second ticker can run twice on the same second
    in this case. But I also don't think we rely on a mininum execution
    interval as well -- after all, these are software timers and
    generally not guaranteed to run in exactly the interval they are
    defined to run. We use the monotonictime for some expiry checks, but
    none of these should be so low a single skipped second would matter.<br>
    <br>
    So I don't think the housekeeping ticker running occasionally twice
    would cause an issue. I do remember though I had an issue with the
    monotonictime behaving strangely once, but I don't recall the exact
    issue.<br>
    <br>
    The websocket timer is used to schedule data updates, metrics etc.,
    for the websocket clients. It does not rely on a minimum interval as
    well, and the occasional queue overflow events we see there are more
    probably caused by a slow network / client.<br>
    <br>
    The vehicle uses of timers may be more critical, as they seem to
    involve protocol timing, I recommend to check these ASAP.<br>
    <br>
    A simple strategy to detect duplicate calls on periodic timers can
    be to keep & compare the FreeRTOS tick count
    (xTaskGetTickCount()) of the last execution in some static variable,
    if it's the same, skip the run.<br>
    <br>
    For single shot timers, the strategy I've used in the events
    framework seems to work: use the timer ID to check the call
    validity, i.e. define some timer ID property to mark an invalid call
    and set that property in the callback.<br>
    <br>
    Regards,<br>
    Michael<br>
    <br>
    <br>
    <div class="moz-cite-prefix">Am 14.01.22 um 09:43 schrieb Michael
      Balzer:<br>
    </div>
    <blockquote type="cite"
      cite="mid:a3e4de67-6439-ec5f-18fb-6b8584f88360@expeedo.de">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      The effect is also present with SPIRAM disabled (and 64K for
      cpuload_task to fit in internal RAM), and it's even more frequent
      when running in single core mode.<br>
      <br>
      I haven't reviewed all our timers yet, I'll try to test periodic
      timers first for the effect so we know if these need to be
      reviewed also.<br>
      <br>
      Regards,<br>
      Michael<br>
      <br>
      <br>
      <div class="moz-cite-prefix">Am 14.01.22 um 06:50 schrieb Mark
        Webb-Johnson:<br>
      </div>
      <blockquote type="cite"
        cite="mid:3E9B2190-87C8-4CAD-B09C-F7ADC0AF91D1@webb-johnson.net">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        Good to hear this has been confirmed in a test case.
        <div class=""><br class="">
        </div>
        <div class="">Do we need to review all our use of timers? Or
          have you already checked?</div>
        <div class=""><br class="">
        </div>
        <div class="">I wonder if it is reproducible single core and
          without SPI RAM? Whether this is another ESP32 hardware bug
          affecting something else...</div>
        <div class=""><br class="">
        </div>
        <div class="">Regards, Mark.<br class="">
          <div><br class="">
            <blockquote type="cite" class="">
              <div class="">On 14 Jan 2022, at 12:03 AM, Michael Balzer
                <<a href="mailto:dexter@expeedo.de"
                  class="moz-txt-link-freetext" moz-do-not-send="true">dexter@expeedo.de</a>>
                wrote:</div>
              <br class="Apple-interchange-newline">
              <div class="">
                <div class="content-isolator__container">
                  <div class="protected-part">
                    <div class="protected-title">Signed PGP part</div>
                    <div class="protected-content">
                      <meta http-equiv="Content-Type"
                        content="text/html; charset=UTF-8" class="">
                      <div class=""> Everyone,<br class="">
                        <br class="">
                        I've managed to reproduce the effect on a
                        standard ESP32 development board with a simple
                        test project only involving timers, some
                        CPU/memory load & wifi networking.<br
                          class="">
                        <br class="">
                        I've tested both the standard esp-idf release
                        3.3 and the latest esp-idf release 4.4 (using
                        gcc 8.4) for this, and the effect is still
                        present.<br class="">
                        <br class="">
                        → Bug report: <a class="moz-txt-link-freetext"
href="https://github.com/espressif/esp-idf/issues/8234"
                          moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234</a><br
                          class="">
                        <br class="">
                        Attached are my test projects if you'd like to
                        reproduce this or improve the test.<br class="">
                        <br class="">
                        I haven't tested periodic timer callbacks yet
                        for the effect. These are normally designed to
                        run periodically, but if the timing is important
                        (e.g. on some CAN transmission?), this could
                        cause erroneous behaviour as well.<br class="">
                        <br class="">
                        Regards,<br class="">
                        Michael<br class="">
                        <br class="">
                        <br class="">
                        <div class="moz-cite-prefix">Am 30.12.21 um
                          19:25 schrieb Michael Balzer:<br class="">
                        </div>
                        <blockquote type="cite"
                          cite="mid:b0b3d3cb-5b7a-6a1c-31e5-a719eaaddb45@expeedo.de"
                          class="">
                          <meta http-equiv="Content-Type"
                            content="text/html; charset=UTF-8" class="">
                          Followup to…<br class="">
                          <br class="">
                          <div class="moz-cite-prefix">Am 26.09.21 um
                            19:28 schrieb Michael Balzer:<br class="">
                          </div>
                          <blockquote type="cite"
                            cite="mid:d2058325-7bdd-73a9-46dc-c8e9686d5602@expeedo.de"
                            class="">If so, my conclusions so far would
                            be: <br class="">
                            <br class="">
                            a) we've got a real heap corruption issue
                            that sometimes gets triggered by the test.
                            I've had it twice around the same place,
                            i.e. within the scheduled event processing.
                            I'll check my code. <br class="">
                            <br class="">
                            Am 26.09.21 um 18:46 schrieb Craig Leres: <br
                              class="">
                            <blockquote type="cite" class="">Second
                              crash on the production module, CORRUPT
                              HEAP after ~350 minutes. <br class="">
                              <br class="">
                              OVMS# CORRUPT HEAP: Bad head at
                              0x3f8adc78. Expected 0xabba1234 got
                              0x3f8adcc0 <br class="">
                              abort() was called at PC 0x400844c3 on
                              core 0 <br class="">
                            </blockquote>
                          </blockquote>
                          <br class="">
                          I've found the heap corruption source and a
                          new bug class:<br class="">
                          <br class="">
                          The corruption was caused by a duplicate
                          free() (here via delete), which was basically
                          impossible: it was the free() call for the
                          event message for a delayed event delivery.
                          There is exactly one producer
                          (OvmsEvents::ScheduleEvent) and exactly one
                          consumer (OvmsEvents::SignalScheduledEvent),
                          which is called exactly once -- when the
                          single shot timer expires.<br class="">
                          <br class="">
                          In theory. <u class="">In reality, the timer
                            callback occasionally gets executed twice</u>.
                          To exclude every possible race condition, I
                          enclosed both producer & consumer into a
                          semaphore lock. I then changed the code in
                          order to clear the timer payload as soon as
                          it's read, and added a test for a NULL payload
                          -- and voila, the timer callback now gets
                          occasionally called with a NULL payload, which
                          is also impossible as the allocation result is
                          checked in the producer.<br class="">
                          <br class="">
                          I've had no luck reproducing this in a reduced
                          test project, even with multiple auto reload
                          timers and distribution over both cores, but I
                          still see no other explanation than a bug in
                          the FreeRTOS timer service (or the Espressif
                          FreeRTOS multi core adaption). Ah, and yes,
                          this occurs on the ESP32/r3 as well.<br
                            class="">
                          <br class="">
                          You should be able to reproduce the effect
                          using the same event test loop as for the
                          Duktape TypeError issue: <a
                            class="moz-txt-link-freetext"
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055"
                            moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055</a><br
                            class="">
                          <br class="">
                          My workaround prevents crashes and outputs a
                          log entry when the NULL payload is detected.<br
                            class="">
                          <br class="">
                          Example log excerpt:<br class="">
                          <br class="">
                          <font class="" face="monospace">script eval
                            'testcnt=0; PubSub.subscribe("usr.testev",
                            function(ev) { var
                            ms=Number(ev.substr(11))||10; if (++testcnt
                            % (3*1000/ms) == 0) print(ev + ": " +
                            testcnt); OvmsEvents.Raise("usr.testev."+ms,
                            ms); })'<br class="">
                            script eval 'testcnt=0;
                            OvmsEvents.Raise("usr.testev.10")'<br
                              class="">
                            I (13493019) events: ScheduleEvent: creating
                            new timer<br class="">
                            <b class="">W (13495919) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13497029) ovms-duk-util: [eval:1:]
                            usr.testev.10: 300<br class="">
                            I (13501109) housekeeping: 2021-12-30
                            18:12:35 CET (RAM: 8b=64448-67004 32b=6472)<br
                              class="">
                            --<br class="">
                            I (13521779) ovms-duk-util: [eval:1:]
                            usr.testev.10: 2100<br class="">
                            I (13525809) ovms-duk-util: [eval:1:]
                            usr.testev.10: 2400<br class="">
                            <b class="">W (13527579) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><b class=""><br
                                class="">
                            </b><b class="">W (13527629) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13529839) ovms-duk-util: [eval:1:]
                            usr.testev.10: 2700<br class="">
                            I (13533329) ovms-server-v2: Incoming Msg:
                            MP-0 AFA<br class="">
                            --<br class="">
                            I (13579149) ovms-duk-util: [eval:1:]
                            usr.testev.10: 6300<br class="">
                            I (13583319) ovms-duk-util: [eval:1:]
                            usr.testev.10: 6600<br class="">
                            <b class="">W (13584679) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13587439) ovms-duk-util: [eval:1:]
                            usr.testev.10: 6900<br class="">
                            I (13591589) ovms-duk-util: [eval:1:]
                            usr.testev.10: 7200<br class="">
                            --<br class="">
                            I (13714299) ovms-duk-util: [eval:1:]
                            usr.testev.10: 16200<br class="">
                            I (13718339) ovms-duk-util: [eval:1:]
                            usr.testev.10: 16500<br class="">
                            <b class="">W (13718719) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13722459) ovms-duk-util: [eval:1:]
                            usr.testev.10: 16800<br class="">
                            I (13726509) ovms-duk-util: [eval:1:]
                            usr.testev.10: 17100<br class="">
                            --<br class="">
                            I (13743149) ovms-duk-util: [eval:1:]
                            usr.testev.10: 18300<br class="">
                            I (13747129) ovms-duk-util: [eval:1:]
                            usr.testev.10: 18600<br class="">
                            <b class="">W (13748979) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13751299) ovms-duk-util: [eval:1:]
                            usr.testev.10: 18900<br class="">
                            I (13755349) ovms-duk-util: [eval:1:]
                            usr.testev.10: 19200<br class="">
                            --<br class="">
                            I (13784029) ovms-duk-util: [eval:1:]
                            usr.testev.10: 21300<br class="">
                            I (13788059) ovms-duk-util: [eval:1:]
                            usr.testev.10: 21600<br class="">
                            <b class="">W (13791409) events:
                              SignalScheduledEvent: duplicate callback
                              invocation detected</b><br class="">
                            I (13792179) ovms-duk-util: [eval:1:]
                            usr.testev.10: 21900<br class="">
                            I (13796239) ovms-duk-util: [eval:1:]
                            usr.testev.10: 22200<br class="">
                            …<br class="">
                          </font><br class="">
                          <br class="">
                          The bug frequency differs from boot to boot,
                          but as you can see can be very high. I've had
                          runs with ~ 1 occurrence every 300.000
                          callbacks, and runs like the above with ~ 1
                          every 3.000 callbacks.<br class="">
                          <br class="">
                          If this is a common effect with timer
                          callbacks, that may cause some of the
                          remaining issues. It's possible this only
                          happens with single shot timers, haven't
                          checked our periodic timers yet.<br class="">
                          <br class="">
                          Any additional input on this is welcome.<br
                            class="">
                          <br class="">
                          Regards,<br class="">
                          Michael<br class="">
                          <br class="">
                          <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                          <br class="">
                          <fieldset class="moz-mime-attachment-header"></fieldset>
                          <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
                        </blockquote>
                        <br class="">
                        <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                      </div>
                      <span
                        id="cid:8BFE1EF7-790B-4FBE-9A22-85504702B44F"><test-timer-dupecall.zip></span></div>
                  </div>
                  <br class="">
                  <iframe class="content-isolator__isolated-content"
                    sandbox="allow-scripts" scrolling="auto"
                    style="border:none;display:block;overflow:auto;"
data-src="data:text/html;charset=UTF-8;base64,PGlmcmFtZS1jb250ZW50IGRhdGEtaWZyYW1lLWhlaWdodD0idHJ1ZSI+X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX188QlI+T3Ztc0RldiBtYWlsaW5nIGxpc3Q8QlI+T3Ztc0RldkBsaXN0cy5vcGVudmVoaWNsZXMuY29tPEJSPmh0dHA6Ly9saXN0cy5vcGVudmVoaWNsZXMuY29tL21haWxtYW4vbGlzdGluZm8vb3Ztc2RldjxCUj48L2lmcmFtZS1jb250ZW50Pg=="
                    width="200" height="10"></iframe></div>
              </div>
            </blockquote>
          </div>
          <br class="">
        </div>
        <br>
        <fieldset class="moz-mime-attachment-header"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
      </blockquote>
      <br>
      <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
  </body>
</html>