<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    The duplicate callbacks on the single shot timers actually were my
    fault.<br>
    <br>
    I missed reading the documentation thoroughly: xTimerChangePeriod()
    actually not only changes the period, it also starts the timer.<br>
    <br>
    So assigning the timer payload after changing it's period introduced
    a race condition. I've fixed that and no longer see duplicate
    callbacks in the test log.<br>
    <br>
    The case of additional callback executions for cyclic timers is
    still open.<br>
    <br>
    Regards,<br>
    Michael<br>
    <br>
    <br>
    <div class="moz-cite-prefix">Am 15.01.22 um 15:13 schrieb Michael
      Balzer:<br>
    </div>
    <blockquote type="cite"
      cite="mid:89bf2c09-c6d9-1348-fddf-2f659e68d00a@expeedo.de">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      I've added a tick count check to the core tickers and to the VW
      e-Up and Renault Twizy timer callbacks, which are all periodic.<br>
      <br>
      Code template: insert this at the top of your timer callback:<br>
      <br>
      <font face="monospace">  // Workaround for FreeRTOS duplicate
        timer callback bug<br>
          // (see <a class="moz-txt-link-freetext"
          href="https://github.com/espressif/esp-idf/issues/8234"
          moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234</a>)<br>
          static TickType_t last_tick = 0;<br>
          TickType_t tick = xTaskGetTickCount();<br>
          if (tick < last_tick + xTimerGetPeriod(timer) - 3) return;<br>
          last_tick = tick;<br>
      </font><br>
      This should also work for a single shot timer, unless it can be
      reused with a shorter period (that's the case with the event timer
      pool).<br>
      <br>
      The "-3" accomodates for delayed timer callback execution. Note
      that this dupe detection therefore depends on the period being
      higher than 3 ticks (30 ms), which is the case for all current
      periodic timers (fastest is the 100 ms = 10 ticks timer for the
      Twizy kickdown).<br>
      <br>
      Regards,<br>
      Michael<br>
      <br>
      <br>
      <div class="moz-cite-prefix">Am 15.01.22 um 11:08 schrieb Michael
        Balzer:<br>
      </div>
      <blockquote type="cite"
        cite="mid:e3fd59b9-0181-b542-0bfb-525a129a9bdf@expeedo.de">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        The effect also affects periodic timers, but with a very low
        frequency.<br>
        <br>
        Extended test & results:<br>
        <a class="moz-txt-link-freetext"
href="https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548"
          moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548</a><br>
        <br>
        The test run had 3 periodic callback dupes while counting 5912
        single shot callback dupes.<br>
        <br>
        Timer usages in the OVMS aside events are a) the housekeeping
        per second timer, from which the monotictime and all standard
        tickers are derived, and b) the update ticker for websocket
        connections.<br>
        <br>
        Vehicles using timers are: Nissan Leaf, Tesla Roadster, Renault
        Twizy & VW e-Up. I suggest the respective maintainers check
        their timer uses for issues that might result from a duplicate
        callback.<br>
        <br>
        The event timer already detects duplicate invocations and
        produces the log warning (see below) on this.<br>
        <br>
        Regarding the main ticker timer: a duplicate call would cause
        the monotonictime counter to get ouf of sync with the actual
        system time, but I don't think we would rely on that anywhere.
        More serious could be that the per second ticker can run twice
        on the same second in this case. But I also don't think we rely
        on a mininum execution interval as well -- after all, these are
        software timers and generally not guaranteed to run in exactly
        the interval they are defined to run. We use the monotonictime
        for some expiry checks, but none of these should be so low a
        single skipped second would matter.<br>
        <br>
        So I don't think the housekeeping ticker running occasionally
        twice would cause an issue. I do remember though I had an issue
        with the monotonictime behaving strangely once, but I don't
        recall the exact issue.<br>
        <br>
        The websocket timer is used to schedule data updates, metrics
        etc., for the websocket clients. It does not rely on a minimum
        interval as well, and the occasional queue overflow events we
        see there are more probably caused by a slow network / client.<br>
        <br>
        The vehicle uses of timers may be more critical, as they seem to
        involve protocol timing, I recommend to check these ASAP.<br>
        <br>
        A simple strategy to detect duplicate calls on periodic timers
        can be to keep & compare the FreeRTOS tick count
        (xTaskGetTickCount()) of the last execution in some static
        variable, if it's the same, skip the run.<br>
        <br>
        For single shot timers, the strategy I've used in the events
        framework seems to work: use the timer ID to check the call
        validity, i.e. define some timer ID property to mark an invalid
        call and set that property in the callback.<br>
        <br>
        Regards,<br>
        Michael<br>
        <br>
        <br>
        <div class="moz-cite-prefix">Am 14.01.22 um 09:43 schrieb
          Michael Balzer:<br>
        </div>
        <blockquote type="cite"
          cite="mid:a3e4de67-6439-ec5f-18fb-6b8584f88360@expeedo.de">
          <meta http-equiv="Content-Type" content="text/html;
            charset=UTF-8">
          The effect is also present with SPIRAM disabled (and 64K for
          cpuload_task to fit in internal RAM), and it's even more
          frequent when running in single core mode.<br>
          <br>
          I haven't reviewed all our timers yet, I'll try to test
          periodic timers first for the effect so we know if these need
          to be reviewed also.<br>
          <br>
          Regards,<br>
          Michael<br>
          <br>
          <br>
          <div class="moz-cite-prefix">Am 14.01.22 um 06:50 schrieb Mark
            Webb-Johnson:<br>
          </div>
          <blockquote type="cite"
            cite="mid:3E9B2190-87C8-4CAD-B09C-F7ADC0AF91D1@webb-johnson.net">
            <meta http-equiv="Content-Type" content="text/html;
              charset=UTF-8">
            Good to hear this has been confirmed in a test case.
            <div class=""><br class="">
            </div>
            <div class="">Do we need to review all our use of timers? Or
              have you already checked?</div>
            <div class=""><br class="">
            </div>
            <div class="">I wonder if it is reproducible single core and
              without SPI RAM? Whether this is another ESP32 hardware
              bug affecting something else...</div>
            <div class=""><br class="">
            </div>
            <div class="">Regards, Mark.<br class="">
              <div><br class="">
                <blockquote type="cite" class="">
                  <div class="">On 14 Jan 2022, at 12:03 AM, Michael
                    Balzer <<a href="mailto:dexter@expeedo.de"
                      class="moz-txt-link-freetext"
                      moz-do-not-send="true">dexter@expeedo.de</a>>
                    wrote:</div>
                  <br class="Apple-interchange-newline">
                  <div class="">
                    <div class="content-isolator__container">
                      <div class="protected-part">
                        <div class="protected-title">Signed PGP part</div>
                        <div class="protected-content">
                          <meta http-equiv="Content-Type"
                            content="text/html; charset=UTF-8" class="">
                          <div class=""> Everyone,<br class="">
                            <br class="">
                            I've managed to reproduce the effect on a
                            standard ESP32 development board with a
                            simple test project only involving timers,
                            some CPU/memory load & wifi networking.<br
                              class="">
                            <br class="">
                            I've tested both the standard esp-idf
                            release 3.3 and the latest esp-idf release
                            4.4 (using gcc 8.4) for this, and the effect
                            is still present.<br class="">
                            <br class="">
                            → Bug report: <a
                              class="moz-txt-link-freetext"
                              href="https://github.com/espressif/esp-idf/issues/8234"
                              moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234</a><br
                              class="">
                            <br class="">
                            Attached are my test projects if you'd like
                            to reproduce this or improve the test.<br
                              class="">
                            <br class="">
                            I haven't tested periodic timer callbacks
                            yet for the effect. These are normally
                            designed to run periodically, but if the
                            timing is important (e.g. on some CAN
                            transmission?), this could cause erroneous
                            behaviour as well.<br class="">
                            <br class="">
                            Regards,<br class="">
                            Michael<br class="">
                            <br class="">
                            <br class="">
                            <div class="moz-cite-prefix">Am 30.12.21 um
                              19:25 schrieb Michael Balzer:<br class="">
                            </div>
                            <blockquote type="cite"
                              cite="mid:b0b3d3cb-5b7a-6a1c-31e5-a719eaaddb45@expeedo.de"
                              class="">
                              <meta http-equiv="Content-Type"
                                content="text/html; charset=UTF-8"
                                class="">
                              Followup to…<br class="">
                              <br class="">
                              <div class="moz-cite-prefix">Am 26.09.21
                                um 19:28 schrieb Michael Balzer:<br
                                  class="">
                              </div>
                              <blockquote type="cite"
                                cite="mid:d2058325-7bdd-73a9-46dc-c8e9686d5602@expeedo.de"
                                class="">If so, my conclusions so far
                                would be: <br class="">
                                <br class="">
                                a) we've got a real heap corruption
                                issue that sometimes gets triggered by
                                the test. I've had it twice around the
                                same place, i.e. within the scheduled
                                event processing. I'll check my code. <br
                                  class="">
                                <br class="">
                                Am 26.09.21 um 18:46 schrieb Craig
                                Leres: <br class="">
                                <blockquote type="cite" class="">Second
                                  crash on the production module,
                                  CORRUPT HEAP after ~350 minutes. <br
                                    class="">
                                  <br class="">
                                  OVMS# CORRUPT HEAP: Bad head at
                                  0x3f8adc78. Expected 0xabba1234 got
                                  0x3f8adcc0 <br class="">
                                  abort() was called at PC 0x400844c3 on
                                  core 0 <br class="">
                                </blockquote>
                              </blockquote>
                              <br class="">
                              I've found the heap corruption source and
                              a new bug class:<br class="">
                              <br class="">
                              The corruption was caused by a duplicate
                              free() (here via delete), which was
                              basically impossible: it was the free()
                              call for the event message for a delayed
                              event delivery. There is exactly one
                              producer (OvmsEvents::ScheduleEvent) and
                              exactly one consumer
                              (OvmsEvents::SignalScheduledEvent), which
                              is called exactly once -- when the single
                              shot timer expires.<br class="">
                              <br class="">
                              In theory. <u class="">In reality, the
                                timer callback occasionally gets
                                executed twice</u>. To exclude every
                              possible race condition, I enclosed both
                              producer & consumer into a semaphore
                              lock. I then changed the code in order to
                              clear the timer payload as soon as it's
                              read, and added a test for a NULL payload
                              -- and voila, the timer callback now gets
                              occasionally called with a NULL payload,
                              which is also impossible as the allocation
                              result is checked in the producer.<br
                                class="">
                              <br class="">
                              I've had no luck reproducing this in a
                              reduced test project, even with multiple
                              auto reload timers and distribution over
                              both cores, but I still see no other
                              explanation than a bug in the FreeRTOS
                              timer service (or the Espressif FreeRTOS
                              multi core adaption). Ah, and yes, this
                              occurs on the ESP32/r3 as well.<br
                                class="">
                              <br class="">
                              You should be able to reproduce the effect
                              using the same event test loop as for the
                              Duktape TypeError issue: <a
                                class="moz-txt-link-freetext"
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055"
                                moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055</a><br
                                class="">
                              <br class="">
                              My workaround prevents crashes and outputs
                              a log entry when the NULL payload is
                              detected.<br class="">
                              <br class="">
                              Example log excerpt:<br class="">
                              <br class="">
                              <font class="" face="monospace">script
                                eval 'testcnt=0;
                                PubSub.subscribe("usr.testev",
                                function(ev) { var
                                ms=Number(ev.substr(11))||10; if
                                (++testcnt % (3*1000/ms) == 0) print(ev
                                + ": " + testcnt);
                                OvmsEvents.Raise("usr.testev."+ms, ms);
                                })'<br class="">
                                script eval 'testcnt=0;
                                OvmsEvents.Raise("usr.testev.10")'<br
                                  class="">
                                I (13493019) events: ScheduleEvent:
                                creating new timer<br class="">
                                <b class="">W (13495919) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13497029) ovms-duk-util: [eval:1:]
                                usr.testev.10: 300<br class="">
                                I (13501109) housekeeping: 2021-12-30
                                18:12:35 CET (RAM: 8b=64448-67004
                                32b=6472)<br class="">
                                --<br class="">
                                I (13521779) ovms-duk-util: [eval:1:]
                                usr.testev.10: 2100<br class="">
                                I (13525809) ovms-duk-util: [eval:1:]
                                usr.testev.10: 2400<br class="">
                                <b class="">W (13527579) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><b
                                  class=""><br class="">
                                </b><b class="">W (13527629) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13529839) ovms-duk-util: [eval:1:]
                                usr.testev.10: 2700<br class="">
                                I (13533329) ovms-server-v2: Incoming
                                Msg: MP-0 AFA<br class="">
                                --<br class="">
                                I (13579149) ovms-duk-util: [eval:1:]
                                usr.testev.10: 6300<br class="">
                                I (13583319) ovms-duk-util: [eval:1:]
                                usr.testev.10: 6600<br class="">
                                <b class="">W (13584679) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13587439) ovms-duk-util: [eval:1:]
                                usr.testev.10: 6900<br class="">
                                I (13591589) ovms-duk-util: [eval:1:]
                                usr.testev.10: 7200<br class="">
                                --<br class="">
                                I (13714299) ovms-duk-util: [eval:1:]
                                usr.testev.10: 16200<br class="">
                                I (13718339) ovms-duk-util: [eval:1:]
                                usr.testev.10: 16500<br class="">
                                <b class="">W (13718719) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13722459) ovms-duk-util: [eval:1:]
                                usr.testev.10: 16800<br class="">
                                I (13726509) ovms-duk-util: [eval:1:]
                                usr.testev.10: 17100<br class="">
                                --<br class="">
                                I (13743149) ovms-duk-util: [eval:1:]
                                usr.testev.10: 18300<br class="">
                                I (13747129) ovms-duk-util: [eval:1:]
                                usr.testev.10: 18600<br class="">
                                <b class="">W (13748979) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13751299) ovms-duk-util: [eval:1:]
                                usr.testev.10: 18900<br class="">
                                I (13755349) ovms-duk-util: [eval:1:]
                                usr.testev.10: 19200<br class="">
                                --<br class="">
                                I (13784029) ovms-duk-util: [eval:1:]
                                usr.testev.10: 21300<br class="">
                                I (13788059) ovms-duk-util: [eval:1:]
                                usr.testev.10: 21600<br class="">
                                <b class="">W (13791409) events:
                                  SignalScheduledEvent: duplicate
                                  callback invocation detected</b><br
                                  class="">
                                I (13792179) ovms-duk-util: [eval:1:]
                                usr.testev.10: 21900<br class="">
                                I (13796239) ovms-duk-util: [eval:1:]
                                usr.testev.10: 22200<br class="">
                                …<br class="">
                              </font><br class="">
                              <br class="">
                              The bug frequency differs from boot to
                              boot, but as you can see can be very high.
                              I've had runs with ~ 1 occurrence every
                              300.000 callbacks, and runs like the above
                              with ~ 1 every 3.000 callbacks.<br
                                class="">
                              <br class="">
                              If this is a common effect with timer
                              callbacks, that may cause some of the
                              remaining issues. It's possible this only
                              happens with single shot timers, haven't
                              checked our periodic timers yet.<br
                                class="">
                              <br class="">
                              Any additional input on this is welcome.<br
                                class="">
                              <br class="">
                              Regards,<br class="">
                              Michael<br class="">
                              <br class="">
                              <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                              <br class="">
                              <fieldset
                                class="moz-mime-attachment-header"></fieldset>
                              <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
                            </blockquote>
                            <br class="">
                            <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                          </div>
                          <span
                            id="cid:8BFE1EF7-790B-4FBE-9A22-85504702B44F"><test-timer-dupecall.zip></span></div>
                      </div>
                      <br class="">
                      <iframe class="content-isolator__isolated-content"
                        sandbox="allow-scripts" scrolling="auto"
                        style="border:none;display:block;overflow:auto;"
data-src="data:text/html;charset=UTF-8;base64,PGlmcmFtZS1jb250ZW50IGRhdGEtaWZyYW1lLWhlaWdodD0idHJ1ZSI+X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX188QlI+T3Ztc0RldiBtYWlsaW5nIGxpc3Q8QlI+T3Ztc0RldkBsaXN0cy5vcGVudmVoaWNsZXMuY29tPEJSPmh0dHA6Ly9saXN0cy5vcGVudmVoaWNsZXMuY29tL21haWxtYW4vbGlzdGluZm8vb3Ztc2RldjxCUj48L2lmcmFtZS1jb250ZW50Pg=="
                        width="200" height="10"></iframe></div>
                  </div>
                </blockquote>
              </div>
              <br class="">
            </div>
            <br>
            <fieldset class="moz-mime-attachment-header"></fieldset>
            <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
          </blockquote>
          <br>
          <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
          <br>
          <fieldset class="moz-mime-attachment-header"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
        </blockquote>
        <br>
        <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
        <br>
        <fieldset class="moz-mime-attachment-header"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
      </blockquote>
      <br>
      <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
  </body>
</html>