<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    I've added a tick count check to the core tickers and to the VW e-Up
    and Renault Twizy timer callbacks, which are all periodic.<br>
    <br>
    Code template: insert this at the top of your timer callback:<br>
    <br>
    <font face="monospace">  // Workaround for FreeRTOS duplicate timer
      callback bug<br>
        // (see <a class="moz-txt-link-freetext" href="https://github.com/espressif/esp-idf/issues/8234">https://github.com/espressif/esp-idf/issues/8234</a>)<br>
        static TickType_t last_tick = 0;<br>
        TickType_t tick = xTaskGetTickCount();<br>
        if (tick < last_tick + xTimerGetPeriod(timer) - 3) return;<br>
        last_tick = tick;<br>
    </font><br>
    This should also work for a single shot timer, unless it can be
    reused with a shorter period (that's the case with the event timer
    pool).<br>
    <br>
    The "-3" accomodates for delayed timer callback execution. Note that
    this dupe detection therefore depends on the period being higher
    than 3 ticks (30 ms), which is the case for all current periodic
    timers (fastest is the 100 ms = 10 ticks timer for the Twizy
    kickdown).<br>
    <br>
    Regards,<br>
    Michael<br>
    <br>
    <br>
    <div class="moz-cite-prefix">Am 15.01.22 um 11:08 schrieb Michael
      Balzer:<br>
    </div>
    <blockquote type="cite"
      cite="mid:e3fd59b9-0181-b542-0bfb-525a129a9bdf@expeedo.de">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      The effect also affects periodic timers, but with a very low
      frequency.<br>
      <br>
      Extended test & results:<br>
      <a class="moz-txt-link-freetext"
href="https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548"
        moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234#issuecomment-1013650548</a><br>
      <br>
      The test run had 3 periodic callback dupes while counting 5912
      single shot callback dupes.<br>
      <br>
      Timer usages in the OVMS aside events are a) the housekeeping per
      second timer, from which the monotictime and all standard tickers
      are derived, and b) the update ticker for websocket connections.<br>
      <br>
      Vehicles using timers are: Nissan Leaf, Tesla Roadster, Renault
      Twizy & VW e-Up. I suggest the respective maintainers check
      their timer uses for issues that might result from a duplicate
      callback.<br>
      <br>
      The event timer already detects duplicate invocations and produces
      the log warning (see below) on this.<br>
      <br>
      Regarding the main ticker timer: a duplicate call would cause the
      monotonictime counter to get ouf of sync with the actual system
      time, but I don't think we would rely on that anywhere. More
      serious could be that the per second ticker can run twice on the
      same second in this case. But I also don't think we rely on a
      mininum execution interval as well -- after all, these are
      software timers and generally not guaranteed to run in exactly the
      interval they are defined to run. We use the monotonictime for
      some expiry checks, but none of these should be so low a single
      skipped second would matter.<br>
      <br>
      So I don't think the housekeeping ticker running occasionally
      twice would cause an issue. I do remember though I had an issue
      with the monotonictime behaving strangely once, but I don't recall
      the exact issue.<br>
      <br>
      The websocket timer is used to schedule data updates, metrics
      etc., for the websocket clients. It does not rely on a minimum
      interval as well, and the occasional queue overflow events we see
      there are more probably caused by a slow network / client.<br>
      <br>
      The vehicle uses of timers may be more critical, as they seem to
      involve protocol timing, I recommend to check these ASAP.<br>
      <br>
      A simple strategy to detect duplicate calls on periodic timers can
      be to keep & compare the FreeRTOS tick count
      (xTaskGetTickCount()) of the last execution in some static
      variable, if it's the same, skip the run.<br>
      <br>
      For single shot timers, the strategy I've used in the events
      framework seems to work: use the timer ID to check the call
      validity, i.e. define some timer ID property to mark an invalid
      call and set that property in the callback.<br>
      <br>
      Regards,<br>
      Michael<br>
      <br>
      <br>
      <div class="moz-cite-prefix">Am 14.01.22 um 09:43 schrieb Michael
        Balzer:<br>
      </div>
      <blockquote type="cite"
        cite="mid:a3e4de67-6439-ec5f-18fb-6b8584f88360@expeedo.de">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        The effect is also present with SPIRAM disabled (and 64K for
        cpuload_task to fit in internal RAM), and it's even more
        frequent when running in single core mode.<br>
        <br>
        I haven't reviewed all our timers yet, I'll try to test periodic
        timers first for the effect so we know if these need to be
        reviewed also.<br>
        <br>
        Regards,<br>
        Michael<br>
        <br>
        <br>
        <div class="moz-cite-prefix">Am 14.01.22 um 06:50 schrieb Mark
          Webb-Johnson:<br>
        </div>
        <blockquote type="cite"
          cite="mid:3E9B2190-87C8-4CAD-B09C-F7ADC0AF91D1@webb-johnson.net">
          <meta http-equiv="Content-Type" content="text/html;
            charset=UTF-8">
          Good to hear this has been confirmed in a test case.
          <div class=""><br class="">
          </div>
          <div class="">Do we need to review all our use of timers? Or
            have you already checked?</div>
          <div class=""><br class="">
          </div>
          <div class="">I wonder if it is reproducible single core and
            without SPI RAM? Whether this is another ESP32 hardware bug
            affecting something else...</div>
          <div class=""><br class="">
          </div>
          <div class="">Regards, Mark.<br class="">
            <div><br class="">
              <blockquote type="cite" class="">
                <div class="">On 14 Jan 2022, at 12:03 AM, Michael
                  Balzer <<a href="mailto:dexter@expeedo.de"
                    class="moz-txt-link-freetext" moz-do-not-send="true">dexter@expeedo.de</a>>
                  wrote:</div>
                <br class="Apple-interchange-newline">
                <div class="">
                  <div class="content-isolator__container">
                    <div class="protected-part">
                      <div class="protected-title">Signed PGP part</div>
                      <div class="protected-content">
                        <meta http-equiv="Content-Type"
                          content="text/html; charset=UTF-8" class="">
                        <div class=""> Everyone,<br class="">
                          <br class="">
                          I've managed to reproduce the effect on a
                          standard ESP32 development board with a simple
                          test project only involving timers, some
                          CPU/memory load & wifi networking.<br
                            class="">
                          <br class="">
                          I've tested both the standard esp-idf release
                          3.3 and the latest esp-idf release 4.4 (using
                          gcc 8.4) for this, and the effect is still
                          present.<br class="">
                          <br class="">
                          → Bug report: <a
                            class="moz-txt-link-freetext"
                            href="https://github.com/espressif/esp-idf/issues/8234"
                            moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/8234</a><br
                            class="">
                          <br class="">
                          Attached are my test projects if you'd like to
                          reproduce this or improve the test.<br
                            class="">
                          <br class="">
                          I haven't tested periodic timer callbacks yet
                          for the effect. These are normally designed to
                          run periodically, but if the timing is
                          important (e.g. on some CAN transmission?),
                          this could cause erroneous behaviour as well.<br
                            class="">
                          <br class="">
                          Regards,<br class="">
                          Michael<br class="">
                          <br class="">
                          <br class="">
                          <div class="moz-cite-prefix">Am 30.12.21 um
                            19:25 schrieb Michael Balzer:<br class="">
                          </div>
                          <blockquote type="cite"
                            cite="mid:b0b3d3cb-5b7a-6a1c-31e5-a719eaaddb45@expeedo.de"
                            class="">
                            <meta http-equiv="Content-Type"
                              content="text/html; charset=UTF-8"
                              class="">
                            Followup to…<br class="">
                            <br class="">
                            <div class="moz-cite-prefix">Am 26.09.21 um
                              19:28 schrieb Michael Balzer:<br class="">
                            </div>
                            <blockquote type="cite"
                              cite="mid:d2058325-7bdd-73a9-46dc-c8e9686d5602@expeedo.de"
                              class="">If so, my conclusions so far
                              would be: <br class="">
                              <br class="">
                              a) we've got a real heap corruption issue
                              that sometimes gets triggered by the test.
                              I've had it twice around the same place,
                              i.e. within the scheduled event
                              processing. I'll check my code. <br
                                class="">
                              <br class="">
                              Am 26.09.21 um 18:46 schrieb Craig Leres:
                              <br class="">
                              <blockquote type="cite" class="">Second
                                crash on the production module, CORRUPT
                                HEAP after ~350 minutes. <br class="">
                                <br class="">
                                OVMS# CORRUPT HEAP: Bad head at
                                0x3f8adc78. Expected 0xabba1234 got
                                0x3f8adcc0 <br class="">
                                abort() was called at PC 0x400844c3 on
                                core 0 <br class="">
                              </blockquote>
                            </blockquote>
                            <br class="">
                            I've found the heap corruption source and a
                            new bug class:<br class="">
                            <br class="">
                            The corruption was caused by a duplicate
                            free() (here via delete), which was
                            basically impossible: it was the free() call
                            for the event message for a delayed event
                            delivery. There is exactly one producer
                            (OvmsEvents::ScheduleEvent) and exactly one
                            consumer (OvmsEvents::SignalScheduledEvent),
                            which is called exactly once -- when the
                            single shot timer expires.<br class="">
                            <br class="">
                            In theory. <u class="">In reality, the
                              timer callback occasionally gets executed
                              twice</u>. To exclude every possible race
                            condition, I enclosed both producer &
                            consumer into a semaphore lock. I then
                            changed the code in order to clear the timer
                            payload as soon as it's read, and added a
                            test for a NULL payload -- and voila, the
                            timer callback now gets occasionally called
                            with a NULL payload, which is also
                            impossible as the allocation result is
                            checked in the producer.<br class="">
                            <br class="">
                            I've had no luck reproducing this in a
                            reduced test project, even with multiple
                            auto reload timers and distribution over
                            both cores, but I still see no other
                            explanation than a bug in the FreeRTOS timer
                            service (or the Espressif FreeRTOS multi
                            core adaption). Ah, and yes, this occurs on
                            the ESP32/r3 as well.<br class="">
                            <br class="">
                            You should be able to reproduce the effect
                            using the same event test loop as for the
                            Duktape TypeError issue: <a
                              class="moz-txt-link-freetext"
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055"
                              moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-1002687055</a><br
                              class="">
                            <br class="">
                            My workaround prevents crashes and outputs a
                            log entry when the NULL payload is detected.<br
                              class="">
                            <br class="">
                            Example log excerpt:<br class="">
                            <br class="">
                            <font class="" face="monospace">script eval
                              'testcnt=0; PubSub.subscribe("usr.testev",
                              function(ev) { var
                              ms=Number(ev.substr(11))||10; if
                              (++testcnt % (3*1000/ms) == 0) print(ev +
                              ": " + testcnt);
                              OvmsEvents.Raise("usr.testev."+ms, ms);
                              })'<br class="">
                              script eval 'testcnt=0;
                              OvmsEvents.Raise("usr.testev.10")'<br
                                class="">
                              I (13493019) events: ScheduleEvent:
                              creating new timer<br class="">
                              <b class="">W (13495919) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13497029) ovms-duk-util: [eval:1:]
                              usr.testev.10: 300<br class="">
                              I (13501109) housekeeping: 2021-12-30
                              18:12:35 CET (RAM: 8b=64448-67004
                              32b=6472)<br class="">
                              --<br class="">
                              I (13521779) ovms-duk-util: [eval:1:]
                              usr.testev.10: 2100<br class="">
                              I (13525809) ovms-duk-util: [eval:1:]
                              usr.testev.10: 2400<br class="">
                              <b class="">W (13527579) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><b class=""><br
                                  class="">
                              </b><b class="">W (13527629) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13529839) ovms-duk-util: [eval:1:]
                              usr.testev.10: 2700<br class="">
                              I (13533329) ovms-server-v2: Incoming Msg:
                              MP-0 AFA<br class="">
                              --<br class="">
                              I (13579149) ovms-duk-util: [eval:1:]
                              usr.testev.10: 6300<br class="">
                              I (13583319) ovms-duk-util: [eval:1:]
                              usr.testev.10: 6600<br class="">
                              <b class="">W (13584679) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13587439) ovms-duk-util: [eval:1:]
                              usr.testev.10: 6900<br class="">
                              I (13591589) ovms-duk-util: [eval:1:]
                              usr.testev.10: 7200<br class="">
                              --<br class="">
                              I (13714299) ovms-duk-util: [eval:1:]
                              usr.testev.10: 16200<br class="">
                              I (13718339) ovms-duk-util: [eval:1:]
                              usr.testev.10: 16500<br class="">
                              <b class="">W (13718719) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13722459) ovms-duk-util: [eval:1:]
                              usr.testev.10: 16800<br class="">
                              I (13726509) ovms-duk-util: [eval:1:]
                              usr.testev.10: 17100<br class="">
                              --<br class="">
                              I (13743149) ovms-duk-util: [eval:1:]
                              usr.testev.10: 18300<br class="">
                              I (13747129) ovms-duk-util: [eval:1:]
                              usr.testev.10: 18600<br class="">
                              <b class="">W (13748979) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13751299) ovms-duk-util: [eval:1:]
                              usr.testev.10: 18900<br class="">
                              I (13755349) ovms-duk-util: [eval:1:]
                              usr.testev.10: 19200<br class="">
                              --<br class="">
                              I (13784029) ovms-duk-util: [eval:1:]
                              usr.testev.10: 21300<br class="">
                              I (13788059) ovms-duk-util: [eval:1:]
                              usr.testev.10: 21600<br class="">
                              <b class="">W (13791409) events:
                                SignalScheduledEvent: duplicate callback
                                invocation detected</b><br class="">
                              I (13792179) ovms-duk-util: [eval:1:]
                              usr.testev.10: 21900<br class="">
                              I (13796239) ovms-duk-util: [eval:1:]
                              usr.testev.10: 22200<br class="">
                              …<br class="">
                            </font><br class="">
                            <br class="">
                            The bug frequency differs from boot to boot,
                            but as you can see can be very high. I've
                            had runs with ~ 1 occurrence every 300.000
                            callbacks, and runs like the above with ~ 1
                            every 3.000 callbacks.<br class="">
                            <br class="">
                            If this is a common effect with timer
                            callbacks, that may cause some of the
                            remaining issues. It's possible this only
                            happens with single shot timers, haven't
                            checked our periodic timers yet.<br class="">
                            <br class="">
                            Any additional input on this is welcome.<br
                              class="">
                            <br class="">
                            Regards,<br class="">
                            Michael<br class="">
                            <br class="">
                            <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                            <br class="">
                            <fieldset class="moz-mime-attachment-header"></fieldset>
                            <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
                          </blockquote>
                          <br class="">
                          <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
                        </div>
                        <span
                          id="cid:8BFE1EF7-790B-4FBE-9A22-85504702B44F"><test-timer-dupecall.zip></span></div>
                    </div>
                    <br class="">
                    <iframe class="content-isolator__isolated-content"
                      sandbox="allow-scripts" scrolling="auto"
                      style="border:none;display:block;overflow:auto;"
data-src="data:text/html;charset=UTF-8;base64,PGlmcmFtZS1jb250ZW50IGRhdGEtaWZyYW1lLWhlaWdodD0idHJ1ZSI+X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX188QlI+T3Ztc0RldiBtYWlsaW5nIGxpc3Q8QlI+T3Ztc0RldkBsaXN0cy5vcGVudmVoaWNsZXMuY29tPEJSPmh0dHA6Ly9saXN0cy5vcGVudmVoaWNsZXMuY29tL21haWxtYW4vbGlzdGluZm8vb3Ztc2RldjxCUj48L2lmcmFtZS1jb250ZW50Pg=="
                      width="200" height="10"></iframe></div>
                </div>
              </blockquote>
            </div>
            <br class="">
          </div>
          <br>
          <fieldset class="moz-mime-attachment-header"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
        </blockquote>
        <br>
        <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
        <br>
        <fieldset class="moz-mime-attachment-header"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
      </blockquote>
      <br>
      <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26</pre>
  </body>
</html>