<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    I've pushed the new implementation based on the Espressif info and

    some more experimentation.<br>

    <br>

    I've also enabled the IRAM flag for the GPIO and SPI_noDMA handlers,

    as all handlers here already were in IRAM.<br>

    <br>

    I've been testing this over the weekend, had no issues and slightly

    reduced can1 frame drops.<br>

    <br>

    Mark, maybe you can do some additional high load CAN tests in your

    lab.<br>

    <br>

    Regards,<br>

    Michael<br>

    <br>

    <br>

    <div class="moz-cite-prefix">Am 01.11.19 um 12:35 schrieb Michael

      Balzer:<br>

    </div>

    <blockquote type="cite"

      cite="mid:fd9d5897-5237-7cb8-1e40-a40aee6ba740@expeedo.de">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      An unusual fast reply from Espressif. They basically declared it

      to be a feature…<br>

      <br>

      <a

href="https://github.com/espressif/esp-idf/issues/4276#issuecomment-548753085"

        moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/4276#issuecomment-548753085</a><br>

      <br>

      Regards,<br>

      Michael<br>

      <br>

      <br>

      <div class="moz-cite-prefix">Am 01.11.19 um 10:34 schrieb Michael

        Balzer:<br>

      </div>

      <blockquote type="cite"

        cite="mid:3fccacb5-bf45-a329-24eb-8d2cd62b6cda@expeedo.de">

        <meta http-equiv="Content-Type" content="text/html;

          charset=UTF-8">

        I've submitted an issue to Espressif on this:<br>

        <br>

        <a href="https://github.com/espressif/esp-idf/issues/4276"

          moz-do-not-send="true">https://github.com/espressif/esp-idf/issues/4276</a><br>

        <br>

        It would be nice if I could reproduce the effect using some

        standard hardware.<br>

        I've ordered some of those combined ESP32 + OLED units, but

        don't have any CAN transceivers I could hook up to them.<br>

        Maybe the info I've provided is sufficient for Espressif to

        investigate.<br>

        <br>

        Regards,<br>

        Michael<br>

        <br>

        <br>

        <div class="moz-cite-prefix">Am 31.10.19 um 22:24 schrieb

          Michael Balzer:<br>

        </div>

        <blockquote type="cite"

          cite="mid:fe537da7-12c1-e0ca-0ac5-7374415628b2@expeedo.de">

          <meta http-equiv="content-type" content="text/html;

            charset=UTF-8">

          Second finding & commit:<br>

          <br>

          TL;DR:<br>

          <ul>

            <li>ESP32 CAN controller is broken, delivers trashed &

              duplicated RX frames</li>

            <li>we need to optimize interrupt latency</li>

          </ul>

          <br>

          Users have been reporting occasional false BMS readings on the

          Twizy for more than a year.<br>

          The error always trashed the reading of voltage #10 and/or

          temperature #8, so I initially<br>

          thought that must be a variant of the SPIRAM/cache bug.<br>

          <br>

          But the SPIRAM fix didn't help, and the bug still occured with

          all BMS data in internal<br>

          RAM. The issue correlated with high load situations and the

          SIMCOM UART issue. On testing the<br>

          latest changes by Marko, my car module began showing that

          issue regularly, so I could take a<br>

          closer look. It turned out the wrong data came from the CAN

          bus. Checking with a CAN logger<br>

          didn't show any wrong data on the bus.<br>

          <br>

          So I scanned our CAN driver for potential issues, found &

          solved some (even a serious<br>

          race condition) but that didn't fix it. I then added debug

          dumps of all frames read, dug<br>

          into the SJA1000 spec sheet and found this:<br>

          <br>

          The SJA1000 RX FIFO has a capacity of 64 bytes. It's supposed

          to work as a ring buffer,<br>

          with frames wrapping around as necessary. If no space is left

          in the FIFO for a new frame<br>

          coming in, that frame should be discarded completely, not

          counted, just the overflow<br>

          indicator should be set & the according interrupt be

          generated.<br>

          <br>

          Reality differs. The frame causing the overflow is added to

          the FIFO partially. It's also<br>

          counted both in the RMC register and indicated by RBS as a

          valid frame when retrieving<br>

          the FIFO contents. The read buffer delivers the partial frame

          + some trashed bytes up<br>

          to the nominal frame length.<br>

          <br>

          In the BMS frame sequence it would normally trash bytes 7

          & 8 on the sixth frame of length<br>

          8 in a sequence. A standard frame needs a 3 byte header + the

          data bytes in the FIFO, so<br>

          6 frames would add up to 66 bytes => 64 + 2 trash. The

          first trash byte normally is "08",<br>

          the second "84" or "2a" or sometimes "ab", possibly some

          internal SJA1000 data.<br>

          <br>

          Better yet, if you rely on the indicators of the SJA1000 when

          reading the FIFO, the<br>

          SJA1000 will repeat the first frame within the FIFO after the

          invalid frame. Mostly<br>

          once or twice, and sometimes 7 times or more:<br>

          <br>

          <tt>inv_msg: framecnt=13, invindex=6</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 24 04 00 00 11 40 10 22 37 55 00 37 |

            4..?........$....@."7U.7</tt><tt><br>

          </tt><tt>inv_msg: 25 04 00 00 0a 1b 44 ff fe 4e 01 26 |

            4..?........%.....D..N.&</tt><tt><br>

          </tt><tt>inv_msg: 54 05 00 00 37 37 37 37 37 37 37 00 |

            4..?........T...7777777.</tt><tt><br>

          </tt><tt>inv_msg: 56 05 00 00 31 63 14 31 53 14 31 4a |

            4..?........V...1c.1S.1J</tt><tt><br>

          </tt><tt>inv_msg: 57 05 00 00 31 43 14 31 53 15 <font

              color="#ff0000"><b>08 2a</b></font> |

            4..?........W...1C.1S..*</tt><tt><br>

          </tt><tt>                                       ^^^^^ <font

              color="#ff0000">trashed bytes</font></tt><tt><br>

          </tt><tt>… followed by <font color="#ff0000">7 repetitions of

              the first frame</font>:</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><tt><br>

          </tt><tt>inv_msg: 55 01 00 00 07 98 50 54 54 20 00 6e |

            4..?........U.....PTT .n</tt><br>

          <br>

          If you had occasional strange false readings leading to

          strange effects, this<br>

          may be the explanation. Note: the SJA1000 FIFO bug may need

          specific circumstance<br>

          to occur in addition to the overflow, maybe the overflow

          happening on a specific<br>

          byte position in the FIFO -- I haven't tried to determine

          that.<br>

          <br>

          I've tried clearing the overflow ASAP, looping on the IR

          indicator and looping<br>

          on the RMC, the fault is always the same. The workaround I've

          now done is adding<br>

          up the message lengths read during a FIFO run and discard all

          frames exceeding<br>

          the 64 byte border. Not pretty, but functional.<br>

          <br>

          That keeps the trashed frames from being forwarded to the OVMS

          application level.<br>

          <br>

          As with the SIMCOM UART, these overflows wouldn't happen if

          the interrupt handler<br>

          would react fast enough. It seems we've got a general

          interrupt latency issue.<br>

          <br>

          And just as with the SIMCOM I saw a reduction of overflows

          when allocating the<br>

          CAN IRQ on core #0, another one with raising the IRQ level to

          3, so that's<br>

          also part of my commit.<br>

          <br>

          The esp-idf CAN driver apparently has the same issue even in

          the latest release,<br>

          I'll contact the author about this or post another issue to

          Espressif.<br>

          <br>

          So our interrupt allocation is now:<br>

          <br>

          <tt>Task           Run#  Lvl  ISR#  Usage                   

            Source</tt><tt><br>

          </tt><tt>----------------------------------------------------------------------------------------</tt><tt><br>

          </tt><tt>-              0     1    0     RTC

            Core                 ETS_RTC_CORE_INTR_SOURCE</tt><tt><br>

          </tt><tt>esp_timer      0     1    0     RTOS

            esp_timer           ETS_TIMER2_INTR_SOURCE</tt><tt><br>

          </tt><tt>ipc0           0     1    0     RTOS Scheduler

            Core0     ETS_FROM_CPU_INTR0_SOURCE</tt><tt><br>

          </tt><tt>ipc1           1     1    1     RTOS Scheduler

            Core1     ETS_FROM_CPU_INTR1_SOURCE</tt><tt><br>

          </tt><tt>main           0     1    0     RTOS

            Watchdog            ETS_TG0_WDT_LEVEL_INTR_SOURCE</tt><tt><br>

          </tt><tt>OVMS Events    1     1    1     Peripherals (GPIO

            ISR)   ETS_GPIO_INTR_SOURCE</tt><tt><br>

          </tt><tt>OVMS Events    1     1    1     MAX7317

            (spi_nodma)      ETS_SPI3_INTR_SOURCE</tt><tt><br>

          </tt><b><tt>esp32can init  0     3    0     ESP32

              CAN                ETS_CAN_INTR_SOURCE</tt></b><b><tt><br>

            </tt></b><b><tt>OVMS SIMCOM    0     2    0     SIMCOM

              UART              ETS_UART1_INTR_SOURCE</tt></b><tt><br>

          </tt><tt>OVMS Events    1     1    1     USB console

            (UART)       ETS_UART0_INTR_SOURCE</tt><tt><br>

          </tt><tt>OVMS Events    1     1    1     SD

            card                  ETS_SDIO_HOST_INTR_SOURCE</tt><tt><br>

          </tt><tt>----------------------------------------------------------------------------------------</tt><br>

          <br>

          I think we need to find the source of the interrupt latency

          and/or find some<br>

          better scheme to distribute the tasks & interrupts over

          the cores.<br>

          <br>

          I still don't know why core #1 already got an issue with just

          our 6 interrupts.<br>

          This may be related to the SPIRAM fix adding overhead, but

          I've seen the FIFO<br>

          overflows before applying the fix, and the CAN issue also has

          been present for<br>

          over a year.<br>

          <br>

          My working hypothesis is some task/driver disabling interrupts

          for too often<br>

          / too long, or non-IRAM ISRs getting delayed for too often /

          too long. And it<br>

          seems this is an issue especially on core #1.<br>

          <br>

          Please verify & comment.<br>

          <br>

          Regards,<br>

          Michael<br>

          <br>

          PS: a little side mystery: the netmanager task starts with

          priority 5, then<br>

          gets raised to 22 (probably a mutex priority inheritance), but

          never gets<br>

          lowered back to 5 afterwards. A FreeRTOS bug?<br>

          <br>

          <pre class="moz-signature" cols="160">-- 

Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal

Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

</pre>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <pre class="moz-quote-pre" wrap="">_______________________________________________

OvmsDev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>

<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>

</pre>

        </blockquote>

        <br>

        <pre class="moz-signature" cols="160">-- 

Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal

Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

</pre>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

OvmsDev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a>

<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>

</pre>

      </blockquote>

      <br>

      <pre class="moz-signature" cols="160">-- 

Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal

Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

OvmsDev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>

<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="160">-- 

Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal

Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

</pre>

  </body>

</html>