<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Mark,<br>

    <br>

    Sorry for the radio silence on this end.  I have been poking at the

    code, but have yet to find out what is going on.  But I have some

    data.<br>

    <br>

    When the lockup occurs, it's not for lack of getting any

    interrupts.  Following my recipe for recreating the hang, the first

    callback gives us a status (from register 0x2c) of 0x80, which is a

    receive error, but none of the error bits (register 0x2d) are set. 

    As time goes on, the status sometimes changes to 0xA0 for an

    interrupt or two, then back to 0x80.  Meanwhile the error bits go

    from 0 to 3 to 0xB, and stick there over a period of maybe half to a

    dozen interrupts.  This is all within a blink - big burst of

    interrupts (hundreds) that get processed, then all goes quiet with

    no further interrupt activity.  I suspect at that point the chip has

    gone into "passive mode", and basically curled up in a little ball,

    whimpering softly.  The length of the burst seems to depend on where

    in the HUD's startup cycle I enable the obd2ecu task, but I don't

    have a recipe for how to control it.<br>

    <br>

    Significantly through all this, there are no (zero) received

    messages.  Also, that the interrupts are essentially back-to-back,

    means that we are not effectively dealing with the underlying cause

    of the interrupt.<br>

    <br>

    What seems to be happening is that there are messages being received

    at the CAN interface at the same time as the chip is being

    configured.  That the issue is 100% reproducible when traffic is

    received and 0% if I stop the HUD briefly while enabling the obd2ecu

    task, tells me that there is a window in the configuration code

    which is wider than the time between HUD frames (about 100ms). 

    There are a pair of 50ms delays in the CAN Start routine (neither of

    which appear to be documented in the chip data sheet), but removing

    them simply broke the CAN system.<br>

    <br>

    Simply turning on the HUD by toggling ext12v (so it's clean, with no

    switch bounce at the HUD) causes a pair of status 0x80 / error 00

    interrupts, some short time apart, before normal poll / response

    operation begins.  I presume it's just a bit of noise on the CAN bus

    as the HUD turns on its chips, but conclude that the 0x80 / 00

    condition (which is also seen at the start of the hang scenario) is

    not specifically fatal.  What seems to be at issue with the hang is

    that we are lacking a way to clear the receiver after getting it set

    up  This chip is odd, in that one clears the error flags in the

    interrupt status register, but not anything more internal to address

    what caused the interrupt in the first place.  <br>

    <br>

    At least, I couldn't find such a command.  Still looking...  I don't

    suppose there is a receive buffer between the CAN bus and the MC2515

    that I could temporarily turn off to silence the bus while

    configuring the chip, is there?<br>

    <br>

    Greg<br>

    <br>

    <br>

    <div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:3AD114DC-589D-45CD-98CC-28FBF6CB153F@webb-johnson.net">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      The fault must be in mcp2515::RxCallback.

      <div class=""><br class="">

      </div>

      <div class="">Good news is that the way we marshal these

        interrupts, that callback is in a normal task and can be logged

        / debugged using normal tools. It is not running in interrupt

        context.</div>

      <div class=""><br class="">

      </div>

      <div class="">The usual cause of these sorts of things is the

        interrupt not being raised, so mcp2515::RxCallback is not

        called, and locks forever. Perhaps you can try adding a command

        to inject a spoofed interrupt message (see MCP2515_isr and just

        use xQueueSend not xQueueSendFromISR), and see if that command

        can ‘free’ a locked up CAN bus. If so, that is the cause.</div>

      <div class=""><br class="">

      </div>

      <div class="">Regards, Mark.<br class="">

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">On 24 May 2018, at 1:26 PM, Greg D. <<a

                href="mailto:gregd2350@gmail.com" class=""

                moz-do-not-send="true">gregd2350@gmail.com</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <div class="">

              <meta http-equiv="Content-Type" content="text/html;

                charset=UTF-8" class="">

              <div text="#000000" bgcolor="#FFFFFF" class=""> Hi Mark,<br

                  class="">

                <br class="">

                Ok, did some more poking around, being careful to not

                wiggle too many things at once.  I can get a reliable

                lockup by doing the following:<br class="">

                <br class="">

                1.  power ext12v off<br class="">

                2.  obdii ecu stop<br class="">

                3.  power ext12v on<br class="">

                ...wait a few seconds <br class="">

                4.  obdii ecu start can3<br class="">

                <br class="">

                If I restart obdii too soon, all works.  Otherwise, I

                can repeatedly disable and re-enable 12v to cycle the

                HUD, and it will never connect.  The ordering of steps 1

                and 2 don't seem to matter.  Unfortunately, I don't see

                anything in the can status that's uniquely different

                between scenarios where its working and not.  Will need

                some additional diagnostic logging...<br class="">

                <br class="">

                Now, for fun, I hook the OBDII Dongle to the module, and

                try the same steps, but instead of turning on the HUD, I

                try connecting a few times while the OBDII ECU task is

                not running (simulating the HUD's attempts to connect),

                then start the ecu, then try to connect.  It connects! 

                And this is with the Dongle doing its multi-speed scan

                each time.  So simply having frames come in while we're

                not watching, or frames coming in at the wrong speed

                does not cause the hang.  Rather, it might be that we've

                got a window in the code where incoming traffic <i

                  class="">colliding with</i> the opening of the CAN

                driver is nailing the chip in some critical region.  If

                I hit it just right, I can sometimes cause this

                collision with the Dongle by stopping the ecu, starting

                the connect, then restarting the ecu during the connect

                sequence.  Not always, but sometimes.<br class="">

                <br class="">

                Just a guess...  I need to dust off the chip document

                and see if there are any interesting bits to look at.<br

                  class="">

                <br class="">

                Greg<br class="">

                <br class="">

                <br class="">

                <div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br

                    class="">

                </div>

                <blockquote type="cite"

                  cite="mid:206B7B88-2C5D-4FCF-8CF9-DC9013495F5A@webb-johnson.net"

                  class="">

                  <div class="">When the client (HUD, whatever) is

                    trying to connect to the ECU, it can try 500K, 250K.

                    Or it can try 250K, 500K. I suspect yours tries the

                    first descending sequence, and hence doesn’t have

                    any issues as it finds the match at 500K first.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Anyway, this MCP2515 can bus lockup is

                    something we have to fix. The fact it is

                    reproducible on obd2ecu is good and helpful for

                    that.</div>

                  <div class=""><br class="">

                  </div>

                </blockquote>

                <br class="">

              </div>

              _______________________________________________<br

                class="">

              OvmsDev mailing list<br class="">

              <a href="mailto:OvmsDev@lists.openvehicles.com" class=""

                moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br

                class="">

              <a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br

                class="">

            </div>

          </blockquote>

        </div>

        <br class="">

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

OvmsDev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>

<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>