<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Mark,<br>
<br>
Sorry for the radio silence on this end. I have been poking at the
code, but have yet to find out what is going on. But I have some
data.<br>
<br>
When the lockup occurs, it's not for lack of getting any
interrupts. Following my recipe for recreating the hang, the first
callback gives us a status (from register 0x2c) of 0x80, which is a
receive error, but none of the error bits (register 0x2d) are set.
As time goes on, the status sometimes changes to 0xA0 for an
interrupt or two, then back to 0x80. Meanwhile the error bits go
from 0 to 3 to 0xB, and stick there over a period of maybe half to a
dozen interrupts. This is all within a blink - big burst of
interrupts (hundreds) that get processed, then all goes quiet with
no further interrupt activity. I suspect at that point the chip has
gone into "passive mode", and basically curled up in a little ball,
whimpering softly. The length of the burst seems to depend on where
in the HUD's startup cycle I enable the obd2ecu task, but I don't
have a recipe for how to control it.<br>
<br>
Significantly through all this, there are no (zero) received
messages. Also, that the interrupts are essentially back-to-back,
means that we are not effectively dealing with the underlying cause
of the interrupt.<br>
<br>
What seems to be happening is that there are messages being received
at the CAN interface at the same time as the chip is being
configured. That the issue is 100% reproducible when traffic is
received and 0% if I stop the HUD briefly while enabling the obd2ecu
task, tells me that there is a window in the configuration code
which is wider than the time between HUD frames (about 100ms).
There are a pair of 50ms delays in the CAN Start routine (neither of
which appear to be documented in the chip data sheet), but removing
them simply broke the CAN system.<br>
<br>
Simply turning on the HUD by toggling ext12v (so it's clean, with no
switch bounce at the HUD) causes a pair of status 0x80 / error 00
interrupts, some short time apart, before normal poll / response
operation begins. I presume it's just a bit of noise on the CAN bus
as the HUD turns on its chips, but conclude that the 0x80 / 00
condition (which is also seen at the start of the hang scenario) is
not specifically fatal. What seems to be at issue with the hang is
that we are lacking a way to clear the receiver after getting it set
up This chip is odd, in that one clears the error flags in the
interrupt status register, but not anything more internal to address
what caused the interrupt in the first place. <br>
<br>
At least, I couldn't find such a command. Still looking... I don't
suppose there is a receive buffer between the CAN bus and the MC2515
that I could temporarily turn off to silence the bus while
configuring the chip, is there?<br>
<br>
Greg<br>
<br>
<br>
<div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3AD114DC-589D-45CD-98CC-28FBF6CB153F@webb-johnson.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
The fault must be in mcp2515::RxCallback.
<div class=""><br class="">
</div>
<div class="">Good news is that the way we marshal these
interrupts, that callback is in a normal task and can be logged
/ debugged using normal tools. It is not running in interrupt
context.</div>
<div class=""><br class="">
</div>
<div class="">The usual cause of these sorts of things is the
interrupt not being raised, so mcp2515::RxCallback is not
called, and locks forever. Perhaps you can try adding a command
to inject a spoofed interrupt message (see MCP2515_isr and just
use xQueueSend not xQueueSendFromISR), and see if that command
can ‘free’ a locked up CAN bus. If so, that is the cause.</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 24 May 2018, at 1:26 PM, Greg D. <<a
href="mailto:gregd2350@gmail.com" class=""
moz-do-not-send="true">gregd2350@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> Hi Mark,<br
class="">
<br class="">
Ok, did some more poking around, being careful to not
wiggle too many things at once. I can get a reliable
lockup by doing the following:<br class="">
<br class="">
1. power ext12v off<br class="">
2. obdii ecu stop<br class="">
3. power ext12v on<br class="">
...wait a few seconds <br class="">
4. obdii ecu start can3<br class="">
<br class="">
If I restart obdii too soon, all works. Otherwise, I
can repeatedly disable and re-enable 12v to cycle the
HUD, and it will never connect. The ordering of steps 1
and 2 don't seem to matter. Unfortunately, I don't see
anything in the can status that's uniquely different
between scenarios where its working and not. Will need
some additional diagnostic logging...<br class="">
<br class="">
Now, for fun, I hook the OBDII Dongle to the module, and
try the same steps, but instead of turning on the HUD, I
try connecting a few times while the OBDII ECU task is
not running (simulating the HUD's attempts to connect),
then start the ecu, then try to connect. It connects!
And this is with the Dongle doing its multi-speed scan
each time. So simply having frames come in while we're
not watching, or frames coming in at the wrong speed
does not cause the hang. Rather, it might be that we've
got a window in the code where incoming traffic <i
class="">colliding with</i> the opening of the CAN
driver is nailing the chip in some critical region. If
I hit it just right, I can sometimes cause this
collision with the Dongle by stopping the ecu, starting
the connect, then restarting the ecu during the connect
sequence. Not always, but sometimes.<br class="">
<br class="">
Just a guess... I need to dust off the chip document
and see if there are any interesting bits to look at.<br
class="">
<br class="">
Greg<br class="">
<br class="">
<br class="">
<div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br
class="">
</div>
<blockquote type="cite"
cite="mid:206B7B88-2C5D-4FCF-8CF9-DC9013495F5A@webb-johnson.net"
class="">
<div class="">When the client (HUD, whatever) is
trying to connect to the ECU, it can try 500K, 250K.
Or it can try 250K, 500K. I suspect yours tries the
first descending sequence, and hence doesn’t have
any issues as it finds the match at 500K first.</div>
<div class=""><br class="">
</div>
<div class="">Anyway, this MCP2515 can bus lockup is
something we have to fix. The fact it is
reproducible on obd2ecu is good and helpful for
that.</div>
<div class=""><br class="">
</div>
</blockquote>
<br class="">
</div>
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com" class=""
moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
</blockquote>
<br>
</body>
</html>