<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Mark, Tom, et al,<br>
<br>
See my earlier posts on the progress here, or the lack thereof...<br>
<br>
I can reliably reproduce the issue by having a HUD connected to
CAN3, and then (after the HUD has started trying to connect),
starting the obdii ecu task. This fails 100%. If I start the HUD
or use an OBDII dongle, let it make a mess of the bus through
whatever it's doing, and then stop it before starting the obdii
task, it never fails. So, there seems to be a race condition
somewhere in the receive side of the world, during the process of
opening the CAN device while traffic is actively being received.
The obdii ecu task, however, is very reliable once it starts, and
I've not had any sort of lockups once going. But, note that the
usage of the bus is almost entirely a request / reply sort of thing,
so is self limiting.<br>
<br>
I've done a bunch of tracing and debug-printf'ing around this issue,
and have not yet found how to get the receiver to go again, once
hung. I do not believe, for example,that the SPI bus is hung,
because I can continue to get various status interrupts while the
errors mount. Just not any receive frames, in fact, no frames at
all if I start the HUD first. I do get the status interrupts Mark
has flagged (0 -> 3 -> b), and when received, I tried clearing
them explicitly (clearing the interrupt status, that is). No change
in behavior, I suspect because I'm just clearing the status, not the
underlying cause. Unfortunately, I don't see any way to reset just
the receiver, and resetting the chip would likely just drop into the
same state again (assuming CAN traffic continues to be received).<br>
<br>
Where I think I left things a month ago (before getting side-tracked
on other projects) was to put in a delay in the obdii task so that
stuff builds up without being received, trying to force a lockup due
to the overflow. No luck. If I put in a long enough delay, the HUD
thinks the car has been turned off, and goes to sleep. Less than
that, and things recover. This was starting to make my head hurt,
so I let it rest for a bit, and got side-tracked, sorry.<br>
<br>
Tom, getting a status from you on what the chip thinks is going on
when you see the lockup will be interesting. I'm assuming that you
are receiving stuff for a while, but there's a race condition in the
receive processing somehow that you can hit, that the obdii
request/response sequencing will never hit. Do you ever transmit on
your CAN bus? I wonder transmitting a "NOP" frame of some sort
would help...<br>
<br>
I've got commitments here until next week, but may be able to get
back to poking at this after that.<br>
<br>
Greg<br>
<br>
<br>
<div class="moz-cite-prefix">Mark Webb-Johnson wrote:<br>
</div>
<blockquote type="cite"
cite="mid:DA9B360C-3819-405E-93D8-D43DE0D78BA3@webb-johnson.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div class="">I’ve spent some time on this, and finally managed to
reliably repeat it (at least in one case) by:</div>
<div class=""><br class="">
</div>
<div class="">
<ol class="MailOutline">
<li class="">Connect an external HUD and ‘obdii ecu start
can3’.</li>
<li class="">Once the HUD is connected and working, manually
change baud rate to incorrect ‘can can3 start active
250000’.</li>
<li class="">Watch errors start streaming in.</li>
<li class="">If I quickly switch back with ‘can can3 start
active 500000’, it recovers and everything is fine.</li>
<li class="">If I leave it running, it seems to count up to
128 errors, and then lock up. At this point even a ‘can can3
start active 500000’ doesn’t solve it.</li>
<li class="">A ‘power can3 off’ then ‘can can3 start active
500000’ recovers it.</li>
</ol>
</div>
<div class=""><br class="">
</div>
Here is what it looks like in the failed state:
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div class="">
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">OVMS# can can3 status</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">CAN: can3</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Mode: Active</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Speed: 250000</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Interrupts:
35901</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Rx pkt:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Rx err:
128</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Rx ovrflw:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Tx pkt:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Tx delays:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Tx err:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Tx ovrflw:
0</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Err flags: 0x800b</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">D (697321) canlog:
Status can3 intr=35900 rxpkt=0 txpkt=0 errflags=0x800b
rxerr=128 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Can you check to see what yours looks like next time
it fails?</div>
<div class=""><br class="">
</div>
<div class="">Looking at the MCP2515 data sheet (page #45), it has
this to say:</div>
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div class=""><i class=""><b class="">6.6 Error States</b><br
class="">
<br class="">
Detected errors are made known to all other nodes via error
frames. The transmission of the erroneous mes- sage is
aborted and the frame is repeated as soon as possible.
Furthermore, each CAN node is in one of the three
error states according to the value of the internal error
counters:<br class="">
<br class="">
1. Error-active.<br class="">
2. Error-passive.<br class="">
3. Bus-off (transmitter only).<br class="">
<br class="">
The error-active state is the usual state where the node can
transmit messages and active error frames (made of dominant
bits) without any restrictions.<br class="">
<br class="">
In the error-passive state, messages and passive
error frames (made of recessive bits) may be transmitted.<br
class="">
<br class="">
The bus-off state makes it temporarily impossible for the
station to participate in the bus communication. During this
state, messages can neither be received or transmitted. Only
transmitters can go bus-off.<br class="">
<br class="">
<b class="">6.7 Error Modes and Error Counters</b><br
class="">
<br class="">
The MCP2515 contains two error counters: the Receive Error
Counter (REC) (see Register 6-2) and the Transmit Error
Counter (TEC) (see Register 6-1). The values of both
counters can be read by the MCU. These counters
are incremented/decremented in accordance with the CAN bus
specification.<br class="">
<br class="">
The MCP2515 is error-active if both error counters are below
the error-passive limit of 128.<br class="">
<br class="">
It is error-passive if at least one of the error
counters equals or exceeds 128.<br class="">
<br class="">
It goes to bus-off if the TEC exceeds the bus-off limit
of 255. The device remains in this state until the
bus-off recovery sequence is received. The bus-off
recovery sequence consists of 128 occurrences and 11
consec- utive recessive bits (see Figure 6-1).<br class="">
<br class="">
The Current Error mode of the MCP2515 can be read by the MCU
via the EFLG register (see Register 6-3).<br class="">
<br class="">
Additionally, there is an error state warning flag
bit (EFLG:EWARN) which is set if at least one of the
error counters equals or exceeds the error warning limit
of 96. EWARN is reset if both error counters are less
than the error warning limit.</i></div>
</blockquote>
<div class="">
<div><br class="">
</div>
<div>I don’t think we access these TEC and REC registers, but
the 128 number cannot be a coincidence.</div>
<div><br class="">
</div>
<div>We do access the EFLG register, in our ISR, and here is
what I see:</div>
<div><br class="">
</div>
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div class="">
<div>
<div><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">E (685091) canlog:
Error can3 intr=30 rxpkt=0 txpkt=0 errflags=0x8000
rxerr=56 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
<div><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">E (685091) canlog:
Error can3 intr=31 rxpkt=0 txpkt=0 errflags=0x8000
rxerr=58 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
<div><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">E (685091) canlog:
Error can3 intr=32 rxpkt=0 txpkt=0 errflags=0x8000
rxerr=60 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
<div><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">E (685091) canlog:
Error can3 intr=43 rxpkt=0 txpkt=0 errflags=0x8000
rxerr=81 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
<div><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">E (685101) canlog:
Error can3 intr=60 rxpkt=0 txpkt=0 errflags=0x8003
rxerr=113 txerr=0 rxovr=0 txovr=0 txdelay=0</span></font></div>
</div>
</div>
</blockquote>
<div class="">
<div><br class="">
</div>
<div>Lower 8bits of that is the EFLG, so 0x00 is normal, 0x03 is
when the error is hit, and 0x0b is what we see later.
Documentation for this flag is:</div>
<div><br class="">
</div>
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div>bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0<br class="">
<br class="">
R/W-0 R-0 R-0 R-0 R-0 R-0<br class="">
<br class="">
bit#7: RX1OVR: Receive Buffer 1 Overflow Flag bit<br class="">
- Set when a valid message is received for RXB1 and
CANINTF.RX1IF = 1 - Must be reset by MCU<br class="">
<br class="">
bit#6: RX0OVR: Receive Buffer 0 Overflow Flag bit<br class="">
- Set when a valid message is received for RXB0 and
CANINTF.RX0IF = 1<br class="">
- Must be reset by MCU</div>
<div><br class="">
</div>
<div>bit#5: TXBO: Bus-Off Error Flag bit<br class="">
- Bit set when TEC reaches 255<br class="">
- Reset after a successful bus recovery sequence<br class="">
<br class="">
bit#4: TXEP: Transmit Error-Passive Flag bit<br class="">
- Set when TEC is equal to or greater than 128 - Reset when
TEC is less than 128<br class="">
<br class="">
bit#3: RXEP: Receive Error-Passive Flag bit<br class="">
- Set when REC is equal to or greater than 128<br class="">
- Reset when REC is less than 128</div>
<div><br class="">
</div>
bit#2: TXWAR: Transmit Error Warning Flag bit
<div>- Set when TEC is equal to or greater than 96 - Reset when
TEC is less than 96<br class="">
<br class="">
bit#1: RXWAR: Receive Error Warning Flag bit<br class="">
- Set when REC is equal to or greater than 96 - Reset when REC
is less than 96<br class="">
<br class="">
bit#0: EWARN: Error Warning Flag bit<br class="">
- Set when TEC or REC is equal to or greater than 96 (TXWAR or
RXWAR = 1)<br class="">
- Reset when both REC and TEC are less than 96</div>
</blockquote>
<div class="">
<div><br class="">
</div>
<div>So that is EWARN+RXWAR when the 128 error issue occurs, and
EWARN+RXWAR+RXEP when everything is locked up. We have code to
clear the error condition (in the interrupt flags register),
but that doesn’t seem to get out of this 128 error lock-up.</div>
<div><br class="">
</div>
<div>I am not sure of the best approach for this. Perhaps pickup
the condition, and reset the SPI bus, in a timer every 10
seconds or so?</div>
<div><br class="">
</div>
<div>I am not sure if this is your problem (a ‘can can2 status’
would tell us). In any case, the fix for this is to pickup
this error condition in the ISR and fix it (or perhaps a
separate periodic timer).</div>
<div><br class="">
</div>
<div>Regards, Mark.</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 5 Jul 2018, at 3:55 PM, Tom Parker <<a
href="mailto:tom@carrott.org" class=""
moz-do-not-send="true">tom@carrott.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">I haven't had a chance to try to work out
what is going on.<br class="">
<br class="">
I can say that the second can interface doesn't work for
very long before stopping. This manifests most obviously
on my Leaf as a stopped odometer in the OVMS app. If you
look at the metrics in the console then everything that
comes from the Car CAN bus (ie the second CAN bus) has
frozen.<br class="">
<br class="">
The first CAN interface seems much more reliable, with
SOC information from the EV bus being fairly reliably
reported.<br class="">
<br class="">
I haven't done the modification to make my 3.0 unit's
GPS work so I haven't experienced the stolen detection.<br
class="">
<br class="">
<br class="">
On 05/07/18 18:34, Stein Arne Sordal wrote:<br class="">
<blockquote type="cite" class="">Did anyone figure out
what happens here?<br class="">
Now the OVMS thinks my car is stolen since it´s moving
(GPS) and CAN2 is dead.<br class="">
Reboot of module brings CAN2 back to life for a period
of time.<br class="">
<br class="">
-Stein Arne Sordal-<br class="">
<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On 11 May 2018, at
12:29, Stein Arne Sordal <<a
href="mailto:ovms@topphemmelig.no" class=""
moz-do-not-send="true">ovms@topphemmelig.no</a>>
wrote:<br class="">
<br class="">
Hi Tom<br class="">
<br class="">
I have seen this with my Leaf.<br class="">
I´ve been on vacation, so I haven´t got time to test
a lot, but it looks like one of the can buses stops.
Started testing again today.<br class="">
<br class="">
-Stein Arne Sordal-<br class="">
<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On 11 May 2018, at
12:22, Tom Parker <<a
href="mailto:tom@carrott.org" class=""
moz-do-not-send="true">tom@carrott.org</a>>
wrote:<br class="">
<br class="">
Hi all,<br class="">
<br class="">
I synced up with master about a week ago and since
then I've seen both can busses stop working. I
still see the 12v battery metric changing, but
everything that comes from the car stops.
Rebooting the module with "module reset" does not
seem to fix it, while make app-flash monitor does
fix it. I haven't tried make monitor on it's own.<br
class="">
<br class="">
Is anyone else seeing behavior like this?<br
class="">
<br class="">
Sorry for the vague bug report. I'll spend some
time later this weekend to try to gather more
information.<br class="">
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br class="">
</blockquote>
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br class="">
</blockquote>
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</blockquote>
<br class="">
_______________________________________________<br
class="">
OvmsDev mailing list<br class="">
<a href="mailto:OvmsDev@lists.openvehicles.com" class=""
moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
</blockquote>
<br>
</body>
</html>