<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Mark,<br>
<br>
Frank has observed another lockup on can1, RX was stuck, not sure
about TX. Status was:<br>
<br>
<pre style="padding: 8.5px; font-family: Monaco, Menlo, Consolas, "QuickType Mono", "Lucida Console", "Roboto Mono", "Ubuntu Mono", "DejaVu Sans Mono", "Droid Sans Mono", monospace; font-size: 12px; color: rgb(51, 51, 51); border-radius: 4px; display: block; margin: 0px 0px 9px; line-height: 18px; word-break: break-all; word-wrap: break-word; white-space: pre-wrap; background-color: rgb(245, 245, 245); border: 1px solid rgba(0, 0, 0, 0.15); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">CAN: can1
Mode: Active
Speed: 500000
Interrupts: 22425486
Rx pkt: 21080181
Rx err: 5
Rx ovrflw: 2054
Tx pkt: 1340517
Tx delays: 335
Tx err: 0
Tx ovrflw: 0
Err flags: 0x00800c5b
</pre>
<br class="Apple-interchange-newline">
That seems to be a bus error IRQ -- but strangely, the bus error
status flag is not set?<br>
<br>
ECC 5b = form error on TX in acknowledge delimiter (?)<br>
<br>
Can we use some of this info as an indicator that restarting the
interface is necessary?<br>
<br>
Regards,<br>
Michael<br>
<br>
<br>
<div class="moz-cite-prefix">Am 12.06.2018 um 15:44 schrieb Mark
Webb-Johnson:<br>
</div>
<blockquote type="cite"
cite="mid:5412B335-B70B-468C-9470-8078088BC0BA@webb-johnson.net">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I wrote extensions to the ‘test canrx’ and ‘test cantx’ commands.
They now provide high resolution timings, which can be used to get
an idea of performance. CAN2/CAN3 tx is dog slow compared to CAN1.
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div class="">
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">OVMS# test cantx can2
100</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Testing 100 frames on
can2</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Transmitted 100 frames
in 0.299612s = 2996us/frame</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class=""><br class="">
</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">OVMS# test cantx can1
100</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Testing 100 frames on
can1</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 14px;" class="">Transmitted 100 frames
in 0.013691s = 136us/frame</span></font></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">I tried adding the "while
(MODULE_ESP32CAN->SR.B.RBS)” into esp32can, and it seems to
work just fine. Not sure how to replicate the problem, so not
sure if it resolves anything - but it doesn’t seem to make
anything worse so I committed it. Feel free to revert that
commit if it causes any issues.</div>
<div class=""><br class="">
</div>
<div class="">I’ll continue looking into this tomorrow.</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 12 Jun 2018, at 2:37 PM, Mark Webb-Johnson
<<a href="mailto:mark@webb-johnson.net" class=""
moz-do-not-send="true">mark@webb-johnson.net</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode:
space; line-break: after-white-space;" class="">
<blockquote type="cite" class="">But how would you check
that a frame is available? I understand from the
documentation, the RX IRQ isn't raised unless the FIFO
has a valid message and (in<br class="">
PeliCAN mode) has been mapped to the buffer address. I
can't see how we could check the frame validity
additionally to that.</blockquote>
<div class=""><br class="">
</div>
I suggest to put a check for
MODULE_ESP32CAN->SR.B.RBS at the start
of ESP32CAN_rxframe. That status bit is documented as
indicating "one or more messages are available in
the RXFIFO”. Presumably we shouldn’t be reading from the
RX FIFO unless that bit is set.
<div class=""><br class="">
</div>
<div class="">I agree that the documentation doesn’t
talk about spurious interrupts, but adding that check
should not do any harm. Reading the receive buffer
when nothing is in it, by comparison, would produce
all sorts of weird incoming frame results (including
duplicates, and the partial frame corruptions you
mention, due to RX FIFO wrapping).</div>
<div class=""><br class="">
</div>
<div class="">Looking at page #28 of the datasheet, we
have:</div>
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none;
padding: 0px;" class="">
<div class="">
<div class=""><i class="">4. After reading the
contents of the receive buffer, the CPU can
release this memory space in the RXFIFO by
setting the release receive buffer bit to logic
1. This may result in another message becoming
immediately available within the receive buffer.
If there is no other message available, the
receive interrupt bit is reset.</i></div>
</div>
</blockquote>
<div class="">
<div class=""><br class="">
</div>
<div class="">Does that mean that we should be
checking the receive buffer in ESP32CAN_rxframe and
looping? I’m confused because if it did not then the
interrupt would presumably fire again, and cause
recursion (or is the new interrupt delayed until the
end of the current ISR execution)? Perhaps we can
simply do a:</div>
<div class=""><br class="">
</div>
</div>
<blockquote style="margin: 0 0 0 40px; border: none;
padding: 0px;" class="">
<div class="">
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 18px;" class="">while
(MODULE_ESP32CAN->SR.B.RBS)</span></font></div>
</div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 18px;" class=""> {</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 18px;" class=""> // Read and
process a message</span></font></div>
<div class=""><font class="" face="Andale Mono"><span
style="font-size: 18px;" class=""> }</span></font></div>
</blockquote>
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">in ESP32CAN_rxframe to process all
incoming messages? The mcp2515 does something
similar (relying on can.cpp CAN_rxstask to loop
until no more messages available).</div>
<div class=""><br class="">
</div>
<div class="">Regards, Mark.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 12 Jun 2018, at 5:22 AM,
Michael Balzer <<a
href="mailto:dexter@expeedo.de" class=""
moz-do-not-send="true">dexter@expeedo.de</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">Am 11.06.2018 um 04:07 schrieb
Mark Webb-Johnson:<br class="">
<blockquote type="cite" class="">I think the
fault must be internal (controller, or our
driver code), as the can bus itself has
error checking.<br class="">
The only other thing I noticed on my
review is that in ESP32CAN_rxframe there
is no check that a frame is actually
available before reading and queueing it.
If the RX interrupt is spurious, that
could cause all sorts of issues. I think
it would be safer to have such a check,
and maybe that is the cause of this issue?<br
class="">
</blockquote>
<br class="">
You mean we could be reading the RX mailbox
a) without a new signal and b) before the
next message is fully rolled in.<br class="">
<br class="">
But how would you check that a frame is
available? I understand from the
documentation, the RX IRQ isn't raised
unless the FIFO has a valid message and (in<br
class="">
PeliCAN mode) has been mapped to the buffer
address. I can't see how we could check the
frame validity additionally to that.<br
class="">
<br class="">
Setting CMR.RRB then allows the chip to
fetch the next message from the FIFO, which
will trigger another IR.RI as soon as the
buffer is filled.<br class="">
<br class="">
IR.RI is a level interrupt, so if stuck will
trigger again. But reading IR clears it. So
even if the ISR would be called again due to
an IRQ handling bug, IR.RI<br class="">
would be 0, so rxframe() wouldn't be called.<br
class="">
<br class="">
…unless IR.RI was set due to a hardware bug…<br
class="">
<br class="">
…or IR.RI could also be set on a data
overrun (or other error) for an invalid
message. That would mean IR.RI must be
discarded in case of an error interrupt<br
class="">
coming in with IR.RI, but I've seen nothing
in the documentation about something like
this.<br class="">
<br class="">
<blockquote type="cite" class="">I’m working
on the MCP2515 driver at the moment, to
try to find out what is causing the OBDII
HUD lock-up (and others have seen).
Perhaps I’ll find out something when
reviewing the driver and shared CAN
library. Due to the huge volume of traffic
on my Tesla Model S, I get a large number
of overruns but don’t see this problem.
Maybe it is rare, or we are just not
noticing it.<br class="">
</blockquote>
<br class="">
I assume the problem can be triggered by
poor Wifi connectivity -- that's the
situation at Frank's home. I assume the Wifi
driver disables interrupts or context<br
class="">
switches in some situations, which causes
the CAN rx task execution pauses.<br
class="">
<br class="">
Just a wild guess though, I would check the
source for this pattern if it was open.<br
class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">Regards,
Mark<br class="">
<br class="">
<blockquote type="cite" class="">On 10 Jun
2018, at 9:05 PM, Michael Balzer <<a
href="mailto:dexter@expeedo.de"
class="" moz-do-not-send="true">dexter@expeedo.de</a>>
wrote:<br class="">
<br class="">
Frank Demes reported a very strange CAN
behaviour. He managed to get a CRTD log
of one incident.<br class="">
<br class="">
The effect is triggered by an unknown
cause freezing the CAN rx task for short
periods of time, i.e. 70 - 120 ms. This
alone should be no problem, the RXFIFO<br
class="">
will probaly overflow, but that is
handled by our driver, so except some
messages lost this should not be an
issue.<br class="">
<br class="">
Unfortunately, something goes completely
wrong:<br class="">
<br class="">
a) on the overrun event, the last valid
message is delivered multiple times
(i.e. 5 times)<br class="">
<br class="">
b) after resolving the overrun, a
garbled message is received.<br class="">
<br class="">
The garbled message contains a valid
byte sequence in the wrong place, i.e.
bytes 5-8 would be valid if they were
bytes 1-4. Bytes 1-4 cannot have been
read<br class="">
from the bus, so something is going very
wrong here.<br class="">
<br class="">
I have walked through the esp32can
driver and checked the SJA1000
documentation but have no clue what is
going wrong.<br class="">
<br class="">
Any ideas? CRTD excerpt below. (Frame
155 is reflected by the module with byte
1 changed from 07 to 01 to limit the
charge current. The problem is, it also<br
class="">
reflects the garbled message, which
causes the charger to stop.)<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
----------------------------------------------------------------------------------------<br
class="">
…<br class="">
274.341 1R11 155 07 97 E4 54 4B A8 00 6C<br
class="">
274.341 1T11 155 01 97 E4 54 4B A8 00 6C<br
class="">
274.351 1R11 155 07 97 E5 54 4B A8 00 6C<br
class="">
274.351 1T11 155 01 97 E5 54 4B A8 00 6C<br
class="">
274.361 1R11 155 07 97 E5 54 4B A8 00 6C<br
class="">
274.361 1T11 155 01 97 E5 54 4B A8 00 6C<br
class="">
274.371 1R11 423 03 22 FF FF 00 E0 00 DB<br
class="">
274.371 1R11 426 00 38 01 00 4D 7E 00<br
class="">
274.371 1R11 597 20 A4 03 B1 2F 00 01 53<br
class="">
274.371 1R11 627 00 00 00<br class="">
274.371 1R11 155 07 97 E5 54 4B A8 00 6C<br
class="">
274.371 1T11 155 01 97 E5 54 4B A8 00 6C<br
class="">
274.381 1R11 155 07 97 E5 54 4B A8 00 6C<br
class="">
274.381 1T11 155 01 97 E5 54 4B A8 00 6C<br
class="">
274.391 1R11 5D7 FF FF 01 E4 53 80 00<br
class="">
<br class="">
- normal up to here<br class="">
<br class="">
!!! CAN RX is blocked here for ~70 ms
!!!<br class="">
… interrupts disabled? by whom?<br
class="">
<br class="">
274.461 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.461 1R11 436 00 0E 3C B2 00 00<br
class="">
274.461 1R11 155 07 97 E5 54 4B A8 00 6C<br
class="">
274.461 1R11 424 11 40 15 26 47 64 00 48<br
class="">
274.461 1R11 425 0A 1D 44 FF FE 40 01 1F<br
class="">
274.461 1R11 556 30 63 07 30 73 07 30 7A<br
class="">
274.461 1T11 155 01 97 E5 54 4B A8 00 6C
← reflection for 155 still works
here<br class="">
<br class="">
!!! next RX pause, ~90 ms this time !!!<br
class="">
<br class="">
!!! ID 599 has a period of 100 ms, but
we now get it 5 times:<br class="">
<br class="">
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
<br class="">
274.551 1CEV Error intr=58445
rxpkt=27132 txpkt=528128
errflags=0x15433 rxerr=0 txerr=0 rxovr=2
txovr=0 txdelay=17<br class="">
crtd fprintf bug compensation:<br
class="">
274.551 1CEV Error rxpkt=58445
txpkt=27132 errflags=528128 intr=0x15433
rxerr=0 txerr=0 rxovr=2 txovr=0
txdelay=17<br class="">
errflags=528128 = 08 0F 00<br
class="">
0x08 = IR: Data overrun<br class="">
0x0F = SR: Bus OK, error limit not
yet reached, RX & TX idle, RXFIFO
overrun, RX buffer full<br class="">
0x00 = ECC: -<br class="">
<br class="">
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.551 1R11 155 07 97 E6 54 4B A8 00 6C
RX "A"<br class="">
274.551 1R11 155 07 97 E6 54 4B A8 00 6C
RX "B"<br class="">
274.551 1R11 155 07 97 E7 54 4B A8 00 6C
RX "C"<br class="">
274.551 1R11 423 03 22 FF FF 00 E0 00 DB<br
class="">
274.551 1R11 426 00 38 01 00 4D 7E 00<br
class="">
274.551 1R11 597 20 A4 03 B1 2F 00 01 53<br
class="">
274.551 1R11 627 00 00 00<br class="">
274.551 1T11 155 01 97 E6 54 4B A8 00 6C
reflection "A"<br
class="">
274.551 1CEV TX_Queue T11 155 01 97 E6
54 4B A8 00 6C reflection "B"<br
class="">
274.551 1CEV TX_Queue T11 155 01 97 E7
54 4B A8 00 6C reflection "C"<br
class="">
274.551 1T11 155 01 97 E8 54 4B A8 00 6C
reflection "D"<br
class="">
<br class="">
!!! 120 ms pause !!!<br class="">
<br class="">
274.671 1R11 155 07 97 E8 54 4B A8 00 6C
RX "D"<br class="">
<br class="">
→ "D" was sent to the listeners &
reflected by vehicle::IncomingFrame()<br
class="">
then, before the LogFrame() call was
processed, the CAN task was paused for
120 ms<br class="">
<br class="">
and now it's getting really weird:<br
class="">
<br class="">
274.671 1R11 155 07 08 2A A0 07 97 E7 54
RX "E": invalid frame
/ garbled message!<br class="">
<br class="">
… this frame certainly was not on the
bus!<br class="">
bytes 2-4 (08 2A A0) = complete
nonsense, not matching any frame on the
bus<br class="">
bytes 5-8 (07 97 E7 54) = normally
bytes 1-4 (!) of ID 155<br class="">
- copy / memory error? … very
unlikely<br class="">
- CAN RX handling error? … doesn't
seem so<br class="">
- RXFIFO message window bug?<br
class="">
<br class="">
274.671 1CEV Error intr=58458
rxpkt=27134 txpkt=528128
errflags=0x15456 rxerr=0 txerr=0 rxovr=3
txovr=0 txdelay=19<br class="">
274.671 1R11 155 07 97 E7 54 4B A8 00 6C<br
class="">
274.671 1R11 155 07 97 E7 54 4B A8 00 6C<br
class="">
274.671 1R11 155 07 97 E7 54 4B A8 00 6C<br
class="">
274.671 1T11 155 01 97 E6 54 4B A8 00 6C<br
class="">
274.671 1CEV TX_Queue T11 155 01 97 E7
54 4B A8 00 6C<br class="">
274.671 1R11 155 07 97 F2 54 4B A8 00 6C<br
class="">
274.671 1R11 155 07 97 F3 54 4B A8 00 6C<br
class="">
274.671 1R11 423 03 22 FF FF 00 E0 00 DB<br
class="">
274.671 1R11 426 00 38 01 00 4D 7E 00<br
class="">
274.671 1R11 597 20 A4 03 B1 2F 00 01 53<br
class="">
274.671 1R11 627 00 00 00<br class="">
274.671 1R11 155 07 97 F5 54 4B A8 00 6C<br
class="">
274.671 1R11 155 07 97 F6 54 4B A8 00 6C<br
class="">
274.671 1R11 5D7 FF FF 01 E4 53 80 00<br
class="">
274.671 1R11 599 00 00 4D 7E 1E 26 FF FF<br
class="">
274.671 1R11 155 07 97 F7 54 4B A8 00 6C<br
class="">
274.671 1R11 436 00 0E 3C B2 00 00<br
class="">
274.671 1R11 041 A0 07 97 F5 54 4B A8 00<br
class="">
274.671 1T11 155 01 97 E7 54 4B A8 00 6C<br
class="">
274.671 1T11 155 01 08 2A A0 07 97 E7 54
reflection of invalid
msg "E" → charge terminates due to error<br
class="">
<br class="">
<br class="">
-- <br class="">
Michael Balzer * Helkenberger Weg 9 *
D-58256 Ennepetal<br class="">
Fon 02333 / 833 5735 * Handy 0176 / 206
989 26<br class="">
<br class="">
<br class="">
_______________________________________________<br class="">
OvmsDev mailing list<br class="">
<a
href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a
href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev"
class="" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</blockquote>
_______________________________________________<br class="">
OvmsDev mailing list<br class="">
<a
href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a
href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev"
class="" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</blockquote>
<br class="">
-- <br class="">
Michael Balzer * Helkenberger Weg 9 *
D-58256 Ennepetal<br class="">
Fon 02333 / 833 5735 * Handy 0176 / 206 989
26<br class="">
<br class="">
<br class="">
_______________________________________________<br class="">
OvmsDev mailing list<br class="">
<a
href="mailto:OvmsDev@lists.openvehicles.com"
class="" moz-do-not-send="true">OvmsDev@lists.openvehicles.com</a><br
class="">
<a
href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev"
class="" moz-do-not-send="true">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a><br
class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OvmsDev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OvmsDev@lists.openvehicles.com">OvmsDev@lists.openvehicles.com</a>
<a class="moz-txt-link-freetext" href="http://lists.openvehicles.com/mailman/listinfo/ovmsdev">http://lists.openvehicles.com/mailman/listinfo/ovmsdev</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="160">--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
</pre>
</body>
</html>