Am 11.06.2018 um 04:07 schrieb
Mark Webb-Johnson:
I think the
fault must be internal (controller, or our
driver code), as the can bus itself has
error checking.
The only other thing I noticed on my
review is that in ESP32CAN_rxframe there
is no check that a frame is actually
available before reading and queueing it.
If the RX interrupt is spurious, that
could cause all sorts of issues. I think
it would be safer to have such a check,
and maybe that is the cause of this issue?
You mean we could be reading the RX mailbox
a) without a new signal and b) before the
next message is fully rolled in.
But how would you check that a frame is
available? I understand from the
documentation, the RX IRQ isn't raised
unless the FIFO has a valid message and (in
PeliCAN mode) has been mapped to the buffer
address. I can't see how we could check the
frame validity additionally to that.
Setting CMR.RRB then allows the chip to
fetch the next message from the FIFO, which
will trigger another IR.RI as soon as the
buffer is filled.
IR.RI is a level interrupt, so if stuck will
trigger again. But reading IR clears it. So
even if the ISR would be called again due to
an IRQ handling bug, IR.RI
would be 0, so rxframe() wouldn't be called.
…unless IR.RI was set due to a hardware bug…
…or IR.RI could also be set on a data
overrun (or other error) for an invalid
message. That would mean IR.RI must be
discarded in case of an error interrupt
coming in with IR.RI, but I've seen nothing
in the documentation about something like
this.
I’m working
on the MCP2515 driver at the moment, to
try to find out what is causing the OBDII
HUD lock-up (and others have seen).
Perhaps I’ll find out something when
reviewing the driver and shared CAN
library. Due to the huge volume of traffic
on my Tesla Model S, I get a large number
of overruns but don’t see this problem.
Maybe it is rare, or we are just not
noticing it.
I assume the problem can be triggered by
poor Wifi connectivity -- that's the
situation at Frank's home. I assume the Wifi
driver disables interrupts or context
switches in some situations, which causes
the CAN rx task execution pauses.
Just a wild guess though, I would check the
source for this pattern if it was open.
Regards,
Michael
Regards,
Mark
On 10 Jun
2018, at 9:05 PM, Michael Balzer <dexter@expeedo.de>
wrote:
Frank Demes reported a very strange CAN
behaviour. He managed to get a CRTD log
of one incident.
The effect is triggered by an unknown
cause freezing the CAN rx task for short
periods of time, i.e. 70 - 120 ms. This
alone should be no problem, the RXFIFO
will probaly overflow, but that is
handled by our driver, so except some
messages lost this should not be an
issue.
Unfortunately, something goes completely
wrong:
a) on the overrun event, the last valid
message is delivered multiple times
(i.e. 5 times)
b) after resolving the overrun, a
garbled message is received.
The garbled message contains a valid
byte sequence in the wrong place, i.e.
bytes 5-8 would be valid if they were
bytes 1-4. Bytes 1-4 cannot have been
read
from the bus, so something is going very
wrong here.
I have walked through the esp32can
driver and checked the SJA1000
documentation but have no clue what is
going wrong.
Any ideas? CRTD excerpt below. (Frame
155 is reflected by the module with byte
1 changed from 07 to 01 to limit the
charge current. The problem is, it also
reflects the garbled message, which
causes the charger to stop.)
Regards,
Michael
----------------------------------------------------------------------------------------
…
274.341 1R11 155 07 97 E4 54 4B A8 00 6C
274.341 1T11 155 01 97 E4 54 4B A8 00 6C
274.351 1R11 155 07 97 E5 54 4B A8 00 6C
274.351 1T11 155 01 97 E5 54 4B A8 00 6C
274.361 1R11 155 07 97 E5 54 4B A8 00 6C
274.361 1T11 155 01 97 E5 54 4B A8 00 6C
274.371 1R11 423 03 22 FF FF 00 E0 00 DB
274.371 1R11 426 00 38 01 00 4D 7E 00
274.371 1R11 597 20 A4 03 B1 2F 00 01 53
274.371 1R11 627 00 00 00
274.371 1R11 155 07 97 E5 54 4B A8 00 6C
274.371 1T11 155 01 97 E5 54 4B A8 00 6C
274.381 1R11 155 07 97 E5 54 4B A8 00 6C
274.381 1T11 155 01 97 E5 54 4B A8 00 6C
274.391 1R11 5D7 FF FF 01 E4 53 80 00
- normal up to here
!!! CAN RX is blocked here for ~70 ms
!!!
… interrupts disabled? by whom?
274.461 1R11 599 00 00 4D 7E 1E 26 FF FF
274.461 1R11 436 00 0E 3C B2 00 00
274.461 1R11 155 07 97 E5 54 4B A8 00 6C
274.461 1R11 424 11 40 15 26 47 64 00 48
274.461 1R11 425 0A 1D 44 FF FE 40 01 1F
274.461 1R11 556 30 63 07 30 73 07 30 7A
274.461 1T11 155 01 97 E5 54 4B A8 00 6C
← reflection for 155 still works
here
!!! next RX pause, ~90 ms this time !!!
!!! ID 599 has a period of 100 ms, but
we now get it 5 times:
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1CEV Error intr=58445
rxpkt=27132 txpkt=528128
errflags=0x15433 rxerr=0 txerr=0 rxovr=2
txovr=0 txdelay=17
crtd fprintf bug compensation:
274.551 1CEV Error rxpkt=58445
txpkt=27132 errflags=528128 intr=0x15433
rxerr=0 txerr=0 rxovr=2 txovr=0
txdelay=17
errflags=528128 = 08 0F 00
0x08 = IR: Data overrun
0x0F = SR: Bus OK, error limit not
yet reached, RX & TX idle, RXFIFO
overrun, RX buffer full
0x00 = ECC: -
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1R11 155 07 97 E6 54 4B A8 00 6C
RX "A"
274.551 1R11 155 07 97 E6 54 4B A8 00 6C
RX "B"
274.551 1R11 155 07 97 E7 54 4B A8 00 6C
RX "C"
274.551 1R11 423 03 22 FF FF 00 E0 00 DB
274.551 1R11 426 00 38 01 00 4D 7E 00
274.551 1R11 597 20 A4 03 B1 2F 00 01 53
274.551 1R11 627 00 00 00
274.551 1T11 155 01 97 E6 54 4B A8 00 6C
reflection "A"
274.551 1CEV TX_Queue T11 155 01 97 E6
54 4B A8 00 6C reflection "B"
274.551 1CEV TX_Queue T11 155 01 97 E7
54 4B A8 00 6C reflection "C"
274.551 1T11 155 01 97 E8 54 4B A8 00 6C
reflection "D"
!!! 120 ms pause !!!
274.671 1R11 155 07 97 E8 54 4B A8 00 6C
RX "D"
→ "D" was sent to the listeners &
reflected by vehicle::IncomingFrame()
then, before the LogFrame() call was
processed, the CAN task was paused for
120 ms
and now it's getting really weird:
274.671 1R11 155 07 08 2A A0 07 97 E7 54
RX "E": invalid frame
/ garbled message!
… this frame certainly was not on the
bus!
bytes 2-4 (08 2A A0) = complete
nonsense, not matching any frame on the
bus
bytes 5-8 (07 97 E7 54) = normally
bytes 1-4 (!) of ID 155
- copy / memory error? … very
unlikely
- CAN RX handling error? … doesn't
seem so
- RXFIFO message window bug?
274.671 1CEV Error intr=58458
rxpkt=27134 txpkt=528128
errflags=0x15456 rxerr=0 txerr=0 rxovr=3
txovr=0 txdelay=19
274.671 1R11 155 07 97 E7 54 4B A8 00 6C
274.671 1R11 155 07 97 E7 54 4B A8 00 6C
274.671 1R11 155 07 97 E7 54 4B A8 00 6C
274.671 1T11 155 01 97 E6 54 4B A8 00 6C
274.671 1CEV TX_Queue T11 155 01 97 E7
54 4B A8 00 6C
274.671 1R11 155 07 97 F2 54 4B A8 00 6C
274.671 1R11 155 07 97 F3 54 4B A8 00 6C
274.671 1R11 423 03 22 FF FF 00 E0 00 DB
274.671 1R11 426 00 38 01 00 4D 7E 00
274.671 1R11 597 20 A4 03 B1 2F 00 01 53
274.671 1R11 627 00 00 00
274.671 1R11 155 07 97 F5 54 4B A8 00 6C
274.671 1R11 155 07 97 F6 54 4B A8 00 6C
274.671 1R11 5D7 FF FF 01 E4 53 80 00
274.671 1R11 599 00 00 4D 7E 1E 26 FF FF
274.671 1R11 155 07 97 F7 54 4B A8 00 6C
274.671 1R11 436 00 0E 3C B2 00 00
274.671 1R11 041 A0 07 97 F5 54 4B A8 00
274.671 1T11 155 01 97 E7 54 4B A8 00 6C
274.671 1T11 155 01 08 2A A0 07 97 E7 54
reflection of invalid
msg "E" → charge terminates due to error
--
Michael Balzer * Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206
989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989
26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev