Nasty. I think the fault must be internal (controller, or our driver code), as the can bus itself has error checking. I reviewed the ESP32.COM thread, and master driver code from the original developers, but couldn’t find anything obvious. There was one late commit (early this year) commented as 'avoid dominant CAN bus level while initialisation’ and just affecting initialisation. I don’t think that is related, but I just applied it to our code base anyway. The only other thing I noticed on my review is that in ESP32CAN_rxframe there is no check that a frame is actually available before reading and queueing it. If the RX interrupt is spurious, that could cause all sorts of issues. I think it would be safer to have such a check, and maybe that is the cause of this issue? I’m working on the MCP2515 driver at the moment, to try to find out what is causing the OBDII HUD lock-up (and others have seen). Perhaps I’ll find out something when reviewing the driver and shared CAN library. Due to the huge volume of traffic on my Tesla Model S, I get a large number of overruns but don’t see this problem. Maybe it is rare, or we are just not noticing it. Regards, Mark
On 10 Jun 2018, at 9:05 PM, Michael Balzer <dexter@expeedo.de> wrote:
Frank Demes reported a very strange CAN behaviour. He managed to get a CRTD log of one incident.
The effect is triggered by an unknown cause freezing the CAN rx task for short periods of time, i.e. 70 - 120 ms. This alone should be no problem, the RXFIFO will probaly overflow, but that is handled by our driver, so except some messages lost this should not be an issue.
Unfortunately, something goes completely wrong:
a) on the overrun event, the last valid message is delivered multiple times (i.e. 5 times)
b) after resolving the overrun, a garbled message is received.
The garbled message contains a valid byte sequence in the wrong place, i.e. bytes 5-8 would be valid if they were bytes 1-4. Bytes 1-4 cannot have been read from the bus, so something is going very wrong here.
I have walked through the esp32can driver and checked the SJA1000 documentation but have no clue what is going wrong.
Any ideas? CRTD excerpt below. (Frame 155 is reflected by the module with byte 1 changed from 07 to 01 to limit the charge current. The problem is, it also reflects the garbled message, which causes the charger to stop.)
Regards, Michael
---------------------------------------------------------------------------------------- … 274.341 1R11 155 07 97 E4 54 4B A8 00 6C 274.341 1T11 155 01 97 E4 54 4B A8 00 6C 274.351 1R11 155 07 97 E5 54 4B A8 00 6C 274.351 1T11 155 01 97 E5 54 4B A8 00 6C 274.361 1R11 155 07 97 E5 54 4B A8 00 6C 274.361 1T11 155 01 97 E5 54 4B A8 00 6C 274.371 1R11 423 03 22 FF FF 00 E0 00 DB 274.371 1R11 426 00 38 01 00 4D 7E 00 274.371 1R11 597 20 A4 03 B1 2F 00 01 53 274.371 1R11 627 00 00 00 274.371 1R11 155 07 97 E5 54 4B A8 00 6C 274.371 1T11 155 01 97 E5 54 4B A8 00 6C 274.381 1R11 155 07 97 E5 54 4B A8 00 6C 274.381 1T11 155 01 97 E5 54 4B A8 00 6C 274.391 1R11 5D7 FF FF 01 E4 53 80 00
- normal up to here
!!! CAN RX is blocked here for ~70 ms !!! … interrupts disabled? by whom?
274.461 1R11 599 00 00 4D 7E 1E 26 FF FF 274.461 1R11 436 00 0E 3C B2 00 00 274.461 1R11 155 07 97 E5 54 4B A8 00 6C 274.461 1R11 424 11 40 15 26 47 64 00 48 274.461 1R11 425 0A 1D 44 FF FE 40 01 1F 274.461 1R11 556 30 63 07 30 73 07 30 7A 274.461 1T11 155 01 97 E5 54 4B A8 00 6C ← reflection for 155 still works here
!!! next RX pause, ~90 ms this time !!!
!!! ID 599 has a period of 100 ms, but we now get it 5 times:
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
274.551 1CEV Error intr=58445 rxpkt=27132 txpkt=528128 errflags=0x15433 rxerr=0 txerr=0 rxovr=2 txovr=0 txdelay=17 crtd fprintf bug compensation: 274.551 1CEV Error rxpkt=58445 txpkt=27132 errflags=528128 intr=0x15433 rxerr=0 txerr=0 rxovr=2 txovr=0 txdelay=17 errflags=528128 = 08 0F 00 0x08 = IR: Data overrun 0x0F = SR: Bus OK, error limit not yet reached, RX & TX idle, RXFIFO overrun, RX buffer full 0x00 = ECC: -
274.551 1R11 599 00 00 4D 7E 1E 26 FF FF 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF 274.551 1R11 155 07 97 E6 54 4B A8 00 6C RX "A" 274.551 1R11 155 07 97 E6 54 4B A8 00 6C RX "B" 274.551 1R11 155 07 97 E7 54 4B A8 00 6C RX "C" 274.551 1R11 423 03 22 FF FF 00 E0 00 DB 274.551 1R11 426 00 38 01 00 4D 7E 00 274.551 1R11 597 20 A4 03 B1 2F 00 01 53 274.551 1R11 627 00 00 00 274.551 1T11 155 01 97 E6 54 4B A8 00 6C reflection "A" 274.551 1CEV TX_Queue T11 155 01 97 E6 54 4B A8 00 6C reflection "B" 274.551 1CEV TX_Queue T11 155 01 97 E7 54 4B A8 00 6C reflection "C" 274.551 1T11 155 01 97 E8 54 4B A8 00 6C reflection "D"
!!! 120 ms pause !!!
274.671 1R11 155 07 97 E8 54 4B A8 00 6C RX "D"
→ "D" was sent to the listeners & reflected by vehicle::IncomingFrame() then, before the LogFrame() call was processed, the CAN task was paused for 120 ms
and now it's getting really weird:
274.671 1R11 155 07 08 2A A0 07 97 E7 54 RX "E": invalid frame / garbled message!
… this frame certainly was not on the bus! bytes 2-4 (08 2A A0) = complete nonsense, not matching any frame on the bus bytes 5-8 (07 97 E7 54) = normally bytes 1-4 (!) of ID 155 - copy / memory error? … very unlikely - CAN RX handling error? … doesn't seem so - RXFIFO message window bug?
274.671 1CEV Error intr=58458 rxpkt=27134 txpkt=528128 errflags=0x15456 rxerr=0 txerr=0 rxovr=3 txovr=0 txdelay=19 274.671 1R11 155 07 97 E7 54 4B A8 00 6C 274.671 1R11 155 07 97 E7 54 4B A8 00 6C 274.671 1R11 155 07 97 E7 54 4B A8 00 6C 274.671 1T11 155 01 97 E6 54 4B A8 00 6C 274.671 1CEV TX_Queue T11 155 01 97 E7 54 4B A8 00 6C 274.671 1R11 155 07 97 F2 54 4B A8 00 6C 274.671 1R11 155 07 97 F3 54 4B A8 00 6C 274.671 1R11 423 03 22 FF FF 00 E0 00 DB 274.671 1R11 426 00 38 01 00 4D 7E 00 274.671 1R11 597 20 A4 03 B1 2F 00 01 53 274.671 1R11 627 00 00 00 274.671 1R11 155 07 97 F5 54 4B A8 00 6C 274.671 1R11 155 07 97 F6 54 4B A8 00 6C 274.671 1R11 5D7 FF FF 01 E4 53 80 00 274.671 1R11 599 00 00 4D 7E 1E 26 FF FF 274.671 1R11 155 07 97 F7 54 4B A8 00 6C 274.671 1R11 436 00 0E 3C B2 00 00 274.671 1R11 041 A0 07 97 F5 54 4B A8 00 274.671 1T11 155 01 97 E7 54 4B A8 00 6C 274.671 1T11 155 01 08 2A A0 07 97 E7 54 reflection of invalid msg "E" → charge terminates due to error
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev