[Ovmsdev] esp32can issue

Mark Webb-Johnson mark at webb-johnson.net
Tue Jun 12 21:44:06 HKT 2018


I wrote extensions to the ‘test canrx’ and ‘test cantx’ commands. They now provide high resolution timings, which can be used to get an idea of performance. CAN2/CAN3 tx is dog slow compared to CAN1.

OVMS# test cantx can2 100
Testing 100 frames on can2
Transmitted 100 frames in 0.299612s = 2996us/frame

OVMS# test cantx can1 100
Testing 100 frames on can1
Transmitted 100 frames in 0.013691s = 136us/frame

I tried adding the "while (MODULE_ESP32CAN->SR.B.RBS)” into esp32can, and it seems to work just fine. Not sure how to replicate the problem, so not sure if it resolves anything - but it doesn’t seem to make anything worse so I committed it. Feel free to revert that commit if it causes any issues.

I’ll continue looking into this tomorrow.

Regards, Mark.

> On 12 Jun 2018, at 2:37 PM, Mark Webb-Johnson <mark at webb-johnson.net> wrote:
> 
>> But how would you check that a frame is available? I understand from the documentation, the RX IRQ isn't raised unless the FIFO has a valid message and (in
>> PeliCAN mode) has been mapped to the buffer address. I can't see how we could check the frame validity additionally to that.
> 
> I suggest to put a check for MODULE_ESP32CAN->SR.B.RBS at the start of ESP32CAN_rxframe. That status bit is documented as indicating "one or more messages are available in the RXFIFO”. Presumably we shouldn’t be reading from the RX FIFO unless that bit is set.
> 
> I agree that the documentation doesn’t talk about spurious interrupts, but adding that check should not do any harm. Reading the receive buffer when nothing is in it, by comparison, would produce all sorts of weird incoming frame results (including duplicates, and the partial frame corruptions you mention, due to RX FIFO wrapping).
> 
> Looking at page #28 of the datasheet, we have:
> 
> 4. After reading the contents of the receive buffer, the CPU can release this memory space in the RXFIFO by setting the release receive buffer bit to logic 1. This may result in another message becoming immediately available within the receive buffer. If there is no other message available, the receive interrupt bit is reset.
> 
> Does that mean that we should be checking the receive buffer in ESP32CAN_rxframe and looping? I’m confused because if it did not then the interrupt would presumably fire again, and cause recursion (or is the new interrupt delayed until the end of the current ISR execution)? Perhaps we can simply do a:
> 
> while (MODULE_ESP32CAN->SR.B.RBS)
>   {
>   // Read and process a message
>   }
> 
> in ESP32CAN_rxframe to process all incoming messages? The mcp2515 does something similar (relying on can.cpp CAN_rxstask to loop until no more messages available).
> 
> Regards, Mark.
> 
>> On 12 Jun 2018, at 5:22 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>> 
>> Am 11.06.2018 um 04:07 schrieb Mark Webb-Johnson:
>>> I think the fault must be internal (controller, or our driver code), as the can bus itself has error checking.
>>> The only other thing I noticed on my review is that in ESP32CAN_rxframe there is no check that a frame is actually available before reading and queueing it. If the RX interrupt is spurious, that could cause all sorts of issues. I think it would be safer to have such a check, and maybe that is the cause of this issue?
>> 
>> You mean we could be reading the RX mailbox a) without a new signal and b) before the next message is fully rolled in.
>> 
>> But how would you check that a frame is available? I understand from the documentation, the RX IRQ isn't raised unless the FIFO has a valid message and (in
>> PeliCAN mode) has been mapped to the buffer address. I can't see how we could check the frame validity additionally to that.
>> 
>> Setting CMR.RRB then allows the chip to fetch the next message from the FIFO, which will trigger another IR.RI as soon as the buffer is filled.
>> 
>> IR.RI is a level interrupt, so if stuck will trigger again. But reading IR clears it. So even if the ISR would be called again due to an IRQ handling bug, IR.RI
>> would be 0, so rxframe() wouldn't be called.
>> 
>> …unless IR.RI was set due to a hardware bug…
>> 
>> …or IR.RI could also be set on a data overrun (or other error) for an invalid message. That would mean IR.RI must be discarded in case of an error interrupt
>> coming in with IR.RI, but I've seen nothing in the documentation about something like this.
>> 
>>> I’m working on the MCP2515 driver at the moment, to try to find out what is causing the OBDII HUD lock-up (and others have seen). Perhaps I’ll find out something when reviewing the driver and shared CAN library. Due to the huge volume of traffic on my Tesla Model S, I get a large number of overruns but don’t see this problem. Maybe it is rare, or we are just not noticing it.
>> 
>> I assume the problem can be triggered by poor Wifi connectivity -- that's the situation at Frank's home. I assume the Wifi driver disables interrupts or context
>> switches in some situations, which causes the CAN rx task execution pauses.
>> 
>> Just a wild guess though, I would check the source for this pattern if it was open.
>> 
>> Regards,
>> Michael
>> 
>> 
>>> Regards, Mark
>>> 
>>>> On 10 Jun 2018, at 9:05 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>> 
>>>> Frank Demes reported a very strange CAN behaviour. He managed to get a CRTD log of one incident.
>>>> 
>>>> The effect is triggered by an unknown cause freezing the CAN rx task for short periods of time, i.e. 70 - 120 ms. This alone should be no problem, the RXFIFO
>>>> will probaly overflow, but that is handled by our driver, so except some messages lost this should not be an issue.
>>>> 
>>>> Unfortunately, something goes completely wrong:
>>>> 
>>>> a) on the overrun event, the last valid message is delivered multiple times (i.e. 5 times)
>>>> 
>>>> b) after resolving the overrun, a garbled message is received.
>>>> 
>>>> The garbled message contains a valid byte sequence in the wrong place, i.e. bytes 5-8 would be valid if they were bytes 1-4. Bytes 1-4 cannot have been read
>>>> from the bus, so something is going very wrong here.
>>>> 
>>>> I have walked through the esp32can driver and checked the SJA1000 documentation but have no clue what is going wrong.
>>>> 
>>>> Any ideas? CRTD excerpt below. (Frame 155 is reflected by the module with byte 1 changed from 07 to 01 to limit the charge current. The problem is, it also
>>>> reflects the garbled message, which causes the charger to stop.)
>>>> 
>>>> Regards,
>>>> Michael
>>>> 
>>>> 
>>>> ----------------------------------------------------------------------------------------
>>>>>>>> 274.341 1R11 155 07 97 E4 54 4B A8 00 6C
>>>> 274.341 1T11 155 01 97 E4 54 4B A8 00 6C
>>>> 274.351 1R11 155 07 97 E5 54 4B A8 00 6C
>>>> 274.351 1T11 155 01 97 E5 54 4B A8 00 6C
>>>> 274.361 1R11 155 07 97 E5 54 4B A8 00 6C
>>>> 274.361 1T11 155 01 97 E5 54 4B A8 00 6C
>>>> 274.371 1R11 423 03 22 FF FF 00 E0 00 DB
>>>> 274.371 1R11 426 00 38 01 00 4D 7E 00
>>>> 274.371 1R11 597 20 A4 03 B1 2F 00 01 53
>>>> 274.371 1R11 627 00 00 00
>>>> 274.371 1R11 155 07 97 E5 54 4B A8 00 6C
>>>> 274.371 1T11 155 01 97 E5 54 4B A8 00 6C
>>>> 274.381 1R11 155 07 97 E5 54 4B A8 00 6C
>>>> 274.381 1T11 155 01 97 E5 54 4B A8 00 6C
>>>> 274.391 1R11 5D7 FF FF 01 E4 53 80 00
>>>> 
>>>>  - normal up to here
>>>> 
>>>>  !!! CAN RX is blocked here for ~70 ms !!!
>>>>    … interrupts disabled? by whom?
>>>> 
>>>> 274.461 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.461 1R11 436 00 0E 3C B2 00 00
>>>> 274.461 1R11 155 07 97 E5 54 4B A8 00 6C
>>>> 274.461 1R11 424 11 40 15 26 47 64 00 48
>>>> 274.461 1R11 425 0A 1D 44 FF FE 40 01 1F
>>>> 274.461 1R11 556 30 63 07 30 73 07 30 7A
>>>> 274.461 1T11 155 01 97 E5 54 4B A8 00 6C      ← reflection for 155 still works here
>>>> 
>>>>  !!! next RX pause, ~90 ms this time !!!
>>>> 
>>>>  !!! ID 599 has a period of 100 ms, but we now get it 5 times:
>>>> 
>>>> 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 
>>>> 274.551 1CEV Error intr=58445 rxpkt=27132 txpkt=528128 errflags=0x15433 rxerr=0 txerr=0 rxovr=2 txovr=0 txdelay=17
>>>>  crtd fprintf bug compensation:
>>>>  274.551 1CEV Error rxpkt=58445 txpkt=27132 errflags=528128 intr=0x15433 rxerr=0 txerr=0 rxovr=2 txovr=0 txdelay=17
>>>>    errflags=528128 = 08 0F 00
>>>>    0x08 = IR: Data overrun
>>>>    0x0F = SR: Bus OK, error limit not yet reached, RX & TX idle, RXFIFO overrun, RX buffer full
>>>>    0x00 = ECC: -
>>>> 
>>>> 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.551 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.551 1R11 155 07 97 E6 54 4B A8 00 6C                    RX "A"
>>>> 274.551 1R11 155 07 97 E6 54 4B A8 00 6C                    RX "B"
>>>> 274.551 1R11 155 07 97 E7 54 4B A8 00 6C                    RX "C"
>>>> 274.551 1R11 423 03 22 FF FF 00 E0 00 DB
>>>> 274.551 1R11 426 00 38 01 00 4D 7E 00
>>>> 274.551 1R11 597 20 A4 03 B1 2F 00 01 53
>>>> 274.551 1R11 627 00 00 00
>>>> 274.551 1T11 155 01 97 E6 54 4B A8 00 6C                    reflection "A"
>>>> 274.551 1CEV TX_Queue T11 155 01 97 E6 54 4B A8 00 6C       reflection "B"
>>>> 274.551 1CEV TX_Queue T11 155 01 97 E7 54 4B A8 00 6C       reflection "C"
>>>> 274.551 1T11 155 01 97 E8 54 4B A8 00 6C                    reflection "D"
>>>> 
>>>>  !!! 120 ms pause !!!
>>>> 
>>>> 274.671 1R11 155 07 97 E8 54 4B A8 00 6C                    RX "D"
>>>> 
>>>>  → "D" was sent to the listeners & reflected by vehicle::IncomingFrame()
>>>>    then, before the LogFrame() call was processed, the CAN task was paused for 120 ms
>>>> 
>>>>  and now it's getting really weird:
>>>> 
>>>> 274.671 1R11 155 07 08 2A A0 07 97 E7 54                    RX "E": invalid frame / garbled message!
>>>> 
>>>>  … this frame certainly was not on the bus!
>>>>    bytes 2-4 (08 2A A0) = complete nonsense, not matching any frame on the bus
>>>>    bytes 5-8 (07 97 E7 54) = normally bytes 1-4 (!) of ID 155
>>>>    - copy / memory error?      … very unlikely
>>>>    - CAN RX handling error?    … doesn't seem so
>>>>    - RXFIFO message window bug?
>>>> 
>>>> 274.671 1CEV Error intr=58458 rxpkt=27134 txpkt=528128 errflags=0x15456 rxerr=0 txerr=0 rxovr=3 txovr=0 txdelay=19
>>>> 274.671 1R11 155 07 97 E7 54 4B A8 00 6C
>>>> 274.671 1R11 155 07 97 E7 54 4B A8 00 6C
>>>> 274.671 1R11 155 07 97 E7 54 4B A8 00 6C
>>>> 274.671 1T11 155 01 97 E6 54 4B A8 00 6C
>>>> 274.671 1CEV TX_Queue T11 155 01 97 E7 54 4B A8 00 6C
>>>> 274.671 1R11 155 07 97 F2 54 4B A8 00 6C
>>>> 274.671 1R11 155 07 97 F3 54 4B A8 00 6C
>>>> 274.671 1R11 423 03 22 FF FF 00 E0 00 DB
>>>> 274.671 1R11 426 00 38 01 00 4D 7E 00
>>>> 274.671 1R11 597 20 A4 03 B1 2F 00 01 53
>>>> 274.671 1R11 627 00 00 00
>>>> 274.671 1R11 155 07 97 F5 54 4B A8 00 6C
>>>> 274.671 1R11 155 07 97 F6 54 4B A8 00 6C
>>>> 274.671 1R11 5D7 FF FF 01 E4 53 80 00
>>>> 274.671 1R11 599 00 00 4D 7E 1E 26 FF FF
>>>> 274.671 1R11 155 07 97 F7 54 4B A8 00 6C
>>>> 274.671 1R11 436 00 0E 3C B2 00 00
>>>> 274.671 1R11 041 A0 07 97 F5 54 4B A8 00
>>>> 274.671 1T11 155 01 97 E7 54 4B A8 00 6C
>>>> 274.671 1T11 155 01 08 2A A0 07 97 E7 54                    reflection of invalid msg "E" → charge terminates due to error
>>>> 
>>>> 
>>>> -- 
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>> 
>>>> 
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>> 
>> -- 
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>> 
>> 
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20180612/ed181b4e/attachment.htm>


More information about the OvmsDev mailing list