[Ovmsdev] Can buses stop after some time
mark at webb-johnson.net
Thu Jul 5 22:29:25 HKT 2018
I’ve spent some time on this, and finally managed to reliably repeat it (at least in one case) by:
Connect an external HUD and ‘obdii ecu start can3’.
Once the HUD is connected and working, manually change baud rate to incorrect ‘can can3 start active 250000’.
Watch errors start streaming in.
If I quickly switch back with ‘can can3 start active 500000’, it recovers and everything is fine.
If I leave it running, it seems to count up to 128 errors, and then lock up. At this point even a ‘can can3 start active 500000’ doesn’t solve it.
A ‘power can3 off’ then ‘can can3 start active 500000’ recovers it.
Here is what it looks like in the failed state:
OVMS# can can3 status
Rx pkt: 0
Rx err: 128
Rx ovrflw: 0
Tx pkt: 0
Tx delays: 0
Tx err: 0
Tx ovrflw: 0
Err flags: 0x800b
D (697321) canlog: Status can3 intr=35900 rxpkt=0 txpkt=0 errflags=0x800b rxerr=128 txerr=0 rxovr=0 txovr=0 txdelay=0
Can you check to see what yours looks like next time it fails?
Looking at the MCP2515 data sheet (page #45), it has this to say:
6.6 Error States
Detected errors are made known to all other nodes via error frames. The transmission of the erroneous mes- sage is aborted and the frame is repeated as soon as possible. Furthermore, each CAN node is in one of the three error states according to the value of the internal error counters:
3. Bus-off (transmitter only).
The error-active state is the usual state where the node can transmit messages and active error frames (made of dominant bits) without any restrictions.
In the error-passive state, messages and passive error frames (made of recessive bits) may be transmitted.
The bus-off state makes it temporarily impossible for the station to participate in the bus communication. During this state, messages can neither be received or transmitted. Only transmitters can go bus-off.
6.7 Error Modes and Error Counters
The MCP2515 contains two error counters: the Receive Error Counter (REC) (see Register 6-2) and the Transmit Error Counter (TEC) (see Register 6-1). The values of both counters can be read by the MCU. These counters are incremented/decremented in accordance with the CAN bus specification.
The MCP2515 is error-active if both error counters are below the error-passive limit of 128.
It is error-passive if at least one of the error counters equals or exceeds 128.
It goes to bus-off if the TEC exceeds the bus-off limit of 255. The device remains in this state until the bus-off recovery sequence is received. The bus-off recovery sequence consists of 128 occurrences and 11 consec- utive recessive bits (see Figure 6-1).
The Current Error mode of the MCP2515 can be read by the MCU via the EFLG register (see Register 6-3).
Additionally, there is an error state warning flag bit (EFLG:EWARN) which is set if at least one of the error counters equals or exceeds the error warning limit of 96. EWARN is reset if both error counters are less than the error warning limit.
I don’t think we access these TEC and REC registers, but the 128 number cannot be a coincidence.
We do access the EFLG register, in our ISR, and here is what I see:
E (685091) canlog: Error can3 intr=30 rxpkt=0 txpkt=0 errflags=0x8000 rxerr=56 txerr=0 rxovr=0 txovr=0 txdelay=0
E (685091) canlog: Error can3 intr=31 rxpkt=0 txpkt=0 errflags=0x8000 rxerr=58 txerr=0 rxovr=0 txovr=0 txdelay=0
E (685091) canlog: Error can3 intr=32 rxpkt=0 txpkt=0 errflags=0x8000 rxerr=60 txerr=0 rxovr=0 txovr=0 txdelay=0
E (685091) canlog: Error can3 intr=43 rxpkt=0 txpkt=0 errflags=0x8000 rxerr=81 txerr=0 rxovr=0 txovr=0 txdelay=0
E (685101) canlog: Error can3 intr=60 rxpkt=0 txpkt=0 errflags=0x8003 rxerr=113 txerr=0 rxovr=0 txovr=0 txdelay=0
Lower 8bits of that is the EFLG, so 0x00 is normal, 0x03 is when the error is hit, and 0x0b is what we see later. Documentation for this flag is:
bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0
R/W-0 R-0 R-0 R-0 R-0 R-0
bit#7: RX1OVR: Receive Buffer 1 Overflow Flag bit
- Set when a valid message is received for RXB1 and CANINTF.RX1IF = 1 - Must be reset by MCU
bit#6: RX0OVR: Receive Buffer 0 Overflow Flag bit
- Set when a valid message is received for RXB0 and CANINTF.RX0IF = 1
- Must be reset by MCU
bit#5: TXBO: Bus-Off Error Flag bit
- Bit set when TEC reaches 255
- Reset after a successful bus recovery sequence
bit#4: TXEP: Transmit Error-Passive Flag bit
- Set when TEC is equal to or greater than 128 - Reset when TEC is less than 128
bit#3: RXEP: Receive Error-Passive Flag bit
- Set when REC is equal to or greater than 128
- Reset when REC is less than 128
bit#2: TXWAR: Transmit Error Warning Flag bit
- Set when TEC is equal to or greater than 96 - Reset when TEC is less than 96
bit#1: RXWAR: Receive Error Warning Flag bit
- Set when REC is equal to or greater than 96 - Reset when REC is less than 96
bit#0: EWARN: Error Warning Flag bit
- Set when TEC or REC is equal to or greater than 96 (TXWAR or RXWAR = 1)
- Reset when both REC and TEC are less than 96
So that is EWARN+RXWAR when the 128 error issue occurs, and EWARN+RXWAR+RXEP when everything is locked up. We have code to clear the error condition (in the interrupt flags register), but that doesn’t seem to get out of this 128 error lock-up.
I am not sure of the best approach for this. Perhaps pickup the condition, and reset the SPI bus, in a timer every 10 seconds or so?
I am not sure if this is your problem (a ‘can can2 status’ would tell us). In any case, the fix for this is to pickup this error condition in the ISR and fix it (or perhaps a separate periodic timer).
> On 5 Jul 2018, at 3:55 PM, Tom Parker <tom at carrott.org> wrote:
> I haven't had a chance to try to work out what is going on.
> I can say that the second can interface doesn't work for very long before stopping. This manifests most obviously on my Leaf as a stopped odometer in the OVMS app. If you look at the metrics in the console then everything that comes from the Car CAN bus (ie the second CAN bus) has frozen.
> The first CAN interface seems much more reliable, with SOC information from the EV bus being fairly reliably reported.
> I haven't done the modification to make my 3.0 unit's GPS work so I haven't experienced the stolen detection.
> On 05/07/18 18:34, Stein Arne Sordal wrote:
>> Did anyone figure out what happens here?
>> Now the OVMS thinks my car is stolen since it´s moving (GPS) and CAN2 is dead.
>> Reboot of module brings CAN2 back to life for a period of time.
>> -Stein Arne Sordal-
>>> On 11 May 2018, at 12:29, Stein Arne Sordal <ovms at topphemmelig.no> wrote:
>>> Hi Tom
>>> I have seen this with my Leaf.
>>> I´ve been on vacation, so I haven´t got time to test a lot, but it looks like one of the can buses stops. Started testing again today.
>>> -Stein Arne Sordal-
>>>> On 11 May 2018, at 12:22, Tom Parker <tom at carrott.org> wrote:
>>>> Hi all,
>>>> I synced up with master about a week ago and since then I've seen both can busses stop working. I still see the 12v battery metric changing, but everything that comes from the car stops. Rebooting the module with "module reset" does not seem to fix it, while make app-flash monitor does fix it. I haven't tried make monitor on it's own.
>>>> Is anyone else seeing behavior like this?
>>>> Sorry for the vague bug report. I'll spend some time later this weekend to try to gather more information.
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OvmsDev