[Ovmsdev] Can buses stop after some time

Mark Webb-Johnson mark at webb-johnson.net
Sat Jul 7 22:37:55 HKT 2018


I’m trying to recreate this with my three-can-buses-connected DB9 plugged in. Transmitting on CAN1 should make it appear on CAN2 and CAN3.

Here is what I see:

OVMS# can can1 start active 1000000
Can bus can1 started in mode active at speed 1000000bps
OVMS# can can2 start active 1000000
Can bus can2 started in mode active at speed 1000000bps
OVMS# test cantx can1 25000
Testing 25000 frames on can1
Transmitted 25000 frames in 6.466209s = 258us/frame

OVMS# can can1 status
CAN:       can1
Mode:      Active
Speed:     1000000
Interrupts:               24771
Rx pkt:                       0
Rx err:                       0
Rx ovrflw:                    0
Tx pkt:                   24880
Tx delays:                24703
Tx err:                       0
Tx ovrflw:                  109
Err flags: 0
OVMS# can can2 status

CAN:       can2
Mode:      Active
Speed:     1000000
Interrupts:               19084
Rx pkt:                   24770
Rx err:                       0
Rx ovrflw:                    1
Tx pkt:                       0
Tx delays:                    0
Tx err:                       0
Tx ovrflw:                    0
Err flags: 0x2040

Note the err flags 0x2040 on CAN2, but the bus remains up and working fine.

Repeating the test gives us:

OVMS# test cantx can1 25000
Testing 25000 frames on can1
Transmitted 25000 frames in 6.479670s = 259us/frame

OVMS# can can1 status
CAN:       can1
Mode:      Active
Speed:     1000000
Interrupts:               49546
Rx pkt:                       0
Rx err:                       0
Rx ovrflw:                    0
Tx pkt:                   49771
Tx delays:                49417
Tx err:                       0
Tx ovrflw:                  207
Err flags: 0

OVMS# can can2 status
CAN:       can2
Mode:      Active
Speed:     1000000
Interrupts:               38288
Rx pkt:                   49545
Rx err:                       0
Rx ovrflw:                    3
Tx pkt:                       0
Tx delays:                    0
Tx err:                       0
Tx ovrflw:                    0
Err flags: 0x2040

Looking at the mcp2515 code, it seems that Err flags are only stored "if (intstat & 0b10100000)”. That is "ERRF 0x80 = message tx/rx error” or "ERRIF 0x20 = overflow / error state change”. It is also set to "(intstat & 0b10100000) << 8 | errflag”, so doesn’t show all the error statuses. It is hard to rely on that for other errors/status on lock-up. Given that error_flags is a uint32_t, I think we can store more in it to allow for better diagnosis.

I changed the mcp2515 driver to always set error_flags, on each interrupt handled (except spurious interrupts with no flags found), as follows:

SSSSSSSSFFFFFFFF***EB01TLLLLLLLL
SSSSSSSS = intstat
FFFFFFFF = errflag
B = RXB0 or RXB1 overflow flags cleared
0 = RXB0 overflowed
1 = RXB1 overflowed
T = TX buffer has become available
E = Error/WakeUp flags were cleared
LLLLLLLL = intflag

I did find a problem on line 300: if (intstat & 0b10100000). It think that should be 0b11100000 (to also pickup the RXB0 overflow), and removed the m_status.rxbuf_overflow++ from RXB0 overflow (as it is not really an overflow - as RXB1 got the data).

With those changes made, I get:

OVMS# test cantx can1 25000
Testing 25000 frames on can1
Transmitted 25000 frames in 6.389849s = 255us/frame

OVMS# can can1 status
CAN:       can1
Mode:      Active
Speed:     1000000
Interrupts:               24777
Rx pkt:                       0
Rx err:                       0
Rx ovrflw:                    0
Tx pkt:                   24884
Tx delays:                24739
Tx err:                       0
Tx ovrflw:                  116
Err flags: 0x00000000

OVMS# can can2 status
CAN:       can2
Mode:      Active
Speed:     1000000
Interrupts:               18935
Rx pkt:                   24777
Rx err:                       0
Rx ovrflw:                    0
Tx pkt:                       0
Tx delays:                    0
Tx err:                       0
Tx ovrflw:                    0
Err flags: 0x01000001

I don’t think I’ve fixed anything (apart from that minor issue with RXB0 overflow diagnostics), but hopefully the new error_flags display should help finding out what is causing this lockup. Hopefully I haven’t broken anything.

Regards, Mark.

> On 7 Jul 2018, at 10:42 AM, Tom Parker <tom at carrott.org> wrote:
> 
> On 07/07/18 00:05, Mark Webb-Johnson wrote:
>> Err flags 0x2040. The 0x20 part is the error interrupt. The 0x40 part is "RX0OVR: Receive Buffer 0 Overflow Flag bit”.
>> 
>> Where the number on ‘can can2 status’ moving at all? Or completely stuck?
> 
> None of the can can2 status numbers change when the can bus is broken. After power cycling it they move.
> 
>> Seems different than the fault Greg and I are seeing. This one likely to be interrupt flag, or buffer overflow, not being cleared correctly. I’m guessing the overflow because that just doesn’t seem correct in mcp2515::RxCallback(). I’ll focus on that and have a look.
> 
> I just checked the car again and it stopped with Rx ovrflw number only 1281, half what it got to last time.
> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20180707/976f765b/attachment-0001.html>


More information about the OvmsDev mailing list