[Ovmsdev] Can buses stop after some time

Mark Webb-Johnson mark at webb-johnson.net
Mon Apr 1 20:55:40 HKT 2019


So this is with CAN2 not working?

Err flags are 0x80001080. For MCP2515 that is:

(intstat << 24) |
(errflag << 16) |
intflag

So intstat = 0x80, errflag = 0x00, intflag = 0x1080.

The 0x10.. in intflag indicates that this just ran:

  // clear error & wakeup interrupts:
  if (intstat & 0b11100000)
    {
    m_status.error_flags |= 0x1000;
    m_spibus->spi_cmd(m_spi, buf, 0, 4, CMD_BITMODIFY, 0x2c, intstat & 0b11100000, 0x00);
    }

The instat=0x80 indicates "MERRF: Message Error Interrupt Flag bit”. So that is a bus error, but the errflag=0x00 so there are no indicated issues with either the 96 or 128 successive errors hit. The documentation (mcp2515 spec sheet) for this says:

7.4 Message Error Interrupt

When an error occurs during the transmission or reception of a message, the message error flag (CANINTF.MERRF) will be set and, if the CANINTE.MERRE bit is set, an interrupt will be gener- ated on the INT pin. This is intended to be used to facilitate baud rate determination when used in conjunction with Listen-only mode.

I can’t see anything obviously wrong with that. Not sure how to proceed. Perhaps we need a command to dump all the MCP2515 registers? We could at least then see the current state of the chip with all the gory details.

Regards, Mark.

> On 1 Apr 2019, at 3:49 PM, ovms <ovms at topphemmelig.no> wrote:
> 
> Looks the same, see attachment.
>  
> -Stein Arne-
>  
>  
>  
> On 2019-04-01 09:38, Mark Webb-Johnson <mark at webb-johnson.net> wrote:
> This is the thread from last summer talking about CAN bus lock-ups.
>  
> I’m guessing this is still happening for Leaf? Since the changes made below (back then), I haven’t seen it in my car.
>  
> Next time this happens to you guys (can2 stopped), can you get a ‘can can2 status’, wait 30 seconds, and repeat. Then send it here for us to look at.
>  
> Regards, Mark.
> 
> On 7 Jul 2018, at 10:37 PM, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
> 
> I’m trying to recreate this with my three-can-buses-connected DB9 plugged in. Transmitting on CAN1 should make it appear on CAN2 and CAN3.
>  
> Here is what I see:
>  
> OVMS# can can1 start active 1000000
> Can bus can1 started in mode active at speed 1000000bps
> OVMS# can can2 start active 1000000
> Can bus can2 started in mode active at speed 1000000bps
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.466209s = 258us/frame
>  
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               24771
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   24880
> Tx delays:                24703
> Tx err:                       0
> Tx ovrflw:                  109
> Err flags: 0
> OVMS# can can2 status
>  
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               19084
> Rx pkt:                   24770
> Rx err:                       0
> Rx ovrflw:                    1
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x2040
>  
> Note the err flags 0x2040 on CAN2, but the bus remains up and working fine.
>  
> Repeating the test gives us:
>  
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.479670s = 259us/frame
>  
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               49546
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   49771
> Tx delays:                49417
> Tx err:                       0
> Tx ovrflw:                  207
> Err flags: 0
>  
> OVMS# can can2 status
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               38288
> Rx pkt:                   49545
> Rx err:                       0
> Rx ovrflw:                    3
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x2040
>  
> Looking at the mcp2515 code, it seems that Err flags are only stored "if (intstat & 0b10100000)”. That is "ERRF 0x80 = message tx/rx error” or "ERRIF 0x20 = overflow / error state change”. It is also set to "(intstat & 0b10100000) << 8 | errflag”, so doesn’t show all the error statuses. It is hard to rely on that for other errors/status on lock-up. Given that error_flags is a uint32_t, I think we can store more in it to allow for better diagnosis.
>  
> I changed the mcp2515 driver to always set error_flags, on each interrupt handled (except spurious interrupts with no flags found), as follows:
>  
> SSSSSSSSFFFFFFFF***EB01TLLLLLLLL
> SSSSSSSS = intstat
> FFFFFFFF = errflag
> B = RXB0 or RXB1 overflow flags cleared
> 0 = RXB0 overflowed
> 1 = RXB1 overflowed
> T = TX buffer has become available
> E = Error/WakeUp flags were cleared
> LLLLLLLL = intflag
>  
> I did find a problem on line 300: if (intstat & 0b10100000). It think that should be 0b11100000 (to also pickup the RXB0 overflow), and removed the m_status.rxbuf_overflow++ from RXB0 overflow (as it is not really an overflow - as RXB1 got the data).
>  
> With those changes made, I get:
>  
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.389849s = 255us/frame
> 
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               24777
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   24884
> Tx delays:                24739
> Tx err:                       0
> Tx ovrflw:                  116
> Err flags: 0x00000000
> 
> OVMS# can can2 status
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               18935
> Rx pkt:                   24777
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x01000001
>  
> I don’t think I’ve fixed anything (apart from that minor issue with RXB0 overflow diagnostics), but hopefully the new error_flags display should help finding out what is causing this lockup. Hopefully I haven’t broken anything.
>  
> Regards, Mark.
>  
> On 7 Jul 2018, at 10:42 AM, Tom Parker <tom at carrott.org <mailto:tom at carrott.org>> wrote:
> 
> On 07/07/18 00:05, Mark Webb-Johnson wrote:
> Err flags 0x2040. The 0x20 part is the error interrupt. The 0x40 part is "RX0OVR: Receive Buffer 0 Overflow Flag bit”.
> 
> Where the number on ‘can can2 status’ moving at all? Or completely stuck?
> 
> None of the can can2 status numbers change when the can bus is broken. After power cycling it they move.
> 
> Seems different than the fault Greg and I are seeing. This one likely to be interrupt flag, or buffer overflow, not being cleared correctly. I’m guessing the overflow because that just doesn’t seem correct in mcp2515::RxCallback(). I’ll focus on that and have a look.
> 
> I just checked the car again and it stopped with Rx ovrflw number only 1281, half what it got to last time.
> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> _______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> <IMG_2835.PNG><IMG_2836.PNG>_______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190401/19cac7ed/attachment-0001.html>


More information about the OvmsDev mailing list