[Ovmsdev] Can buses stop after some time

Mark Webb-Johnson mark at webb-johnson.net
Thu May 30 08:49:17 HKT 2019


Is this repeatable? Does it happen every time, or often, when you do this? What firmware are you running?

Can you get another ‘can can2 status’ at the time of the issue (presumably flatbed alert, before restarting can2)?

Regards, Mark.

> On 29 May 2019, at 2:29 AM, ovms <ovms at topphemmelig.no> wrote:
> 
> I just discovered something. Since the temperature has been pretty stable the last month (not cold, and not warm) I have not used the climate control function in OVMS.
> With the latest firmware, CAN2 can recieve data for many days, until I trigger the "start climate control" command. Then CAN2 stops receiving. (and "possible theft" message occurs). Car is 2016 Leaf. 
>  
> Kind regards,
> Stein Arne Sordal
>  
>  
> On 2019-04-01 14:56, Mark Webb-Johnson <mark at webb-johnson.net> wrote:
> So this is with CAN2 not working?
>  
> Err flags are 0x80001080. For MCP2515 that is:
>  
> (intstat << 24) |
> (errflag << 16) |
> intflag
>  
> So intstat = 0x80, errflag = 0x00, intflag = 0x1080.
>  
> The 0x10.. in intflag indicates that this just ran:
>  
>   // clear error & wakeup interrupts:
>   if (intstat & 0b11100000)
>     {
>     m_status.error_flags |= 0x1000;
>     m_spibus->spi_cmd(m_spi, buf, 0, 4, CMD_BITMODIFY, 0x2c, intstat & 0b11100000, 0x00);
>     }
>  
> The instat=0x80 indicates "MERRF: Message Error Interrupt Flag bit”. So that is a bus error, but the errflag=0x00 so there are no indicated issues with either the 96 or 128 successive errors hit. The documentation (mcp2515 spec sheet) for this says:
>  
> 7.4 Message Error Interrupt
> 
> When an error occurs during the transmission or reception of a message, the message error flag (CANINTF.MERRF) will be set and, if the CANINTE.MERRE bit is set, an interrupt will be gener- ated on the INT pin. This is intended to be used to facilitate baud rate determination when used in conjunction with Listen-only mode.
>  
> I can’t see anything obviously wrong with that. Not sure how to proceed. Perhaps we need a command to dump all the MCP2515 registers? We could at least then see the current state of the chip with all the gory details.
>  
> Regards, Mark.
>  
> On 1 Apr 2019, at 3:49 PM, ovms <ovms at topphemmelig.no <mailto:ovms at topphemmelig.no>> wrote:
> 
> Looks the same, see attachment.
>  
> -Stein Arne-
>  
>  
>  
> On 2019-04-01 09:38, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
> This is the thread from last summer talking about CAN bus lock-ups.
>  
> I’m guessing this is still happening for Leaf? Since the changes made below (back then), I haven’t seen it in my car.
>  
> Next time this happens to you guys (can2 stopped), can you get a ‘can can2 status’, wait 30 seconds, and repeat. Then send it here for us to look at.
>  
> Regards, Mark.
> 
> On 7 Jul 2018, at 10:37 PM, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
> 
> I’m trying to recreate this with my three-can-buses-connected DB9 plugged in. Transmitting on CAN1 should make it appear on CAN2 and CAN3.
>  
> Here is what I see:
>  
> OVMS# can can1 start active 1000000
> Can bus can1 started in mode active at speed 1000000bps
> OVMS# can can2 start active 1000000
> Can bus can2 started in mode active at speed 1000000bps
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.466209s = 258us/frame
>  
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               24771
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   24880
> Tx delays:                24703
> Tx err:                       0
> Tx ovrflw:                  109
> Err flags: 0
> OVMS# can can2 status
>  
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               19084
> Rx pkt:                   24770
> Rx err:                       0
> Rx ovrflw:                    1
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x2040
>  
> Note the err flags 0x2040 on CAN2, but the bus remains up and working fine.
>  
> Repeating the test gives us:
>  
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.479670s = 259us/frame
>  
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               49546
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   49771
> Tx delays:                49417
> Tx err:                       0
> Tx ovrflw:                  207
> Err flags: 0
>  
> OVMS# can can2 status
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               38288
> Rx pkt:                   49545
> Rx err:                       0
> Rx ovrflw:                    3
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x2040
>  
> Looking at the mcp2515 code, it seems that Err flags are only stored "if (intstat & 0b10100000)”. That is "ERRF 0x80 = message tx/rx error” or "ERRIF 0x20 = overflow / error state change”. It is also set to "(intstat & 0b10100000) << 8 | errflag”, so doesn’t show all the error statuses. It is hard to rely on that for other errors/status on lock-up. Given that error_flags is a uint32_t, I think we can store more in it to allow for better diagnosis.
>  
> I changed the mcp2515 driver to always set error_flags, on each interrupt handled (except spurious interrupts with no flags found), as follows:
>  
> SSSSSSSSFFFFFFFF***EB01TLLLLLLLL
> SSSSSSSS = intstat
> FFFFFFFF = errflag
> B = RXB0 or RXB1 overflow flags cleared
> 0 = RXB0 overflowed
> 1 = RXB1 overflowed
> T = TX buffer has become available
> E = Error/WakeUp flags were cleared
> LLLLLLLL = intflag
>  
> I did find a problem on line 300: if (intstat & 0b10100000). It think that should be 0b11100000 (to also pickup the RXB0 overflow), and removed the m_status.rxbuf_overflow++ from RXB0 overflow (as it is not really an overflow - as RXB1 got the data).
>  
> With those changes made, I get:
>  
> OVMS# test cantx can1 25000
> Testing 25000 frames on can1
> Transmitted 25000 frames in 6.389849s = 255us/frame
> 
> OVMS# can can1 status
> CAN:       can1
> Mode:      Active
> Speed:     1000000
> Interrupts:               24777
> Rx pkt:                       0
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                   24884
> Tx delays:                24739
> Tx err:                       0
> Tx ovrflw:                  116
> Err flags: 0x00000000
> 
> OVMS# can can2 status
> CAN:       can2
> Mode:      Active
> Speed:     1000000
> Interrupts:               18935
> Rx pkt:                   24777
> Rx err:                       0
> Rx ovrflw:                    0
> Tx pkt:                       0
> Tx delays:                    0
> Tx err:                       0
> Tx ovrflw:                    0
> Err flags: 0x01000001
>  
> I don’t think I’ve fixed anything (apart from that minor issue with RXB0 overflow diagnostics), but hopefully the new error_flags display should help finding out what is causing this lockup. Hopefully I haven’t broken anything.
>  
> Regards, Mark.
>  
> On 7 Jul 2018, at 10:42 AM, Tom Parker <tom at carrott.org <mailto:tom at carrott.org>> wrote:
> 
> On 07/07/18 00:05, Mark Webb-Johnson wrote:
> Err flags 0x2040. The 0x20 part is the error interrupt. The 0x40 part is "RX0OVR: Receive Buffer 0 Overflow Flag bit”.
> 
> Where the number on ‘can can2 status’ moving at all? Or completely stuck?
> 
> None of the can can2 status numbers change when the can bus is broken. After power cycling it they move.
> 
> Seems different than the fault Greg and I are seeing. This one likely to be interrupt flag, or buffer overflow, not being cleared correctly. I’m guessing the overflow because that just doesn’t seem correct in mcp2515::RxCallback(). I’ll focus on that and have a look.
> 
> I just checked the car again and it stopped with Rx ovrflw number only 1281, half what it got to last time.
> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev><IMG_2835.PNG><IMG_2836.PNG>_______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> _______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190530/534fd8f6/attachment.htm>


More information about the OvmsDev mailing list