[Ovmsdev] Can buses stop after some time

ovms ovms at topphemmelig.no
Fri May 31 06:04:02 HKT 2019


It looks like it happens every time after triggering climate control. The firmware is 3.2.002-65-g02a07a2 I will try again tomorrow. Regards, Stein Arne On 2019-05-30 02:49, Mark Webb-Johnson <mark at webb-johnson.net> wrote: > Is this repeatable? Does it happen every time, or often, when you do this? What firmware are you running? > > > Can you get another ‘can can2 status’ at the time of the issue (presumably flatbed alert, before restarting can2)? > > > > Regards, Mark. > > > > > > > On 29 May 2019, at 2:29 AM, ovms <ovms at topphemmelig.no (mailto:ovms at topphemmelig.no)> wrote: > > > > > > > > I just discovered something. Since the temperature has been pretty stable the last month (not cold, and not warm) I have not used the climate control function in OVMS. > > > > With the latest firmware, CAN2 can recieve data for many days, until I trigger the "start climate control" command. Then CAN2 stops receiving. (and "possible theft" message occurs). Car is 2016 Leaf. > > > > > > > > Kind regards, > > > > Stein Arne Sordal > > > > > > > > > > > > > > On 2019-04-01 14:56, Mark Webb-Johnson <mark at webb-johnson.net (mailto:mark at webb-johnson.net)> wrote: > > > > > > > So this is with CAN2 not working? > > > > > > > > > Err flags are 0x80001080. For MCP2515 that is: > > > > > > > > > > > > > > > > > > (intstat << 24) | > > > > > > (errflag << 16) | > > > > > > intflag > > > > > > > > > > > > > > > > > > So intstat = 0x80, errflag = 0x00, intflag = 0x1080. > > > > > > > > > > > > The 0x10.. in intflag indicates that this just ran: > > > > > > > > > > > > > > > > > > > > > // clear error & wakeup interrupts: > > > > > > > > if (intstat & 0b11100000) > > > > > > > > { > > > > > > > > m_status.error_flags |= 0x1000; > > > > > > > > m_spibus->spi_cmd(m_spi, buf, 0, 4, CMD_BITMODIFY, 0x2c, intstat & 0b11100000, 0x00); > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > The instat=0x80 indicates "MERRF: Message Error Interrupt Flag bit”. So that is a bus error, but the errflag=0x00 so there are no indicated issues with either the 96 or 128 successive errors hit. The documentation (mcp2515 spec sheet) for this says: > > > > > > > > > > > > > > > > > > > > > > > > 7.4 Message Error Interrupt > > > > > > > > When an error occurs during the transmission or reception of a message, the message error flag (CANINTF.MERRF) will be set and, if the CANINTE.MERRE bit is set, an interrupt will be gener- ated on the INT pin. This is intended to be used to facilitate baud rate determination when used in conjunction with Listen-only mode. > > > > > > > > > > > > I can’t see anything obviously wrong with that. Not sure how to proceed. Perhaps we need a command to dump all the MCP2515 registers? We could at least then see the current state of the chip with all the gory details. > > > > > > > > > > > > Regards, Mark. > > > > > > > > > > > > > > > > > > > > On 1 Apr 2019, at 3:49 PM, ovms <ovms at topphemmelig.no (mailto:ovms at topphemmelig.no)> wrote: > > > > > > > > > > > > > > > > Looks the same, see attachment. > > > > > > > > > > > > > > > > -Stein Arne- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2019-04-01 09:38, Mark Webb-Johnson <mark at webb-johnson.net (mailto:mark at webb-johnson.net)> wrote: > > > > > > > > > > > > > This is the thread from last summer talking about CAN bus lock-ups. > > > > > > > > > > > > > > > I’m guessing this is still happening for Leaf? Since the changes made below (back then), I haven’t seen it in my car. > > > > > > > > > > > > > > > > > > > > Next time this happens to you guys (can2 stopped), can you get a ‘can can2 status’, wait 30 seconds, and repeat. Then send it here for us to look at. > > > > > > > > > > > > > > > > > > > > Regards, Mark. > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7 Jul 2018, at 10:37 PM, Mark Webb-Johnson <mark at webb-johnson.net (mailto:mark at webb-johnson.net)> wrote: > > > > > > > > > > > > > > > > > > > > > > > > I’m trying to recreate this with my three-can-buses-connected DB9 plugged in. Transmitting on CAN1 should make it appear on CAN2 and CAN3. > > > > > > > > > > > > > > > > > > Here is what I see: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# can can1 start active 1000000 > > > > > > > > > > > > > > Can bus can1 started in mode active at speed 1000000bps > > > > > > > > > > > > > > OVMS# can can2 start active 1000000 > > > > > > > > > > > > > > Can bus can2 started in mode active at speed 1000000bps > > > > > > > > > > > > > > OVMS# test cantx can1 25000 > > > > > > > > > > > > > > Testing 25000 frames on can1 > > > > > > > > > > > > > > Transmitted 25000 frames in 6.466209s = 258us/frame > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# can can1 status > > > > > > > > > > > > > > CAN: can1 > > > > > > > > > > > > > > Mode: Active > > > > > > > > > > > > > > Speed: 1000000 > > > > > > > > > > > > > > Interrupts: 24771 > > > > > > > > > > > > > > Rx pkt: 0 > > > > > > > > > > > > > > Rx err: 0 > > > > > > > > > > > > > > Rx ovrflw: 0 > > > > > > > > > > > > > > Tx pkt: 24880 > > > > > > > > > > > > > > Tx delays: 24703 > > > > > > > > > > > > > > Tx err: 0 > > > > > > > > > > > > > > Tx ovrflw: 109 > > > > > > > > > > > > > > Err flags: 0 > > > > > > > > > > > > > > OVMS# can can2 status > > > > > > > > > > > > > > > > > > > > > > > > > > > > CAN: can2 > > > > > > > > > > > > > > Mode: Active > > > > > > > > > > > > > > Speed: 1000000 > > > > > > > > > > > > > > Interrupts: 19084 > > > > > > > > > > > > > > Rx pkt: 24770 > > > > > > > > > > > > > > Rx err: 0 > > > > > > > > > > > > > > Rx ovrflw: 1 > > > > > > > > > > > > > > Tx pkt: 0 > > > > > > > > > > > > > > Tx delays: 0 > > > > > > > > > > > > > > Tx err: 0 > > > > > > > > > > > > > > Tx ovrflw: 0 > > > > > > > > > > > > > > Err flags: 0x2040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note the err flags 0x2040 on CAN2, but the bus remains up and working fine. > > > > > > > > > > > > > > > > > > > > > > > > Repeating the test gives us: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# test cantx can1 25000 > > > > > > > > > > > > > > Testing 25000 frames on can1 > > > > > > > > > > > > > > Transmitted 25000 frames in 6.479670s = 259us/frame > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# can can1 status > > > > > > > > > > > > > > CAN: can1 > > > > > > > > > > > > > > Mode: Active > > > > > > > > > > > > > > Speed: 1000000 > > > > > > > > > > > > > > Interrupts: 49546 > > > > > > > > > > > > > > Rx pkt: 0 > > > > > > > > > > > > > > Rx err: 0 > > > > > > > > > > > > > > Rx ovrflw: 0 > > > > > > > > > > > > > > Tx pkt: 49771 > > > > > > > > > > > > > > Tx delays: 49417 > > > > > > > > > > > > > > Tx err: 0 > > > > > > > > > > > > > > Tx ovrflw: 207 > > > > > > > > > > > > > > Err flags: 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# can can2 status > > > > > > > > > > > > > > CAN: can2 > > > > > > > > > > > > > > Mode: Active > > > > > > > > > > > > > > Speed: 1000000 > > > > > > > > > > > > > > Interrupts: 38288 > > > > > > > > > > > > > > Rx pkt: 49545 > > > > > > > > > > > > > > Rx err: 0 > > > > > > > > > > > > > > Rx ovrflw: 3 > > > > > > > > > > > > > > Tx pkt: 0 > > > > > > > > > > > > > > Tx delays: 0 > > > > > > > > > > > > > > Tx err: 0 > > > > > > > > > > > > > > Tx ovrflw: 0 > > > > > > > > > > > > > > Err flags: 0x2040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking at the mcp2515 code, it seems that Err flags are only stored "if (intstat & 0b10100000)”. That is "ERRF 0x80 = message tx/rx error” or "ERRIF 0x20 = overflow / error state change”. It is also set to "(intstat & 0b10100000) << 8 | errflag”, so doesn’t show all the error statuses. It is hard to rely on that for other errors/status on lock-up. Given that error_flags is a uint32_t, I think we can store more in it to allow for better diagnosis. > > > > > > > > > > > > > > > > > > > > > > > > I changed the mcp2515 driver to always set error_flags, on each interrupt handled (except spurious interrupts with no flags found), as follows: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > SSSSSSSSFFFFFFFF***EB01TLLLLLLLL > > > > > > > > > > > > > > > > > > > > > > > > > > > > SSSSSSSS = intstat > > > > > > > > > > > > > > FFFFFFFF = errflag > > > > > > > > > > > > > > B = RXB0 or RXB1 overflow flags cleared > > > > > > > > > > > > > > 0 = RXB0 overflowed > > > > > > > > > > > > > > 1 = RXB1 overflowed > > > > > > > > > > > > > > T = TX buffer has become available > > > > > > > > > > > > > > E = Error/WakeUp flags were cleared > > > > > > > > > > > > > > LLLLLLLL = intflag > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I did find a problem on line 300: if (intstat & 0b10100000). It think that should be 0b11100000 (to also pickup the RXB0 overflow), and removed the m_status.rxbuf_overflow++ from RXB0 overflow (as it is not really an overflow - as RXB1 got the data). > > > > > > > > > > > > > > > > > > > > > > > > With those changes made, I get: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OVMS# test cantx can1 25000 > > > > > > > Testing 25000 frames on can1 > > > > > > > Transmitted 25000 frames in 6.389849s = 255us/frame > > > > > > > > > > > > > > OVMS# can can1 status > > > > > > > CAN: can1 > > > > > > > Mode: Active > > > > > > > Speed: 1000000 > > > > > > > Interrupts: 24777 > > > > > > > Rx pkt: 0 > > > > > > > Rx err: 0 > > > > > > > Rx ovrflw: 0 > > > > > > > Tx pkt: 24884 > > > > > > > Tx delays: 24739 > > > > > > > Tx err: 0 > > > > > > > Tx ovrflw: 116 > > > > > > > Err flags: 0x00000000 > > > > > > > > > > > > > > OVMS# can can2 status > > > > > > > CAN: can2 > > > > > > > Mode: Active > > > > > > > Speed: 1000000 > > > > > > > Interrupts: 18935 > > > > > > > Rx pkt: 24777 > > > > > > > Rx err: 0 > > > > > > > Rx ovrflw: 0 > > > > > > > Tx pkt: 0 > > > > > > > Tx delays: 0 > > > > > > > Tx err: 0 > > > > > > > Tx ovrflw: 0 > > > > > > > Err flags: 0x01000001 > > > > > > > > > > > > > > > > > > > > > > > > I don’t think I’ve fixed anything (apart from that minor issue with RXB0 overflow diagnostics), but hopefully the new error_flags display should help finding out what is causing this lockup. Hopefully I haven’t broken anything. > > > > > > > > > > > > > > > > > > > > > > > > Regards, Mark. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7 Jul 2018, at 10:42 AM, Tom Parker <tom at carrott.org (mailto:tom at carrott.org)> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 07/07/18 00:05, Mark Webb-Johnson wrote: > > > > > > > > > > > > > > > Err flags 0x2040. The 0x20 part is the error interrupt. The 0x40 part is "RX0OVR: Receive Buffer 0 Overflow Flag bit”. > > > > > > > > > > > > > > > > Where the number on ‘can can2 status’ moving at all? Or completely stuck? > > > > > > > None of the can can2 status numbers change when the can bus is broken. After power cycling it they move. > > > > > > > > > > > > > > > > > > > > > > Seems different than the fault Greg and I are seeing. This one likely to be interrupt flag, or buffer overflow, not being cleared correctly. I’m guessing the overflow because that just doesn’t seem correct in mcp2515::RxCallback(). I’ll focus on that and have a look. > > > > > > > > > > > > > > I just checked the car again and it stopped with Rx ovrflw number only 1281, half what it got to last time. > > > > > > > > > > > > > > _______________________________________________ > > > > > > > OvmsDev mailing list > > > > > > > OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) > > > > > > > http://lists.openvehicles.com/mailman/listinfo/ovmsdev _______________________________________________ > > > > > > OvmsDev mailing list > > > > > > OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) > > > > > > http://lists.openvehicles.com/mailman/listinfo/ovmsdev _______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) http://lists.openvehicles.com/mailman/listinfo/ovmsdev <IMG_2835.PNG><IMG_2836.PNG>_______________________________________________ > > > > OvmsDev mailing list > > > > OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) > > > > http://lists.openvehicles.com/mailman/listinfo/ovmsdev _______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) http://lists.openvehicles.com/mailman/listinfo/ovmsdev _______________________________________________ > > OvmsDev mailing list > > OvmsDev at lists.openvehicles.com (mailto:OvmsDev at lists.openvehicles.com) > > http://lists.openvehicles.com/mailman/listinfo/ovmsdev _______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190531/47ed8e53/attachment.htm>


More information about the OvmsDev mailing list