[Ovmsdev] Can buses stop after some time

Mark Webb-Johnson mark at webb-johnson.net
Thu May 30 08:50:59 HKT 2019


In the case of the vehicle modules, the CAN ports are open at boot, work well, but fail some time later.

The issue seems to be caused by errors on the CAN bus not recovered from, and opening the port could trigger that. But in the case of the vehicle modules, I don’t think that is the cause.

Regards, Mark.

> On 30 May 2019, at 1:25 AM, Greg D. <gregd2350 at gmail.com> wrote:
> 
> Hi Tom,
> 
> My suspicion is that it's the opening of the CAN port where the problem is.  I tried putting delays in the OBD2ECU code to cause a receive overrun, and things recovered just fine when the delay cleared.  Writes to the CAN bus are generally once per read (request and response sequence), but there are a few places where the task can issue multiple writes to the bus back-to-back.  Some time ago, two writes were fine, but three (I think) would cause a transmit overflow.  Michael and I fixed that, so even if you're sending multiple frames it should be fine.  Basically, once the bus is open and running, it seems to work pretty well.  It's just that 'open' procedure that seems to have a problem when the bus is already active.
> 
> Note that the only way I have found to clear the hung bus condition is to stop the traffic, close the port, open the port, and resume the traffic.
> 
> I haven't looked at the Leaf code, but I suspect that the climate commands open the CAN bus, write the necessary frames, and then close it afterward.  If so, that would definitely risk the issue I'm seeing.  If there's a way to open the environmental bus at boot time and leave it open, that should workaround the problem, assuming of course that you survive the initial open.  I can control the external device with ext12v power; you can't control the Leaf environmental bus...
> 
> Greg
> 
> 
> Tom Parker wrote:
>> On the 2016 Leaf the Climate Control command is sent on the 'CAR' CAN bus which is connected to on of the exernal can busses. I'm not sure if there is a task, or if there are other usages of the CAN2 bus in the Leaf implementation (there weren't last time I looked closely but I haven't been able to keep up with the improvements others have made). The Climate Control does do task or timer related stuff because it sends the CAN message several times over a period of a second or two.
>> 
>> This is distinct from the earlier Leafs where climate control is on the EV can bus connected to the internal CAN bus.
>> 
>> On 29/05/19 2:57 PM, Greg D. wrote:
>>> Hi Stein,
>>> 
>>> Interesting.  Does the "start climate control" command start a new process/task on OVMS, or otherwise cause the CAN2 bus to be "opened" at the time of the command?  If so, that would be consistent with what I see. 
>>> 
>>> With the OBD2ECU translator, if I have an OBDII device (e.g. head-up display) running first, and then start the OBD2ECU task, the bus will hang every time.  If I start the task first, then power up the OBDII device, all will be fine.  There seems to be a collision between the task starting and the data coming in that triggers the bus hang.
>>> 
>>> I have a pair of scripts to turn on and turn off the Ext12v supply to the OBDII device when the car is turned on or off, respectively.  That's just to manage the OBDII device as if it were in an ICE vehicle.  What I do now, as a workaround for the bus hang, is to stop and restart the OBD2ECU task before turning on the ext12v supply in the car turn-on script.  That seems to work.
>>> 
>>> Not sure how that might help your problem (since you can't turn off the car's data stream), but at least it would confirm that we're looking at the same bug.  
>>> 
>>> Greg
>>> 
>>> 
>>> ovms wrote:
>>>> I just discovered something. Since the temperature has been pretty stable the last month (not cold, and not warm) I have not used the climate control function in OVMS.
>>>> With the latest firmware, CAN2 can recieve data for many days, until I trigger the "start climate control" command. Then CAN2 stops receiving. (and "possible theft" message occurs). Car is 2016 Leaf. 
>>>>  
>>>> Kind regards,
>>>> Stein Arne Sordal
>>>>  
>>>>  
>>>> On 2019-04-01 14:56, Mark Webb-Johnson <mark at webb-johnson.net> <mailto:mark at webb-johnson.net> wrote:
>>>> So this is with CAN2 not working?
>>>>  
>>>> Err flags are 0x80001080. For MCP2515 that is:
>>>>  
>>>> (intstat << 24) |
>>>> (errflag << 16) |
>>>> intflag
>>>>  
>>>> So intstat = 0x80, errflag = 0x00, intflag = 0x1080.
>>>>  
>>>> The 0x10.. in intflag indicates that this just ran:
>>>>  
>>>>   // clear error & wakeup interrupts:
>>>>   if (intstat & 0b11100000)
>>>>     {
>>>>     m_status.error_flags |= 0x1000;
>>>>     m_spibus->spi_cmd(m_spi, buf, 0, 4, CMD_BITMODIFY, 0x2c, intstat & 0b11100000, 0x00);
>>>>     }
>>>>  
>>>> The instat=0x80 indicates "MERRF: Message Error Interrupt Flag bit”. So that is a bus error, but the errflag=0x00 so there are no indicated issues with either the 96 or 128 successive errors hit. The documentation (mcp2515 spec sheet) for this says:
>>>>  
>>>> 7.4 Message Error Interrupt
>>>> 
>>>> When an error occurs during the transmission or reception of a message, the message error flag (CANINTF.MERRF) will be set and, if the CANINTE.MERRE bit is set, an interrupt will be gener- ated on the INT pin. This is intended to be used to facilitate baud rate determination when used in conjunction with Listen-only mode.
>>>>  
>>>> I can’t see anything obviously wrong with that. Not sure how to proceed. Perhaps we need a command to dump all the MCP2515 registers? We could at least then see the current state of the chip with all the gory details.
>>>>  
>>>> Regards, Mark.
>>>>  
>>>> On 1 Apr 2019, at 3:49 PM, ovms <ovms at topphemmelig.no <mailto:ovms at topphemmelig.no>> wrote:
>>>> 
>>>> Looks the same, see attachment.
>>>>  
>>>> -Stein Arne-
>>>>  
>>>>  
>>>>  
>>>> On 2019-04-01 09:38, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>>> This is the thread from last summer talking about CAN bus lock-ups.
>>>>  
>>>> I’m guessing this is still happening for Leaf? Since the changes made below (back then), I haven’t seen it in my car.
>>>>  
>>>> Next time this happens to you guys (can2 stopped), can you get a ‘can can2 status’, wait 30 seconds, and repeat. Then send it here for us to look at.
>>>>  
>>>> Regards, Mark.
>>>> 
>>>> On 7 Jul 2018, at 10:37 PM, Mark Webb-Johnson <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>>> 
>>>> I’m trying to recreate this with my three-can-buses-connected DB9 plugged in. Transmitting on CAN1 should make it appear on CAN2 and CAN3.
>>>>  
>>>> Here is what I see:
>>>>  
>>>> OVMS# can can1 start active 1000000
>>>> Can bus can1 started in mode active at speed 1000000bps
>>>> OVMS# can can2 start active 1000000
>>>> Can bus can2 started in mode active at speed 1000000bps
>>>> OVMS# test cantx can1 25000
>>>> Testing 25000 frames on can1
>>>> Transmitted 25000 frames in 6.466209s = 258us/frame
>>>>  
>>>> OVMS# can can1 status
>>>> CAN:       can1
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               24771
>>>> Rx pkt:                       0
>>>> Rx err:                       0
>>>> Rx ovrflw:                    0
>>>> Tx pkt:                   24880
>>>> Tx delays:                24703
>>>> Tx err:                       0
>>>> Tx ovrflw:                  109
>>>> Err flags: 0
>>>> OVMS# can can2 status
>>>>  
>>>> CAN:       can2
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               19084
>>>> Rx pkt:                   24770
>>>> Rx err:                       0
>>>> Rx ovrflw:                    1
>>>> Tx pkt:                       0
>>>> Tx delays:                    0
>>>> Tx err:                       0
>>>> Tx ovrflw:                    0
>>>> Err flags: 0x2040
>>>>  
>>>> Note the err flags 0x2040 on CAN2, but the bus remains up and working fine.
>>>>  
>>>> Repeating the test gives us:
>>>>  
>>>> OVMS# test cantx can1 25000
>>>> Testing 25000 frames on can1
>>>> Transmitted 25000 frames in 6.479670s = 259us/frame
>>>>  
>>>> OVMS# can can1 status
>>>> CAN:       can1
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               49546
>>>> Rx pkt:                       0
>>>> Rx err:                       0
>>>> Rx ovrflw:                    0
>>>> Tx pkt:                   49771
>>>> Tx delays:                49417
>>>> Tx err:                       0
>>>> Tx ovrflw:                  207
>>>> Err flags: 0
>>>>  
>>>> OVMS# can can2 status
>>>> CAN:       can2
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               38288
>>>> Rx pkt:                   49545
>>>> Rx err:                       0
>>>> Rx ovrflw:                    3
>>>> Tx pkt:                       0
>>>> Tx delays:                    0
>>>> Tx err:                       0
>>>> Tx ovrflw:                    0
>>>> Err flags: 0x2040
>>>>  
>>>> Looking at the mcp2515 code, it seems that Err flags are only stored "if (intstat & 0b10100000)”. That is "ERRF 0x80 = message tx/rx error” or "ERRIF 0x20 = overflow / error state change”. It is also set to "(intstat & 0b10100000) << 8 | errflag”, so doesn’t show all the error statuses. It is hard to rely on that for other errors/status on lock-up. Given that error_flags is a uint32_t, I think we can store more in it to allow for better diagnosis.
>>>>  
>>>> I changed the mcp2515 driver to always set error_flags, on each interrupt handled (except spurious interrupts with no flags found), as follows:
>>>>  
>>>> SSSSSSSSFFFFFFFF***EB01TLLLLLLLL
>>>> SSSSSSSS = intstat
>>>> FFFFFFFF = errflag
>>>> B = RXB0 or RXB1 overflow flags cleared
>>>> 0 = RXB0 overflowed
>>>> 1 = RXB1 overflowed
>>>> T = TX buffer has become available
>>>> E = Error/WakeUp flags were cleared
>>>> LLLLLLLL = intflag
>>>>  
>>>> I did find a problem on line 300: if (intstat & 0b10100000). It think that should be 0b11100000 (to also pickup the RXB0 overflow), and removed the m_status.rxbuf_overflow++ from RXB0 overflow (as it is not really an overflow - as RXB1 got the data).
>>>>  
>>>> With those changes made, I get:
>>>>  
>>>> OVMS# test cantx can1 25000
>>>> Testing 25000 frames on can1
>>>> Transmitted 25000 frames in 6.389849s = 255us/frame
>>>> 
>>>> OVMS# can can1 status
>>>> CAN:       can1
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               24777
>>>> Rx pkt:                       0
>>>> Rx err:                       0
>>>> Rx ovrflw:                    0
>>>> Tx pkt:                   24884
>>>> Tx delays:                24739
>>>> Tx err:                       0
>>>> Tx ovrflw:                  116
>>>> Err flags: 0x00000000
>>>> 
>>>> OVMS# can can2 status
>>>> CAN:       can2
>>>> Mode:      Active
>>>> Speed:     1000000
>>>> Interrupts:               18935
>>>> Rx pkt:                   24777
>>>> Rx err:                       0
>>>> Rx ovrflw:                    0
>>>> Tx pkt:                       0
>>>> Tx delays:                    0
>>>> Tx err:                       0
>>>> Tx ovrflw:                    0
>>>> Err flags: 0x01000001
>>>>  
>>>> I don’t think I’ve fixed anything (apart from that minor issue with RXB0 overflow diagnostics), but hopefully the new error_flags display should help finding out what is causing this lockup. Hopefully I haven’t broken anything.
>>>>  
>>>> Regards, Mark.
>>>>  
>>>> On 7 Jul 2018, at 10:42 AM, Tom Parker <tom at carrott.org <mailto:tom at carrott.org>> wrote:
>>>> 
>>>> On 07/07/18 00:05, Mark Webb-Johnson wrote:
>>>> Err flags 0x2040. The 0x20 part is the error interrupt. The 0x40 part is "RX0OVR: Receive Buffer 0 Overflow Flag bit”.
>>>> 
>>>> Where the number on ‘can can2 status’ moving at all? Or completely stuck?
>>>> 
>>>> None of the can can2 status numbers change when the can bus is broken. After power cycling it they move.
>>>> 
>>>> Seems different than the fault Greg and I are seeing. This one likely to be interrupt flag, or buffer overflow, not being cleared correctly. I’m guessing the overflow because that just doesn’t seem correct in mcp2515::RxCallback(). I’ll focus on that and have a look.
>>>> 
>>>> I just checked the car again and it stopped with Rx ovrflw number only 1281, half what it got to last time.
>>>> 
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev><IMG_2835.PNG><IMG_2836.PNG>_______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>_______________________________________________ OvmsDev mailing list OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>> 
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>> 
>> 
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190530/0e456ae0/attachment.htm>


More information about the OvmsDev mailing list