I tried, and mostly failed, to fix the receive side of the 2515 chip driver.  It can still get into a bad state at times, and the only way it seems to be recovered is to close the driver and restart. 

The condition (or, a condition) can be reproduced at will by having traffic on the bus at the time the driver is opened.  In my case, it was for the OBD2ECU task.  If I had the HUD device running (polling for data) at the time the task was started, it would nearly always hang.  No frames received from the bus, even though the status flags didn't indicate an overflow.  So what I do when starting the OBD2ECU task is to first turn off the external 12v to the HUD to quiet the bus, start the task, then turn on the 12v.  I do this in a short event script tied to vehicle.on.
OVMS# vfs cat /store/events/vehicle.on/myevent
power ext12v off
obdii ecu stop
obdii ecu start can3
power ext12v on
OVMS#
Critically, the chip does not appear to have any way to command a reset of just the receive side state machine; if it did, that line #467 log entry would be one place to put it.  But it appears that there are other ways to get the receive side to hang, separate from a simple overflow, and I don't know enough about the chip or its driver to determine an exact set of causes.  As Michael notes, there is definitely a race condition in the system, and I would add that the chip itself is not necessarily blameless in this regard.  It feels like we're an interrupt short of a full deck here.  I even tried forcing a read of the chip, but it came up empty, so something in its receive state machine s locked up.

Good luck!

Greg


Michael Balzer wrote:
Derek,

all error conditions can be relevant for debugging. For example if you need to react to some frames as fast as possible, you may want to know if/when overflows occur, as they indicate processing delays.

Also be aware LogStatus() injects the error status into the CAN logging framework, while ESP_LOGW() only creates a system log entry.

Again for your error codes:
1970-01-01 13:01:41 NZDT E (89501) can: can2: intr=93510 rxpkt=127559 txpkt=14 errflags=0x23000001 rxerr=0 txerr=0 rxovr=0 txovr=0 txdelay=0 wdgreset=0 errreset=0
1970-01-01 13:01:41 NZDT E (89511) can: can2: intr=93511 rxpkt=127564 txpkt=14 errflags=0x23400401 rxerr=0 txerr=0 rxovr=0 txovr=0 txdelay=0 wdgreset=0 errreset=0
1970-01-01 13:01:41 NZDT E (89531) can: can2: intr=93525 rxpkt=127592 txpkt=14 errflags=0x22400402 rxerr=0 txerr=0 rxovr=0 txovr=0 txdelay=0 wdgreset=0 errreset=0

The strange combination 0x2300.... (error interrupt w/o error flags) may here also indicate a bug / race condition in the driver.

Looking at the current code now: the LogStatus() call should also be done after having collected all error info, but the handler adds m_status.error_flags after the call. So the picture you got might be wrong, I suggest verifying / debugging the handler first.

Regards,
Michael


Am 17.07.20 um 09:31 schrieb Derek Caudwell:
Thanks Michael,

That explains why I was struggling to decode the error flag. Having been through the MCP2515 docs and decoded the logged error flags can2 is receiving they all relate to overflow of RXB0 but not RXB1. Hence this does not really pose or indicate a problem for the can bus / MCP2515 chip, correct? 

So if I have understood the chip operation is there any reason for line 489 in mcp2515.cpp which causes these to be logged?

489: LogStatus(CAN_LogStatus_Error);

As if RXB1 does overflow and cause a packet loss then line 467: ESP_LOGW(TAG, "CAN Bus 2/3 receive overflow; Frame lost.");
deals with that?

Cheers Derek


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev