Re: [Ovmsdev] OVMS Poller module/singleton
Hi. Tested with MCP_CAN lib settings, same thing as with SAE recommendation. // cnf1=0x00; cnf2=0xf0; cnf3=0x86; original code // SAE/CiA recommendation // PROP=5, PS1=8, PS2=2, SJW=2, Sample 1x @87.5% //cnf1=0x40; cnf2=0xbc; cnf3=0x81; // Arduino MCP_CAN lib // PROP=6, PS1=5, PS2=4, SJW=2, Sample 3x @75% cnf1=0x40; cnf2=0xe5; cnf3=0x83; E (445282) can: can2: intr=240803 rxpkt=242887 txpkt=6 errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (446202) can: can2: intr=242315 rxpkt=244423 txpkt=6 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (448472) can: can2: intr=246089 rxpkt=248236 txpkt=6 errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (448472) can: can2: intr=246089 rxpkt=248237 txpkt=6 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 OVMS# can can1 status CAN: can1 Mode: Active Speed: 500000 DBC: none Interrupts: 112548 Rx pkt: 112559 Rx ovrflw: 0 Tx pkt: 18 Tx delays: 0 Tx ovrflw: 0 Tx fails: 0 Err flags: 0x00000000 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 1 sec(s) Err Resets: 0 OVMS# can can2 status CAN: can2 Mode: Active Speed: 500000 DBC: none Interrupts: 291197 Rx pkt: 293642 Rx ovrflw: 0 Tx pkt: 6 Tx delays: 0 Tx ovrflw: 0 Tx fails: 0 Err flags: 0x01000001 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 9 sec(s) Err Resets: 0 Values from can2 are not present in leaf metrics.
On 2025-01-23 12:11, Developer From Jokela via OvmsDev wrote:
E (445282) can: can2: intr=240803 rxpkt=242887 txpkt=6 errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
Ok, now putting Developer From Jokela's post together with Michael's I think I understand what this is telling us. First, E means error. The presence of this line in the log means there has been an error on can bus 2. The values given appear to be the current values of some counters. The code is in can/src/can.cpp. This has basic deduplication by summing most of the counters and errflags, and only logging if this changes. Interestingly both of us saw can2 errors in pairs: one with errflags=0x23401c01 and one with 0x22401c02. In my case these were either 0 or 10 ms apart. The flags appear to derive from mcp2515/src/mcp2515.cpp but it's not clear to me what they mean.
OVMS# can can2 status CAN: can2 Mode: Active Speed: 500000 DBC: none
Interrupts: 291197 Rx pkt: 293642 Rx ovrflw: 0 Tx pkt: 6 Tx delays: 0 Tx ovrflw: 0 Tx fails: 0
Err flags: 0x01000001 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 9 sec(s) Err Resets: 0
Question: If the log tells us that can2 errors have occurred, why does "can can2 status" report zero values for errors? Chris
The CAN status variables are all explained in can.h: // CAN status typedef struct { uint32_t interrupts; // interrupts uint32_t packets_rx; // frames reveiced uint32_t packets_tx; // frames sent successfully uint32_t txbuf_delay; // frames routed through TX queue uint16_t rxbuf_overflow; // frames lost due to RX buffers full uint16_t txbuf_overflow; // TX queue overflows uint32_t tx_fails; // TX failures/aborts uint32_t error_flags; // driver specific bitset uint16_t errors_rx; // RX error counter uint16_t errors_tx; // TX error counter uint16_t invalid_rx; // RX invalid frame counter uint16_t watchdog_resets; // Watchdog reset counter uint16_t error_resets; // Error resolving reset counter uint32_t error_time; // monotonictime of last error state detection } CAN_status_t; The driver specific error flags need to be decoded in the context of the transceiver type: can1 = esp32can driver, losely SJA1000 compatible, can2 & can3 = mcp2515 driver. esp32can: error_flags = error_irqs << 16 | (status & 0b11001110) << 8 | (ecc & 0xff); esp32can error & status bits are documented in "esp32can_regdef.h", see SJA1000 documentation for more details. mcp2515: error_flags = (intstat << 24) | (errflag << 16) | intflag; mcp2515 interrupt & error bits are documtented in "mcp2515_regdef.h", see MCP2515 documentation for more details, plus some synthetic internal debugging flags as ORed in by the driver in `mcp2515::AsynchronousInterruptHandler()` beginning on line 500.
Interestingly both of us saw can2 errors in pairs: one with errflags=0x23401c01 and one with 0x22401c02. In my case these were either 0 or 10 ms apart.
* intstat 0x23 = 0b00100011 = Error state change (details in EFLG) | RX buffer 1 full | RX buffer 0 full o … 0x22 = same, but only RX buffer 1 full (RXB0 has just been cleared by the driver) * errflag 0x40 = 0x01000000 = Receive Buffer 0 Overflow * intflag 0x1c01 = all indications detected & processed at RX buffer 0 o … 0x1c02 = same, but at RX buffer 1 So these are simply indications of RX buffer 0 overflows. These are no "real" overflows, as RX buffer 1 was still available, so the frames were not lost and the rxbuf_overflow counter was not incremented. The driver could clear this condition to avoid signaling this as a CAN error. I think we kept it that way because an RXB0 overflow already indicates a lot of traffic on the bus. This could also result in a warning log, but the CAN framework doesn't know how to distinguish driver specific errors from warnings.
Err flags: 0x01000001
* …simply means RXB0 has been received & read. Regards, Michael Am 23.01.25 um 14:48 schrieb Chris Box via OvmsDev:
On 2025-01-23 12:11, Developer From Jokela via OvmsDev wrote:
E (445282) can: can2: intr=240803 rxpkt=242887 txpkt=6 errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
Ok, now putting Developer From Jokela's post together with Michael's I think I understand what this is telling us.
First, E means error.
The presence of this line in the log means there has been an error on can bus 2. The values given appear to be the current values of some counters.
The code is in can/src/can.cpp. This has basic deduplication by summing most of the counters and errflags, and only logging if this changes. Interestingly both of us saw can2 errors in pairs: one with errflags=0x23401c01 and one with 0x22401c02. In my case these were either 0 or 10 ms apart.
The flags appear to derive from mcp2515/src/mcp2515.cpp but it's not clear to me what they mean.
OVMS# can can2 status CAN: can2 Mode: Active Speed: 500000 DBC: none Interrupts: 291197 Rx pkt: 293642 Rx ovrflw: 0 Tx pkt: 6 Tx delays: 0 Tx ovrflw: 0 Tx fails: 0 Err flags: 0x01000001 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 9 sec(s) Err Resets: 0 Question: If the log tells us that can2 errors have occurred, why does "can can2 status" report zero values for errors?
Chris
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Am Rahmen 5 * D-58313 Herdecke Fon 02330 9104094 * Handy 0176 20698926
On 2025-01-23 16:28, Michael Balzer via OvmsDev wrote:
The CAN status variables are all explained in can.h:
Thanks for your explanation, which is most helpful.
So these are simply indications of RX buffer 0 overflows. These are no "real" overflows, as RX buffer 1 was still available, so the frames were not lost and the rxbuf_overflow counter was not incremented.
So with the current code, I should interpret these pairs of errors as a single warning that can2 has a lot of activity, and if it gets worse could result in loss of a frame. It's possible that when I experienced the actual car issues there was something more significant logged. But we don't know because I pulled the plug and the log entries are missing.
The driver could clear this condition to avoid signaling this as a CAN error. I think we kept it that way because an RXB0 overflow already indicates a lot of traffic on the bus. This could also result in a warning log, but the CAN framework doesn't know how to distinguish driver specific errors from warnings.
Going back to your earlier advice, I think you're asking us to monitor the rate of can errors in the log, and observe if that changes with different bus timing. And also monitor can status. With my current firmware (existing timing, single sampling, both buses in active mode) I see can1 is a bit busier than can2 (10 million vs 8.6 million). Can1 has had one genuine receive overflow. OVMS# can can1 status CAN: can1 Mode: Active Speed: 500000 DBC: none Interrupts: 10231357 Rx pkt: 10228979 Rx ovrflw: 1 Tx pkt: 1343 Tx delays: 50 Tx ovrflw: 0 Tx fails: 998 Err flags: 0x00000000 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 0 sec(s) Err Resets: 1 OVMS# can can2 status CAN: can2 Mode: Active Speed: 500000 DBC: none Interrupts: 8661395 Rx pkt: 8672342 Rx ovrflw: 0 Tx pkt: 40 Tx delays: 0 Tx ovrflw: 0 Tx fails: 0 Err flags: 0x01000001 Rx err: 0 Tx err: 0 Rx invalid: 0 Wdg Resets: 0 Wdg Timer: 0 sec(s) Err Resets: 0 On the transmit side nearly all our activity is on can1. There have been a huge proportion of failures, but this is the only can1 error I can find in the log. 2025-01-23 16:37:06.464 GMT E (31051994) esp32can: can1 stuck bus-off error state (errflags=0x00040c00) detected - resetting bus It's quite possible there are others from earlier in the day, when I had debug level on. I've now set the level to warn so it should be easy to spot everything significant. I'll update when I have something to report. Chris
A bus-off condition is potentially more significant than an occasional RX overflow. Bus-off is encountered by the transceiver if it sees too many frame errors on the bus. It normally recovers by itself from that condition, with the exception of transmitting on a switched off bus. That's for example the case on the VW e-Up, where can1 has no process data and only connects to the diag gateway, when the gateway switches off, and the OVMS probes the car status using a poll, it frequently has to reset the bus from this state. The transceiver would see many false frame errors (potentially causing the vehicle issue observed) if the timing is off for the Leaf, so seeing actual bus errors logged just before the event would be a strong indication. But we shouldn't jump to conclusions here, let's first try to determine which of the buses actually causes the issue. Regarding logs, please consider logging to SD card, to gather as many as possible. Also, setting the log level to "warn" will exclude all lower priority log entries. Rather set it to "debug" or even "verbose" for "can", "esp32can" & "mcp2515". If you only set that high level for the components of interest, you probably can keep the general level at "warn" or "info". Btw, level "debug" for "events" will log all non-ticker events, useful to later identify the vehicle state transitions if you don't have any logs from the vehicle / poller. Regards, Michael Am 23.01.25 um 18:50 schrieb Chris Box:
On the transmit side nearly all our activity is on can1. There have been a huge proportion of failures, but this is the only can1 error I can find in the log. 2025-01-23 16:37:06.464 GMT E (31051994) esp32can: can1 stuck bus-off error state (errflags=0x00040c00) detected - resetting bus
It's quite possible there are others from earlier in the day, when I had debug level on. I've now set the level to warn so it should be easy to spot everything significant. I'll update when I have something to report.
-- Michael Balzer * Am Rahmen 5 * D-58313 Herdecke Fon 02330 9104094 * Handy 0176 20698926
Sorry, I just realize I messed up terminology: when writing "transceiver" I meant "controller". The transceiver only translates electrically between the bus and the controller logic levels, the controller implements the actual CAN logic. Am 23.01.25 um 19:49 schrieb Michael Balzer via OvmsDev:
Bus-off is encountered by the transceiver if it sees too many frame errors on the bus.
-- Michael Balzer * Am Rahmen 5 * D-58313 Herdecke Fon 02330 9104094 * Handy 0176 20698926
participants (3)
-
Chris Box -
Developer From Jokela -
Michael Balzer