Chris, Michael,

by "regular status ping" I meant testing the car status while off by sending a request every few seconds. The "off" state for the Leaf is state 0, none of the polls has an interval in that slot. So while the Leaf is considered "off" by the module, there should be no new (!) requests.

If a final round of the BMS polls would have been kept in the queue, that would only contain at most one entry for each of the polls, i.e. four TX in total, with PIDs 0x01, 0x02, 0x06, 0x04 (with 0x04 only being polled every 5 minutes). The first TX_Fail would then remove the poll from the queue.

But your log shows repeated TX failures for 0x01, 0x02 & 0x06. So no, that's not normal.

That can only happen, if either the Leaf code has a bug with detecting the correct vehicle state (and with announcing that by some log / event, would need to be "on" or "charging" according to the polling scheme)…

…or the new poller has a bug, that leads to keeping or re-entering a wrong poll state. The latter could affect other vehicles as well.

1738275829.012407 CXX OVMS CRTD
1738275829.012407 CVR 3.1
1738275829.020355 1CXX Info Type:vfs Format:crtd(discard) Vehicle:NL Path:/sd/can4.crtd
1738275879.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738275880.419598 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738275881.419606 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738275942.419724 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738275943.419618 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738275944.419611 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738276005.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738276006.419606 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55

From the timestamps, that's the 60 seconds interval (63 due to the error/timeout), so that must have been poll state 1 (on) or 3 (charging).

You need to check the poll state transitions. These get logged by the poller (log tag "vehicle-poll"), which needs to be set to log level "debug" for this.

You should then see log entries "Pollers: Queue SetState" for the state change request & "Pollers: PollState(<state>[,<bus>])" for the actual state change.

A log entry "Pollers[SetState]: Task Queue Overflow" (btw, that's currently at "info" level, should be "error") indicates a state change request got lost. That should not happen.


@Michael: the poller currently allows state changes to get lost. I think that's a problem. Not properly transitioning the poll state could cause mayhem, as the wrong polls can cause damage, either by putting the vehicle in a fault mode, or simply by not letting the ECUs go to sleep, causing the 12V battery to drain.

I think state changes must not get lost, if that happens, that's rather a reason for `abort()` if it cannot be handled otherwise. The same applies to all poll list management commands. Maybe it's an option to make these queue requests blocking? Or to introduce a secondary queue for poller control commands?

Any additional ideas for Chris on how to investigate this further?

Regards,
Michael


Am 02.02.25 um 13:58 schrieb Chris Box:

Hi everyone,

On 2025-01-28 19:12, Michael Balzer wrote:

Reading the Leaf code, I don't see any regular status ping polling,
 
 
Actually there is some defined polling of BMS:
 
static const OvmsPoller::poll_pid_t obdii_polls[] =
  {
    // BUS 2
    { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, VIN_PID, {  0, 900, 0, 0 }, 2, ISOTP_STD },           // VIN [19]
    { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIEXTENDED, QC_COUNT_PID, {  0, 900, 0, 0 }, 2, ISOTP_STD },   // QC [2]
    { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIEXTENDED, L1L2_COUNT_PID, {  0, 900, 0, 0 }, 2, ISOTP_STD }, // L0/L1/L2 [2]
    // BUS 1
    { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x01, {  0, 60, 0, 60 }, 1, ISOTP_STD },   // bat [39/41]
    { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x02, {  0, 60, 0, 60 }, 1, ISOTP_STD },   // battery voltages [196]
    { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x06, {  0, 60, 0, 60 }, 1, ISOTP_STD },   // battery shunts [96]
    { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x04, {  0, 300, 0, 300 }, 1, ISOTP_STD }, // battery temperatures [14]
    POLL_LIST_END
  };
 
BMS_TXID is 0x79B, and this is mostly what I see appearing in CRTD overnight:
 
1738275829.012407 CXX OVMS CRTD
1738275829.012407 CVR 3.1
1738275829.020355 1CXX Info Type:vfs Format:crtd(discard) Vehicle:NL Path:/sd/can4.crtd
1738275879.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738275880.419598 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738275881.419606 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738275942.419724 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738275943.419618 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738275944.419611 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738276005.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738276006.419606 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
 
The first night, that was everything I saw until a charge pilot event woke the car up at 05:50.
 
The second night, at 03:46 there were two records of apparently successful transmits, which coincides with errors being logged.
 
1738381606.559723 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738381607.560118 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55

1738381607.561672 1CER Error intr=32898937 rxpkt=32886195 txpkt=4872 errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5411 wdgreset=0 errreset=5 txqueue=0
1738381607.561766 1T11 79B 02 21 02 55 55 55 55 55
1738381607.561816 1CST Status intr=32898937 rxpkt=32886195 txpkt=4873 errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5411 wdgreset=0 errreset=5 txqueue=0

1738381608.559656 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738381608.559819 1CST Status intr=32898940 rxpkt=32886195 txpkt=4873 errflags=0x8048d9 rxerr=0 txerr=135 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5412 wdgreset=0 errreset=5 txqueue=0
1738381608.561065 1CER Error intr=32898943 rxpkt=32886195 txpkt=4873 errflags=0x8048d9 rxerr=0 txerr=134 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5412 wdgreset=0 errreset=5 txqueue=0
1738381608.561163 1T11 79B 02 21 06 55 55 55 55 55

1738381669.559719 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738381670.559655 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
 
The above activity is what creates records in the otherwise-quiet system log:
2025-02-01 03:46:48.272 GMT E (189607402) can: can1: intr=32898936 rxpkt=32886195 txpkt=4872 errflags=0x8048d9 rxerr=0 txerr=128 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5411 wdgreset=0 er
rreset=5 txqueue=0
2025-02-01 03:46:49.272 GMT E (189608402) can: can1: intr=32898940 rxpkt=32886195 txpkt=4873 errflags=0x8048d9 rxerr=0 txerr=135 rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5412 wdgreset=0 er
rreset=5 txqueue=0
 
It fits well with your description below, although in these nights I didn't see any queuing. Just a small probability of thinking it can send a frame on a bus that's off.
 
On my UpMiiGo (which switches off the bus), the ESP32CAN most of the time recognizes an immediate failure for a TX with the vehicle being off. That means the TX buffer (in the controller) gets cleared immediately, with an immediate "TX_Fail" result for the frame. Ideally that would be the case always while the bus is off.

But the ESP32CAN controller sometimes tries to send the frame, waiting for an ack, repeating the transmission and then ending up in the frame being stuck in the TX buffer. Then the TX queue (software) gets filled with the status ping request frame from there, up to the 30 frames capacity of the TX queue, then counting TX overflows.

This continues until the ESP32CAN, due to unknown reasons, thinks it could send the frame. The TX queue then gets flushed immediately, with the frames appearing as being sent successfully (although there is no ECU active, the bus is still off).

After the queue has been cleared, this loop starts over.

So it may happen, that the TX queue gets filled, and the ESP32CAN thinks it can send frames, even while the bus is shut off.

 
On the third night the crtd log says it tried to send BMS polls at 23:37, 23:39, 00:00 and 00:02.
 
1738452975.674976 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738453036.675613 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738453036.677195 1CER Error intr=41628672 rxpkt=41612603 txpkt=6073 errflags=0x8048d9 rxerr=0 txerr=128 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6847 wdgreset=0 errreset=7 txqueue=0
1738453036.686582 1T11 79B 02 21 01 55 55 55 55 55
1738453036.686690 1CST Status intr=41628708 rxpkt=41612603 txpkt=6074 errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6847 wdgreset=0 errreset=7 txqueue=0
1738453037.674988 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738453037.675148 1CST Status intr=41628711 rxpkt=41612603 txpkt=6074 errflags=0x8048d9 rxerr=0 txerr=135 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6848 wdgreset=0 errreset=7 txqueue=0
1738453037.676306 1CER Error intr=41628714 rxpkt=41612603 txpkt=6074 errflags=0x8048d9 rxerr=0 txerr=134 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6848 wdgreset=0 errreset=7 txqueue=0
1738453037.676389 1T11 79B 02 21 02 55 55 55 55 55
1738453038.674985 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
 
1738453164.675494 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
1738453164.676968 1CER Error intr=41628729 rxpkt=41612603 txpkt=6075 errflags=0x8048d9 rxerr=0 txerr=133 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6855 wdgreset=0 errreset=7 txqueue=0
1738453164.677076 1T11 79B 02 21 02 55 55 55 55 55
1738453165.675001 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
 
1738454366.674962 1CER TX_Fail T11 79B 02 21 04 55 55 55 55 55
1738454427.675572 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
1738454427.677047 1CER Error intr=41628861 rxpkt=41612603 txpkt=6076 errflags=0x8048d9 rxerr=0 txerr=133 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6918 wdgreset=0 errreset=7 txqueue=0
1738454427.686539 1T11 79B 02 21 01 55 55 55 55 55
1738454428.674962 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
 
1738454555.679243 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
1738454555.680668 1CER Error intr=41628916 rxpkt=41612603 txpkt=6077 errflags=0x8048d9 rxerr=0 txerr=131 rxinval=0 rxovr=3 txovr=0 txdelay=259 txfail=6926 wdgreset=0 errreset=7 txqueue=0
1738454555.680756 1T11 79B 02 21 06 55 55 55 55 55
1738454616.678795 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
 
So far I haven't seen any errors with a non-zero txqueue value, but I'll continue to keep a lookout for any issues. Until then, this seems to be its normal behaviour. I haven't explored the Leaf modules' BMS polling code, but I'm wondering if some changes there would help, e.g. not polling when the car is off.
 
Chris

-- 
Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
Fon 02330 9104094 * Handy 0176 20698926