[Ovmsdev] OVMS Poller module/singleton

Sun Feb 2 21:44:46 HKT 2025

Chris, Michael,

by "regular status ping" I meant testing the car status while off by 
sending a request every few seconds. The "off" state for the Leaf is 
state 0, none of the polls has an interval in that slot. So while the 
Leaf is considered "off" by the module, there should be no new (!) requests.

If a final round of the BMS polls would have been kept in the queue, 
that would only contain at most one entry for each of the polls, i.e. 
four TX in total, with PIDs 0x01, 0x02, 0x06, 0x04 (with 0x04 only being 
polled every 5 minutes). The first TX_Fail would then remove the poll 
from the queue.

But your log shows repeated TX failures for 0x01, 0x02 & 0x06. So no, 
that's not normal.

That can only happen, if either the Leaf code has a bug with detecting 
the correct vehicle state (and with announcing that by some log / event, 
would need to be "on" or "charging" according to the polling scheme)…

…or the new poller has a bug, that leads to keeping or re-entering a 
wrong poll state. The latter could affect other vehicles as well.

> 1738275829.012407 CXX OVMS CRTD
> 1738275829.012407 CVR 3.1
> 1738275829.020355 1CXX Info Type:vfs Format:crtd(discard) Vehicle:NL 
> Path:/sd/can4.crtd
> *1738275879*.419663 1CER TX_Fail T11 79B 02 21 *01* 55 55 55 55 55
> 1738275880.419598 1CER TX_Fail T11 79B 02 21 *02* 55 55 55 55 55
> 1738275881.419606 1CER TX_Fail T11 79B 02 21 *06* 55 55 55 55 55
> *1738275942*.419724 1CER TX_Fail T11 79B 02 21 *01* 55 55 55 55 55
> 1738275943.419618 1CER TX_Fail T11 79B 02 21 *02* 55 55 55 55 55
> 1738275944.419611 1CER TX_Fail T11 79B 02 21 *06* 55 55 55 55 55
> *1738276005*.419663 1CER TX_Fail T11 79B 02 21 *01* 55 55 55 55 55
> 1738276006.419606 1CER TX_Fail T11 79B 02 21 *02* 55 55 55 55 55

 From the timestamps, that's the 60 seconds interval (63 due to the 
error/timeout), so that must have been poll state 1 (on) or 3 (charging).

You need to check the poll state transitions. These get logged by the 
poller (log tag "vehicle-poll"), which needs to be set to log level 
"debug" for this.

You should then see log entries "Pollers: Queue SetState" for the state 
change request & "Pollers: PollState(<state>[,<bus>])" for the actual 
state change.

A log entry "Pollers[SetState]: Task Queue Overflow" (btw, that's 
currently at "info" level, should be "error") indicates a state change 
request got lost. That should not happen.

@Michael: the poller currently allows state changes to get lost. I think 
that's a problem. Not properly transitioning the poll state could cause 
mayhem, as the wrong polls can cause damage, either by putting the 
vehicle in a fault mode, or simply by not letting the ECUs go to sleep, 
causing the 12V battery to drain.

I think state changes must not get lost, if that happens, that's rather 
a reason for `abort()` if it cannot be handled otherwise. The same 
applies to all poll list management commands. Maybe it's an option to 
make these queue requests blocking? Or to introduce a secondary queue 
for poller control commands?

Any additional ideas for Chris on how to investigate this further?

Regards,
Michael

Am 02.02.25 um 13:58 schrieb Chris Box:
>
> Hi everyone,
>
> On 2025-01-28 19:12, Michael Balzer wrote:
>
>> Reading the Leaf code, I don't see any regular status ping polling,
> Actually there is some defined polling of BMS:
> static const OvmsPoller::poll_pid_t obdii_polls[] =
> {
>   // BUS 2
>   { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, VIN_PID, 
> {  0, 900, 0, 0 }, 2, ISOTP_STD },           // VIN [19]
>   { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIEXTENDED, 
> QC_COUNT_PID, {  0, 900, 0, 0 }, 2, ISOTP_STD },   // QC [2]
>   { CHARGER_TXID, CHARGER_RXID, VEHICLE_POLL_TYPE_OBDIIEXTENDED, 
> L1L2_COUNT_PID, {  0, 900, 0, 0 }, 2, ISOTP_STD }, // L0/L1/L2 [2]
>   // BUS 1
>   { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x01, {  0, 60, 
> 0, 60 }, 1, ISOTP_STD },   // bat [39/41]
>   { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x02, {  0, 60, 
> 0, 60 }, 1, ISOTP_STD },   // battery voltages [196]
>   { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x06, {  0, 60, 
> 0, 60 }, 1, ISOTP_STD },   // battery shunts [96]
>   { BMS_TXID, BMS_RXID, VEHICLE_POLL_TYPE_OBDIIGROUP, 0x04, {  0, 300, 
> 0, 300 }, 1, ISOTP_STD }, // battery temperatures [14]
>   POLL_LIST_END
> };
> BMS_TXID is 0x79B, and this is mostly what I see appearing in CRTD 
> overnight:
> 1738275829.012407 CXX OVMS CRTD
> 1738275829.012407 CVR 3.1
> 1738275829.020355 1CXX Info Type:vfs Format:crtd(discard) Vehicle:NL 
> Path:/sd/can4.crtd
> 1738275879.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738275880.419598 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> 1738275881.419606 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738275942.419724 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738275943.419618 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> 1738275944.419611 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738276005.419663 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738276006.419606 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> The first night, that was everything I saw until a charge pilot event 
> woke the car up at 05:50.
> The second night, at 03:46 there were two records of apparently 
> successful transmits, which coincides with errors being logged.
> 1738381606.559723 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738381607.560118 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
>
> 1738381607.561672 1CER Error intr=32898937 rxpkt=32886195 txpkt=4872 
> errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 
> txdelay=213 txfail=5411 wdgreset=0 errreset=5 txqueue=0
> 1738381607.561766 *1T11 79B 02 21 02 55 55 55 55 55*
> 1738381607.561816 1CST Status intr=32898937 rxpkt=32886195 txpkt=4873 
> errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 
> txdelay=213 txfail=5411 wdgreset=0 errreset=5 txqueue=0
>
> 1738381608.559656 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738381608.559819 1CST Status intr=32898940 rxpkt=32886195 txpkt=4873 
> errflags=0x8048d9 rxerr=0 txerr=135 rxinval=0 rxovr=3 txovr=0 
> txdelay=213 txfail=5412 wdgreset=0 errreset=5 txqueue=0
> 1738381608.561065 1CER Error intr=32898943 rxpkt=32886195 txpkt=4873 
> errflags=0x8048d9 rxerr=0 txerr=134 rxinval=0 rxovr=3 txovr=0 
> txdelay=213 txfail=5412 wdgreset=0 errreset=5 txqueue=0
> 1738381608.561163 *1T11 79B 02 21 06 55 55 55 55 55*
>
> 1738381669.559719 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738381670.559655 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> The above activity is what creates records in the otherwise-quiet 
> system log:
> 2025-02-01 03:46:48.272 GMT E (189607402) can: can1: intr=32898936 
> rxpkt=32886195 txpkt=4872 errflags=0x8048d9 rxerr=0 txerr=128 
> rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5411 wdgreset=0 er
> rreset=5 txqueue=0
> 2025-02-01 03:46:49.272 GMT E (189608402) can: can1: intr=32898940 
> rxpkt=32886195 txpkt=4873 errflags=0x8048d9 rxerr=0 txerr=135 
> rxinval=0 rxovr=3 txovr=0 txdelay=213 txfail=5412 wdgreset=0 er
> rreset=5 txqueue=0
> It fits well with your description below, although in these nights I 
> didn't see any queuing. Just a small probability of thinking it can 
> send a frame on a bus that's off.
>> On my UpMiiGo (which switches off the bus), the ESP32CAN most of the 
>> time recognizes an immediate failure for a TX with the vehicle being 
>> off. That means the TX buffer (in the controller) gets cleared 
>> immediately, with an immediate "TX_Fail" result for the frame. 
>> Ideally that would be the case always while the bus is off.
>>
>> But the ESP32CAN controller sometimes tries to send the frame, 
>> waiting for an ack, repeating the transmission and then ending up in 
>> the frame being stuck in the TX buffer. Then the TX queue (software) 
>> gets filled with the status ping request frame from there, up to the 
>> 30 frames capacity of the TX queue, then counting TX overflows.
>>
>> This continues until the ESP32CAN, due to unknown reasons, thinks it 
>> could send the frame. The TX queue then gets flushed immediately, 
>> with the frames appearing as being sent successfully (although there 
>> is no ECU active, the bus is still off).
>>
>> After the queue has been cleared, this loop starts over.
>>
>> So it may happen, that the TX queue gets filled, and the ESP32CAN 
>> thinks it can send frames, even while the bus is shut off.
>>
> On the third night the crtd log says it tried to send BMS polls at 
> 23:37, 23:39, 00:00 and 00:02.
> 1738452975.674976 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738453036.675613 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738453036.677195 1CER Error intr=41628672 rxpkt=41612603 txpkt=6073 
> errflags=0x8048d9 rxerr=0 txerr=128 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6847 wdgreset=0 errreset=7 txqueue=0
> 1738453036.686582 *1T11 79B 02 21 01 55 55 55 55 55*
> 1738453036.686690 1CST Status intr=41628708 rxpkt=41612603 txpkt=6074 
> errflags=0x204c00 rxerr=0 txerr=127 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6847 wdgreset=0 errreset=7 txqueue=0
> 1738453037.674988 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> 1738453037.675148 1CST Status intr=41628711 rxpkt=41612603 txpkt=6074 
> errflags=0x8048d9 rxerr=0 txerr=135 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6848 wdgreset=0 errreset=7 txqueue=0
> 1738453037.676306 1CER Error intr=41628714 rxpkt=41612603 txpkt=6074 
> errflags=0x8048d9 rxerr=0 txerr=134 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6848 wdgreset=0 errreset=7 txqueue=0
> 1738453037.676389 *1T11 79B 02 21 02 55 55 55 55 55*
> 1738453038.674985 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738453164.675494 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> 1738453164.676968 1CER Error intr=41628729 rxpkt=41612603 txpkt=6075 
> errflags=0x8048d9 rxerr=0 txerr=133 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6855 wdgreset=0 errreset=7 txqueue=0
> 1738453164.677076 *1T11 79B 02 21 02 55 55 55 55 55*
> 1738453165.675001 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738454366.674962 1CER TX_Fail T11 79B 02 21 04 55 55 55 55 55
> 1738454427.675572 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> 1738454427.677047 1CER Error intr=41628861 rxpkt=41612603 txpkt=6076 
> errflags=0x8048d9 rxerr=0 txerr=133 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6918 wdgreset=0 errreset=7 txqueue=0
> 1738454427.686539 *1T11 79B 02 21 01 55 55 55 55 55*
> 1738454428.674962 1CER TX_Fail T11 79B 02 21 02 55 55 55 55 55
> 1738454555.679243 1CER TX_Fail T11 79B 02 21 06 55 55 55 55 55
> 1738454555.680668 1CER Error intr=41628916 rxpkt=41612603 txpkt=6077 
> errflags=0x8048d9 rxerr=0 txerr=131 rxinval=0 rxovr=3 txovr=0 
> txdelay=259 txfail=6926 wdgreset=0 errreset=7 txqueue=0
> 1738454555.680756 *1T11 79B 02 21 06 55 55 55 55 55*
> 1738454616.678795 1CER TX_Fail T11 79B 02 21 01 55 55 55 55 55
> So far I haven't seen any errors with a non-zero txqueue value, but 
> I'll continue to keep a lookout for any issues. Until then, this seems 
> to be its normal behaviour. I haven't explored the Leaf modules' BMS 
> polling code, but I'm wondering if some changes there would help, e.g. 
> not polling when the car is off.
> Chris

-- 
Michael Balzer * Am Rahmen 5 * D-58313 Herdecke
Fon 02330 9104094 * Handy 0176 20698926

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20250202/e7637776/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20250202/e7637776/attachment-0001.sig>