Charging stops during a charging process, the LEDs on the typ 2
socket flash. OVMS cannot be reached via WLAN or WAN.
While driving, the crash of the bus means that the car switches to
limp mode. In the Focus Electric, this means that the powertrain
is immediately switched off completely and “stop safely
now” is displayed on the dashboard. OVMS cannot be reached via
WLAN or WAN.
Since the problem does not occur in listen mode, I have to assume
that the poller thread somewhere is no longer able to keep up with
the acknowledgement and is therefore causing the bus to get out of
step.
For the statistics alone, the poller seems to evaluate every frame
and therefore has a certain workload or do I have the option of
restricting this to certain modules before calling
IncomingFrameCan?
Cheers,
Simon
Am 17.01.2025 um 11:08 schrieb Michael Balzer via OvmsDev:
Simon,
AFAIK "Pollers[Send]" is the regular "next round" request for the
polling task, not a frame transmission. A TX attempt in listen
mode would produce a warn level log entry "Cannot write
<busname> when not in ACTIVE mode" for tag "esp32can" or
"mcp2515".
With the old poller, you wouldn't have had any indication of a
queue overflow, so the new poller may help you to identify an
issue in your code.
You need to understand the new poller not only does polls, it's
now the main CAN receiver for the vehicle module. That's why your
poller time stats include all process data frames as well, not
only protocol frames.
The overflow means either your CAN processing is too slow to keep
up with the packets received, or some other task is hogging the
CPU. That would need to be a task with an equal or higher priority
as the vehicle task, which has priority level 10 -- only system
& networking tasks are above, so that's not likely. (You can
check the CPU usage with the "module tasks" command.)
That means you should check your standard frame processing
(`IncomingFrameCanX` callbacks) for potential delays and
processing complexity issues.
The poller statistics can help you track this down, but you need
to have at least 10 seconds of statistics, the more the better.
Rule of thumb: PRI average at 0.0/second means you don't have
enough data yet. (-- //.ichael, correct me if I'm wrong)
At the same time, calling poller times
on, poller times status causes the bus to crash
In any case that still doesn't explain how the poller could
possibly cause a CAN bus (!) crash without actually doing any
transmissions.
How does the CAN bus crash manifest?
Regards,
Michael
Am 15.01.25 um 19:54 schrieb Simon
Ehlen via OvmsDev:
I put the two buses back into
listen mode and drove a short distance and charged the car. As
I said, there is no active polling in my code. As expected,
there is therefore no indication of a failed TX attempt in the
console log.
I only continue to see a lot of RX Task Queue Overflow
messages in the log, which I did not have before.
I also see a Pollers[Send]: Task Queue Overflow entry, but
since I don't see any indication of “cannot write” on that,
I'm not assuming that's a “real” send.
I (19760210) vehicle-poll:
Poller[Frame]: RX Task Queue Overflow Run 5
I (19764420) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764420) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19764420) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764740) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764740) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764740) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764740) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764740) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764750) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 2
I (19764750) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764750) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 2
I (19764870) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19764870) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 2
I (19765270) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 4
I (19765270) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 6
I (19765610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19765930) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19765930) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19765930) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766260) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 50
I (19766260) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 48
I (19766260) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 27
I (19766270) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 6
I (19766270) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 4
E (19766280) can: can2: intr=4754831 rxpkt=4757493 txpkt=0
errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
E (19766300) can: can2: intr=4754831 rxpkt=4757494 txpkt=0
errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766300) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 7
I (19766340) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 39
I (19766350) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 32
I (19766350) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 17
I (19766350) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19766350) vehicle-poll: Pollers[Send]: Task Queue
Overflow
I (19766350) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766350) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
E (19766360) can: can2: intr=4754840 rxpkt=4757509 txpkt=0
errflags=0x22401c02 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766370) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 4
E (19766370) can: can2: intr=4754840 rxpkt=4757510 txpkt=0
errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 30
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 36
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 26
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 4
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 2
I (19766430) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
E (19766450) can: can2: intr=4754855 rxpkt=4757535 txpkt=0
errflags=0x23001001 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
E (19766460) can: can2: intr=4754855 rxpkt=4757536 txpkt=0
errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
E (19766520) can: can2: intr=4754855 rxpkt=4757539 txpkt=0
errflags=0x22001002 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 9
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 37
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 30
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 6
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 2
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766520) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
I (19766530) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766530) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
I (19766530) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 1
E (19766540) can: can2: intr=4754861 rxpkt=4757548 txpkt=0
errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766550) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 3
E (19766560) can: can2: intr=4754861 rxpkt=4757552 txpkt=0
errflags=0x23001001 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
E (19766610) can: can2: intr=4754861 rxpkt=4757553 txpkt=0
errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0
txovr=0 txdelay=0 txfail=0 wdgreset=1 errreset=0
I (19766610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 4
I (19766610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 5
I (19766610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 33
I (19766610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 34
I (19766610) vehicle-poll: Poller[Frame]: RX Task Queue
Overflow Run 12
W (19767070) websocket: WebSocketHandler[0x3f8d79d8]: job
queue overflow detected
W (19767140) websocket: WebSocketHandler[0x3f8d79d8]: job
queue overflow detected
W (19767180) websocket: WebSocketHandler[0x3f8d79d8]: job
queue overflow resolved, 1 drops
W (19767260) websocket: WebSocketHandler[0x3f8d79d8]: job
queue overflow detected
W (19767310) websocket: WebSocketHandler[0x3f8d79d8]: job
queue overflow resolved, 1 drops
Cheers
Simon
Am 15.01.2025 um 19:18 schrieb Derek Caudwell via OvmsDev:
Since the following email I have high
confidence the issue on the Leaf is related/caused by the
poller as there has been no further occurrence and Chris has
also experienced the car going to neutral on the new poller
firmware.
....
I haven't ruled out it being a fault with
my car yet. Shortly after it faulted the car was run
into so has been off the road for sometime, my first
step was to replace 12V battery. The ovms unit is now
unplugged and if it does not fault over the next month
while driving I'll be reasonably confident it's ovms
related.
Not sure which firmware version the poller
updates were included in but it was only after upgrading
to it that the errors occurred (which could be
coincidental however it has faulted twice more both on
version 3.3.004-141-gf729d82c). For periods where I
reverted to 3.3.003 it was fine.
It might be useful to have an extra option
on the enable can write to only enable it when the car
is parked/charging.
Derek, can you comment on this? Do you still have
the issue mentioned?
Speaking of the Leaf, the code there actually does
something fishy in `CommandWakeup()`:
unsigned char data = 0;
…
m_can1->WriteStandard(0x5C0, 8,
&data); //Wakes up the VCM (by spoofing empty
battery request heating)
And there are CAN transmissions from command execution
in the Leaf code, so to be sure there is no TX, listen
mode is needed there as well.
Regarding CAN trace, you can try with the monitor
channel and use the USB console to record the log
output. That way you should be able to at least rule out
there are regular transmissions -- assuming not every TX
will cause an immediate crash.
Another / probably better option: when in listen mode,
the log will also tell you about TX attempts, as the
driver will issue a warning on each.
The CAN monitor log may also tell you about bus errors
detected.
Regards,
Michael
Am 15.01.25 um 10:44 schrieb Simon Ehlen via
OvmsDev:
There is at least one more Leaf from Derek that
has also ended up in limp mode since the new poller.
There, too, no polling was actually carried out
while driving.
I'm not familiar with the framework at all, but does
this perhaps offer you enough of an approach to
recognize a commonality?
Unfortunately I do not have a tool to create a CAN
trace.
I can't use the OVMS for this, as it no longer
responds after a bus crash until I have disconnected
it from the OBD and plugged it back in.
Cheers,
Simon
Am 15.01.2025 um 10:18 schrieb Michael Balzer via
OvmsDev:
The frame acknowledging is
done automatically by the CAN transceiver when in
active mode, this is part of the CAN protocol to
indicate bit errors to the sender.
So normally a failure of the system to process
received frames fast enough cannot cause any issue
on the bus. But the ESP32 has a range of known
hardware issues, especially in the embedded CAN
transceiver, I wouldn't be surprised if there are
more, or if our driver lacks some workaround for
these.
Maybe the new poller has some bug that causes false
transmissions from process data frames received. But
many vehicles send process data frames, and we've
had no issue reports like this on any of them, and
ECUs also normally simply ignore out of sequence
protocol frames.
You could record a CAN trace to see if there are
transmissions, and what kind. If you don't poll and
don't send frames from your code, there should be
none. If the bus still crashes, that would be an
indicator for something going wrong in the
transceiver.
Regards,
Michael
Am 15.01.25 um 07:15 schrieb Simon Ehlen via
OvmsDev:
Thanks Mark for the explanation.
So does this mean that OVMS tries to acknowledge
all incoming messages in active mode?
This seems to me to clearly exceed the capacity
of OVMS with the mass of incoming messages.
In fact, I am currently opening the bus in
active mode, as I was hoping to get my code to
revive the BMS in OVMS to work.
However, unlike the other data, I only get the
cell voltages when I actively poll them.
To be on the safe side, I will now open the bus
in read mode again.
However, I am still wondering what in the poller
change has altered the OVMS in such a way that
the bus is now crashing completely.
Currently I have undone the changes since
february 2024, this only concerns code from the
public repository, there was no change from me
in that period.
Now the OVMS is running stable again and there
are neither queue overflows nor bus crashes.
I had also previously increased the following
queue sizes, but unfortunately this was not
successful:
CONFIG_OVMS_HW_EVENT_QUEUE_SIZE=120
CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=80
CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=80
Cheers,
Simon
Am 15.01.2025 um 02:08 schrieb Mark
Webb-Johnson:
Note sure if this helps, but some comments:
Remember than CAN protocol in normal mode
is an ‘active’ protocol. Nodes on the bus
are actively acknowledging messages (and
that includes OVMS), even if they never
write messages. So in normal mode there is
no absolute ‘read access’. For example,
opening the CAN port at the wrong baud rate,
even if not writing messages, will mess up
the bus.
However, if you open the CAN port in
‘listen’ mode, then it is truly read-only.
In that mode it will not acknowledge
messages, and cannot write on the bus at
all. I’ve never seen an OVMS mess up a bus
in listen mode, even if the baud rate is
wrong. I think the only way for that to
happen would be a hardware layer issue
(cabling, termination, etc).
But what is the reason that a
read access to the bus can cause
the bus to crash?
This is not critical during
charging, it just aborts the
charging process with an error.
While driving, this results in a
“stop safely now” error message on
the dashboard and the engine is
switched off immediately.
Cheers,
Simon
Am 14.01.2025 um 23:22 schrieb
Michael Geddes via OvmsDev:
You may need to increase
the queue size for the poll
task queue.
The poller still
handles the bus to vehicle
notifications even if it is
off.
Any poller
logging on such an intensive
load of can messages is likely
to be a problem. This is part
of the reason it is flagged
off.
I finally got around to
merging my code with the
current master (previous
merge february 2024).
I have rebuilt my code for
a Ford Focus Electric so
that it uses the new
OvmsPoller class.
However, I now see a lot
of entries like this in my
log:
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 8
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 3
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 2
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 2
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (246448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 24
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
I (254448) vehicle-poll:
Poller[Frame]: RX Task
Queue Overflow Run 1
Was this message just
hidden before or do I need
to make further
adjustments to my code?
My code currently does not
use active polling but
reads on the busses
(IncomingFrameCanX) on
certain modules.
When I look at poller
times status, it looks
very extensive to me...