I realise that I was only using the standard cable to test - which probably is not sufficient - I haven't looked closely at how the Leaf OBD to Db9 cable is different from standard. Ah, my bad out the queue length. We are definitely queueing more messages though. From my log of when the overflow happened, the poller was in state 0 which means OFF - ie nothing was being sent!! I'll look at the TX message thing - opt in sounds good - though it shouldn't be playing that much of a part here as the TXs are infrequent in this case (or zero when the leaf is off or driving) - On the ioniq 5 when I'm using the HUD - I'm polling quite frequently - multiple times per second and that seems to be fine!. I did find an issue with the throttling .. but it would still mostly apply the throttling where it matters, so again, it shouldn't be the problem (also, we aren't transmitting in the leaf case). The change I made to the logging of RX messages showed how many in a row were dropped... and it was mostly 1 only in a run - which means even if it is a short time between - that means that the drops are being interleaved by at least one success! Sooo.. I'm still wondering what is going on. Some things I'm going to try: * If the number of messages on the Can bus (coming in through RX) means that the queue is slowly getting longer and not quite catching up, then making the queue longer will help it last longer... but only pushes the problem down the road. - Add 'current queue length' to the poller status information to see if this is indeed the case? - Add some kind of alert when the queue reaches a % full? * Once you start overflowing and getting overflow log messages, I wonder if this is then contributing to the problem. - Push the overflow logging into Poller Task which can look at how many drops occurred since last received item. * Split up the flags for the poller messages into 2: - Messages that are/could be happening in the TX/RX tasks - Other noisy messages that always happen in the poller task. Thoughts on what else we might measure to figure out what is going on? //.ichael On Sun, 5 May 2024, 19:29 Michael Balzer via OvmsDev, < ovmsdev@lists.openvehicles.com> wrote:
Michael,
the queue size isn't in bytes, it's in messages:
* @param uxQueueLength The maximum number of items that the queue can contain. * * @param uxItemSize The number of bytes each item in the queue will require.
Also, from the time stamps in Dereks log excerpt, there were quite some dropped frames in that time window -- at least 23 frames in 40 ms, that's bad.
Queue sizes are currently:
CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=60 CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=60
The new poller now channels all TX callbacks through the task queue additionally to RX and commands. So setting the queue size to be larger than the CAN RX queue size seems appropriate.
Nevertheless, an overflow with more than 60 waiting messages still indicates some too long processing time in the vehicle task.
TX callbacks previously were done directly in the CAN context, and no current vehicle overrides the empty default handler, so this imposed almost no additional overhead. By requiring a queue entry for each TX callback, this feature now has a potentially high impact for all vehicles. If passing these to the task is actually necessary, it needs to become an opt-in feature, so only vehicles subscribing to the callback actually need to cope with that additional load & potential processing delays involved.
Regards, Michael
Am 05.05.24 um 11:59 schrieb Michael Geddes via OvmsDev:
OK.
I did some logging changes to confirm that it's only _just_ getting overwhelmed..ie dropping 1 (or occasionally 2) RX frames in a row but then picking the next ones up...
So increasing the queue size (I doubled it but maybe that was excessive) seems to do the trick. I think the queue size is in bytes and the queue packet size has increased at least a little bit.
Michael
On Sat, 4 May 2024, 09:24 Michael Geddes, <frog@bunyip.wheelycreek.net> wrote:
Hey Derek,
I do have a friend with a Leaf in his EV collection, so I can try grab some more info from his car and work out exactly what (tf) is going on.
//.
On Sat, 4 May 2024 at 08:13, <frog@bunyip.wheelycreek.net> wrote:
HI Derek,
For a start, I’m very sorry that the ovms put your car in limp mode. That’s quite bad.
Just FYI those overflows that you have copied are about receiving and not sending information, so maybe the poller thread is being overwhelmed or stalled somehow.
I’ve been looking at the code and wondered if you have the ‘can write’ flag set to true or not? (Though it shouldn’t make that much difference)
Even if it were somehow mistakenly polling (ie in the wrong pollstate), it should not be doing it that frequently – (at most once a minute). And the poll throttle would still be on 1 per second.. so that should put a limit on sends.. and so it is really worrying that somehow that is not happening :/
If you have more logs that you send me that don’t just include that particular overflow message that would be good. Did it crash/reboot? Maybe it crashed repeatedly?
//.ichael
*From:* OvmsDev <ovmsdev-bounces@lists.openvehicles.com> *On Behalf Of *Derek Caudwell via OvmsDev *Sent:* Friday, May 3, 2024 6:54 PM *To:* OVMS Developers <ovmsdev@lists.openvehicles.com> *Cc:* Derek Caudwell <d.caudwell@gmail.com> *Subject:* Re: [Ovmsdev] OVMS Poller module/singleton
Hi Michael,
When running firmware 3.3.004-74-gbd4e7196 on my Nissan Leaf I suspect (but can't be 100% sure as it's only been 24h without fault) the new poller caused the car to throw the attached faults from overloading the can bus whilst driving. The fault was sufficient to send the car into limp mode and could not be driven until cleared with LeafSpy. Ovms should not be polling in the driving state, so I'm not sure why the poller is overflowing.
The ovms log repeats this overflow in rapid succession right before the failure occurred. I have since reverted to an earlier version and it might be advisable for other Leaf owners to do so until the new poller is fully operational.
2024-05-02 12:22:28.316 NZST I (36847486) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.316 NZST I (36847486) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.316 NZST I (36847486) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.326 NZST I (36847496) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.336 NZST I (36847506) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.336 NZST I (36847506) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.336 NZST I (36847506) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.336 NZST I (36847506) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.346 NZST I (36847516) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.346 NZST I (36847516) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.346 NZST I (36847516) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.346 NZST I (36847516) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.356 NZST I (36847526) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.356 NZST I (36847526) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.356 NZST I (36847526) vehicle-poll: Poller[Frame]: Task Queue Overflow
2024-05-02 12:22:28.356 NZST I (36847526) vehicle-poll: Poller[Frame]: Task Queue Overflow
On Fri, 3 May 2024 at 05:40, Michael Balzer via OvmsDev < ovmsdev@lists.openvehicles.com> wrote:
A task may delete itself by calling vTaskDelete(NULL) as the final statement, and that's normally a better way to shutdown a task, if you can signal the task it should terminate. Deleting a task from outside may always result in resources taken by the task not being freed, as the task gets terminated wherever it just is.
As the vehicle task also needs to access vehicle data memory, vehicle destruction should not begin until the task has finished, otherwise a task job already running while the vehicle destruction takes place may run into an illegal memory access (access after free). Keep in mind, destructors are executed bottom up, so waiting for the task shutdown within the destructor won't help.
Also, I think "MyCan.DeregisterCallback(TAG)" in ShuttingDown() is now obsolete.
Regards, Michael
Am 02.05.24 um 15:23 schrieb Michael Geddes:
Thanks.
On that race condition... do you think we are better to just delete the task or I saw one recommendation that
We can delete it outside the while loop in the task itself!
//.ichael
On Tue, 30 Apr 2024, 17:46 Michael Balzer via OvmsDev, < ovmsdev@lists.openvehicles.com> wrote:
OK, understood & merged, thanks for the quick fix/workaround.
I'll ask the Twizy driver to test this.
Regards, Michael
PS: side note: there's a shutdown race condition arising from `while (!m_is_shutdown)` in the new `OvmsVehicle::VehicleTask()`: if the task receives a message while m_is_shutdown already has been set, it will exit the loop and return from the task function, which isn't allowed. I think FreeRTOS will abort in that case.
Am 30.04.24 um 08:55 schrieb Michael Geddes:
No it will start the poll thread of the poller fine for those that don't use it. That's the last commit.... so that should be covered. When the callback is registered the thread is force started.
This is not to say that what you are suggesting isn't a better way forward... just that it should work.
Michael
On Tue, 30 Apr 2024, 14:51 Michael Balzer via OvmsDev, < ovmsdev@lists.openvehicles.com> wrote:
Michael,
(taking this back to the list, as other developers may have helpful ideas or suggestions)
from a first check, your new patch (PR #1008) won't help for vehicles that don't use the poller, like the generic DBC vehicle or the Fiat 500 (plus possibly non-public third party vehicles). As the poller is normally compiled in, these would need a run time switch to re-enable the vehicle task, or more suitably default into that configuration and switch to the poller task only when the poller gets initialized.
How about moving the task back into the vehicle class, but keeping your task message extensions while doing so, and adding a way for the poller to hook into the task? The dedicated vehicle task is an essential part of the vehicle framework, but I do remember some occasions where a general way to pass custom messages to the vehicle task would have been helpful to use the task for more than CAN processing.
If the poller shall become avaible as a separate service that can be used without any vehicle instance (currently not a defined use case), the construct of a CAN processor task that can be extended for custom messages (or a message processor task that can subscribe to CAN frames) could be factored out into a dedicated class or template. The poller could then create his own instance of that class, if it cannot hook into an existing one.
Btw, in case you're not aware of this, FreeRTOS also provides queue sets (https://www.freertos.org/RTOS-queue-sets.html). We haven't used them in the OVMS yet, but they could be useful, especially if message unions become large or tasks shall be able to dynamically subscribe to different message sources.
Regards, Michael
Am 30.04.24 um 01:20 schrieb Michael Geddes:
It might be worth reverting, but, I've got a patch suggestion that I'll push up which will let me know if I understand everything and which might provide a solution.
If this isn't going to work then revert. (P/R coming)
While the poller was still a part of the vehicle class it was probably still all ok. The poller had taken over what I assume you are talking about as the vehicle task. A call-back fromthe poller was calling IncomingPollRxFrame which was then coming from the (now) poller task (is that correct?)
While we have OVMS_COMP_POLLER config defined, we could just use the poller task to provide the IncomingPollRxFrame call-back from the poller (was vehicle?) task.
The problem is when OVMS_COMP_POLLER is undefined, we need an alternate (maybe we could use the 'Event' loop then) which is when I hacked that bad solution.
//.ichael
I was thinking though that because everything is being queued we could divert some calls into the car
On Mon, 29 Apr 2024, 18:58 Michael Balzer, <dexter@expeedo.de> wrote:
Michael,
I've found a severe design flaw with your poller change. Sorry, I should have seen that before, and shouldn't have merged this.
You have moved the standard CAN frame processing from the vehicle task into the CAN task by changing the vehicle from a CAN listener to a CAN callback processor. That will break all kind of things, vehicles like the Twizy, Smart and many others rely on frame processing being done in the dedicated vehicle task.
This especially induces issues regarding CAN processing capacity, as the vehicles rely on having a separate context and not needing to decouple complex processing of incoming frames. So this will degrade the CAN processing performance seriously in many cases, where additional steps involving file or network I/O need to be done.
So this needs to be changed back.
I assumed you introduced a new dedicated poller task but kept the vehicle task intact. From your naming (OvmsVehicle::IncomingPollRxFrame), it seems you misinterpreted the vehicle task's purpose as only being used for polling.
I assume this isn't a small change…? If so we should revert the poller merge, to avoid having a defunctional edge build state until the fix.
Secondly: logging is generally expensive regardless of the log level control and log channels enabled. Any log message needs to be queued to the log system, so involves a lock and a potential wait state (for the queue) & resulting context switch (= minimum delay of 10 ms). Therefore, logging must be avoided as far as possible in any time critical context, and is forbidden under some circumstances, e.g. in a timer callback (see FreeRTOS docs on this). The log system load multiplies with connected web clients or enabled file logging and enabled debug / verbose level logging -- that quickly becomes too much, usually triggering the task watchdog.
Your logging of nearly all bus events passing to/from the poller (especially the log entry in Queue_PollerFrame) becomes an issue on any vehicle that has high frequency running process data frames on the bus, like the Twizy or Smart. As a counter measure, I've just added a runtime control for all the poller's verbose logging (by default now off), and changed some debug level logs to verbose. Not sure if I've catched all that may need to be silenced by default. Please check all your log calls and place them under the new "trace" control flag wherever appropriate.
This won't help avoiding issues with process data frame buses though.
Shall I revert the poller merge for now?
Regards, Michael
Am 29.04.24 um 06:18 schrieb Michael Geddes:
Btw I included the log changes in the p/r which is a few small commits for the poller.
//.
On Mon, 29 Apr 2024, 07:20 Michael Geddes, <frog@bunyip.wheelycreek.net> wrote:
Hey Michael,
I've got to work now (it's Monday), but I suspect those 'giving up' are from unsolicited messages on the bus.
I can re-order things so that the message will likely be 'dropped (no poll entry)' rather than the time-out message.
And make it a verbose log.
//.ichael
On Mon, 29 Apr 2024 at 00:25, Michael Balzer <dexter@expeedo.de> wrote:
Michael,
forwarding the Twizy logs to you directly, as they contain the user location.
He has just verified it's the new version, he says the module stops responding as soon as he turns on the Twizy.
His description of his actions is a bit ambiguous, and it seems he didn't enable logging to the file persistently.
According to the server, these were the version boot times:
2024-04-28 15:30:56 0 3.3.004-32-g125e0841/ota_0/edge (build idf v3.3.4-849-g6e214dc335 Apr 26 2024 19:16:50) 2024-04-28 16:21:08 0 3.3.004-32-g125e0841/ota_0/edge (build idf v3.3.4-849-g6e214dc335 Apr 26 2024 19:16:50) 2024-04-28 16:38:33 0 3.3.004-63-gf3595561/ota_1/edge (build idf v3.3.4-849-g6e214dc335 Apr 27 2024 07:44:50) 2024-04-28 16:40:57 0 3.3.004-63-gf3595561/ota_1/edge (build idf v3.3.4-849-g6e214dc335 Apr 27 2024 07:44:50) 2024-04-28 16:43:14 0 3.3.004-63-gf3595561/ota_1/edge (build idf v3.3.4-849-g6e214dc335 Apr 27 2024 07:44:50) 2024-04-28 16:46:39 0 3.3.004-63-gf3595561/ota_1/edge (build idf v3.3.4-849-g6e214dc335 Apr 27 2024 07:44:50) 2024-04-28 16:54:44 0 3.3.004-32-g125e0841/ota_0/edge (build idf v3.3.4-849-g6e214dc335 Apr 26 2024 19:16:50)
Attached is also his crash debug log -- a2ll doesn't tell much about what happened, but maybe you get an idea from this.
After the boot at 16:46, there are immediately lots of these messages:
2024-04-28 16:46:33.792 CEST D (39042) vehicle-poll: Poller: Queue PollerFrame() 2024-04-28 16:46:33.792 CEST D (39042) vehicle-poll: [1]Poller: Incoming - giving up
Regards, Michael
Am 28.04.24 um 16:44 schrieb Michael Geddes:
Ah. OK.
I could try to fix the vin thing using the new way of doing it and get rid of a semaphore?
It would at least identify the problem possibly?
Michael
On Sun, 28 Apr 2024, 22:32 Michael Balzer via OvmsDev, < ovmsdev@lists.openvehicles.com> wrote:
Not sure if that's the problem, but I've found a different behaviour with the new PollSetState() implementation.
The old version only did anything if the new state actually was different from the previous one. The Twizy relies on this behaviour, calling PollSetState() from the per second ticker (see OvmsVehicleRenaultTwizy::ObdTicker1).
The new implementation apparently always sends the PollState command to the task, and that in turn always at least locks the poller mutex. Not sure if/how that could cause the observed issues, but it definitely adds quite some (unnecessary?) lock/unlock operations.
Am 28.04.24 um 16:05 schrieb Michael Balzer via OvmsDev:
The Twizy uses the poller to query the VIN (once) and DTCs (every 10 seconds while driving), see rt_obd2.cpp.
It also has its own version of the OBD single request (OvmsVehicleRenaultTwizy::ObdRequest), which was the precursor for the generalized version. This is used by custom/user Twizy plugins and scripts to access ECU internals.
The Twizy doesn't use IncomingPollRxFrame, but the Twizy's IncomingPollReply handler will log any poll responses it doesn't know about, so that could lead to a lot of log output if something goes wrong there.
Am 28.04.24 um 15:49 schrieb Michael Geddes via OvmsDev:
AFAICT the twizzy doesn't use the poller list at all. So is it missing a call-back or something??
I can see a potential problem with IncomingPollRxFrame being called twice as much as it should be but only when there is a poll list. Maybe commenting out this would do it. (I can find another away to get this called on the thread I want). This might be the problem with the smarted
void OvmsVehicle::OvmsVehicleSignal::IncomingPollRxFrame(canbus* bus, CAN_frame_t *frame, bool success) { //if (Ready()) // m_parent->IncomingPollRxFrame(frame, success); }
//.
On Sun, 28 Apr 2024 at 21:10, Michael Balzer via OvmsDev < ovmsdev@lists.openvehicles.com> wrote:
There may also be an issue with the Renault Twizy, I've received a report of a user who is using the edge builds, that the latest build wouldn't work.
He reports all kinds of errors and warnings signaled by the car during driving, and switching back to the previous build fixed the issues.
I've asked him to provide a debug log excerpt if possible.
Regards, Michael
Am 28.04.24 um 14:29 schrieb Michael Geddes via OvmsDev:
OK. That's bad.
Does the reading work in general?
Is it just the writing commands?
Raise a ticket on github and tag me in and we can address it that way.
Michael
On Sun, 28 Apr 2024, 19:49 Thomas Heuer via OvmsDev, < ovmsdev@lists.openvehicles.com> wrote:
Hi,
The new poller code doesn't seem to work properly with the smarted.
D (218831) vehicle-poll: [1]PollerNextTick(PRI): cycle complete for ticker=215
V (218831) vehicle-poll: Standard Poll Series: List reset
D (218831) vehicle-poll: PollSeriesList::NextPollEntry[!v.standard]: ReachedEnd
V (218831) vehicle-poll: [1]PollerSend: Poller Reached End
D (219691) vehicle-poll: Poller: Queue PollerFrame()
D (219691) vehicle-poll: Poller: Queue PollerFrame()
V (219691) vehicle-poll: Pollers: FrameRx(bus=2)
D (219691) vehicle-poll: Poller: Queue PollerFrame()
V (219691) vehicle-poll: Pollers: FrameRx(bus=2)
V (219691) vehicle-poll: Pollers: FrameRx(bus=2)
D (219691) vehicle-poll: Poller: Queue PollerFrame()
*OVMS#* unlock 22 Vehicle unlocked
V (219691) vehicle-poll: Pollers: FrameRx(bus=2)