I have a question for Michael B (or whoever) - I have a commit lined up that would add a bit of a time check to the poller loop. What do we expect the maximum time to execute a poller loop command should be?

This is a rough idea (in ms) I have.. based on nothing much really, so any ideas would be appreciated:

int TardyMaxTime_ms(OvmsPoller::OvmsPollEntryType entry_type)
{
switch (entry_type)
{
case OvmsPoller::OvmsPollEntryType::Poll: return 80;
case OvmsPoller::OvmsPollEntryType::FrameRx: return 30;
case OvmsPoller::OvmsPollEntryType::FrameTx: return 20;
case OvmsPoller::OvmsPollEntryType::Command: return 10;
case OvmsPoller::OvmsPollEntryType::PollState: return 15;
default: return 80;
}
}

//.ichael

On Mon, 6 May 2024 at 07:45, Michael Geddes <frog@bunyip.wheelycreek.net> wrote:

I realise that I was only using the standard cable to test - which probably is not sufficient - I haven't looked closely at how the Leaf OBD to Db9 cable is different from standard.

Ah, my bad out the queue length. We are definitely queueing more messages though. From my log of when the overflow happened, the poller was in state 0 which means OFF - ie nothing was being sent!!

I'll look at the TX message thing - opt in sounds good - though it shouldn't be playing that much of a part here as the TXs are infrequent in this case (or zero when the leaf is off or driving) - On the ioniq 5 when I'm using the HUD - I'm polling quite frequently - multiple times per second and that seems to be fine!.

I did find an issue with the throttling .. but it would still mostly apply the throttling where it matters, so again, it shouldn't be the problem (also, we aren't transmitting in the leaf case).

The change I made to the logging of RX messages showed how many in a row were dropped... and it was mostly 1 only in a run - which means even if it is a short time between - that means that the drops are being interleaved by at least one success!

Sooo.. I'm still wondering what is going on. Some things I'm going to try:

* If the number of messages on the Can bus (coming in through RX) means that the queue is slowly getting longer and not quite catching up, then making the queue longer will help it last longer... but only pushes the problem down the road.
- Add 'current queue length' to the poller status information to see if this is indeed the case?
- Add some kind of alert when the queue reaches a % full?
* Once you start overflowing and getting overflow log messages, I wonder if this is then contributing to the problem.
- Push the overflow logging into Poller Task which can look at how many drops occurred since last received item.
* Split up the flags for the poller messages into 2:
- Messages that are/could be happening in the TX/RX tasks
- Other noisy messages that always happen in the poller task.

Thoughts on what else we might measure to figure out what is going on?

//.ichael

On Sun, 5 May 2024, 19:29 Michael Balzer via OvmsDev, <ovmsdev@lists.openvehicles.com> wrote:

Michael,

the queue size isn't in bytes, it's in messages:

* @param uxQueueLength The maximum number of items that the queue can contain.
*
* @param uxItemSize The number of bytes each item in the queue will require.

Also, from the time stamps in Dereks log excerpt, there were quite some dropped frames in that time window -- at least 23 frames in 40 ms, that's bad.

Queue sizes are currently:

CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=60
CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=60

The new poller now channels all TX callbacks through the task queue additionally to RX and commands. So setting the queue size to be larger than the CAN RX queue size seems appropriate.

Nevertheless, an overflow with more than 60 waiting messages still indicates some too long processing time in the vehicle task.

TX callbacks previously were done directly in the CAN context, and no current vehicle overrides the empty default handler, so this imposed almost no additional overhead. By requiring a queue entry for each TX callback, this feature now has a potentially high impact for all vehicles. If passing these to the task is actually necessary, it needs to become an opt-in feature, so only vehicles subscribing to the callback actually need to cope with that additional load & potential processing delays involved.

Regards,
Michael