On Thu, 16 May 2024 at 10:29, Michael Geddes <frog@bunyip.wheelycreek.net> wrote:

Thanks Michael,

My calculations give me ((2^32)-1) / (1000*1000*3600) = only 1.2 hours of processing time in 32bit. The initial subtraction is 64bit anyway and I can't see a further 64bit addition being a problem. I have the calculations being performed in doubles at print-out where performance is not really an issue anyway. (Though apparently doing 64 bit division is worse than floating point).

In addition
* I currently have this being able to be turned on and off and reset manually (only do it when required).

* For the lower volume commands, the smoothed average is not going to be useful - the count is more interesting for different reasons.

* The total time is quite useful. Ie a high average time doesn't matter if the count is low. The things that are affecting performance are stuff with high total time. Stuff which is happening 100 times a second needs to be a much lower average than once a second.

* A measure like 'time per minute/second' and possibly count per minute/seconds as a smoothed average would potentially be more useful. (or in addition?)
I think we could do _that_ in a reasonably efficient manner using a 64 bit 'last measured time', a 32 bit accumulated value and the stored 32 bit rolling average.
It would boils down to some iterative (integer) sums and multiplications plus a divide by n ^ (time periods passed) - which is a shift - and which can be optimised to '0' if 'time-periods-passed' is more than 32/(bits-per-n) - effectively limiting the number of iterations.
The one issue I can see is that we need to calculate 'number of time-periods passed' which is a 64 bit subtraction followed by a 32 bit division (not optimisable to a simple shift).

* I'm also happy to keep a rolling (32bit) average time.
Even if you assume averages in the 100ms, 32bit is going to happily support an N of 64 or even 128.
Am I right in thinking that the choice of N is highly dependent on frequency. For things happening 100 times per second, you might want an N like 128.. where things happening once per
second, you might want an N of 4 or 8. The other things we keep track of in this manner we have a better idea of the frequency of the thing.

How about we have (per record type):
* total count (since last reset?) (32 bit)
* smoothed average of time per instance (32 bit)

* ?xx? total accumulated time since last reset (64bit) ?? <-- with the below stats this is much less useful

* last-measured-time (64 bit)
* accumulated count since last time-period (16bit - but maybe 32bit anyway for byte alignment?)
* smoothed average of count per time-period (32bit)
* accumulated time since last time-period (32bit)
* smoothed average of time per time-period (32bit)

It's possible to keep the

Is this going to be too much per record type? The number of 'records' we are keeping is quite low (so 10 to 20 maybe) - so it's not a huge memory burden.

Thoughts?

//.ichael
On Thu, 16 May 2024 at 03:09, Michael Balzer via OvmsDev <ovmsdev@lists.openvehicles.com> wrote:
esp_timer_get_time() is the right choice for precision timing.

I'd say uint32 is enough though, even if counting microseconds that can hold a total of more than 71 hours of actual processing time. uint64 has a significant performance penalty, although I don't recall the overhead for simple additions.

Also & more important, the average wouldn't be my main focus, but the maximum processing time seen per ID, which seems to be missing in your draft.

Second thought on the average… the exact overall average really has a minor meaning, I'd rather see the current average, adapting to the current mode of operation (drive/charge/…). I suggest feeding the measurements to a low pass filter to get the smoothed average of the last n measurements. Pattern:

runavg = ((N-1) * runavg + newval) / N

By using a low power of 2 for N (e.g. 8 or 16), you can replace the division by a simple bit shift, and have enough headroom to use 32 bit integers.

Regards,
Michael

Am 15.05.24 um 06:51 schrieb Michael Geddes via OvmsDev:
Formatting aside, I have implemented what I think Michael B was suggesting. This is a sample run on the Ioniq 5 (which doesn't have unsolicited RX events).

This uses the call esp_timer_get_time() got get a 64bit microseconds since started value - and works out the time to execute that way. It's looking at absolute time and not time in the Task - so other things going on at the same time in other tasks will have an effect. (The normal tick count doesn't have nearly enough resolution to be useful - any other ideas on measurement?) I've got total accumulated time displaying in seconds and the average in milliseconds currently - but I can change that easy enough.

The cumulative time is stored as uint64_t which will be plenty, as 32bit wouldn't be nearly enough.

OVMS# poller time on
Poller timing is now on

OVMS# poller time status
Poller timing is: on
Poll [PRI] : n=390 tot=0.2s ave=0.586ms
Poll [SRX] : n=316 tot=0.1s ave=0.196ms
CAN1 RX[0778] : n=382 tot=0.2s ave=0.615ms
CAN1 RX[07a8] : n=48 tot=0.0s ave=0.510ms
CAN1 RX[07bb] : n=162 tot=0.1s ave=0.519ms
CAN1 RX[07ce] : n=33 tot=0.0s ave=0.469ms
CAN1 RX[07ea] : n=408 tot=0.2s ave=0.467ms
CAN1 RX[07ec] : n=486 tot=0.2s ave=0.477ms
CAN3 RX[07df] : n=769 tot=0.2s ave=0.261ms
CAN1 TX[0770] : n=191 tot=0.0s ave=0.054ms
CAN1 TX[07a0] : n=16 tot=0.0s ave=0.047ms
CAN1 TX[07b3] : n=31 tot=0.0s ave=0.069ms
CAN1 TX[07c6] : n=11 tot=0.0s ave=0.044ms
CAN1 TX[07e2] : n=82 tot=0.0s ave=0.067ms
CAN1 TX[07e4] : n=54 tot=0.0s ave=0.044ms
Set State : n=7 tot=0.0s ave=0.104ms

This is probably going to be quite useful in general! The TX call-backs don't seem to be significant here. (oh, I should probably implement a reset of the values too).

//.ichael
On Sun, 12 May 2024 at 22:58, Michael Geddes <frog@bunyip.wheelycreek.net> wrote:
Yeah - I certainly wasn't going to put a hard limit. Just a log above a certain time, that being said, the idea of just collecting stats (being able to turn it on via a "poller timer" set of commands) would be much more useful. I'll look into that.

Average time is probably a good stat - and certainly what we care about.

I actually am hopeful that those couple of things I did might help reduce that average time quite a bit (that short-cutting the isotp protocol handling especially).

That p/r with logging changes might help reduce the unproductive log time further, but also makes it possible to turn on the poller logging without the RX task logs kicking in.

//.ichael
On Sun, 12 May 2024 at 22:29, Michael Balzer via OvmsDev <ovmsdev@lists.openvehicles.com> wrote:
Warning / gathering debug statistics about slow processing can be helpful, but there must not be a hard limit. Frame/poll response processing may need disk or network I/O, and the vehicle task may be starving from punctual high loads on higher priority tasks (e.g. networking) or by needing to wait for some semaphore -- that's outside the application's control, and must not lead to termination/recreation of the task (in case you're heading towards that direction).

I have no idea how much processing time the current vehicles actually need in their respective worst cases. Your draft is probably too lax, poll responses and frames normally need to be processed much faster. I'd say 10 ms is already too slow, but any wait for a queue/semaphore will already mean at least 10 ms (FreeRTOS tick). Probably best to begin with just collecting stats.

Btw, to help in narrowing down the actual problem case, the profiler could collect max times per RX message ID.

Regards,
Michael

Am 12.05.24 um 10:41 schrieb Michael Geddes:

I have a question for Michael B (or whoever) - I have a commit lined up that would add a bit of a time check to the poller loop. What do we expect the maximum time to execute a poller loop command should be?

This is a rough idea (in ms) I have.. based on nothing much really, so any ideas would be appreciated:

int TardyMaxTime_ms(OvmsPoller::OvmsPollEntryType entry_type)
{
switch (entry_type)
{
case OvmsPoller::OvmsPollEntryType::Poll: return 80;
case OvmsPoller::OvmsPollEntryType::FrameRx: return 30;
case OvmsPoller::OvmsPollEntryType::FrameTx: return 20;
case OvmsPoller::OvmsPollEntryType::Command: return 10;
case OvmsPoller::OvmsPollEntryType::PollState: return 15;
default: return 80;
}
}

//.ichael

On Mon, 6 May 2024 at 07:45, Michael Geddes <frog@bunyip.wheelycreek.net> wrote:

I realise that I was only using the standard cable to test - which probably is not sufficient - I haven't looked closely at how the Leaf OBD to Db9 cable is different from standard.

Ah, my bad out the queue length. We are definitely queueing more messages though. From my log of when the overflow happened, the poller was in state 0 which means OFF - ie nothing was being sent!!

I'll look at the TX message thing - opt in sounds good - though it shouldn't be playing that much of a part here as the TXs are infrequent in this case (or zero when the leaf is off or driving) - On the ioniq 5 when I'm using the HUD - I'm polling quite frequently - multiple times per second and that seems to be fine!.

I did find an issue with the throttling .. but it would still mostly apply the throttling where it matters, so again, it shouldn't be the problem (also, we aren't transmitting in the leaf case).

The change I made to the logging of RX messages showed how many in a row were dropped... and it was mostly 1 only in a run - which means even if it is a short time between - that means that the drops are being interleaved by at least one success!

Sooo.. I'm still wondering what is going on. Some things I'm going to try:

* If the number of messages on the Can bus (coming in through RX) means that the queue is slowly getting longer and not quite catching up, then making the queue longer will help it last longer... but only pushes the problem down the road.

- Add 'current queue length' to the poller status information to see if this is indeed the case?

- Add some kind of alert when the queue reaches a % full?

* Once you start overflowing and getting overflow log messages, I wonder if this is then contributing to the problem.

- Push the overflow logging into Poller Task which can look at how many drops occurred since last received item.

* Split up the flags for the poller messages into 2:

- Messages that are/could be happening in the TX/RX tasks

- Other noisy messages that always happen in the poller task.

Thoughts on what else we might measure to figure out what is going on?

//.ichael

On Sun, 5 May 2024, 19:29 Michael Balzer via OvmsDev, <ovmsdev@lists.openvehicles.com> wrote:

Michael,

the queue size isn't in bytes, it's in messages:

* @param uxQueueLength The maximum number of items that the queue can contain.
*
* @param uxItemSize The number of bytes each item in the queue will require.

Also, from the time stamps in Dereks log excerpt, there were quite some dropped frames in that time window -- at least 23 frames in 40 ms, that's bad.

Queue sizes are currently:

CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=60
CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=60

The new poller now channels all TX callbacks through the task queue additionally to RX and commands. So setting the queue size to be larger than the CAN RX queue size seems appropriate.

Nevertheless, an overflow with more than 60 waiting messages still indicates some too long processing time in the vehicle task.

TX callbacks previously were done directly in the CAN context, and no current vehicle overrides the empty default handler, so this imposed almost no additional overhead. By requiring a queue entry for each TX callback, this feature now has a potentially high impact for all vehicles. If passing these to the task is actually necessary, it needs to become an opt-in feature, so only vehicles subscribing to the callback actually need to cope with that additional load & potential processing delays involved.

Regards,
Michael
-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev