[Ovmsdev] OVMS Poller module/singleton

Sun May 19 08:06:00 HKT 2024

I was so focused on how I calculated the value that I totally missed that ‰
would be a better description.  I could also use the  system 'Ratio'
unit... so % or ‰.

I'll make space to put 'Avg' on the row.  Was trying to limit the width for
output on a mobile.  I agree it would make it easier to understand.
Totals also makes sense.

Should I make this a configuration that can be set on the web-page?   I'd
probably use a configuration change notification so that the very bit
setting is sync'd with the 'configuration' value.

//.ichael

On Sat, 18 May 2024, 14:05 Michael Balzer, <dexter at expeedo.de> wrote:

> I'm not sure whether the 'max' should be the maximum of the smoothed
> value.. or the maximum of the raw value.
>
>
> It should normally be the maximum of the raw value I think, the maximum of
> the smoothed value cannot tell about how bad the processing of an ID can
> become.
>
> The naming in the table is a bit confusing I think. (besides: I've never
> seen "ave" as the abbreviation for average)
>
> If I understand you correctly, "time ms per s" is the time share in per
> mille, so something in that direction would be more clear, and "length ms"
> would then be "time [ms]".
>
> The totals for all averages in the table foot would also be nice.
>
> Maybe "Ave" (or avg?) also should be placed on the left, as the "peak"
> label now suggests being the peak of the average.
>
> Btw, keep in mind, not all "edge" users / testers are developers (e.g. the
> Twizy driver I'm in contact with), collecting stats feedback for vehicles
> from testers should be straight forward. Maybe add a data/history record,
> sent automatically on every drive/charge stop when the poller tracing is on?
>
> Regards,
> Michael
>
>
> Am 18.05.24 um 02:28 schrieb Michael Geddes:
>
> You did say max/pead value.  I also halved the N for both.
> I'm not sure whether the 'max' should be the maximum of the smoothed
> value.. or the maximum of the raw value.
> This is currently the raw-value maximum.
> The problem is that the middle column is the maximum of the {{sum over
> 10s} / (10*1000,000)
> I could easily change the 'period' to 1s and see how that goes.. was just
> trying to reduce the larger calculations.
>
>
> Usage: poller [pause|resume|status|times|trace]
>
> OVMS# poller time status
> Poller timing is: on
> Type         | Count    | Ave time  | Ave length
>              | per s    | ms per s  | ms
> -------------+----------+-----------+-----------
> Poll:PRI     |      1.00|      0.559|      0.543
>  peak        |          |      0.663|      1.528
> -------------+----------+-----------+-----------
> Poll:SRX     |      0.08|      0.009|      0.038
>  peak        |          |      0.068|      0.146
> -------------+----------+-----------+-----------
> CAN1 RX[778] |      0.11|      0.061|      0.280
>  peak        |          |      0.458|      1.046
> -------------+----------+-----------+-----------
> CAN1 RX[7a8] |      0.04|      0.024|      0.124
>  peak        |          |      0.160|      0.615
> -------------+----------+-----------+-----------
> CAN1 TX[770] |      0.05|      0.004|      0.016
>  peak        |          |      0.022|      0.102
> -------------+----------+-----------+-----------
> CAN1 TX[7a0] |      0.02|      0.002|      0.011
>  peak        |          |      0.010|      0.098
> -------------+----------+-----------+-----------
> CAN1 TX[7b3] |      0.01|      0.001|      0.006
>  peak        |          |      0.000|      0.099
> -------------+----------+-----------+-----------
> CAN1 TX[7e2] |      0.02|      0.002|      0.011
>  peak        |          |      0.010|      0.099
> -------------+----------+-----------+-----------
> CAN1 TX[7e4] |      0.08|      0.008|      0.048
>  peak        |          |      0.049|      0.107
> -------------+----------+-----------+-----------
> Cmd:State    |      0.00|      0.000|      0.005
>  peak        |          |      0.000|      0.094
>
> On Fri, 17 May 2024 at 15:26, Michael Geddes <frog at bunyip.wheelycreek.net>
> wrote:
>
>> This is what I have now.
>> The one on the end is the one  MIchael B was after using an N of 32. (up
>> for discussion).
>>
>> The middle is the time  spent in that even t per second.  It accumulates
>> times (in microseconds), and then every 10s it stores it as smoothed (N=16)
>> value.
>> The Count is similar (except that we store a value of '100' as 1 event so
>> it can be still integers and has 2 decimal places).
>>
>> Every received poll  does a 64bit difference to 32bit (for the elapsed
>> time) and 64bit comparison (for end-of-period).
>> It also does 1x 32bit smoothing and 2x 32bit adds.
>>
>> Then at the end of a 10s period, it will do a 64bit add to get the next
>> end-of-period value, as well as the 2x 32bit smoothing calcs.
>>
>> This is from the Ioniq 5 so not any big values yet.  You can certainly
>> see how insignificant the TX callbacks are.
>> I'll leave it on for when the car is moving and gets some faster polling.
>>
>> OVMS# poll time status
>> Poller timing is: on
>> Type         | Count    | Ave time  | Ave Length
>>              | per s    | ms per s  | ms
>> -------------+----------+-----------+-----------
>> Poll:PRI     |      1.00|      0.540|      0.539
>> Poll:SRX     |      0.03|      0.004|      0.017
>> CAN1 RX[778] |      0.06|      0.042|      0.175
>> CAN1 TX[770] |      0.04|      0.002|      0.008
>> Cmd:State    |      0.01|      0.001|      0.005
>>
>> ----------------------8<--------------------------------
>>
>> Nice smoothing class (forces N as a power of 2):
>>   constexpr unsigned floorlog2(unsigned x)
>>     {
>>     return x == 1 ? 0 : 1+floorlog2(x >> 1);
>>     }
>>   /* Maintain a smoothed average using shifts for division.
>>    * T should be an integer type
>>    * N needs to be a power of 2
>>    */
>>   template <typename T, unsigned N>
>>   class average_util_t
>>     {
>>     private:
>>       T m_ave;
>>     public:
>>       average_util_t() : m_ave(0) {}
>>       static const uint8_t _BITS = floorlog2(N);
>>       void add( T val)
>>         {
>>         static_assert(N == (1 << _BITS), "N must be a power of 2");
>>         m_ave = (((N-1) * m_ave) + val) >> _BITS;
>>         }
>>       T get() { return m_ave; }
>>       operator T() { return m_ave; }
>>     };
>>
>>
>>
>> On Thu, 16 May 2024 at 10:29, Michael Geddes <frog at bunyip.wheelycreek.net>
>> wrote:
>>
>>> Thanks Michael,
>>>
>>> My calculations give me ((2^32)-1) / (1000*1000*3600) = only 1.2 hours
>>> of processing time in 32bit.  The initial subtraction is 64bit anyway and I
>>> can't see a further 64bit addition being a problem. I have the calculations
>>> being performed in doubles at print-out where performance is not really an
>>> issue anyway. (Though apparently doing 64 bit division is worse than
>>> floating point).
>>>
>>> In addition
>>> * I currently have this being able to be turned on and off and reset
>>> manually (only do it when required).
>>>
>>> *  For the lower volume commands, the smoothed average is not going to
>>> be useful - the count is more interesting for different reasons.
>>>
>>> * The total time is quite useful. Ie a high average time doesn't matter
>>> if the count is low.  The things that are affecting performance are stuff
>>> with high total time. Stuff which is happening 100 times a second needs to
>>> be a much lower average than once a second.
>>>
>>> * A measure like 'time per minute/second' and possibly count per
>>> minute/seconds as a smoothed average would potentially be more useful. (or
>>> in addition?)
>>> I think we could do _that_ in a reasonably efficient manner using a 64
>>> bit 'last measured time', a 32 bit accumulated value and the stored 32 bit
>>> rolling average.
>>> It would boils down to some iterative (integer) sums and multiplications
>>> plus a divide by n ^ (time periods passed)  - which is a shift - and which
>>> can be optimised to '0' if 'time-periods-passed' is more than
>>> 32/(bits-per-n) - effectively limiting the number of iterations.
>>> The one issue I can see is that we need to calculate 'number of
>>> time-periods passed' which is a 64 bit subtraction followed by a 32 bit
>>> division (not optimisable to a simple shift).
>>>
>>> * I'm also happy to keep a rolling (32bit) average time.
>>>    Even if you assume averages in the 100ms, 32bit is going to happily
>>> support an N of 64 or even 128.
>>>    Am I right in thinking that the choice of N is highly dependent on
>>> frequency. For things happening 100 times per second, you might want an N
>>> like 128.. where things happening once per
>>>    second, you might want an N of 4 or 8.  The other things we keep
>>> track of in this manner we have a better idea of the frequency of the thing.
>>>
>>> How about we have (per record type):
>>>   * total count (since last reset?) (32 bit)
>>>   * smoothed average of time per instance (32 bit)
>>>
>>>   * ?xx? total accumulated time since last reset (64bit) ?? <-- with the
>>> below stats this is much less useful
>>>
>>>   * last-measured-time (64 bit)
>>>   * accumulated count since last time-period (16bit - but maybe 32bit
>>> anyway for byte alignment?)
>>>   * smoothed average of count per time-period (32bit)
>>>   * accumulated time since last time-period (32bit)
>>>   * smoothed average of time per time-period (32bit)
>>>
>>> It's possible to keep the
>>>
>>> Is this going to be too much per record type?  The number of 'records'
>>> we are keeping is quite low (so 10 to 20 maybe) - so it's not a huge memory
>>> burden.
>>>
>>> Thoughts?
>>>
>>> //.ichael
>>> On Thu, 16 May 2024 at 03:09, Michael Balzer via OvmsDev <
>>> ovmsdev at lists.openvehicles.com> wrote:
>>>
>>>> esp_timer_get_time() is the right choice for precision timing.
>>>>
>>>> I'd say uint32 is enough though, even if counting microseconds that can
>>>> hold a total of more than 71 hours of actual processing time. uint64 has a
>>>> significant performance penalty, although I don't recall the overhead for
>>>> simple additions.
>>>>
>>>> Also & more important, the average wouldn't be my main focus, but the
>>>> maximum processing time seen per ID, which seems to be missing in your
>>>> draft.
>>>>
>>>> Second thought on the average… the exact overall average really has a
>>>> minor meaning, I'd rather see the current average, adapting to the current
>>>> mode of operation (drive/charge/…). I suggest feeding the measurements to a
>>>> low pass filter to get the smoothed average of the last n measurements.
>>>> Pattern:
>>>>
>>>> runavg = ((N-1) * runavg + newval) / N
>>>>
>>>> By using a low power of 2 for N (e.g. 8 or 16), you can replace the
>>>> division by a simple bit shift, and have enough headroom to use 32 bit
>>>> integers.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 15.05.24 um 06:51 schrieb Michael Geddes via OvmsDev:
>>>>
>>>> Formatting aside, I have implemented what I think Michael B was
>>>> suggesting.  This is a sample run on the Ioniq 5 (which doesn't have
>>>> unsolicited RX events).
>>>>
>>>> This uses the call esp_timer_get_time() got get a 64bit *microseconds*
>>>> since started value - and works out the time to execute that way. It's
>>>> looking at absolute time and not time in the Task - so other things going
>>>> on at the same time in other tasks will have an effect.  (The normal tick
>>>> count doesn't have nearly enough resolution to be useful - any other ideas
>>>> on measurement?) I've got total accumulated time displaying in seconds and
>>>> the average in milliseconds currently - but I can change that easy enough.
>>>> The cumulative time is stored as uint64_t which will be plenty, as
>>>> 32bit wouldn't be nearly enough.
>>>>
>>>> OVMS# *poller time on*
>>>> Poller timing is now on
>>>>
>>>> OVMS# *poller time status*
>>>> Poller timing is: on
>>>> Poll [PRI]               : n=390 tot=0.2s ave=0.586ms
>>>> Poll [SRX]               : n=316 tot=0.1s ave=0.196ms
>>>> CAN1 RX[0778]            : n=382 tot=0.2s ave=0.615ms
>>>> CAN1 RX[07a8]            : n=48 tot=0.0s ave=0.510ms
>>>> CAN1 RX[07bb]            : n=162 tot=0.1s ave=0.519ms
>>>> CAN1 RX[07ce]            : n=33 tot=0.0s ave=0.469ms
>>>> CAN1 RX[07ea]            : n=408 tot=0.2s ave=0.467ms
>>>> CAN1 RX[07ec]            : n=486 tot=0.2s ave=0.477ms
>>>> CAN3 RX[07df]            : n=769 tot=0.2s ave=0.261ms
>>>> CAN1 TX[0770]            : n=191 tot=0.0s ave=0.054ms
>>>> CAN1 TX[07a0]            : n=16 tot=0.0s ave=0.047ms
>>>> CAN1 TX[07b3]            : n=31 tot=0.0s ave=0.069ms
>>>> CAN1 TX[07c6]            : n=11 tot=0.0s ave=0.044ms
>>>> CAN1 TX[07e2]            : n=82 tot=0.0s ave=0.067ms
>>>> CAN1 TX[07e4]            : n=54 tot=0.0s ave=0.044ms
>>>> Set State                : n=7 tot=0.0s ave=0.104ms
>>>>
>>>> This is probably going to be quite useful in general!  The TX
>>>> call-backs don't seem to be significant here. (oh, I should probably
>>>> implement a reset of the values too).
>>>>
>>>> //.ichael
>>>>
>>>> On Sun, 12 May 2024 at 22:58, Michael Geddes <
>>>> frog at bunyip.wheelycreek.net> wrote:
>>>>
>>>>> Yeah - I certainly wasn't going to put a hard limit. Just a log above
>>>>> a certain time, that being said, the idea of just collecting stats (being
>>>>> able to turn it on via a  "poller timer" set of commands) would be much
>>>>> more useful.  I'll look into that.
>>>>>
>>>>> Average time is probably a good stat - and certainly what we care
>>>>> about.
>>>>>
>>>>> I actually am hopeful that those couple of things I did might help
>>>>> reduce that average time quite a bit (that short-cutting the isotp protocol
>>>>> handling especially).
>>>>>
>>>>> That p/r with logging changes might help reduce the unproductive log
>>>>> time further, but also makes it possible to turn on the poller logging
>>>>> without the RX task logs kicking in.
>>>>>
>>>>> //.ichael
>>>>>
>>>>>
>>>>> On Sun, 12 May 2024 at 22:29, Michael Balzer via OvmsDev <
>>>>> ovmsdev at lists.openvehicles.com> wrote:
>>>>>
>>>>>> Warning / gathering debug statistics about slow processing can be
>>>>>> helpful, but there must not be a hard limit. Frame/poll response processing
>>>>>> may need disk or network I/O, and the vehicle task may be starving from
>>>>>> punctual high loads on higher priority tasks (e.g. networking) or by
>>>>>> needing to wait for some semaphore -- that's outside the application's
>>>>>> control, and must not lead to termination/recreation of the task (in case
>>>>>> you're heading towards that direction).
>>>>>>
>>>>>> I have no idea how much processing time the current vehicles actually
>>>>>> need in their respective worst cases. Your draft is probably too lax, poll
>>>>>> responses and frames normally need to be processed much faster. I'd say 10
>>>>>> ms is already too slow, but any wait for a queue/semaphore will already
>>>>>> mean at least 10 ms (FreeRTOS tick). Probably best to begin with just
>>>>>> collecting stats.
>>>>>>
>>>>>> Btw, to help in narrowing down the actual problem case, the profiler
>>>>>> could collect max times per RX message ID.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 12.05.24 um 10:41 schrieb Michael Geddes:
>>>>>>
>>>>>> I have a question for Michael B  (or whoever) - I have a commit lined
>>>>>> up that would add a bit of a time check to the poller loop. What do we
>>>>>> expect the maximum time to execute a poller loop command should be?
>>>>>> This is a rough idea (in ms) I have.. based on nothing much really,
>>>>>> so any ideas would be appreciated:
>>>>>>     int TardyMaxTime_ms(OvmsPoller::OvmsPollEntryType entry_type)
>>>>>>       {
>>>>>>       switch (entry_type)
>>>>>>         {
>>>>>>         case OvmsPoller::OvmsPollEntryType::Poll: return 80;
>>>>>>         case OvmsPoller::OvmsPollEntryType::FrameRx: return 30;
>>>>>>         case OvmsPoller::OvmsPollEntryType::FrameTx: return 20;
>>>>>>         case OvmsPoller::OvmsPollEntryType::Command: return 10;
>>>>>>         case OvmsPoller::OvmsPollEntryType::PollState: return 15;
>>>>>>         default: return 80;
>>>>>>         }
>>>>>>       }
>>>>>>
>>>>>> //.ichael
>>>>>>
>>>>>> On Mon, 6 May 2024 at 07:45, Michael Geddes <
>>>>>> frog at bunyip.wheelycreek.net> wrote:
>>>>>>
>>>>>>> I realise that I was only using the standard cable to test - which
>>>>>>> probably is not sufficient - I haven't looked closely at how the Leaf OBD
>>>>>>> to Db9 cable is different from standard.
>>>>>>>
>>>>>>> Ah, my bad out the queue length.   We are definitely queueing more
>>>>>>> messages though.   From my log of when the overflow happened, the poller
>>>>>>> was in state 0 which means OFF - ie nothing was being sent!!
>>>>>>>
>>>>>>> I'll look at the TX message thing - opt in sounds good - though it
>>>>>>> shouldn't be playing that much of a part here as the TXs are infrequent in
>>>>>>> this case (or zero when the leaf is off or driving) - On the ioniq 5 when
>>>>>>> I'm using the HUD - I'm polling quite frequently - multiple times per
>>>>>>> second and that seems to be fine!.
>>>>>>>
>>>>>>> I did find an issue with the throttling .. but  it would still
>>>>>>> mostly apply the throttling where it matters, so again, it shouldn't be the
>>>>>>> problem (also, we aren't transmitting in the leaf case).
>>>>>>>
>>>>>>> The change I made to the logging of RX messages showed how many in a
>>>>>>> row were dropped... and it was mostly 1 only in a run - which means even if
>>>>>>> it is a short time between - that means that the drops are being
>>>>>>> interleaved by at least one success!
>>>>>>>
>>>>>>> Sooo.. I'm still wondering what is going on.  Some things I'm going
>>>>>>> to try:
>>>>>>>
>>>>>>> * If the number of messages on the Can bus (coming in through RX)
>>>>>>> means that the queue is slowly getting longer and not quite catching up,
>>>>>>> then making the queue longer will help it last longer... but only pushes
>>>>>>> the problem down the road.
>>>>>>>   - Add 'current queue length' to the poller status  information to
>>>>>>> see if this is indeed the case?
>>>>>>>   - Add some kind of alert when the queue reaches a % full?
>>>>>>> * Once you start overflowing and getting overflow log messages, I
>>>>>>> wonder if this is then contributing to the problem.
>>>>>>>   - Push the overflow logging into Poller Task which can look at how
>>>>>>> many drops occurred since last received item.
>>>>>>> * Split up the flags for the poller messages into 2:
>>>>>>>    - Messages that are/could be happening in the TX/RX tasks
>>>>>>>    - Other noisy messages that always happen in the poller task.
>>>>>>>
>>>>>>> Thoughts on what else we might measure to figure out what is going
>>>>>>> on?
>>>>>>>
>>>>>>> //.ichael
>>>>>>>
>>>>>>>
>>>>>>> On Sun, 5 May 2024, 19:29 Michael Balzer via OvmsDev, <
>>>>>>> ovmsdev at lists.openvehicles.com> wrote:
>>>>>>>
>>>>>>>> Michael,
>>>>>>>>
>>>>>>>> the queue size isn't in bytes, it's in messages:
>>>>>>>>
>>>>>>>>  * @param uxQueueLength The maximum number of items that the queue
>>>>>>>> can contain.
>>>>>>>>  *
>>>>>>>>  * @param uxItemSize The number of bytes each item in the queue
>>>>>>>> will require.
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, from the time stamps in Dereks log excerpt, there were quite
>>>>>>>> some dropped frames in that time window -- at least 23 frames in 40 ms,
>>>>>>>> that's bad.
>>>>>>>>
>>>>>>>> Queue sizes are currently:
>>>>>>>>
>>>>>>>> CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=60
>>>>>>>> CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=60
>>>>>>>>
>>>>>>>> The new poller now channels all TX callbacks through the task queue
>>>>>>>> additionally to RX and commands. So setting the queue size to be larger
>>>>>>>> than the CAN RX queue size seems appropriate.
>>>>>>>>
>>>>>>>> Nevertheless, an overflow with more than 60 waiting messages still
>>>>>>>> indicates some too long processing time in the vehicle task.
>>>>>>>>
>>>>>>>> TX callbacks previously were done directly in the CAN context, and
>>>>>>>> no current vehicle overrides the empty default handler, so this imposed
>>>>>>>> almost no additional overhead. By requiring a queue entry for each TX
>>>>>>>> callback, this feature now has a potentially high impact for all vehicles.
>>>>>>>> If passing these to the task is actually necessary, it needs to become an
>>>>>>>> opt-in feature, so only vehicles subscribing to the callback actually need
>>>>>>>> to cope with that additional load & potential processing delays involved.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>
>>>>>> _______________________________________________
>>>>>> OvmsDev mailing list
>>>>>> OvmsDev at lists.openvehicles.com
>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> OvmsDev mailing listOvmsDev at lists.openvehicles.comhttp://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>>
>>>> --
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>
> --
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20240519/9892d98c/attachment-0001.htm>