[Ovmsdev] OVMS Poller module/singleton

Sun May 26 20:35:11 HKT 2024

It _should_ already be sending a report on charge stop.

MyEvents.RegisterEvent(TAG, "vehicle.charge.stop", std::bind(&OvmsPollers::VehicleChargeStop, this, _1, _2));

Reset on charge start/vehicle on is a good idea.

A question – would it be silly if the first value after a reset, rather than using 0 average to start with, if the average got set to the initial value?  I’m in 2 minds about it. It would make the average more useful quicker.

//.ichael

On Sun, 26 May 2024, 19:39 Michael Balzer via OvmsDev, <ovmsdev at lists.openvehicles.com <mailto:ovmsdev at lists.openvehicles.com> > wrote:

As the averages quickly decline when idle, an automatic report should probably also be sent on charge stop.

Also, I think you should automatically reset the timer statistics on drive & charge start.

First stats from charging my UpMiiGo:

Type           | count  | Utlztn | Time 
               | per s  | [‰]    | [ms]
---------------+--------+--------+---------
Poll:PRI    Avg|    1.00|   0.723|    0.716
           Peak|        |   1.282|    3.822
---------------+--------+--------+---------
Poll:SRX    Avg|    7.72|   1.246|    0.184
           Peak|        |   3.128|    1.058
---------------+--------+--------+---------
RxCan1[7ae] Avg|    2.48|   0.915|    0.362
           Peak|        |   1.217|    1.661
---------------+--------+--------+---------
RxCan1[7cf] Avg|    4.76|   1.928|    0.397
           Peak|        |   2.317|    2.687
---------------+--------+--------+---------
RxCan1[7ed] Avg|    3.38|   1.251|    0.327
           Peak|        |   8.154|   12.273
---------------+--------+--------+---------
RxCan1[7ee] Avg|    0.21|   0.066|    0.297
           Peak|        |   0.225|    1.690
---------------+--------+--------+---------
TxCan1[744] Avg|    1.49|   0.022|    0.011
           Peak|        |   0.032|    0.095
---------------+--------+--------+---------
TxCan1[765] Avg|    3.89|   0.134|    0.027
           Peak|        |   0.155|    0.113
---------------+--------+--------+---------
TxCan1[7e5] Avg|    2.32|   0.038|    0.013
           Peak|        |   0.295|    0.084
---------------+--------+--------+---------
TxCan1[7e6] Avg|     1  0.21|   0.002|    0.008
           Peak|        |   0.010|    0.041
---------------+--------+--------+---------
Cmd:State   Avg|    0.00|   0.000|    0.007
           Peak|        |   0.005|    0.072
===============+========+========+=========
      Total Avg|   27.46|   6.324|    2.349

Overall healthy I'd say, but let's see how it compares.

7ed is the BMS, the peak time is probably related to the extended cell data logging -- I've enabled log intervals for both cell voltages & temperatures.

Regards,
Michael

Am 26.05.24 um 08:42 schrieb Michael Balzer via OvmsDev:

The notification works on my devices, it only has a garbled per mille character -- see attached screenshot. The same applies to the mail version:

Poller timing is: on
Type           | count  | Utlztn | Time 
               | per s  | [â€°]    | [ms]
---------------+--------+--------+---------
Poll:PRI    Avg|    0.25|   0.119|    0.382
           Peak|        |   0.513|    0.678
---------------+--------+--------+---------
RxCan1[597] Avg|    0.01|   0.004|    0.021
           Peak|        |   0.000|    0.338
---------------+--------+--------+---------
RxCan1[59b] Avg|    0.01|   0.011|    0.053
           Peak|        |   0.000|    0.848
---------------+--------+--------+---------
Cmd:State   Avg|    0.01|   0.002|    0.012
           Peak|        |   0.000|    0.120
===============+========+========+=========
      Total Avg|    0.28|   0.135|    0.468

The encoding is a general issue. The character encoding for text messages via V2/MP is quite old & clumsy, it's got an issue with the degree celcius character as well. We previously tried to keep all text messages within the SMS safe character set (which e.g. lead to writing just "C" instead of "°C"). I'd say we should head towards UTF-8 now. If we ever refit SMS support, we can recode on the fly.

Regarding not seeing the notification on your phone:

a) Check your notification subtype/channel filters on the module. See https://docs.openvehicles.com/en/latest/userguide/notifications.html#suppress-notifications

b) Check your notification vehicle filters on the phone (menu on notification tab): if you enabled the vehicle filter, it will add the messages of not currently selected vehicles to the list only, but not raise a system notification. (Applies to the Android App, no idea about the iOS version)

Regards,
Michael

Am 26.05.24 um 06:32 schrieb Michael Geddes:

Hi,

I'm trying to finalise this now .. and one last thing is that I don't get the report coming to my mobile. I'm using the command:

  MyNotify.NotifyString("info", "poller.report", buf.c_str());

Where the buffer string is just the same as the report output.  Should I be using some other format or command?

I get "alert" types (like the ioniq5 door-open alert) fine to my mobile.

Michael.

On Sun, 19 May 2024, 12:51 Michael Balzer via OvmsDev, <ovmsdev at lists.openvehicles.com <mailto:ovmsdev at lists.openvehicles.com> > wrote:

A builtin web UI for this seems a bit over the top. Builtin web config pages should focus on user features, this is clearly a feature only needed during/for the development/extension of a vehicle adapter. Development features in the web UI are confusing for end users.

If persistent enabling/disabling is done by a simple config command (e.g. "config set can poller.trace on"), that's also doable by users.

Regards,
Michael

Am 19.05.24 um 02:06 schrieb Michael Geddes:

I was so focused on how I calculated the value that I totally missed that ‰ would be a better description.  I could also use the  system 'Ratio' unit... so % or ‰.  

I'll make space to put 'Avg' on the row.  Was trying to limit the width for output on a mobile.  I agree it would make it easier to understand.

Totals also makes sense.

Should I make this a configuration that can be set on the web-page?   I'd probably use a configuration change notification so that the very bit setting is sync'd with the 'configuration' value.

//.ichael

On Sat, 18 May 2024, 14:05 Michael Balzer, <dexter at expeedo.de <mailto:dexter at expeedo.de> > wrote:

I'm not sure whether the 'max' should be the maximum of the smoothed value.. or the maximum of the raw value.

It should normally be the maximum of the raw value I think, the maximum of the smoothed value cannot tell about how bad the processing of an ID can become.

The naming in the table is a bit confusing I think. (besides: I've never seen "ave" as the abbreviation for average)

If I understand you correctly, "time ms per s" is the time share in per mille, so something in that direction would be more clear, and "length ms" would then be "time [ms]".

The totals for all averages in the table foot would also be nice.

Maybe "Ave" (or avg?) also should be placed on the left, as the "peak" label now suggests being the peak of the average.

Btw, keep in mind, not all "edge" users / testers are developers (e.g. the Twizy driver I'm in contact with), collecting stats feedback for vehicles from testers should be straight forward. Maybe add a data/history record, sent automatically on every drive/charge stop when the poller tracing is on?

Regards,
Michael

Am 18.05.24 um 02:28 schrieb Michael Geddes:

You did say max/pead value.  I also halved the N for both.

I'm not sure whether the 'max' should be the maximum of the smoothed value.. or the maximum of the raw value.

This is currently the raw-value maximum.  

The problem is that the middle column is the maximum of the {{sum over 10s} / (10*1000,000)

I could easily change the 'period' to 1s and see how that goes.. was just trying to reduce the larger calculations. 

Usage: poller [pause|resume|status|times|trace] 

OVMS# poller time status
Poller timing is: on
Type         | Count    | Ave time  | Ave length
             | per s    | ms per s  | ms 
-------------+----------+-----------+-----------
Poll:PRI     |      1.00|      0.559|      0.543
 peak        |          |      0.663|      1.528
-------------+----------+-----------+-----------
Poll:SRX     |      0.08|      0.009|      0.038
 peak        |          |      0.068|      0.146
-------------+----------+-----------+-----------
CAN1 RX[778] |      0.11|      0.061|      0.280
 peak        |          |      0.458|      1.046
-------------+----------+-----------+-----------
CAN1 RX[7a8] |      0.04|      0.024|      0.124
 peak        |          |      0.160|      0.615
-------------+----------+-----------+-----------
CAN1 TX[770] |      0.05|      0.004|      0.016
 peak        |          |      0.022|      0.102
-------------+----------+-----------+-----------
CAN1 TX[7a0] |      0.02|      0.002|      0.011
 peak        |          |      0.010|      0.098
-------------+----------+-----------+-----------
CAN1 TX[7b3] |      0.01|      0.001|      0.006
 peak        |          |      0.000|      0.099
-------------+----------+-----------+-----------
CAN1 TX[7e2] |      0.02|      0.002|      0.011
 peak        |          |      0.010|      0.099
-------------+----------+-----------+-----------
CAN1 TX[7e4] |      0.08|      0.008|      0.048
 peak        |          |      0.049|      0.107
-------------+----------+-----------+-----------
Cmd:State    |      0.00|      0.000|      0.005
 peak        |          |      0.000|      0.094

On Fri, 17 May 2024 at 15:26, Michael Geddes <frog at bunyip.wheelycreek.net <mailto:frog at bunyip.wheelycreek.net> > wrote:

This is what I have now.

The one on the end is the one  MIchael B was after using an N of 32. (up for discussion).

The middle is the time  spent in that even t per second.  It accumulates times (in microseconds), and then every 10s it stores it as smoothed (N=16) value.

The Count is similar (except that we store a value of '100' as 1 event so it can be still integers and has 2 decimal places).

Every received poll  does a 64bit difference to 32bit (for the elapsed time) and 64bit comparison (for end-of-period).

It also does 1x 32bit smoothing and 2x 32bit adds.

Then at the end of a 10s period, it will do a 64bit add to get the next end-of-period value, as well as the 2x 32bit smoothing calcs.

This is from the Ioniq 5 so not any big values yet.  You can certainly see how insignificant the TX callbacks are.

I'll leave it on for when the car is moving and gets some faster polling.

OVMS# poll time status
Poller timing is: on
Type         | Count    | Ave time  | Ave Length
             | per s    | ms per s  | ms 
-------------+----------+-----------+-----------
Poll:PRI     |      1.00|      0.540|      0.539
Poll:SRX     |      0.03|      0.004|      0.017
CAN1 RX[778] |      0.06|      0.042|      0.175
CAN1 TX[770] |      0.04|      0.002|      0.008
Cmd:State    |      0.01|      0.001|      0.005

----------------------8<--------------------------------

Nice smoothing class (forces N as a power of 2):

  constexpr unsigned floorlog2(unsigned x)
    {
    return x == 1 ? 0 : 1+floorlog2(x >> 1);
    }
  /* Maintain a smoothed average using shifts for division.
   * T should be an integer type
   * N needs to be a power of 2
   */
  template <typename T, unsigned N>
  class average_util_t
    {
    private:
      T m_ave;
    public:
      average_util_t() : m_ave(0) {}
      static const uint8_t _BITS = floorlog2(N);
      void add( T val)
        {
        static_assert(N == (1 << _BITS), "N must be a power of 2");
        m_ave = (((N-1) * m_ave) + val) >> _BITS;
        }
      T get() { return m_ave; }
      operator T() { return m_ave; }
    };

On Thu, 16 May 2024 at 10:29, Michael Geddes <frog at bunyip.wheelycreek.net <mailto:frog at bunyip.wheelycreek.net> > wrote:

Thanks Michael,

My calculations give me ((2^32)-1) / (1000*1000*3600) = only 1.2 hours of processing time in 32bit.  The initial subtraction is 64bit anyway and I can't see a further 64bit addition being a problem. I have the calculations being performed in doubles at print-out where performance is not really an issue anyway. (Though apparently doing 64 bit division is worse than floating point). 

In addition

* I currently have this being able to be turned on and off and reset manually (only do it when required).

*  For the lower volume commands, the smoothed average is not going to be useful - the count is more interesting for different reasons.

* The total time is quite useful. Ie a high average time doesn't matter if the count is low.  The things that are affecting performance are stuff with high total time. Stuff which is happening 100 times a second needs to be a much lower average than once a second.

* A measure like 'time per minute/second' and possibly count per minute/seconds as a smoothed average would potentially be more useful. (or in addition?)

I think we could do _that_ in a reasonably efficient manner using a 64 bit 'last measured time', a 32 bit accumulated value and the stored 32 bit rolling average. 

It would boils down to some iterative (integer) sums and multiplications plus a divide by n ^ (time periods passed)  - which is a shift - and which can be optimised to '0' if 'time-periods-passed' is more than 32/(bits-per-n) - effectively limiting the number of iterations.

The one issue I can see is that we need to calculate 'number of time-periods passed' which is a 64 bit subtraction followed by a 32 bit division (not optimisable to a simple shift).

* I'm also happy to keep a rolling (32bit) average time.

   Even if you assume averages in the 100ms, 32bit is going to happily support an N of 64 or even 128.

   Am I right in thinking that the choice of N is highly dependent on frequency. For things happening 100 times per second, you might want an N like 128.. where things happening once per

   second, you might want an N of 4 or 8.  The other things we keep track of in this manner we have a better idea of the frequency of the thing.

How about we have (per record type):

  * total count (since last reset?) (32 bit)

  * smoothed average of time per instance (32 bit)

  * ?xx? total accumulated time since last reset (64bit) ?? <-- with the below stats this is much less useful

  * last-measured-time (64 bit) 

  * accumulated count since last time-period (16bit - but maybe 32bit anyway for byte alignment?)

  * smoothed average of count per time-period (32bit)

  * accumulated time since last time-period (32bit)

  * smoothed average of time per time-period (32bit)

It's possible to keep the 

Is this going to be too much per record type?  The number of 'records' we are keeping is quite low (so 10 to 20 maybe) - so it's not a huge memory burden.

Thoughts?

//.ichael

On Thu, 16 May 2024 at 03:09, Michael Balzer via OvmsDev <ovmsdev at lists.openvehicles.com <mailto:ovmsdev at lists.openvehicles.com> > wrote:

esp_timer_get_time() is the right choice for precision timing.

I'd say uint32 is enough though, even if counting microseconds that can hold a total of more than 71 hours of actual processing time. uint64 has a significant performance penalty, although I don't recall the overhead for simple additions.

Also & more important, the average wouldn't be my main focus, but the maximum processing time seen per ID, which seems to be missing in your draft.

Second thought on the average… the exact overall average really has a minor meaning, I'd rather see the current average, adapting to the current mode of operation (drive/charge/…). I suggest feeding the measurements to a low pass filter to get the smoothed average of the last n measurements. Pattern:

runavg = ((N-1) * runavg + newval) / N

By using a low power of 2 for N (e.g. 8 or 16), you can replace the division by a simple bit shift, and have enough headroom to use 32 bit integers.

Regards,
Michael

Am 15.05.24 um 06:51 schrieb Michael Geddes via OvmsDev:

Formatting aside, I have implemented what I think Michael B was suggesting.  This is a sample run on the Ioniq 5 (which doesn't have unsolicited RX events). 

This uses the call esp_timer_get_time() got get a 64bit microseconds since started value - and works out the time to execute that way. It's looking at absolute time and not time in the Task - so other things going on at the same time in other tasks will have an effect.  (The normal tick count doesn't have nearly enough resolution to be useful - any other ideas on measurement?) I've got total accumulated time displaying in seconds and the average in milliseconds currently - but I can change that easy enough.

The cumulative time is stored as uint64_t which will be plenty, as 32bit wouldn't be nearly enough.

OVMS# poller time on
Poller timing is now on

OVMS# poller time status
Poller timing is: on
Poll [PRI]               : n=390 tot=0.2s ave=0.586ms
Poll [SRX]               : n=316 tot=0.1s ave=0.196ms
CAN1 RX[0778]            : n=382 tot=0.2s ave=0.615ms
CAN1 RX[07a8]            : n=48 tot=0.0s ave=0.510ms
CAN1 RX[07bb]            : n=162 tot=0.1s ave=0.519ms
CAN1 RX[07ce]            : n=33 tot=0.0s ave=0.469ms
CAN1 RX[07ea]            : n=408 tot=0.2s ave=0.467ms
CAN1 RX[07ec]            : n=486 tot=0.2s ave=0.477ms
CAN3 RX[07df]            : n=769 tot=0.2s ave=0.261ms
CAN1 TX[0770]            : n=191 tot=0.0s ave=0.054ms
CAN1 TX[07a0]            : n=16 tot=0.0s ave=0.047ms
CAN1 TX[07b3]            : n=31 tot=0.0s ave=0.069ms
CAN1 TX[07c6]            : n=11 tot=0.0s ave=0.044ms
CAN1 TX[07e2]            : n=82 tot=0.0s ave=0.067ms
CAN1 TX[07e4]            : n=54 tot=0.0s ave=0.044ms
Set State                : n=7 tot=0.0s ave=0.104ms

This is probably going to be quite useful in general!  The TX call-backs don't seem to be significant here. (oh, I should probably implement a reset of the values too).

//.ichael

On Sun, 12 May 2024 at 22:58, Michael Geddes <frog at bunyip.wheelycreek.net <mailto:frog at bunyip.wheelycreek.net> > wrote:

Yeah - I certainly wasn't going to put a hard limit. Just a log above a certain time, that being said, the idea of just collecting stats (being able to turn it on via a  "poller timer" set of commands) would be much more useful.  I'll look into that.

Average time is probably a good stat - and certainly what we care about.

I actually am hopeful that those couple of things I did might help reduce that average time quite a bit (that short-cutting the isotp protocol handling especially). 

That p/r with logging changes might help reduce the unproductive log time further, but also makes it possible to turn on the poller logging without the RX task logs kicking in.

//.ichael

On Sun, 12 May 2024 at 22:29, Michael Balzer via OvmsDev <ovmsdev at lists.openvehicles.com <mailto:ovmsdev at lists.openvehicles.com> > wrote:

Warning / gathering debug statistics about slow processing can be helpful, but there must not be a hard limit. Frame/poll response processing may need disk or network I/O, and the vehicle task may be starving from punctual high loads on higher priority tasks (e.g. networking) or by needing to wait for some semaphore -- that's outside the application's control, and must not lead to termination/recreation of the task (in case you're heading towards that direction).

I have no idea how much processing time the current vehicles actually need in their respective worst cases. Your draft is probably too lax, poll responses and frames normally need to be processed much faster. I'd say 10 ms is already too slow, but any wait for a queue/semaphore will already mean at least 10 ms (FreeRTOS tick). Probably best to begin with just collecting stats.

Btw, to help in narrowing down the actual problem case, the profiler could collect max times per RX message ID.

Regards,
Michael

Am 12.05.24 um 10:41 schrieb Michael Geddes:

I have a question for Michael B  (or whoever) - I have a commit lined up that would add a bit of a time check to the poller loop. What do we expect the maximum time to execute a poller loop command should be? 

This is a rough idea (in ms) I have.. based on nothing much really, so any ideas would be appreciated:

    int TardyMaxTime_ms(OvmsPoller::OvmsPollEntryType entry_type)
      {
      switch (entry_type)
        {
        case OvmsPoller::OvmsPollEntryType::Poll: return 80;
        case OvmsPoller::OvmsPollEntryType::FrameRx: return 30;
        case OvmsPoller::OvmsPollEntryType::FrameTx: return 20;
        case OvmsPoller::OvmsPollEntryType::Command: return 10;
        case OvmsPoller::OvmsPollEntryType::PollState: return 15;
        default: return 80;
        }
      }

//.ichael

On Mon, 6 May 2024 at 07:45, Michael Geddes <frog at bunyip.wheelycreek.net <mailto:frog at bunyip.wheelycreek.net> > wrote:

I realise that I was only using the standard cable to test - which probably is not sufficient - I haven't looked closely at how the Leaf OBD to Db9 cable is different from standard.

Ah, my bad out the queue length.   We are definitely queueing more messages though.   From my log of when the overflow happened, the poller was in state 0 which means OFF - ie nothing was being sent!!

I'll look at the TX message thing - opt in sounds good - though it shouldn't be playing that much of a part here as the TXs are infrequent in this case (or zero when the leaf is off or driving) - On the ioniq 5 when I'm using the HUD - I'm polling quite frequently - multiple times per second and that seems to be fine!.

I did find an issue with the throttling .. but  it would still mostly apply the throttling where it matters, so again, it shouldn't be the problem (also, we aren't transmitting in the leaf case).

The change I made to the logging of RX messages showed how many in a row were dropped... and it was mostly 1 only in a run - which means even if it is a short time between - that means that the drops are being interleaved by at least one success!

Sooo.. I'm still wondering what is going on.  Some things I'm going to try:

* If the number of messages on the Can bus (coming in through RX) means that the queue is slowly getting longer and not quite catching up, then making the queue longer will help it last longer... but only pushes the problem down the road.

  - Add 'current queue length' to the poller status  information to see if this is indeed the case?  

  - Add some kind of alert when the queue reaches a % full?

* Once you start overflowing and getting overflow log messages, I wonder if this is then contributing to the problem.

  - Push the overflow logging into Poller Task which can look at how many drops occurred since last received item.

* Split up the flags for the poller messages into 2:

   - Messages that are/could be happening in the TX/RX tasks

   - Other noisy messages that always happen in the poller task.

Thoughts on what else we might measure to figure out what is going on?

//.ichael

On Sun, 5 May 2024, 19:29 Michael Balzer via OvmsDev, <ovmsdev at lists.openvehicles.com <mailto:ovmsdev at lists.openvehicles.com> > wrote:

Michael,

the queue size isn't in bytes, it's in messages:

 * @param uxQueueLength The maximum number of items that the queue can contain.
 *
 * @param uxItemSize The number of bytes each item in the queue will require.

Also, from the time stamps in Dereks log excerpt, there were quite some dropped frames in that time window -- at least 23 frames in 40 ms, that's bad.

Queue sizes are currently:

CONFIG_OVMS_HW_CAN_RX_QUEUE_SIZE=60
CONFIG_OVMS_VEHICLE_CAN_RX_QUEUE_SIZE=60

The new poller now channels all TX callbacks through the task queue additionally to RX and commands. So setting the queue size to be larger than the CAN RX queue size seems appropriate.

Nevertheless, an overflow with more than 60 waiting messages still indicates some too long processing time in the vehicle task.

TX callbacks previously were done directly in the CAN context, and no current vehicle overrides the empty default handler, so this imposed almost no additional overhead. By requiring a queue entry for each TX callback, this feature now has a potentially high impact for all vehicles. If passing these to the task is actually necessary, it needs to become an opt-in feature, so only vehicles subscribing to the callback actually need to cope with that additional load & potential processing delays involved.

Regards,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20240526/d98ab9db/attachment-0001.htm>