[Ovmsdev] Reboot under some load

Tue Nov 1 04:20:32 HKT 2022

Ludovic,

Am 31.10.22 um 10:59 schrieb Ludovic LANGE:
>
> (I'm reposting because I had the impression that my message didn't get 
> through. If it appears as a duplicate, please forgive me - and delete 
> the double post if necessary. Still learning how to handle this delay 
> between post and list visibility (moderation ?))
>
You're right, didn't get through, but there is no moderation. Checked 
your junk folder for an error message? Possibly Mark can see something 
in the logs.

> Metrics are properly generated (from DBC), and properly displayed on 
> the dashboard. However, the combination of the "intense" bus traffic, 
> + number of generated metrics seems to be, in some way, overflowing 
> the capacity of the WebSocketHandler, which results in a reboot from 
> time to time:
>
>> W (5111095) websocket: WebSocketHandler[0x3f8d1654]: job queue 
>> overflow resolved, 14 drops
>> W (5111095) websocket: WebSocketHandler[0x3f8d1654]: job queue 
>> overflow detected
>> I (5111105) metrics: Modified metric v.g.current: 0A
>> I (5111105) metrics: Modified metric v.m.rpm: 763
>> I (5111115) metrics: Modified metric v.i.temp: 34.1°C
>> W (5111115) websocket: WebSocketHandler[0x3f8d1654]: job queue 
>> overflow detected
>> W (5111125) websocket: WebSocketHandler[0x3f8d1654]: job queue 
>> overflow detected

A WebSocket client channel can jam easily if it can't transmit the data 
to the client fast enough. This doesn't depend on the actual Wifi 
connection quality alone, but also on the processing speed of the client 
device. My impression is, complex and fast chart updates can cause the 
Javascript engine needing to do a lot of memory management work.

I haven't had the time to do an analysis on this, but I'm pretty sure 
there are options to reduce the load. The dashboard & chart data 
processing is still my first implementation, I didn't invest much time 
in optimization on that. For example, every new data series is a new 
allocation, so the garbage collector has quite some work to do.

Having said that, you should also try to reduce the data volume. From 
your logs it seems you've got metrics tracing enabled. That produces a 
log message on every metrics update, and all log messages are 
transmitted via the WebSocket channel.

>> E (5111845) task_wdt: Tasks currently running:
>> E (5111845) task_wdt: CPU 0: wifi
>> E (5111845) task_wdt: CPU 1: OVMS Console

If you didn't execute a command on the console at that moment, that's 
probably also an indicator for a high log load.

> Please note that the Lab setup has:
>
>   * OVMS connected to the Lab network
>   * The computer (displaying the dashboard) also connected to the Lab
>     network
>
> (While, in the car, the computer / tablet would be directly connected 
> to OVMS' wifi)
>

Shouldn't make much of a difference. But you could try configuring just 
Wifi client or ap mode, not both, depending on the setup. AP is running 
on the same channel, so might cut off some capacity.

> That's it for the context, now a few questions:
>
>   * As I don't know about the capabilities of the OVMS for CAN bus
>     traffic analysis, does it looks like the number / frequency of
>     messages I'm injecting is unreasonable ?
>
No.

>   * It seems like there is a buffering / consolidation of the metrics
>     before sending them to the web socket ; is this tweakable in some
>     way ?
>
Metrics updates are initiated by the web client update ticket every 250 
ms. You can experiment with changing the interval or make that a 
configuration if you like, but I had bad results with higher frequencies 
by producing too much load on the smartphones tested, and lower 
frequencies are bad for a smooth UI experience.

Regarding the queue overflow you might experiment with raising the queue 
size, which is currently 50 jobs. But if 50 tx jobs are reached, chances 
are you've got Wifi or client capacity issues.

>   * Does the DBC processor add a significant processing time (compared
>     to a dedicated vehicle module) when processing CAN data ?
>
Don't know, haven't used the DBC processor for real data.

>  *
>
>
>   * What would be the best way to diagnose / confirm the health of the
>     processes involved here ?
>
Use the task monitoring (module tasks) to check the CPU load of your 
processes.

Reduce any unnecessary load, for example avoid excessive logging, user 
event creation, file writes and especially SD card accesses, these can 
be very slow, see my warning here: 
https://docs.openvehicles.com/en/latest/userguide/scripting.html#vfs

Use the browser developer tools to analyse client performance. Btw, you 
can see the actual websocket packets when opening the network monitor 
before opening the web UI.

>  *
>   * any similar use case / feedback from you ?
>
>
> Thanks for any feedback.
>

Regards,
Michael

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20221031/67b1612f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20221031/67b1612f/attachment.sig>