<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Sorry, I didn't think about this being
an issue elsewhere -- german data plans typically start at minimum
100 MB/month flat (that's my current plan at 3 € / month).<br>
<br>
No need for a new release, it can be turned off OTA by issueing<br>
<br>
<tt>config set module debug.tasks no</tt><br>
<br>
Regards,<br>
Michael<br>
<br>
<br>
Am 19.09.19 um 09:08 schrieb Mark Webb-Johnson:<br>
</div>
<blockquote type="cite"
cite="mid:BFED773E-0CFE-4EBB-A3B4-9B44FE4A783D@webb-johnson.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Yep:
<div class=""><br class="">
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;" class="">
<div class="">758 bytes * (86400 / 300) * 30 = 6.5MB/month</div>
</blockquote>
<div class="">
<div><br class="">
</div>
<div>That is going over data (not SD). Presumably cellular data
for a large portion of the time.</div>
<div><br class="">
</div>
<div>I think we need to default this to OFF, and make a 3.2.004
to avoid this becoming an issue.</div>
<div><br class="">
</div>
<div>Regards, Mark.</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 19 Sep 2019, at 2:04 PM, Stephen Casner
<<a href="mailto:casner@acm.org" class=""
moz-do-not-send="true">casner@acm.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">That's 6.55MB/month, unless you have
unusually short months! :-)<br class="">
<br class="">
In what space is that data stored? A log written to SD?
That's not<br class="">
likely to fill up the SD card too fast, but what happens
if no SD card<br class="">
is installed?<br class="">
<br class="">
-- Steve<br
class="">
<br class="">
On Thu, 19 Sep 2019, Mark Webb-Johnson wrote:<br
class="">
<br class="">
<blockquote type="cite" class="">
<blockquote type="cite" class=""> To enable CPU
usage statistics, apply the changes to sdkconfig<br
class="">
included.<br class="">
New history record:<br class="">
- "*-OVM-DebugTasks" v1:
<taskcnt,totaltime> + per task:<br class="">
<tasknum,name,state,stack_now,stack_max,stack_total,<br
class="">
heap_total,heap_32bit,heap_spi,runtime><br
class="">
Note: CPU core use percentage = runtime /
totaltime<br class="">
</blockquote>
<br class="">
I’ve just noticed that this is enabled by default now
(my production build has the sdkconfig updated, as per
defaults).<br class="">
<br class="">
I am seeing 758 bytes of history record, every 5
minutes. About 218KB/day, or 654KB/month.<br class="">
<br class="">
Should this be opt-in?<br class="">
<br class="">
Regards, Mark.<br class="">
<br class="">
<blockquote type="cite" class="">On 8 Sep 2019, at
5:43 PM, Michael Balzer <<a
href="mailto:dexter@expeedo.de" class=""
moz-do-not-send="true">dexter@expeedo.de</a>>
wrote:<br class="">
<br class="">
I've pushed some modifications and improvements to
(hopefully) fix the timer issue or at least be able
to debug it.<br class="">
<br class="">
Some sdkconfig changes are necessary.<br class="">
<br class="">
The build including these updates is on my edge
release as 3.2.002-258-g20ae554b.<br class="">
<br class="">
Btw: the network restart strategy seems to mitigate
issue #241; I've seen a major drop on record
repetitions on my server since the rollout.<br
class="">
<br class="">
<br class="">
commit 99e4e48bdd40b7004c0976f51aba9e3da4ecab53<br
class="">
<br class="">
Module: add per task CPU usage statistics, add
task stats history records<br class="">
<br class="">
To enable CPU usage statistics, apply the changes
to sdkconfig<br class="">
included. The CPU usage shown by the commands is
calculated against<br class="">
the last task status retrieved (or system boot).<br
class="">
<br class="">
Command changes:<br class="">
- "module tasks" -- added CPU (core) usage in
percent per task<br class="">
<br class="">
New command:<br class="">
- "module tasks data" -- output task stats in
history record form<br class="">
<br class="">
New config:<br class="">
- [module] debug.tasks -- yes (default) = send
task stats every 5 minutes<br class="">
<br class="">
New history record:<br class="">
- "*-OVM-DebugTasks" v1:
<taskcnt,totaltime> + per task:<br class="">
<tasknum,name,state,stack_now,stack_max,stack_total,<br
class="">
heap_total,heap_32bit,heap_spi,runtime><br
class="">
Note: CPU core use percentage = runtime /
totaltime<br class="">
<br class="">
commit 950172c216a72beb4da0bc7a40a46995a6105955<br
class="">
<br class="">
Build config: default timer service task priority
raised to 20<br class="">
<br class="">
Background: the FreeRTOS timer service shall only
be used for very<br class="">
short and non-blocking jobs. We delegate event
processing to our<br class="">
events task, anything else in timers needs to
run with high<br class="">
priority.<br class="">
<br class="">
commit 31ac19d187480046c16356b80668de45cacbb83d<br
class="">
<br class="">
DukTape: add build config for task priority,
default lowered to 3<br class="">
<br class="">
Background: the DukTape garbage collector shall
run on lower<br class="">
priority than tasks like SIMCOM & events<br
class="">
<br class="">
commit e0a44791fbcfb5a4e4cad24c9d1163b76e637b4f<br
class="">
<br class="">
Server V2: use esp_log_timestamp for timeout
detection,<br class="">
add timeout config, limit data records &
size per second<br class="">
<br class="">
New config:<br class="">
- [server.v2] timeout.rx -- timeout in seconds,
default 960<br class="">
<br class="">
commit 684a4ce9525175a910040f0d1ca82ac212fbf5de<br
class="">
<br class="">
Notify: use esp_log_timestamp for creation time
instead of monotonictime<br class="">
to harden against timer service starvation /
ticker event drops<br class="">
<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 07.09.19 um 10:55 schrieb Michael Balzer:<br
class="">
<blockquote type="cite" class="">I think the RTOS
timer service task starves. It's running on core 0
with priority 1.<br class="">
<br class="">
Taks on core 0 sorted by priority:<br class="">
<br class="">
Number of Tasks = 20 Stack: Now Max Total
Heap 32-bit SPIRAM C# PRI<br class="">
3FFC84A8 6 Blk ipc0 388 500 1024
7788 0 0 0 24<br class="">
3FFC77F0 5 Blk OVMS CanRx 428 428 2048
3052 0 31844 0 23<br class="">
3FFAFBF4 1 Blk esp_timer 400 656 4096
35928 644 25804 0 22<br class="">
3FFD3240 19 Blk wifi 460 2716 3584
43720 0 20 0 22<br class="">
3FFC03C4 2 Blk eventTask 448 1984 4608
104 0 0 0 20<br class="">
3FFC8F14 17 Blk tiT 500 2308 3072
6552 0 0 * 18<br class="">
3FFE14F0 26 Blk OVMS COrx 456 456 4096
0 0 0 0 7<br class="">
3FFE19D4 27 Blk OVMS COwrk 476 476 3072
0 0 0 0 7<br class="">
3FFCBC34 12 Blk Tmr Svc 352 928 3072
88 0 0 0 1<br class="">
3FFE7708 23 Blk mdns 468 1396 4096
108 0 0 0 1<br class="">
<br class="">
I don't think it's our CanRx, as that only fetches
and queues CAN frames, the actual work is done by
the listeners. The CO tasks only run for CANopen
jobs, which are few for normal operation.<br
class="">
<br class="">
That leaves the system tasks, with main suspect
-once again- the wifi blob.<br class="">
<br class="">
We need to know how much CPU time the tasks
actually use now. I think I saw some option for
this in the FreeRTOS config.<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 06.09.19 um 23:15 schrieb Michael Balzer:<br
class="">
<blockquote type="cite" class="">The workaround is
based on the monotonictime being updated per
second, as do the history record offsets.<br
class="">
<br class="">
Apparently, that mechanism doesn't work
reliably. That may be an indicator for some
bigger underlying issue.<br class="">
<br class="">
Example log excerpt:<br class="">
<br class="">
2019-09-06 22:07:48.126919 +0200 info main:
#173 C MITPROHB rx msg h
964,0,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:03.089031 +0200 info main:
#173 C MITPROHB rx msg h
964,-10,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:05.041574 +0200 info main:
#173 C MITPROHB rx msg h
964,-20,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:05.052644 +0200 info main:
#173 C MITPROHB rx msg h
964,-30,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:05.063617 +0200 info main:
#173 C MITPROHB rx msg h
964,-49,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:05.077527 +0200 info main:
#173 C MITPROHB rx msg h
964,-59,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:05.193775 +0200 info main:
#173 C MITPROHB rx msg h
964,-70,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:13.190645 +0200 info main:
#173 C MITPROHB rx msg h
964,-80,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:22.077994 +0200 info main:
#173 C MITPROHB rx msg h
964,-90,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:09:54.590300 +0200 info main:
#173 C MITPROHB rx msg h
964,-109,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:11:10.127054 +0200 info main:
#173 C MITPROHB rx msg h
964,-119,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:11:16.794200 +0200 info main:
#173 C MITPROHB rx msg h
964,-130,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:11:22.455652 +0200 info main:
#173 C MITPROHB rx msg h
964,-140,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:12:49.423412 +0200 info main:
#173 C MITPROHB rx msg h
964,-150,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:12:49.442096 +0200 info main:
#173 C MITPROHB rx msg h
964,-169,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:12:49.461941 +0200 info main:
#173 C MITPROHB rx msg h
964,-179,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:14:39.828133 +0200 info main:
#173 C MITPROHB rx msg h
964,-190,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:14:39.858144 +0200 info main:
#173 C MITPROHB rx msg h
964,-200,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:14:52.020319 +0200 info main:
#173 C MITPROHB rx msg h
964,-210,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:14:54.452637 +0200 info main:
#173 C MITPROHB rx msg h
964,-229,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:15:12.613935 +0200 info main:
#173 C MITPROHB rx msg h
964,-239,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:15:35.223845 +0200 info main:
#173 C MITPROHB rx msg h
964,-250,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:16:09.255059 +0200 info main:
#173 C MITPROHB rx msg h
964,-260,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:17:31.919754 +0200 info main:
#173 C MITPROHB rx msg h
964,-270,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:19:23.366267 +0200 info main:
#173 C MITPROHB rx msg h
964,-289,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:21:57.344609 +0200 info main:
#173 C MITPROHB rx msg h
964,-299,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:23:40.082406 +0200 info main: #31
C MITPROHB rx msg h
964,-1027,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
2019-09-06 22:25:58.061883 +0200 info main: #31
C MITPROHB rx msg h
964,-1040,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0<br
class="">
<br class="">
<br class="">
This shows the ticker was only run 299 times
from 22:07:48 to 22:21:57.<br class="">
<br class="">
After 22:21:57 the workaround was triggered and
did a reconnect. Apparently during that network
reinitialization of 103 seconds, the per second
ticker was run 628 times.<br class="">
<br class="">
That can't be catching up on the event queue, as
that queue has only 20 slots. So something
strange is going on here.<br class="">
<br class="">
Any ideas?<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 06.09.19 um 08:04 schrieb Michael Balzer:<br
class="">
<blockquote type="cite" class="">Mark &
anyone else running a V2 server,<br class="">
<br class="">
as most cars don't send history records, this
also needs the change to the server I just
pushed, i.e. server version 2.4.2.<br class="">
<br class="">
<a
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master"
class="" moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master</a>
<<a
href="https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master"
class="" moz-do-not-send="true">https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master</a>><br
class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 05.09.19 um 19:55 schrieb Michael Balzer:<br
class="">
<blockquote type="cite" class="">I've pushed
the nasty workaround: the v2 server checks
for no RX over 15 minutes, then restarts the
network (wifi & modem) as configured for
autostart.<br class="">
<br class="">
Rolled out on my server in edge as
3.2.002-237-ge075f655.<br class="">
<br class="">
Please test.<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 05.09.19 um 01:58 schrieb Mark
Webb-Johnson:<br class="">
<blockquote type="cite" class="">
<blockquote type="cite" class="">Mark, you
can check your server logs for history
messages with ridiculous time offsets:<br
class="">
[sddexter@ns27 server]$ cat log-20190903
| egrep "rx msg h [0-9]+,-[0-9]{4}" | wc
-l<br class="">
455283<br class="">
</blockquote>
<br class="">
I checked my logs and see 12 vehicles
showing this. But, 2 only show this for a
debugcrash log (which is expected, I
guess, if the time is not synced at report
time). I’ve got 4 cars with the offset
> 10,000.<br class="">
<br class="">
Regards, Mark.<br class="">
<br class="">
<blockquote type="cite" class="">On 4 Sep
2019, at 4:45 AM, Michael Balzer <<a
href="mailto:dexter@expeedo.de"
class="" moz-do-not-send="true">dexter@expeedo.de</a>
<<a href="mailto:dexter@expeedo.de"
class="" moz-do-not-send="true">mailto:dexter@expeedo.de</a>>>
wrote:<br class="">
<br class="">
Everyone,<br class="">
<br class="">
I've pushed a change that needs some
testing.<br class="">
<br class="">
I had the issue myself now parking at a
certain distance from my garage wifi AP,
i.e. on the edge of "in", after wifi had
been disconnected for some hours, and
with the module still connected via
modem. The wifi blob had been trying to
connect to the AP for about two hours.<br
class="">
<br class="">
As seen before, the module saw no error,
just the server responses and commands
stopped coming in. I noticed the default
interface was still "st1" despite wifi
having been disconnected and modem
connected. The DNS was also still
configured for my wifi network, and the
interface seemed to have an IP address
-- but wasn't pingable from the wifi
network.<br class="">
<br class="">
A power cycle of the modem solved the
issue without reboot. So the cause may
be in the modem/ppp subsystem, or it may
be related (in some weird way) to the
default interface / DNS setup.<br
class="">
<br class="">
More tests showed the default interface
again/still got set by the wifi blob
itself at some point, overriding our
modem prioritization. The events we
didn't handle up to now were
"sta.connected" and "sta.lostip", so I
added these, and the bug didn't show up
again since then. That doesn't mean
anything, so we need to test this.<br
class="">
<br class="">
The default interface really shouldn't
affect inbound packet routing of an
established connection, but there always
may be strange bugs lurking in those
libs.<br class="">
<br class="">
The change also reimplements the wifi
signal strength reading, as the tests
also showed that still wasn't working
well using the CSI callback. It now
seems to be much more reliable.<br
class="">
<br class="">
Please test & report. The single
module will be hard to test, as the bug
isn't reproducable easily, but you can
still try if wifi / modem transitions
work well.<br class="">
<br class="">
Mark, you can check your server logs for
history messages with ridiculous time
offsets:<br class="">
[sddexter@ns27 server]$ cat log-20190903
| egrep "rx msg h [0-9]+,-[0-9]{4}" | wc
-l<br class="">
455283<br class="">
The bug now severely affects the V2
server performance, as the server is
single threaded and doesn't scale very
well to this kind of bulk data bursts,
especially when coming from multiple
modules in parallel. So we really need
to solve this now. Slow reactions or
connection drops from my server lately
have been due to this bug. If this
change doesn't solve it, we'll need to
add some reboot trigger on "too many
server v2 notification retransmissions"
-- or maybe a modem power cycle will do,
that wouldn't discard the data.<br
class="">
<br class="">
Thanks,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 03.09.19 um 07:46 schrieb Mark
Webb-Johnson:<br class="">
<blockquote type="cite" class="">No
problem. We can hold. I won’t commit
anything for the next few days (and
agree to hold-off on Markos’s pull).
Let me know when you are ready.<br
class="">
<br class="">
Regards, Mark.<br class="">
<br class="">
<blockquote type="cite" class="">On 3
Sep 2019, at 1:58 AM, Michael Balzer
<<a
href="mailto:dexter@expeedo.de"
class="" moz-do-not-send="true">dexter@expeedo.de</a>>
<<a
href="mailto:dexter@expeedo.de"
class="" moz-do-not-send="true">mailto:dexter@expeedo.de</a>>
wrote:<br class="">
<br class="">
Mark, please wait.<br class="">
<br class="">
I may just have found the cause for
issue #241, or at least something I
need to investigate before
releasing.<br class="">
<br class="">
I need to dig into my logs first,
and try something.<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 02.09.19 um 12:23 schrieb Michael
Balzer:<br class="">
<blockquote type="cite" class="">Nothing
open from my side at the moment.<br
class="">
<br class="">
I haven't had the time to look in
to Markos pull request, but from a
first check also think that's
going too deep to be included in
this release.<br class="">
<br class="">
Regards,<br class="">
Michael<br class="">
<br class="">
<br class="">
Am 02.09.19 um 04:15 schrieb Mark
Webb-Johnson:<br class="">
<blockquote type="cite" class="">I
think it is well past time for a
3.2.003 release. Things seems
table in edge (although some
things only partially
implemented).<br class="">
<br class="">
Anything people want to include
at the last minute, or can we go
ahead and build?<br class="">
<br class="">
Regards, Mark.<br class="">
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="144">--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
</pre>
</body>
</html>