No, the 32 bit limit indeed has an effect pretty soon. I had assumed they are "ticks", but they really are at the timer resolution (see DebugTasks records). Timer resolution is currently 1 MHz, so the counters will cycle over every 71.6 minutes. So if you don't enable the debug.tasks report, the fist manual report is likely to look strange. We can get a checkpoint every five minutes regardless of the report activation. Or we can add a measurement delay to the command. Regards, Michael Am 20.09.19 um 16:16 schrieb Michael Balzer:
The percentages are calculated from the time slice counter differences between two command invocations (or boot), but their 32 bit limit shouldn't have an effect that soon.
Also strange: your esp_timer SPIRAM allocation…
Is that a build with toolkit -98?
Regards, Michael
Am 20.09.19 um 03:29 schrieb Mark Webb-Johnson:
Strange. Connected to my car, and from ovms_shell asked for ‘module tasks’:
ovms> module tasks ... Vehicle Response: Number of Tasks = 17 Stack: Now Max Total Heap 32-bit SPIRAM C# PRI CPU% 3FFAFBF4 1 Blk esp_timer 400 656 4096 35944 6442122956 0 22 2% 3FFBDBBC 2 Blk eventTask 440 1928 4608 140 0 0 0 20 0% 3FFBFF00 3 Blk OVMS Events 444 3404 8192 70816 0 21092 1 5 9% 3FFC3300 4 Blk OVMS DukTape 460 9548 12288 188 0 31020 1 3 12% 3FFC4FF4 5 Blk OVMS CanRx 432 832 2048 3144 0 32072 0 23 120% 3FFC5D3C 6 Blk ipc0 392 504 1024 7804 0 0 0 24 2% 3FFC6340 7 Blk ipc1 396 444 1024 12 0 0 1 24 0% 3FFC8194 10 Rdy IDLE0 416 512 1024 0 0 0 0 0 362% 3FFC872C 11 Rdy IDLE1 408 504 1024 0 0 0 1 0 387% 3FFC94C4 12 Blk Tmr Svc 352 912 3072 88 0 0 0 20 1% 3FFC6794 17 Blk tiT 504 2536 3072 7668 0 0 * 18 6% 3FFCF3E0 18 Blk OVMS SIMCOM 460 2412 4096 40 0 3892 1 5 1% 3FFD0C10 19 Blk wifi 460 2716 3584 36356 0 2912 0 22 23% 3FFDBD70 20 Blk OVMS Vehicle 460 1308 6144 0 0 2336 1 10 95% 3FFDF0FC 21 Rdy OVMS Console 488 1736 6144 0 0 20 1 5 0% 3FFDF64C 22 Blk mdns 472 1576 4096 108 0 0 0 1 1% 3FFE38F0 35 Rdy OVMS NetMan 1384 2984 8192 144 25180 5004 1 5 3%
ovms> module tasks ... Vehicle Response: Number of Tasks = 17 Stack: Now Max Total Heap 32-bit SPIRAM C# PRI CPU% 3FFAFBF4 1 Blk esp_timer 400 656 4096 35944 6442122956 0 22 0% 3FFBDBBC 2 Blk eventTask 440 1928 4608 140 0 0 0 20 0% 3FFBFF00 3 Blk OVMS Events 444 3404 8192 70816 0 21092 1 5 0% 3FFC3300 4 Blk OVMS DukTape 460 9548 12288 188 0 23712 1 3 1% 3FFC4FF4 5 Blk OVMS CanRx 432 832 2048 3144 0 32072 0 23 4% 3FFC5D3C 6 Blk ipc0 392 504 1024 7804 0 0 0 24 0% 3FFC6340 7 Blk ipc1 396 444 1024 12 0 0 1 24 0% 3FFC8194 10 Rdy IDLE0 416 512 1024 0 0 0 0 0 96% 3FFC872C 11 Rdy IDLE1 408 504 1024 0 0 0 1 0 95% 3FFC94C4 12 Blk Tmr Svc 352 912 3072 88 0 0 0 20 0% 3FFC6794 17 Blk tiT 504 2536 3072 7668 0 0 * 18 0% 3FFCF3E0 18 Blk OVMS SIMCOM 668 2412 4096 40 0 3892 1 5 0% 3FFD0C10 19 Blk wifi 460 2716 3584 36356 0 2912 0 22 0% 3FFDBD70 20 Blk OVMS Vehicle 460 1308 6144 0 0 2336 1 10 3% 3FFDF0FC 21 Blk OVMS Console 552 1736 6144 0 0 20 1 5 0% 3FFDF64C 22 Blk mdns 472 1576 4096 108 0 0 0 1 0% 3FFE38F0 35 Rdy OVMS NetMan 1896 3240 8192 400 25180 5392 1 5 0%
The second one is what I normally see, but the first is bizarre. Any idea what the time frame for the CPU% is?
Regards, Mark
> On 8 Sep 2019, at 5:43 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: > > I've pushed some modifications and improvements to (hopefully) fix the timer issue or at least be able to debug it. > > Some sdkconfig changes are necessary. > > The build including these updates is on my edge release as 3.2.002-258-g20ae554b. > > Btw: the network restart strategy seems to mitigate issue #241; I've seen a major drop on record repetitions on my server since the rollout. > > > commit 99e4e48bdd40b7004c0976f51aba9e3da4ecab53 > > Module: add per task CPU usage statistics, add task stats history records > > To enable CPU usage statistics, apply the changes to sdkconfig > included. The CPU usage shown by the commands is calculated against > the last task status retrieved (or system boot). > > Command changes: > - "module tasks" -- added CPU (core) usage in percent per task > > New command: > - "module tasks data" -- output task stats in history record form > > New config: > - [module] debug.tasks -- yes (default) = send task stats every 5 minutes > > New history record: > - "*-OVM-DebugTasks" v1: <taskcnt,totaltime> + per task: > <tasknum,name,state,stack_now,stack_max,stack_total, > heap_total,heap_32bit,heap_spi,runtime> > Note: CPU core use percentage = runtime / totaltime > > commit 950172c216a72beb4da0bc7a40a46995a6105955 > > Build config: default timer service task priority raised to 20 > > Background: the FreeRTOS timer service shall only be used for very > short and non-blocking jobs. We delegate event processing to our > events task, anything else in timers needs to run with high > priority. > > commit 31ac19d187480046c16356b80668de45cacbb83d > > DukTape: add build config for task priority, default lowered to 3 > > Background: the DukTape garbage collector shall run on lower > priority than tasks like SIMCOM & events > > commit e0a44791fbcfb5a4e4cad24c9d1163b76e637b4f > > Server V2: use esp_log_timestamp for timeout detection, > add timeout config, limit data records & size per second > > New config: > - [server.v2] timeout.rx -- timeout in seconds, default 960 > > commit 684a4ce9525175a910040f0d1ca82ac212fbf5de > > Notify: use esp_log_timestamp for creation time instead of monotonictime > to harden against timer service starvation / ticker event drops > > > Regards, > Michael > > > Am 07.09.19 um 10:55 schrieb Michael Balzer: >> I think the RTOS timer service task starves. It's running on core 0 with priority 1. >> >> Taks on core 0 sorted by priority: >> >> Number of Tasks = 20 Stack: Now Max Total Heap 32-bit SPIRAM C# PRI >> 3FFC84A8 6 Blk ipc0 388 500 1024 7788 0 0 0 24 >> 3FFC77F0 5 Blk OVMS CanRx 428 428 2048 3052 0 31844 0 23 >> 3FFAFBF4 1 Blk esp_timer 400 656 4096 35928 644 25804 0 22 >> 3FFD3240 19 Blk wifi 460 2716 3584 43720 0 20 0 22 >> 3FFC03C4 2 Blk eventTask 448 1984 4608 104 0 0 0 20 >> 3FFC8F14 17 Blk tiT 500 2308 3072 6552 0 0 * 18 >> 3FFE14F0 26 Blk OVMS COrx 456 456 4096 0 0 0 0 7 >> 3FFE19D4 27 Blk OVMS COwrk 476 476 3072 0 0 0 0 7 >> 3FFCBC34 12 Blk Tmr Svc 352 928 3072 88 0 0 0 1 >> 3FFE7708 23 Blk mdns 468 1396 4096 108 0 0 0 1 >> >> I don't think it's our CanRx, as that only fetches and queues CAN frames, the actual work is done by the listeners. The CO tasks only run for CANopen >> jobs, which are few for normal operation. >> >> That leaves the system tasks, with main suspect -once again- the wifi blob. >> >> We need to know how much CPU time the tasks actually use now. I think I saw some option for this in the FreeRTOS config. >> >> Regards, >> Michael >> >> >> Am 06.09.19 um 23:15 schrieb Michael Balzer: >>> The workaround is based on the monotonictime being updated per second, as do the history record offsets. >>> >>> Apparently, that mechanism doesn't work reliably. That may be an indicator for some bigger underlying issue. >>> >>> Example log excerpt: >>> >>> 2019-09-06 22:07:48.126919 +0200 info main: #173 C MITPROHB rx msg h 964,0,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:03.089031 +0200 info main: #173 C MITPROHB rx msg h 964,-10,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:05.041574 +0200 info main: #173 C MITPROHB rx msg h 964,-20,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:05.052644 +0200 info main: #173 C MITPROHB rx msg h 964,-30,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:05.063617 +0200 info main: #173 C MITPROHB rx msg h 964,-49,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:05.077527 +0200 info main: #173 C MITPROHB rx msg h 964,-59,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:05.193775 +0200 info main: #173 C MITPROHB rx msg h 964,-70,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:13.190645 +0200 info main: #173 C MITPROHB rx msg h 964,-80,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:22.077994 +0200 info main: #173 C MITPROHB rx msg h 964,-90,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:09:54.590300 +0200 info main: #173 C MITPROHB rx msg h 964,-109,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:11:10.127054 +0200 info main: #173 C MITPROHB rx msg h 964,-119,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:11:16.794200 +0200 info main: #173 C MITPROHB rx msg h 964,-130,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:11:22.455652 +0200 info main: #173 C MITPROHB rx msg h 964,-140,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:12:49.423412 +0200 info main: #173 C MITPROHB rx msg h 964,-150,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:12:49.442096 +0200 info main: #173 C MITPROHB rx msg h 964,-169,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:12:49.461941 +0200 info main: #173 C MITPROHB rx msg h 964,-179,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:14:39.828133 +0200 info main: #173 C MITPROHB rx msg h 964,-190,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:14:39.858144 +0200 info main: #173 C MITPROHB rx msg h 964,-200,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:14:52.020319 +0200 info main: #173 C MITPROHB rx msg h 964,-210,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:14:54.452637 +0200 info main: #173 C MITPROHB rx msg h 964,-229,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:15:12.613935 +0200 info main: #173 C MITPROHB rx msg h 964,-239,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:15:35.223845 +0200 info main: #173 C MITPROHB rx msg h 964,-250,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:16:09.255059 +0200 info main: #173 C MITPROHB rx msg h 964,-260,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:17:31.919754 +0200 info main: #173 C MITPROHB rx msg h 964,-270,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:19:23.366267 +0200 info main: #173 C MITPROHB rx msg h 964,-289,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:21:57.344609 +0200 info main: #173 C MITPROHB rx msg h 964,-299,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:23:40.082406 +0200 info main: #31 C MITPROHB rx msg h 964,-1027,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> 2019-09-06 22:25:58.061883 +0200 info main: #31 C MITPROHB rx msg h 964,-1040,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0 >>> >>> >>> This shows the ticker was only run 299 times from 22:07:48 to 22:21:57. >>> >>> After 22:21:57 the workaround was triggered and did a reconnect. Apparently during that network reinitialization of 103 seconds, the per second ticker >>> was run 628 times. >>> >>> That can't be catching up on the event queue, as that queue has only 20 slots. So something strange is going on here. >>> >>> Any ideas? >>> >>> Regards, >>> Michael >>> >>> >>> Am 06.09.19 um 08:04 schrieb Michael Balzer: >>>> Mark & anyone else running a V2 server, >>>> >>>> as most cars don't send history records, this also needs the change to the server I just pushed, i.e. server version 2.4.2. >>>> >>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/maste... >>>> <https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master> >>>> >>>> Regards, >>>> Michael >>>> >>>> >>>> Am 05.09.19 um 19:55 schrieb Michael Balzer: >>>>> I've pushed the nasty workaround: the v2 server checks for no RX over 15 minutes, then restarts the network (wifi & modem) as configured for autostart. >>>>> >>>>> Rolled out on my server in edge as 3.2.002-237-ge075f655. >>>>> >>>>> Please test. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> Am 05.09.19 um 01:58 schrieb Mark Webb-Johnson: >>>>>>> Mark, you can check your server logs for history messages with ridiculous time offsets: >>>>>>> [sddexter@ns27 server]$ cat log-20190903 | egrep "rx msg h [0-9]+,-[0-9]{4}" | wc -l >>>>>>> 455283 >>>>>> >>>>>> I checked my logs and see 12 vehicles showing this. But, 2 only show this for a debugcrash log (which is expected, I guess, if the time is not >>>>>> synced at report time). I’ve got 4 cars with the offset > 10,000. >>>>>> >>>>>> Regards, Mark. >>>>>> >>>>>>> On 4 Sep 2019, at 4:45 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de> <mailto:dexter@expeedo.de>> wrote: >>>>>>> >>>>>>> Everyone, >>>>>>> >>>>>>> I've pushed a change that needs some testing. >>>>>>> >>>>>>> I had the issue myself now parking at a certain distance from my garage wifi AP, i.e. on the edge of "in", after wifi had been disconnected for >>>>>>> some hours, and with the module still connected via modem. The wifi blob had been trying to connect to the AP for about two hours. >>>>>>> >>>>>>> As seen before, the module saw no error, just the server responses and commands stopped coming in. I noticed the default interface was still "st1" >>>>>>> despite wifi having been disconnected and modem connected. The DNS was also still configured for my wifi network, and the interface seemed to have >>>>>>> an IP address -- but wasn't pingable from the wifi network. >>>>>>> >>>>>>> A power cycle of the modem solved the issue without reboot. So the cause may be in the modem/ppp subsystem, or it may be related (in some weird >>>>>>> way) to the default interface / DNS setup. >>>>>>> >>>>>>> More tests showed the default interface again/still got set by the wifi blob itself at some point, overriding our modem prioritization. The events >>>>>>> we didn't handle up to now were "sta.connected" and "sta.lostip", so I added these, and the bug didn't show up again since then. That doesn't mean >>>>>>> anything, so we need to test this. >>>>>>> >>>>>>> The default interface really shouldn't affect inbound packet routing of an established connection, but there always may be strange bugs lurking in >>>>>>> those libs. >>>>>>> >>>>>>> The change also reimplements the wifi signal strength reading, as the tests also showed that still wasn't working well using the CSI callback. It >>>>>>> now seems to be much more reliable. >>>>>>> >>>>>>> Please test & report. The single module will be hard to test, as the bug isn't reproducable easily, but you can still try if wifi / modem >>>>>>> transitions work well. >>>>>>> >>>>>>> Mark, you can check your server logs for history messages with ridiculous time offsets: >>>>>>> [sddexter@ns27 server]$ cat log-20190903 | egrep "rx msg h [0-9]+,-[0-9]{4}" | wc -l >>>>>>> 455283 >>>>>>> The bug now severely affects the V2 server performance, as the server is single threaded and doesn't scale very well to this kind of bulk data >>>>>>> bursts, especially when coming from multiple modules in parallel. So we really need to solve this now. Slow reactions or connection drops from my >>>>>>> server lately have been due to this bug. If this change doesn't solve it, we'll need to add some reboot trigger on "too many server v2 >>>>>>> notification retransmissions" -- or maybe a modem power cycle will do, that wouldn't discard the data. >>>>>>> >>>>>>> Thanks, >>>>>>> Michael >>>>>>> >>>>>>> >>>>>>> Am 03.09.19 um 07:46 schrieb Mark Webb-Johnson: >>>>>>>> No problem. We can hold. I won’t commit anything for the next few days (and agree to hold-off on Markos’s pull). Let me know when you are ready. >>>>>>>> >>>>>>>> Regards, Mark. >>>>>>>> >>>>>>>>> On 3 Sep 2019, at 1:58 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> <mailto:dexter@expeedo.de> wrote: >>>>>>>>> >>>>>>>>> Mark, please wait. >>>>>>>>> >>>>>>>>> I may just have found the cause for issue #241, or at least something I need to investigate before releasing. >>>>>>>>> >>>>>>>>> I need to dig into my logs first, and try something. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Michael >>>>>>>>> >>>>>>>>> >>>>>>>>> Am 02.09.19 um 12:23 schrieb Michael Balzer: >>>>>>>>>> Nothing open from my side at the moment. >>>>>>>>>> >>>>>>>>>> I haven't had the time to look in to Markos pull request, but from a first check also think that's going too deep to be included in this release. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Michael >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 02.09.19 um 04:15 schrieb Mark Webb-Johnson: >>>>>>>>>>> I think it is well past time for a 3.2.003 release. Things seems table in edge (although some things only partially implemented). >>>>>>>>>>> >>>>>>>>>>> Anything people want to include at the last minute, or can we go ahead and build? >>>>>>>>>>> >>>>>>>>>>> Regards, Mark. >>>>>>>>>>>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26