[Ovmsdev] Time for 3.2.003? / Issue #241

Michael Balzer dexter at expeedo.de
Thu Sep 26 00:54:04 HKT 2019


Same on my server.

Current version in all three branches is  3.2.005-42-g0e8f7306.

Commit offset 42 because my builds are from spiram-fix-test.

Regards,
Michael


Am 25.09.19 um 14:14 schrieb Mark Webb-Johnson:
> OK. I released v3.2.005 to general release. Both EAP and MAIN now have that version.
>
> Regards, Mark
>
>> On 25 Sep 2019, at 3:31 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>
>> Mark,
>>
>> go ahead, I'll follow. I don't have any bad reports from edge or eap.
>>
>> Regards,
>> Michael
>>
>>
>> Am 25.09.19 um 03:47 schrieb Mark Webb-Johnson:
>>> This is what I have (api.openvehicles.com <http://api.openvehicles.com/>):
>>>
>>>     ==> eap/ovms3.ver <==
>>>     3.2.005
>>>     Tue Sep 19 08:00:00 UTC 2019 OTA release
>>>
>>>     ==> edge/ovms3.ver <==
>>>     3.2.005-1-g7f86e9c
>>>     Thu Sep 19 16:01:18 UTC 2019 Automated build (markhk8)
>>>
>>>     ==> main/ovms3.ver <==
>>>     3.2.002
>>>     Sun May 12 08:00:00 UTC 2019 OTA release
>>>
>>>
>>> The 3.2.005 seems stable, so I think it can now go eap->main.
>>>
>>> @Michael: Should we co-ordinate and do this later today, or have you already released 3.2.005 to main?
>>>
>>> Regards, Mark.
>>>
>>>> On 25 Sep 2019, at 2:21 AM, Bernd Geistert <b_ghosti at gmx.de <mailto:b_ghosti at gmx.de>> wrote:
>>>>
>>>> IMHO, currently:
>>>> - in MAIN is 3.2.002
>>>> - in EAP is  3.2.003
>>>> - in EDGE is 3.2.005-1-g7f86e9c
>>>>
>>>>
>>>> Am 19.09.2019 um 10:22 schrieb Mark Webb-Johnson:
>>>>> OK, I’ve built:
>>>>>
>>>>>     2019-09-19 MWJ  3.2.005  OTA release
>>>>>     - Default module/debug.tasks to FALSE
>>>>>       Users that volunteer to submit tasks debug historical data to the Open Vehicles
>>>>>       project, should (with appreciation) set:
>>>>>         config set module debug.tasks yes
>>>>>       This will be transmit approximately 700MB of data a month (over cellular/wifi).
>>>>>
>>>>>     2019-09-19 MWJ  3.2.004  OTA release
>>>>>     - Skipped for Chinese superstitous reasons
>>>>>
>>>>>
>>>>> In EAP now, and I will announce.
>>>>>
>>>>> Regards, Mark.
>>>>>
>>>>>> On 19 Sep 2019, at 3:34 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 19.09.19 um 09:29 schrieb Mark Webb-Johnson:
>>>>>>> I’m just worried about the users who don’t know about this new feature. When they deploy this version, they suddenly start sending 6MB of data a month
>>>>>>> up to us.
>>>>>>>
>>>>>>> I think the ‘fix’ is just to change ovms_module.c:
>>>>>>>
>>>>>>>     MyConfig.GetParamValueBool("module", "debug.tasks", true)
>>>>>>>
>>>>>>>     to
>>>>>>>
>>>>>>>     MyConfig.GetParamValueBool("module", "debug.tasks", false)
>>>>>>>
>>>>>>>
>>>>>>> That would then only submit these logs for those that explicitly turn it on?
>>>>>>>
>>>>>>> Regards, Mark.
>>>>>>>
>>>>>>>> On 19 Sep 2019, at 3:23 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>
>>>>>>>> Sorry, I didn't think about this being an issue elsewhere -- german data plans typically start at minimum 100 MB/month flat (that's my current plan at
>>>>>>>> 3 € / month).
>>>>>>>>
>>>>>>>> No need for a new release, it can be turned off OTA by issueing
>>>>>>>>
>>>>>>>> config set module debug.tasks no
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 19.09.19 um 09:08 schrieb Mark Webb-Johnson:
>>>>>>>>> Yep:
>>>>>>>>>
>>>>>>>>>     758 bytes * (86400 / 300) * 30 = 6.5MB/month
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is going over data (not SD). Presumably cellular data for a large portion of the time.
>>>>>>>>>
>>>>>>>>> I think we need to default this to OFF, and make a 3.2.004 to avoid this becoming an issue.
>>>>>>>>>
>>>>>>>>> Regards, Mark.
>>>>>>>>>
>>>>>>>>>> On 19 Sep 2019, at 2:04 PM, Stephen Casner <casner at acm.org <mailto:casner at acm.org>> wrote:
>>>>>>>>>>
>>>>>>>>>> That's 6.55MB/month, unless you have unusually short months!  :-)
>>>>>>>>>>
>>>>>>>>>> In what space is that data stored?  A log written to SD?  That's not
>>>>>>>>>> likely to fill up the SD card too fast, but what happens if no SD card
>>>>>>>>>> is installed?
>>>>>>>>>>
>>>>>>>>>>                                                        -- Steve
>>>>>>>>>>
>>>>>>>>>> On Thu, 19 Sep 2019, Mark Webb-Johnson wrote:
>>>>>>>>>>
>>>>>>>>>>>>    To enable CPU usage statistics, apply the changes to sdkconfig
>>>>>>>>>>>>    included.
>>>>>>>>>>>>    New history record:
>>>>>>>>>>>>    - "*-OVM-DebugTasks" v1: <taskcnt,totaltime> + per task:
>>>>>>>>>>>>        <tasknum,name,state,stack_now,stack_max,stack_total,
>>>>>>>>>>>>         heap_total,heap_32bit,heap_spi,runtime>
>>>>>>>>>>>>      Note: CPU core use percentage = runtime / totaltime
>>>>>>>>>>>
>>>>>>>>>>> I’ve just noticed that this is enabled by default now (my production build has the sdkconfig updated, as per defaults).
>>>>>>>>>>>
>>>>>>>>>>> I am seeing 758 bytes of history record, every 5 minutes. About 218KB/day, or 654KB/month.
>>>>>>>>>>>
>>>>>>>>>>> Should this be opt-in?
>>>>>>>>>>>
>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>
>>>>>>>>>>>> On 8 Sep 2019, at 5:43 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I've pushed some modifications and improvements to (hopefully) fix the timer issue or at least be able to debug it.
>>>>>>>>>>>>
>>>>>>>>>>>> Some sdkconfig changes are necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> The build including these updates is on my edge release as 3.2.002-258-g20ae554b.
>>>>>>>>>>>>
>>>>>>>>>>>> Btw: the network restart strategy seems to mitigate issue #241; I've seen a major drop on record repetitions on my server since the rollout.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> commit 99e4e48bdd40b7004c0976f51aba9e3da4ecab53
>>>>>>>>>>>>
>>>>>>>>>>>>    Module: add per task CPU usage statistics, add task stats history records
>>>>>>>>>>>>
>>>>>>>>>>>>    To enable CPU usage statistics, apply the changes to sdkconfig
>>>>>>>>>>>>    included. The CPU usage shown by the commands is calculated against
>>>>>>>>>>>>    the last task status retrieved (or system boot).
>>>>>>>>>>>>
>>>>>>>>>>>>    Command changes:
>>>>>>>>>>>>    - "module tasks" -- added CPU (core) usage in percent per task
>>>>>>>>>>>>
>>>>>>>>>>>>    New command:
>>>>>>>>>>>>    - "module tasks data" -- output task stats in history record form
>>>>>>>>>>>>
>>>>>>>>>>>>    New config:
>>>>>>>>>>>>    - [module] debug.tasks -- yes (default) = send task stats every 5 minutes
>>>>>>>>>>>>
>>>>>>>>>>>>    New history record:
>>>>>>>>>>>>    - "*-OVM-DebugTasks" v1: <taskcnt,totaltime> + per task:
>>>>>>>>>>>>        <tasknum,name,state,stack_now,stack_max,stack_total,
>>>>>>>>>>>>         heap_total,heap_32bit,heap_spi,runtime>
>>>>>>>>>>>>      Note: CPU core use percentage = runtime / totaltime
>>>>>>>>>>>>
>>>>>>>>>>>> commit 950172c216a72beb4da0bc7a40a46995a6105955
>>>>>>>>>>>>
>>>>>>>>>>>>    Build config: default timer service task priority raised to 20
>>>>>>>>>>>>
>>>>>>>>>>>>    Background: the FreeRTOS timer service shall only be used for very
>>>>>>>>>>>>      short and non-blocking jobs. We delegate event processing to our
>>>>>>>>>>>>      events task, anything else in timers needs to run with high
>>>>>>>>>>>>      priority.
>>>>>>>>>>>>
>>>>>>>>>>>> commit 31ac19d187480046c16356b80668de45cacbb83d
>>>>>>>>>>>>
>>>>>>>>>>>>    DukTape: add build config for task priority, default lowered to 3
>>>>>>>>>>>>
>>>>>>>>>>>>    Background: the DukTape garbage collector shall run on lower
>>>>>>>>>>>>      priority than tasks like SIMCOM & events
>>>>>>>>>>>>
>>>>>>>>>>>> commit e0a44791fbcfb5a4e4cad24c9d1163b76e637b4f
>>>>>>>>>>>>
>>>>>>>>>>>>    Server V2: use esp_log_timestamp for timeout detection,
>>>>>>>>>>>>      add timeout config, limit data records & size per second
>>>>>>>>>>>>
>>>>>>>>>>>>    New config:
>>>>>>>>>>>>    - [server.v2] timeout.rx -- timeout in seconds, default 960
>>>>>>>>>>>>
>>>>>>>>>>>> commit 684a4ce9525175a910040f0d1ca82ac212fbf5de
>>>>>>>>>>>>
>>>>>>>>>>>>    Notify: use esp_log_timestamp for creation time instead of monotonictime
>>>>>>>>>>>>      to harden against timer service starvation / ticker event drops
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Michael
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Am 07.09.19 um 10:55 schrieb Michael Balzer:
>>>>>>>>>>>>> I think the RTOS timer service task starves. It's running on core 0 with priority 1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Taks on core 0 sorted by priority:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Number of Tasks = 20      Stack:  Now   Max Total    Heap 32-bit SPIRAM C# PRI
>>>>>>>>>>>>> 3FFC84A8  6 Blk ipc0              388   500  1024    7788      0      0  0  24
>>>>>>>>>>>>> 3FFC77F0  5 Blk OVMS CanRx        428   428  2048    3052      0  31844  0  23
>>>>>>>>>>>>> 3FFAFBF4  1 Blk esp_timer         400   656  4096   35928    644  25804  0  22
>>>>>>>>>>>>> 3FFD3240 19 Blk wifi              460  2716  3584   43720      0     20  0  22
>>>>>>>>>>>>> 3FFC03C4  2 Blk eventTask         448  1984  4608     104      0      0  0  20
>>>>>>>>>>>>> 3FFC8F14 17 Blk tiT               500  2308  3072    6552      0      0  *  18
>>>>>>>>>>>>> 3FFE14F0 26 Blk OVMS COrx         456   456  4096       0      0      0  0   7
>>>>>>>>>>>>> 3FFE19D4 27 Blk OVMS COwrk        476   476  3072       0      0      0  0   7
>>>>>>>>>>>>> 3FFCBC34 12 Blk Tmr Svc           352   928  3072      88      0      0  0   1
>>>>>>>>>>>>> 3FFE7708 23 Blk mdns              468  1396  4096     108      0      0  0   1
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think it's our CanRx, as that only fetches and queues CAN frames, the actual work is done by the listeners. The CO tasks only run for
>>>>>>>>>>>>> CANopen jobs, which are few for normal operation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That leaves the system tasks, with main suspect -once again- the wifi blob.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We need to know how much CPU time the tasks actually use now. I think I saw some option for this in the FreeRTOS config.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 06.09.19 um 23:15 schrieb Michael Balzer:
>>>>>>>>>>>>>> The workaround is based on the monotonictime being updated per second, as do the history record offsets.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Apparently, that mechanism doesn't work reliably. That may be an indicator for some bigger underlying issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example log excerpt:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2019-09-06 22:07:48.126919 +0200 info  main: #173 C MITPROHB rx msg h 964,0,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:03.089031 +0200 info  main: #173 C MITPROHB rx msg h 964,-10,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:05.041574 +0200 info  main: #173 C MITPROHB rx msg h 964,-20,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:05.052644 +0200 info  main: #173 C MITPROHB rx msg h 964,-30,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:05.063617 +0200 info  main: #173 C MITPROHB rx msg h 964,-49,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:05.077527 +0200 info  main: #173 C MITPROHB rx msg h 964,-59,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:05.193775 +0200 info  main: #173 C MITPROHB rx msg h 964,-70,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:13.190645 +0200 info  main: #173 C MITPROHB rx msg h 964,-80,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:22.077994 +0200 info  main: #173 C MITPROHB rx msg h 964,-90,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:09:54.590300 +0200 info  main: #173 C MITPROHB rx msg h 964,-109,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:11:10.127054 +0200 info  main: #173 C MITPROHB rx msg h 964,-119,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:11:16.794200 +0200 info  main: #173 C MITPROHB rx msg h 964,-130,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:11:22.455652 +0200 info  main: #173 C MITPROHB rx msg h 964,-140,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:12:49.423412 +0200 info  main: #173 C MITPROHB rx msg h 964,-150,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:12:49.442096 +0200 info  main: #173 C MITPROHB rx msg h 964,-169,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:12:49.461941 +0200 info  main: #173 C MITPROHB rx msg h 964,-179,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:14:39.828133 +0200 info  main: #173 C MITPROHB rx msg h 964,-190,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:14:39.858144 +0200 info  main: #173 C MITPROHB rx msg h 964,-200,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:14:52.020319 +0200 info  main: #173 C MITPROHB rx msg h 964,-210,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:14:54.452637 +0200 info  main: #173 C MITPROHB rx msg h 964,-229,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:15:12.613935 +0200 info  main: #173 C MITPROHB rx msg h 964,-239,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:15:35.223845 +0200 info  main: #173 C MITPROHB rx msg h 964,-250,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:16:09.255059 +0200 info  main: #173 C MITPROHB rx msg h 964,-260,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:17:31.919754 +0200 info  main: #173 C MITPROHB rx msg h 964,-270,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:19:23.366267 +0200 info  main: #173 C MITPROHB rx msg h 964,-289,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:21:57.344609 +0200 info  main: #173 C MITPROHB rx msg h 964,-299,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:23:40.082406 +0200 info  main: #31 C MITPROHB rx msg h 964,-1027,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>> 2019-09-06 22:25:58.061883 +0200 info  main: #31 C MITPROHB rx msg h 964,-1040,RT-BAT-C,5,86400,2,1,3830,3795,3830,-10,25,25,25,0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This shows the ticker was only run 299 times from 22:07:48 to 22:21:57.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After 22:21:57 the workaround was triggered and did a reconnect. Apparently during that network reinitialization of 103 seconds, the per second
>>>>>>>>>>>>>> ticker was run 628 times.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That can't be catching up on the event queue, as that queue has only 20 slots. So something strange is going on here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 06.09.19 um 08:04 schrieb Michael Balzer:
>>>>>>>>>>>>>>> Mark & anyone else running a V2 server,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> as most cars don't send history records, this also needs the change to the server I just pushed, i.e. server version 2.4.2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master
>>>>>>>>>>>>>>> <https://github.com/openvehicles/Open-Vehicle-Monitoring-System/commits/master>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 05.09.19 um 19:55 schrieb Michael Balzer:
>>>>>>>>>>>>>>>> I've pushed the nasty workaround: the v2 server checks for no RX over 15 minutes, then restarts the network (wifi & modem) as configured for
>>>>>>>>>>>>>>>> autostart.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Rolled out on my server in edge as 3.2.002-237-ge075f655.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please test.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 05.09.19 um 01:58 schrieb Mark Webb-Johnson:
>>>>>>>>>>>>>>>>>> Mark, you can check your server logs for history messages with ridiculous time offsets:
>>>>>>>>>>>>>>>>>> [sddexter at ns27 server]$ cat log-20190903 | egrep "rx msg h [0-9]+,-[0-9]{4}" | wc -l
>>>>>>>>>>>>>>>>>> 455283
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I checked my logs and see 12 vehicles showing this. But, 2 only show this for a debugcrash log (which is expected, I guess, if the time is not
>>>>>>>>>>>>>>>>> synced at report time). I’ve got 4 cars with the offset > 10,000.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 4 Sep 2019, at 4:45 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de> <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Everyone,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've pushed a change that needs some testing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had the issue myself now parking at a certain distance from my garage wifi AP, i.e. on the edge of "in", after wifi had been disconnected
>>>>>>>>>>>>>>>>>> for some hours, and with the module still connected via modem. The wifi blob had been trying to connect to the AP for about two hours.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As seen before, the module saw no error, just the server responses and commands stopped coming in. I noticed the default interface was still
>>>>>>>>>>>>>>>>>> "st1" despite wifi having been disconnected and modem connected. The DNS was also still configured for my wifi network, and the interface
>>>>>>>>>>>>>>>>>> seemed to have an IP address -- but wasn't pingable from the wifi network.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A power cycle of the modem solved the issue without reboot. So the cause may be in the modem/ppp subsystem, or it may be related (in some
>>>>>>>>>>>>>>>>>> weird way) to the default interface / DNS setup.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> More tests showed the default interface again/still got set by the wifi blob itself at some point, overriding our modem prioritization. The
>>>>>>>>>>>>>>>>>> events we didn't handle up to now were "sta.connected" and "sta.lostip", so I added these, and the bug didn't show up again since then. That
>>>>>>>>>>>>>>>>>> doesn't mean anything, so we need to test this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The default interface really shouldn't affect inbound packet routing of an established connection, but there always may be strange bugs
>>>>>>>>>>>>>>>>>> lurking in those libs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The change also reimplements the wifi signal strength reading, as the tests also showed that still wasn't working well using the CSI
>>>>>>>>>>>>>>>>>> callback. It now seems to be much more reliable.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please test & report. The single module will be hard to test, as the bug isn't reproducable easily, but you can still try if wifi / modem
>>>>>>>>>>>>>>>>>> transitions work well.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mark, you can check your server logs for history messages with ridiculous time offsets:
>>>>>>>>>>>>>>>>>> [sddexter at ns27 server]$ cat log-20190903 | egrep "rx msg h [0-9]+,-[0-9]{4}" | wc -l
>>>>>>>>>>>>>>>>>> 455283
>>>>>>>>>>>>>>>>>> The bug now severely affects the V2 server performance, as the server is single threaded and doesn't scale very well to this kind of bulk
>>>>>>>>>>>>>>>>>> data bursts, especially when coming from multiple modules in parallel. So we really need to solve this now. Slow reactions or connection
>>>>>>>>>>>>>>>>>> drops from my server lately have been due to this bug. If this change doesn't solve it, we'll need to add some reboot trigger on "too many
>>>>>>>>>>>>>>>>>> server v2 notification retransmissions" -- or maybe a modem power cycle will do, that wouldn't discard the data.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 03.09.19 um 07:46 schrieb Mark Webb-Johnson:
>>>>>>>>>>>>>>>>>>> No problem. We can hold. I won’t commit anything for the next few days (and agree to hold-off on Markos’s pull). Let me know when you are ready.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 3 Sep 2019, at 1:58 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> <mailto:dexter at expeedo.de> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Mark, please wait.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I may just have found the cause for issue #241, or at least something I need to investigate before releasing.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I need to dig into my logs first, and try something.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 02.09.19 um 12:23 schrieb Michael Balzer:
>>>>>>>>>>>>>>>>>>>>> Nothing open from my side at the moment.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I haven't had the time to look in to Markos pull request, but from a first check also think that's going too deep to be included in this
>>>>>>>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 02.09.19 um 04:15 schrieb Mark Webb-Johnson:
>>>>>>>>>>>>>>>>>>>>>> I think it is well past time for a 3.2.003 release. Things seems table in edge (although some things only partially implemented).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Anything people want to include at the last minute, or can we go ahead and build?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>
>> -- 
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20190925/7d950d6a/attachment.htm>


More information about the OvmsDev mailing list