[Ovmsdev] Branch for-v3.3 network issues

Michael Balzer dexter at expeedo.de
Mon Mar 22 18:06:58 HKT 2021


In master, running commands via ssh or server-v2 block, because these 
are running synchronously in the mongoose context.

Running commands via web doesn't block, as the webcommand class starts a 
separate task for each execution.

The firmware config page does a synchronous call to MyOTA.GetStatus(), 
so that call is executed in the mongoose context. It still works in 
master, just needs a second or two to fetch the version file.

Regards,
Michael


Am 22.03.21 um 10:38 schrieb Mark Webb-Johnson:
> In master branch, at the moment, if a command is run from the web 
> shell (or server v2), surely the mongoose task will block as the web 
> server / server v2 blocks waiting for the command to run to completion?
>
> Doesn’t necessarily need to be a networking command. Something long 
> running like the string speed tests.
>
> In v3.3 I can easily detect the task wait being requested in the http 
> library (by seeing if current task id == mongoose task), and fail 
> (which I should do anyway). But I am more concerned with the general 
> case now (which I think may be wrong in both master and for-v3.3).
>
> Regards, Mark
>
>> On 22 Mar 2021, at 5:22 PM, Michael Balzer <dexter at expeedo.de> wrote:
>>
>>  I think we must avoid blocking the Mongoose task, as that's the 
>> central network dispatcher.
>>
>> Chris had implemented a workaround in one of his PRs that could allow 
>> that to be done temporarily by running a local Mongoose main loop 
>> during a synchronous operation, but I still see potential issues from 
>> that, as it wasn't the standard handling as done by the task, and as 
>> it may need to recurse.
>>
>> Maybe the old OvmsHttpClient using socket I/O is the right way for 
>> synchronous network operations?
>>
>> Regards,
>> Michael
>>
>>
>> Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
>>> Not sure how to resolve this.
>>>
>>> OvmsSyncHttpClient is currently used in commands from ovms_plugins 
>>> and ovms_ota.
>>>
>>> I could bring back the OvmsHttpClient blocking (non-mongoose) 
>>> implementation, but I don’t think that would address the core 
>>> problem here:
>>>
>>>     Inside a mongoose callback (inside the mongoose networking
>>>     task), we are making blocking calls (and in particular calls
>>>     that could block for several tens of seconds).
>>>
>>>
>>> But fundamentally is it ok to block the mongoose networking task for 
>>> extended periods during a mongoose event callback?
>>>
>>> Mark
>>>
>>>> On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter at expeedo.de 
>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>
>>>> Signed PGP part
>>>> I've found opening the web UI firmware page or calling "ota status" 
>>>> via ssh to consistently deadlock the network on my module.
>>>>
>>>> I (130531) webserver: HTTP GET /cfg/firmware
>>>> D (130531) http: OvmsSyncHttpClient: Connect to 
>>>> ovms.dexters-web.de:80 <http://ovms.dexters-web.de:80>
>>>> D (130541) http: OvmsSyncHttpClient: waiting for completion
>>>>
>>>> After that log message, the network is dead, and the netmanager 
>>>> also doesn't respond:
>>>>
>>>> OVMS# network list
>>>> ERROR: job failed
>>>> D (183241) netmanager: send cmd 1 from 0x3ffe7054
>>>> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>>>>
>>>> The interfaces seem to be registered and online, but nothing gets 
>>>> in or out:
>>>>
>>>> OVMS# network status
>>>> Interface#3: pp3 (ifup=1 linkup=1)
>>>>   IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>>>>
>>>> Interface#2: ap2 (ifup=1 linkup=1)
>>>>   IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>>>>
>>>> Interface#1: st1 (ifup=1 linkup=1)
>>>>   IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>>>>
>>>> DNS: 192.168.2.1
>>>>
>>>> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway 
>>>> 192.168.2.1)
>>>>
>>>>
>>>> A couple of minutes later, server-v2 recognizes the stale 
>>>> connection and issues a network restart, which fails resulting in 
>>>> the same behaviour as shown below with finally forced reboot by 
>>>> loss of an important event.
>>>>
>>>> Doing "ota status" from USB works normally, so this looks like 
>>>> OvmsSyncHttpClient not being able to run from within a mongoose client.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>>>>> Tried to repeat this, but not having much success. Here is my car 
>>>>> module, with network still up:
>>>>>
>>>>>     OVMS# boot status
>>>>>     Last boot was 262355 second(s) ago
>>>>>
>>>>>
>>>>> I did manage to catch one network related crash after repeatedly 
>>>>> disconnecting and reconnecting the cellular antenna. This was:
>>>>>
>>>>>     I (3717989) cellular: PPP Connection disconnected
>>>>>     Guru Meditation Error: Core  1 panic'ed (LoadProhibited).
>>>>>     Exception was unhandled.
>>>>>
>>>>>     0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate()
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>>>>>
>>>>>         707       if ((pri->name[0]==search[0])&&
>>>>>
>>>>>
>>>>>     0x400ed360 is in
>>>>>     OvmsMetricString::SetValue(std::__cxx11::basic_string<char,
>>>>>     std::char_traits<char>, std::allocator<char> >)
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>>>>>
>>>>>         1357  void OvmsMetricString::SetValue(std::string value)
>>>>>
>>>>>
>>>>>     0x4008bdad is at
>>>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>>>>
>>>>>
>>>>>     0x4008bdd1 is at
>>>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>>>>
>>>>>
>>>>>     0x400fe886 is in
>>>>>     OvmsNetManager::ModemDown(std::__cxx11::basic_string<char,
>>>>>     std::char_traits<char>, std::allocator<char> >, void*)
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>>>>>
>>>>>         522     PrioritiseAndIndicate();
>>>>>
>>>>>
>>>>>     0x400fd752 is in std::_Function_handler<void
>>>>>     (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>>     std::allocator<char> >, void*), std::_Bind<std::_Mem_fn<void
>>>>>     (OvmsNetManager::*)(std::__cxx11::basic_string<char,
>>>>>     std::char_traits<char>, std::allocator<char> >, void*)>
>>>>>     (OvmsNetManager*, std::_Placeholder<1>, std::_Placeholder<2>)>
>>>>>     >::_M_invoke(std::_Any_data const&,
>>>>>     std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>>     std::allocator<char> >&&, void*&&)
>>>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>>>>>
>>>>>         600         { return
>>>>>         (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>>>>
>>>>>
>>>>>     0x400f512e is in std::function<void
>>>>>     (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>>     std::allocator<char> >,
>>>>>     void*)>::operator()(std::__cxx11::basic_string<char,
>>>>>     std::char_traits<char>, std::allocator<char> >, void*) const
>>>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>>>>>
>>>>>         2271        return _M_invoker(_M_functor,
>>>>>         std::forward<_ArgTypes>(__args)...);
>>>>>
>>>>>
>>>>>     0x400f52f1 is in
>>>>>     OvmsEvents::HandleQueueSignalEvent(event_queue_t*)
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>>>>>
>>>>>         283 m_current_callback->m_callback(m_current_event,
>>>>>         msg->body.signal.data);
>>>>>
>>>>>
>>>>>     0x400f53d8 is in OvmsEvents::EventTask()
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>>>>>
>>>>>         237 HandleQueueSignalEvent(&msg);
>>>>>
>>>>>
>>>>>     0x400f545d is in EventLaunchTask(void*)
>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>>>>>
>>>>>         80    me->EventTask();
>>>>>
>>>>>
>>>>> My for_v3.3 branch does include the preliminary changes to support 
>>>>> the wifi at 20MHz bandwidth, and perhaps those could be affecting 
>>>>> things. I do notice that if I ‘power wifi off’, then ‘wifi mode 
>>>>> client’, it can connect to the station, but not get an IP address. 
>>>>> I’ve just tried to merge in the latest fixes to that, and rebuilt 
>>>>> a release. I will continue to test with that.
>>>>>
>>>>> Regards, Mark.
>>>>>
>>>>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer <dexter at expeedo.de 
>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>
>>>>>> Signed PGP part
>>>>>> I just tried switching to for-v3.3 in my car module after tests 
>>>>>> on my desk module were OK, and I've run into the very same 
>>>>>> problem with for-v3.3. So the issue isn't related to esp-idf.
>>>>>>
>>>>>> The network only occasionally starts normally, but even then all 
>>>>>> connectivity is lost after a couple of minutes.
>>>>>>
>>>>>> The stale connection watchdog in server-v2 triggers a network 
>>>>>> restart, but that also doesn't seem to succeed:
>>>>>>
>>>>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: Detected 
>>>>>> stale connection (issue #241), restarting network
>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping WIFI 
>>>>>> station
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> init (0)
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total sleep 
>>>>>> time: 831205045 us / 975329961 us
>>>>>>
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, old:<1,1>, 
>>>>>> ap:<1,1>, sta:<1,1>, prof:1
>>>>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>>>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering down 
>>>>>> WIFI driver
>>>>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc rx 
>>>>>> mblock:16
>>>>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering up 
>>>>>> WIFI driver
>>>>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, erase 
>>>>>> log key successfully, reinit nvs log
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver task: 
>>>>>> 3ffd4d84, prio:23, stack:3584, core=0
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC 
>>>>>> address is not set, read default base MAC address from BLK0 of EFUSE
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC 
>>>>>> address is not set, read default base MAC address from BLK0 of EFUSE
>>>>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware 
>>>>>> version: 30f9e79
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS flash: enabled
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano 
>>>>>> formating: disabled
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame 
>>>>>> dynamic rx buffer num: 16
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init management frame 
>>>>>> dynamic rx buffer num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init management short 
>>>>>> buffer num: 32
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx 
>>>>>> buffer num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer 
>>>>>> size: 2212
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer 
>>>>>> num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx 
>>>>>> buffer num: 16
>>>>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta 
>>>>>> (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>>>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power save 
>>>>>> buffer number: 8
>>>>>> 2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: Restart
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State: Enter 
>>>>>> PowerOffOn state
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down 
>>>>>> (hard)...
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: StatusCallBack: 
>>>>>> User Interrupt
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP connection 
>>>>>> has been closed
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down 
>>>>>> (hard)...
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: StatusCallBack: 
>>>>>> User Interrupt
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP connection 
>>>>>> has been closed
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown (direct)
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: Power 
>>>>>> Cycle
>>>>>> 2021-03-12 14:53:04.682 CET D (984532) events: 
>>>>>> Signal(system.wifi.down)
>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client stop
>>>>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager: Inconsistent 
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client 
>>>>>> down (with MODEM up): reconfigured for MODEM priority
>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events: 
>>>>>> Signal(system.wifi.sta.disconnected)
>>>>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager: Inconsistent 
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA 
>>>>>> disconnected with reason 8 = ASSOC_LEAVE
>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events: 
>>>>>> Signal(system.wifi.sta.stop)
>>>>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager: Inconsistent 
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events: 
>>>>>> Signal(system.wifi.ap.stop)
>>>>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager: Inconsistent 
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI access 
>>>>>> point is down
>>>>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>>>> Signal(network.wifi.sta.bad)
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>>>> Signal(system.wifi.sta.start)
>>>>>> 2021-03-12 14:53:04.732 CET D (984582) events: 
>>>>>> Signal(system.modem.down)
>>>>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM down 
>>>>>> (with WIFI client down): network connectivity has been lost
>>>>>> 2021-03-12 14:53:04.742 CET D (984592) events: 
>>>>>> Signal(system.modem.down)
>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events: 
>>>>>> Signal(system.wifi.ap.start)
>>>>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI access 
>>>>>> point is up
>>>>>> 2021-03-12 14:53:26.802 CET E (1006652) events: SignalEvent: 
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 23 
>>>>>> sec), event 'ticker.1' dropped
>>>>>> 2021-03-12 14:53:27.802 CET E (1007652) events: SignalEvent: 
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 24 
>>>>>> sec), event 'ticker.1' dropped
>>>>>> 2021-03-12 14:53:28.802 CET E (1008652) events: SignalEvent: 
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 25 
>>>>>> sec), event 'ticker.1' dropped
>>>>>> …and so on until
>>>>>> 2021-03-12 14:54:01.802 CET E (1041652) events: SignalEvent: lost 
>>>>>> important event => aborting
>>>>>>
>>>>>>
>>>>>> I need my car now, so will switch back to master for now.
>>>>>>
>>>>>> Mark, if you've got specific debug logs I should fetch on the 
>>>>>> next try, tell me.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the network 
>>>>>>> hang immediately after booting and logging into the web gui.
>>>>>>>
>>>>>>> But I see now my problem is likely that I'm not using the right 
>>>>>>> esp-idf (duh). Is there a way I can have master build using 
>>>>>>> ~/esp/esp-idf and have for-v3.3 use a different path?)
>>>>>>>
>>>>>>>         Craig
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com 
>>>>>>> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev 
>>>>>>> <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>
>>>>>> -- 
>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>> -- 
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>> -- 
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/e2d1723b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/e2d1723b/attachment-0001.sig>


More information about the OvmsDev mailing list