[Ovmsdev] Branch for-v3.3 network issues
Michael Balzer
dexter at expeedo.de
Mon Mar 22 18:06:58 HKT 2021
In master, running commands via ssh or server-v2 block, because these
are running synchronously in the mongoose context.
Running commands via web doesn't block, as the webcommand class starts a
separate task for each execution.
The firmware config page does a synchronous call to MyOTA.GetStatus(),
so that call is executed in the mongoose context. It still works in
master, just needs a second or two to fetch the version file.
Regards,
Michael
Am 22.03.21 um 10:38 schrieb Mark Webb-Johnson:
> In master branch, at the moment, if a command is run from the web
> shell (or server v2), surely the mongoose task will block as the web
> server / server v2 blocks waiting for the command to run to completion?
>
> Doesn’t necessarily need to be a networking command. Something long
> running like the string speed tests.
>
> In v3.3 I can easily detect the task wait being requested in the http
> library (by seeing if current task id == mongoose task), and fail
> (which I should do anyway). But I am more concerned with the general
> case now (which I think may be wrong in both master and for-v3.3).
>
> Regards, Mark
>
>> On 22 Mar 2021, at 5:22 PM, Michael Balzer <dexter at expeedo.de> wrote:
>>
>> I think we must avoid blocking the Mongoose task, as that's the
>> central network dispatcher.
>>
>> Chris had implemented a workaround in one of his PRs that could allow
>> that to be done temporarily by running a local Mongoose main loop
>> during a synchronous operation, but I still see potential issues from
>> that, as it wasn't the standard handling as done by the task, and as
>> it may need to recurse.
>>
>> Maybe the old OvmsHttpClient using socket I/O is the right way for
>> synchronous network operations?
>>
>> Regards,
>> Michael
>>
>>
>> Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
>>> Not sure how to resolve this.
>>>
>>> OvmsSyncHttpClient is currently used in commands from ovms_plugins
>>> and ovms_ota.
>>>
>>> I could bring back the OvmsHttpClient blocking (non-mongoose)
>>> implementation, but I don’t think that would address the core
>>> problem here:
>>>
>>> Inside a mongoose callback (inside the mongoose networking
>>> task), we are making blocking calls (and in particular calls
>>> that could block for several tens of seconds).
>>>
>>>
>>> But fundamentally is it ok to block the mongoose networking task for
>>> extended periods during a mongoose event callback?
>>>
>>> Mark
>>>
>>>> On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter at expeedo.de
>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>
>>>> Signed PGP part
>>>> I've found opening the web UI firmware page or calling "ota status"
>>>> via ssh to consistently deadlock the network on my module.
>>>>
>>>> I (130531) webserver: HTTP GET /cfg/firmware
>>>> D (130531) http: OvmsSyncHttpClient: Connect to
>>>> ovms.dexters-web.de:80 <http://ovms.dexters-web.de:80>
>>>> D (130541) http: OvmsSyncHttpClient: waiting for completion
>>>>
>>>> After that log message, the network is dead, and the netmanager
>>>> also doesn't respond:
>>>>
>>>> OVMS# network list
>>>> ERROR: job failed
>>>> D (183241) netmanager: send cmd 1 from 0x3ffe7054
>>>> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>>>>
>>>> The interfaces seem to be registered and online, but nothing gets
>>>> in or out:
>>>>
>>>> OVMS# network status
>>>> Interface#3: pp3 (ifup=1 linkup=1)
>>>> IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>>>>
>>>> Interface#2: ap2 (ifup=1 linkup=1)
>>>> IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>>>>
>>>> Interface#1: st1 (ifup=1 linkup=1)
>>>> IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>>>>
>>>> DNS: 192.168.2.1
>>>>
>>>> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway
>>>> 192.168.2.1)
>>>>
>>>>
>>>> A couple of minutes later, server-v2 recognizes the stale
>>>> connection and issues a network restart, which fails resulting in
>>>> the same behaviour as shown below with finally forced reboot by
>>>> loss of an important event.
>>>>
>>>> Doing "ota status" from USB works normally, so this looks like
>>>> OvmsSyncHttpClient not being able to run from within a mongoose client.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>>>>> Tried to repeat this, but not having much success. Here is my car
>>>>> module, with network still up:
>>>>>
>>>>> OVMS# boot status
>>>>> Last boot was 262355 second(s) ago
>>>>>
>>>>>
>>>>> I did manage to catch one network related crash after repeatedly
>>>>> disconnecting and reconnecting the cellular antenna. This was:
>>>>>
>>>>> I (3717989) cellular: PPP Connection disconnected
>>>>> Guru Meditation Error: Core 1 panic'ed (LoadProhibited).
>>>>> Exception was unhandled.
>>>>>
>>>>> 0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate()
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>>>>>
>>>>> 707 if ((pri->name[0]==search[0])&&
>>>>>
>>>>>
>>>>> 0x400ed360 is in
>>>>> OvmsMetricString::SetValue(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >)
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>>>>>
>>>>> 1357 void OvmsMetricString::SetValue(std::string value)
>>>>>
>>>>>
>>>>> 0x4008bdad is at
>>>>> ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>>>>
>>>>>
>>>>> 0x4008bdd1 is at
>>>>> ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>>>>
>>>>>
>>>>> 0x400fe886 is in
>>>>> OvmsNetManager::ModemDown(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >, void*)
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>>>>>
>>>>> 522 PrioritiseAndIndicate();
>>>>>
>>>>>
>>>>> 0x400fd752 is in std::_Function_handler<void
>>>>> (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> >, void*), std::_Bind<std::_Mem_fn<void
>>>>> (OvmsNetManager::*)(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >, void*)>
>>>>> (OvmsNetManager*, std::_Placeholder<1>, std::_Placeholder<2>)>
>>>>> >::_M_invoke(std::_Any_data const&,
>>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> >&&, void*&&)
>>>>> (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>>>>>
>>>>> 600 { return
>>>>> (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>>>>
>>>>>
>>>>> 0x400f512e is in std::function<void
>>>>> (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> >,
>>>>> void*)>::operator()(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >, void*) const
>>>>> (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>>>>>
>>>>> 2271 return _M_invoker(_M_functor,
>>>>> std::forward<_ArgTypes>(__args)...);
>>>>>
>>>>>
>>>>> 0x400f52f1 is in
>>>>> OvmsEvents::HandleQueueSignalEvent(event_queue_t*)
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>>>>>
>>>>> 283 m_current_callback->m_callback(m_current_event,
>>>>> msg->body.signal.data);
>>>>>
>>>>>
>>>>> 0x400f53d8 is in OvmsEvents::EventTask()
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>>>>>
>>>>> 237 HandleQueueSignalEvent(&msg);
>>>>>
>>>>>
>>>>> 0x400f545d is in EventLaunchTask(void*)
>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>>>>>
>>>>> 80 me->EventTask();
>>>>>
>>>>>
>>>>> My for_v3.3 branch does include the preliminary changes to support
>>>>> the wifi at 20MHz bandwidth, and perhaps those could be affecting
>>>>> things. I do notice that if I ‘power wifi off’, then ‘wifi mode
>>>>> client’, it can connect to the station, but not get an IP address.
>>>>> I’ve just tried to merge in the latest fixes to that, and rebuilt
>>>>> a release. I will continue to test with that.
>>>>>
>>>>> Regards, Mark.
>>>>>
>>>>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer <dexter at expeedo.de
>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>
>>>>>> Signed PGP part
>>>>>> I just tried switching to for-v3.3 in my car module after tests
>>>>>> on my desk module were OK, and I've run into the very same
>>>>>> problem with for-v3.3. So the issue isn't related to esp-idf.
>>>>>>
>>>>>> The network only occasionally starts normally, but even then all
>>>>>> connectivity is lost after a couple of minutes.
>>>>>>
>>>>>> The stale connection watchdog in server-v2 triggers a network
>>>>>> restart, but that also doesn't seem to succeed:
>>>>>>
>>>>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: Detected
>>>>>> stale connection (issue #241), restarting network
>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping WIFI
>>>>>> station
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> init (0)
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total sleep
>>>>>> time: 831205045 us / 975329961 us
>>>>>>
>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, old:<1,1>,
>>>>>> ap:<1,1>, sta:<1,1>, prof:1
>>>>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>>>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering down
>>>>>> WIFI driver
>>>>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc rx
>>>>>> mblock:16
>>>>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering up
>>>>>> WIFI driver
>>>>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, erase
>>>>>> log key successfully, reinit nvs log
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver task:
>>>>>> 3ffd4d84, prio:23, stack:3584, core=0
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC
>>>>>> address is not set, read default base MAC address from BLK0 of EFUSE
>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC
>>>>>> address is not set, read default base MAC address from BLK0 of EFUSE
>>>>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware
>>>>>> version: 30f9e79
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS flash: enabled
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano
>>>>>> formating: disabled
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame
>>>>>> dynamic rx buffer num: 16
>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init management frame
>>>>>> dynamic rx buffer num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init management short
>>>>>> buffer num: 32
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx
>>>>>> buffer num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer
>>>>>> size: 2212
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer
>>>>>> num: 16
>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx
>>>>>> buffer num: 16
>>>>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta
>>>>>> (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>>>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power save
>>>>>> buffer number: 8
>>>>>> 2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: Restart
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State: Enter
>>>>>> PowerOffOn state
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down
>>>>>> (hard)...
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: StatusCallBack:
>>>>>> User Interrupt
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP connection
>>>>>> has been closed
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down
>>>>>> (hard)...
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: StatusCallBack:
>>>>>> User Interrupt
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP connection
>>>>>> has been closed
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown (direct)
>>>>>> 2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: Power
>>>>>> Cycle
>>>>>> 2021-03-12 14:53:04.682 CET D (984532) events:
>>>>>> Signal(system.wifi.down)
>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client stop
>>>>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager: Inconsistent
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client
>>>>>> down (with MODEM up): reconfigured for MODEM priority
>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events:
>>>>>> Signal(system.wifi.sta.disconnected)
>>>>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager: Inconsistent
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA
>>>>>> disconnected with reason 8 = ASSOC_LEAVE
>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events:
>>>>>> Signal(system.wifi.sta.stop)
>>>>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager: Inconsistent
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events:
>>>>>> Signal(system.wifi.ap.stop)
>>>>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager: Inconsistent
>>>>>> state: no interface of type 'pp' found
>>>>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI access
>>>>>> point is down
>>>>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events:
>>>>>> Signal(network.wifi.sta.bad)
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events:
>>>>>> Signal(system.wifi.sta.start)
>>>>>> 2021-03-12 14:53:04.732 CET D (984582) events:
>>>>>> Signal(system.modem.down)
>>>>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM down
>>>>>> (with WIFI client down): network connectivity has been lost
>>>>>> 2021-03-12 14:53:04.742 CET D (984592) events:
>>>>>> Signal(system.modem.down)
>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.event)
>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events:
>>>>>> Signal(system.wifi.ap.start)
>>>>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI access
>>>>>> point is up
>>>>>> 2021-03-12 14:53:26.802 CET E (1006652) events: SignalEvent:
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 23
>>>>>> sec), event 'ticker.1' dropped
>>>>>> 2021-03-12 14:53:27.802 CET E (1007652) events: SignalEvent:
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 24
>>>>>> sec), event 'ticker.1' dropped
>>>>>> 2021-03-12 14:53:28.802 CET E (1008652) events: SignalEvent:
>>>>>> queue overflow (running system.wifi.ap.start->netmanager for 25
>>>>>> sec), event 'ticker.1' dropped
>>>>>> …and so on until
>>>>>> 2021-03-12 14:54:01.802 CET E (1041652) events: SignalEvent: lost
>>>>>> important event => aborting
>>>>>>
>>>>>>
>>>>>> I need my car now, so will switch back to master for now.
>>>>>>
>>>>>> Mark, if you've got specific debug logs I should fetch on the
>>>>>> next try, tell me.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the network
>>>>>>> hang immediately after booting and logging into the web gui.
>>>>>>>
>>>>>>> But I see now my problem is likely that I'm not using the right
>>>>>>> esp-idf (duh). Is there a way I can have master build using
>>>>>>> ~/esp/esp-idf and have for-v3.3 use a different path?)
>>>>>>>
>>>>>>> Craig
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>> <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>
>>>>>> --
>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>> --
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>> --
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/e2d1723b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/e2d1723b/attachment-0001.sig>
More information about the OvmsDev
mailing list