[Ovmsdev] Branch for-v3.3 network issues

Michael Balzer dexter at expeedo.de
Mon Mar 22 17:22:21 HKT 2021


I think we must avoid blocking the Mongoose task, as that's the central 
network dispatcher.

Chris had implemented a workaround in one of his PRs that could allow 
that to be done temporarily by running a local Mongoose main loop during 
a synchronous operation, but I still see potential issues from that, as 
it wasn't the standard handling as done by the task, and as it may need 
to recurse.

Maybe the old OvmsHttpClient using socket I/O is the right way for 
synchronous network operations?

Regards,
Michael


Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
> Not sure how to resolve this.
>
> OvmsSyncHttpClient is currently used in commands from ovms_plugins and 
> ovms_ota.
>
> I could bring back the OvmsHttpClient blocking (non-mongoose) 
> implementation, but I don’t think that would address the core problem 
> here:
>
>     Inside a mongoose callback (inside the mongoose networking task),
>     we are making blocking calls (and in particular calls that could
>     block for several tens of seconds).
>
>
> But fundamentally is it ok to block the mongoose networking task for 
> extended periods during a mongoose event callback?
>
> Mark
>
>> On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter at expeedo.de 
>> <mailto:dexter at expeedo.de>> wrote:
>>
>> Signed PGP part
>> I've found opening the web UI firmware page or calling "ota status" 
>> via ssh to consistently deadlock the network on my module.
>>
>> I (130531) webserver: HTTP GET /cfg/firmware
>> D (130531) http: OvmsSyncHttpClient: Connect to 
>> ovms.dexters-web.de:80 <http://ovms.dexters-web.de:80>
>> D (130541) http: OvmsSyncHttpClient: waiting for completion
>>
>> After that log message, the network is dead, and the netmanager also 
>> doesn't respond:
>>
>> OVMS# network list
>> ERROR: job failed
>> D (183241) netmanager: send cmd 1 from 0x3ffe7054
>> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>>
>> The interfaces seem to be registered and online, but nothing gets in 
>> or out:
>>
>> OVMS# network status
>> Interface#3: pp3 (ifup=1 linkup=1)
>>   IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>>
>> Interface#2: ap2 (ifup=1 linkup=1)
>>   IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>>
>> Interface#1: st1 (ifup=1 linkup=1)
>>   IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>>
>> DNS: 192.168.2.1
>>
>> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway 192.168.2.1)
>>
>>
>> A couple of minutes later, server-v2 recognizes the stale connection 
>> and issues a network restart, which fails resulting in the same 
>> behaviour as shown below with finally forced reboot by loss of an 
>> important event.
>>
>> Doing "ota status" from USB works normally, so this looks like 
>> OvmsSyncHttpClient not being able to run from within a mongoose client.
>>
>> Regards,
>> Michael
>>
>>
>> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>>> Tried to repeat this, but not having much success. Here is my car 
>>> module, with network still up:
>>>
>>>     OVMS# boot status
>>>     Last boot was 262355 second(s) ago
>>>
>>>
>>> I did manage to catch one network related crash after repeatedly 
>>> disconnecting and reconnecting the cellular antenna. This was:
>>>
>>>     I (3717989) cellular: PPP Connection disconnected
>>>     Guru Meditation Error: Core  1 panic'ed (LoadProhibited).
>>>     Exception was unhandled.
>>>
>>>     0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate()
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>>>
>>>         707 if ((pri->name[0]==search[0])&&
>>>
>>>
>>>     0x400ed360 is in
>>>     OvmsMetricString::SetValue(std::__cxx11::basic_string<char,
>>>     std::char_traits<char>, std::allocator<char> >)
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>>>
>>>         1357    void OvmsMetricString::SetValue(std::string value)
>>>
>>>
>>>     0x4008bdad is at
>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>>
>>>
>>>     0x4008bdd1 is at
>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>>
>>>
>>>     0x400fe886 is in
>>>     OvmsNetManager::ModemDown(std::__cxx11::basic_string<char,
>>>     std::char_traits<char>, std::allocator<char> >, void*)
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>>>
>>>         522 PrioritiseAndIndicate();
>>>
>>>
>>>     0x400fd752 is in std::_Function_handler<void
>>>     (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>     std::allocator<char> >, void*), std::_Bind<std::_Mem_fn<void
>>>     (OvmsNetManager::*)(std::__cxx11::basic_string<char,
>>>     std::char_traits<char>, std::allocator<char> >, void*)>
>>>     (OvmsNetManager*, std::_Placeholder<1>, std::_Placeholder<2>)>
>>>     >::_M_invoke(std::_Any_data const&,
>>>     std::__cxx11::basic_string<char, std::char_traits<char>,
>>>     std::allocator<char> >&&, void*&&)
>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>>>
>>>         600             { return
>>>         (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>>
>>>
>>>     0x400f512e is in std::function<void
>>>     (std::__cxx11::basic_string<char, std::char_traits<char>,
>>>     std::allocator<char> >,
>>>     void*)>::operator()(std::__cxx11::basic_string<char,
>>>     std::char_traits<char>, std::allocator<char> >, void*) const
>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>>>
>>>         2271          return _M_invoker(_M_functor,
>>>         std::forward<_ArgTypes>(__args)...);
>>>
>>>
>>>     0x400f52f1 is in
>>>     OvmsEvents::HandleQueueSignalEvent(event_queue_t*)
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>>>
>>>         283 m_current_callback->m_callback(m_current_event,
>>>         msg->body.signal.data);
>>>
>>>
>>>     0x400f53d8 is in OvmsEvents::EventTask()
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>>>
>>>         237 HandleQueueSignalEvent(&msg);
>>>
>>>
>>>     0x400f545d is in EventLaunchTask(void*)
>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>>>
>>>         80  me->EventTask();
>>>
>>>
>>> My for_v3.3 branch does include the preliminary changes to support 
>>> the wifi at 20MHz bandwidth, and perhaps those could be affecting 
>>> things. I do notice that if I ‘power wifi off’, then ‘wifi mode 
>>> client’, it can connect to the station, but not get an IP address. 
>>> I’ve just tried to merge in the latest fixes to that, and rebuilt a 
>>> release. I will continue to test with that.
>>>
>>> Regards, Mark.
>>>
>>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer <dexter at expeedo.de 
>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>
>>>> Signed PGP part
>>>> I just tried switching to for-v3.3 in my car module after tests on 
>>>> my desk module were OK, and I've run into the very same problem 
>>>> with for-v3.3. So the issue isn't related to esp-idf.
>>>>
>>>> The network only occasionally starts normally, but even then all 
>>>> connectivity is lost after a couple of minutes.
>>>>
>>>> The stale connection watchdog in server-v2 triggers a network 
>>>> restart, but that also doesn't seem to succeed:
>>>>
>>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: Detected 
>>>> stale connection (issue #241), restarting network
>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping WIFI station
>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> init (0)
>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total sleep 
>>>> time: 831205045 us / 975329961 us
>>>>
>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, old:<1,1>, 
>>>> ap:<1,1>, sta:<1,1>, prof:1
>>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering down 
>>>> WIFI driver
>>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc rx mblock:16
>>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering up WIFI 
>>>> driver
>>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, erase log 
>>>> key successfully, reinit nvs log
>>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver task: 
>>>> 3ffd4d84, prio:23, stack:3584, core=0
>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address 
>>>> is not set, read default base MAC address from BLK0 of EFUSE
>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address 
>>>> is not set, read default base MAC address from BLK0 of EFUSE
>>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware version: 
>>>> 30f9e79
>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS flash: enabled
>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano formating: 
>>>> disabled
>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame dynamic 
>>>> rx buffer num: 16
>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init management frame 
>>>> dynamic rx buffer num: 16
>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init management short 
>>>> buffer num: 32
>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx buffer 
>>>> num: 16
>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer 
>>>> size: 2212
>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer 
>>>> num: 16
>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx buffer 
>>>> num: 16
>>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta 
>>>> (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power save buffer 
>>>> number: 8
>>>> 2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: Restart
>>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State: Enter 
>>>> PowerOffOn state
>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: StatusCallBack: 
>>>> User Interrupt
>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP connection has 
>>>> been closed
>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: StatusCallBack: 
>>>> User Interrupt
>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP connection has 
>>>> been closed
>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown (direct)
>>>> 2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: Power Cycle
>>>> 2021-03-12 14:53:04.682 CET D (984532) events: Signal(system.wifi.down)
>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client stop
>>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager: Inconsistent 
>>>> state: no interface of type 'pp' found
>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client down 
>>>> (with MODEM up): reconfigured for MODEM priority
>>>> 2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.event)
>>>> 2021-03-12 14:53:04.692 CET D (984542) events: 
>>>> Signal(system.wifi.sta.disconnected)
>>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager: Inconsistent 
>>>> state: no interface of type 'pp' found
>>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA disconnected 
>>>> with reason 8 = ASSOC_LEAVE
>>>> 2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.event)
>>>> 2021-03-12 14:53:04.702 CET D (984552) events: 
>>>> Signal(system.wifi.sta.stop)
>>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager: Inconsistent 
>>>> state: no interface of type 'pp' found
>>>> 2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.event)
>>>> 2021-03-12 14:53:04.712 CET D (984562) events: 
>>>> Signal(system.wifi.ap.stop)
>>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager: Inconsistent 
>>>> state: no interface of type 'pp' found
>>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI access 
>>>> point is down
>>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>> Signal(network.wifi.sta.bad)
>>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.event)
>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>> Signal(system.wifi.sta.start)
>>>> 2021-03-12 14:53:04.732 CET D (984582) events: 
>>>> Signal(system.modem.down)
>>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM down (with 
>>>> WIFI client down): network connectivity has been lost
>>>> 2021-03-12 14:53:04.742 CET D (984592) events: 
>>>> Signal(system.modem.down)
>>>> 2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.event)
>>>> 2021-03-12 14:53:04.752 CET D (984602) events: 
>>>> Signal(system.wifi.ap.start)
>>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI access 
>>>> point is up
>>>> 2021-03-12 14:53:26.802 CET E (1006652) events: SignalEvent: queue 
>>>> overflow (running system.wifi.ap.start->netmanager for 23 sec), 
>>>> event 'ticker.1' dropped
>>>> 2021-03-12 14:53:27.802 CET E (1007652) events: SignalEvent: queue 
>>>> overflow (running system.wifi.ap.start->netmanager for 24 sec), 
>>>> event 'ticker.1' dropped
>>>> 2021-03-12 14:53:28.802 CET E (1008652) events: SignalEvent: queue 
>>>> overflow (running system.wifi.ap.start->netmanager for 25 sec), 
>>>> event 'ticker.1' dropped
>>>> …and so on until
>>>> 2021-03-12 14:54:01.802 CET E (1041652) events: SignalEvent: lost 
>>>> important event => aborting
>>>>
>>>>
>>>> I need my car now, so will switch back to master for now.
>>>>
>>>> Mark, if you've got specific debug logs I should fetch on the next 
>>>> try, tell me.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the network 
>>>>> hang immediately after booting and logging into the web gui.
>>>>>
>>>>> But I see now my problem is likely that I'm not using the right 
>>>>> esp-idf (duh). Is there a way I can have master build using 
>>>>> ~/esp/esp-idf and have for-v3.3 use a different path?)
>>>>>
>>>>>         Craig
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev 
>>>>> <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>
>>>> -- 
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>> -- 
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/fd27bb10/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/fd27bb10/attachment-0001.sig>


More information about the OvmsDev mailing list