[Ovmsdev] Branch for-v3.3 network issues
Mark Webb-Johnson
mark at webb-johnson.net
Mon Mar 22 14:15:07 HKT 2021
Not sure how to resolve this.
OvmsSyncHttpClient is currently used in commands from ovms_plugins and ovms_ota.
I could bring back the OvmsHttpClient blocking (non-mongoose) implementation, but I don’t think that would address the core problem here:
Inside a mongoose callback (inside the mongoose networking task), we are making blocking calls (and in particular calls that could block for several tens of seconds).
But fundamentally is it ok to block the mongoose networking task for extended periods during a mongoose event callback?
Mark
> On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter at expeedo.de> wrote:
>
> Signed PGP part
> I've found opening the web UI firmware page or calling "ota status" via ssh to consistently deadlock the network on my module.
>
> I (130531) webserver: HTTP GET /cfg/firmware
> D (130531) http: OvmsSyncHttpClient: Connect to ovms.dexters-web.de:80
> D (130541) http: OvmsSyncHttpClient: waiting for completion
>
> After that log message, the network is dead, and the netmanager also doesn't respond:
>
> OVMS# network list
> ERROR: job failed
> D (183241) netmanager: send cmd 1 from 0x3ffe7054
> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>
> The interfaces seem to be registered and online, but nothing gets in or out:
>
> OVMS# network status
> Interface#3: pp3 (ifup=1 linkup=1)
> IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>
> Interface#2: ap2 (ifup=1 linkup=1)
> IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>
> Interface#1: st1 (ifup=1 linkup=1)
> IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>
> DNS: 192.168.2.1
>
> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway 192.168.2.1)
>
>
> A couple of minutes later, server-v2 recognizes the stale connection and issues a network restart, which fails resulting in the same behaviour as shown below with finally forced reboot by loss of an important event.
>
> Doing "ota status" from USB works normally, so this looks like OvmsSyncHttpClient not being able to run from within a mongoose client.
>
> Regards,
> Michael
>
>
> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>> Tried to repeat this, but not having much success. Here is my car module, with network still up:
>>
>> OVMS# boot status
>> Last boot was 262355 second(s) ago
>>
>> I did manage to catch one network related crash after repeatedly disconnecting and reconnecting the cellular antenna. This was:
>>
>> I (3717989) cellular: PPP Connection disconnected
>> Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
>>
>> 0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate() (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>> 707 if ((pri->name[0]==search[0])&&
>>
>> 0x400ed360 is in OvmsMetricString::SetValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>> 1357 void OvmsMetricString::SetValue(std::string value)
>>
>> 0x4008bdad is at ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>
>> 0x4008bdd1 is at ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>
>> 0x400fe886 is in OvmsNetManager::ModemDown(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>> 522 PrioritiseAndIndicate();
>>
>> 0x400fd752 is in std::_Function_handler<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*), std::_Bind<std::_Mem_fn<void (OvmsNetManager::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*)> (OvmsNetManager*, std::_Placeholder<1>, std::_Placeholder<2>)> >::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, void*&&) (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>> 600 { return (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>
>> 0x400f512e is in std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*)>::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*) const (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>> 2271 return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
>>
>> 0x400f52f1 is in OvmsEvents::HandleQueueSignalEvent(event_queue_t*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>> 283 m_current_callback->m_callback(m_current_event, msg->body.signal.data);
>>
>> 0x400f53d8 is in OvmsEvents::EventTask() (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>> 237 HandleQueueSignalEvent(&msg);
>>
>> 0x400f545d is in EventLaunchTask(void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>> 80 me->EventTask();
>>
>> My for_v3.3 branch does include the preliminary changes to support the wifi at 20MHz bandwidth, and perhaps those could be affecting things. I do notice that if I ‘power wifi off’, then ‘wifi mode client’, it can connect to the station, but not get an IP address. I’ve just tried to merge in the latest fixes to that, and rebuilt a release. I will continue to test with that.
>>
>> Regards, Mark.
>>
>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>
>>> Signed PGP part
>>> I just tried switching to for-v3.3 in my car module after tests on my desk module were OK, and I've run into the very same problem with for-v3.3. So the issue isn't related to esp-idf.
>>>
>>> The network only occasionally starts normally, but even then all connectivity is lost after a couple of minutes.
>>>
>>> The stale connection watchdog in server-v2 triggers a network restart, but that also doesn't seem to succeed:
>>>
>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: Detected stale connection (issue #241), restarting network
>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping WIFI station
>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> init (0)
>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total sleep time: 831205045 us / 975329961 us
>>>
>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering down WIFI driver
>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc rx mblock:16
>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering up WIFI driver
>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, erase log key successfully, reinit nvs log
>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver task: 3ffd4d84, prio:23, stack:3584, core=0
>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware version: 30f9e79
>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS flash: enabled
>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano formating: disabled
>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame dynamic rx buffer num: 16
>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init management frame dynamic rx buffer num: 16
>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init management short buffer num: 32
>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx buffer num: 16
>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer size: 2212
>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer num: 16
>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx buffer num: 16
>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power save buffer number: 8
>>> 2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: Restart
>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State: Enter PowerOffOn state
>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: StatusCallBack: User Interrupt
>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP connection has been closed
>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: StatusCallBack: User Interrupt
>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP connection has been closed
>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown (direct)
>>> 2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: Power Cycle
>>> 2021-03-12 14:53:04.682 CET D (984532) events: Signal(system.wifi.down)
>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client stop
>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager: Inconsistent state: no interface of type 'pp' found
>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client down (with MODEM up): reconfigured for MODEM priority
>>> 2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.event)
>>> 2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.wifi.sta.disconnected)
>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager: Inconsistent state: no interface of type 'pp' found
>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA disconnected with reason 8 = ASSOC_LEAVE
>>> 2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.event)
>>> 2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.wifi.sta.stop)
>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager: Inconsistent state: no interface of type 'pp' found
>>> 2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.event)
>>> 2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.wifi.ap.stop)
>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager: Inconsistent state: no interface of type 'pp' found
>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI access point is down
>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(network.wifi.sta.bad)
>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.event)
>>> 2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.wifi.sta.start)
>>> 2021-03-12 14:53:04.732 CET D (984582) events: Signal(system.modem.down)
>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM down (with WIFI client down): network connectivity has been lost
>>> 2021-03-12 14:53:04.742 CET D (984592) events: Signal(system.modem.down)
>>> 2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.event)
>>> 2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.wifi.ap.start)
>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI access point is up
>>> 2021-03-12 14:53:26.802 CET E (1006652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 23 sec), event 'ticker.1' dropped
>>> 2021-03-12 14:53:27.802 CET E (1007652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 24 sec), event 'ticker.1' dropped
>>> 2021-03-12 14:53:28.802 CET E (1008652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 25 sec), event 'ticker.1' dropped
>>> …and so on until
>>> 2021-03-12 14:54:01.802 CET E (1041652) events: SignalEvent: lost important event => aborting
>>>
>>>
>>> I need my car now, so will switch back to master for now.
>>>
>>> Mark, if you've got specific debug logs I should fetch on the next try, tell me.
>>>
>>> Regards,
>>> Michael
>>>
>>>
>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the network hang immediately after booting and logging into the web gui.
>>>>
>>>> But I see now my problem is likely that I'm not using the right esp-idf (duh). Is there a way I can have master build using ~/esp/esp-idf and have for-v3.3 use a different path?)
>>>>
>>>> Craig
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>
>>> --
>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>
> --
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/1e502137/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210322/1e502137/attachment-0001.sig>
More information about the OvmsDev
mailing list