On 22 Mar 2021, at 6:06 PM, Michael Balzer <dexter@expeedo.de> wrote:Signed PGP partIn master, running commands via ssh or server-v2 block, because these are running synchronously in the mongoose context.
Running commands via web doesn't block, as the webcommand class starts a separate task for each execution.
The firmware config page does a synchronous call to MyOTA.GetStatus(), so that call is executed in the mongoose context. It still works in master, just needs a second or two to fetch the version file.
Regards,
Michael
Am 22.03.21 um 10:38 schrieb Mark Webb-Johnson:
In master branch, at the moment, if a command is run from the web shell (or server v2), surely the mongoose task will block as the web server / server v2 blocks waiting for the command to run to completion?
Doesn’t necessarily need to be a networking command. Something long running like the string speed tests.
In v3.3 I can easily detect the task wait being requested in the http library (by seeing if current task id == mongoose task), and fail (which I should do anyway). But I am more concerned with the general case now (which I think may be wrong in both master and for-v3.3).
Regards, Mark
On 22 Mar 2021, at 5:22 PM, Michael Balzer <dexter@expeedo.de> wrote:
I think we must avoid blocking the Mongoose task, as that's the central network dispatcher.
Chris had implemented a workaround in one of his PRs that could allow that to be done temporarily by running a local Mongoose main loop during a synchronous operation, but I still see potential issues from that, as it wasn't the standard handling as done by the task, and as it may need to recurse.
Maybe the old OvmsHttpClient using socket I/O is the right way for synchronous network operations?
Regards,
Michael
Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
Not sure how to resolve this.
OvmsSyncHttpClient is currently used in commands from ovms_plugins and ovms_ota.
I could bring back the OvmsHttpClient blocking (non-mongoose) implementation, but I don’t think that would address the core problem here:
Inside a mongoose callback (inside the mongoose networking task), we are making blocking calls (and in particular calls that could block for several tens of seconds).
But fundamentally is it ok to block the mongoose networking task for extended periods during a mongoose event callback?
Mark
On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP partI've found opening the web UI firmware page or calling "ota status" via ssh to consistently deadlock the network on my module.
I (130531) webserver: HTTP GET /cfg/firmware
D (130531) http: OvmsSyncHttpClient: Connect to ovms.dexters-web.de:80
D (130541) http: OvmsSyncHttpClient: waiting for completion
After that log message, the network is dead, and the netmanager also doesn't respond:
OVMS# network list
ERROR: job failed
D (183241) netmanager: send cmd 1 from 0x3ffe7054
W (193241) netmanager: ExecuteJob: cmd 1: timeout
The interfaces seem to be registered and online, but nothing gets in or out:
OVMS# network status
Interface#3: pp3 (ifup=1 linkup=1)
IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
Interface#2: ap2 (ifup=1 linkup=1)
IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
Interface#1: st1 (ifup=1 linkup=1)
IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
DNS: 192.168.2.1
Default Interface: st1 (192.168.2.106/255.255.255.0 gateway 192.168.2.1)
A couple of minutes later, server-v2 recognizes the stale connection and issues a network restart, which fails resulting in the same behaviour as shown below with finally forced reboot by loss of an important event.
Doing "ota status" from USB works normally, so this looks like OvmsSyncHttpClient not being able to run from within a mongoose client.
Regards,
Michael
Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
Tried to repeat this, but not having much success. Here is my car module, with network still up:
OVMS# boot statusLast boot was 262355 second(s) ago
I did manage to catch one network related crash after repeatedly disconnecting and reconnecting the cellular antenna. This was:
I (3717989) cellular: PPP Connection disconnectedGuru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate() (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).707 if ((pri->name[0]==search[0])&&
0x400ed360 is in OvmsMetricString::SetValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).1357 void OvmsMetricString::SetValue(std::string value)
0x4008bdad is at ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
0x4008bdd1 is at ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
0x400fe886 is in OvmsNetManager::ModemDown(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).522 PrioritiseAndIndicate();
0x400fd752 is in std::_Function_handler<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*), std::_Bind<std::_Mem_fn<void (OvmsNetManager::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*)> (OvmsNetManager*, std::_Placeholder<1>, std::_Placeholder<2>)> >::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, void*&&) (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).600 { return (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
0x400f512e is in std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*)>::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*) const (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).2271 return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
0x400f52f1 is in OvmsEvents::HandleQueueSignalEvent(event_queue_t*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).283 m_current_callback->m_callback(m_current_event, msg->body.signal.data);
0x400f53d8 is in OvmsEvents::EventTask() (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).237 HandleQueueSignalEvent(&msg);
0x400f545d is in EventLaunchTask(void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).80 me->EventTask();
My for_v3.3 branch does include the preliminary changes to support the wifi at 20MHz bandwidth, and perhaps those could be affecting things. I do notice that if I ‘power wifi off’, then ‘wifi mode client’, it can connect to the station, but not get an IP address. I’ve just tried to merge in the latest fixes to that, and rebuilt a release. I will continue to test with that.
Regards, Mark.
On 12 Mar 2021, at 10:32 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP partI just tried switching to for-v3.3 in my car module after tests on my desk module were OK, and I've run into the very same problem with for-v3.3. So the issue isn't related to esp-idf.
The network only occasionally starts normally, but even then all connectivity is lost after a couple of minutes.
The stale connection watchdog in server-v2 triggers a network restart, but that also doesn't seem to succeed:
2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: Detected stale connection (issue #241), restarting network
2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping WIFI station
2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> init (0)
2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total sleep time: 831205045 us / 975329961 us
2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering down WIFI driver
2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc rx mblock:16
2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering up WIFI driver
2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, erase log key successfully, reinit nvs log
2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver task: 3ffd4d84, prio:23, stack:3584, core=0
2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware version: 30f9e79
2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS flash: enabled
2021-03-12 14:53:01.912 CET I (981762) wifi:config nano formating: disabled
2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame dynamic rx buffer num: 16
2021-03-12 14:53:01.912 CET I (981762) wifi:Init management frame dynamic rx buffer num: 16
2021-03-12 14:53:01.922 CET I (981772) wifi:Init management short buffer num: 32
2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx buffer num: 16
2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer size: 2212
2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx buffer num: 16
2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx buffer num: 16
2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
2021-03-12 14:53:02.652 CET I (982502) wifi:Total power save buffer number: 8
2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: Restart
2021-03-12 14:53:02.662 CET I (982512) cellular: State: Enter PowerOffOn state
2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: StatusCallBack: User Interrupt
2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP connection has been closed
2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting down (hard)...
2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: StatusCallBack: User Interrupt
2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP connection has been closed
2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown (direct)
2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: Power Cycle
2021-03-12 14:53:04.682 CET D (984532) events: Signal(system.wifi.down)
2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client stop
2021-03-12 14:53:04.682 CET E (984532) netmanager: Inconsistent state: no interface of type 'pp' found
2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI client down (with MODEM up): reconfigured for MODEM priority
2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.event)
2021-03-12 14:53:04.692 CET D (984542) events: Signal(system.wifi.sta.disconnected)
2021-03-12 14:53:04.692 CET E (984542) netmanager: Inconsistent state: no interface of type 'pp' found
2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA disconnected with reason 8 = ASSOC_LEAVE
2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.event)
2021-03-12 14:53:04.702 CET D (984552) events: Signal(system.wifi.sta.stop)
2021-03-12 14:53:04.702 CET E (984552) netmanager: Inconsistent state: no interface of type 'pp' found
2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.event)
2021-03-12 14:53:04.712 CET D (984562) events: Signal(system.wifi.ap.stop)
2021-03-12 14:53:04.712 CET E (984562) netmanager: Inconsistent state: no interface of type 'pp' found
2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI access point is down
2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
2021-03-12 14:53:04.722 CET D (984572) events: Signal(network.wifi.sta.bad)
2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.event)
2021-03-12 14:53:04.722 CET D (984572) events: Signal(system.wifi.sta.start)
2021-03-12 14:53:04.732 CET D (984582) events: Signal(system.modem.down)
2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM down (with WIFI client down): network connectivity has been lost
2021-03-12 14:53:04.742 CET D (984592) events: Signal(system.modem.down)
2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.event)
2021-03-12 14:53:04.752 CET D (984602) events: Signal(system.wifi.ap.start)
2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI access point is up
2021-03-12 14:53:26.802 CET E (1006652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 23 sec), event 'ticker.1' dropped
2021-03-12 14:53:27.802 CET E (1007652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 24 sec), event 'ticker.1' dropped
2021-03-12 14:53:28.802 CET E (1008652) events: SignalEvent: queue overflow (running system.wifi.ap.start->netmanager for 25 sec), event 'ticker.1' dropped
…and so on until
2021-03-12 14:54:01.802 CET E (1041652) events: SignalEvent: lost important event => aborting
I need my car now, so will switch back to master for now.
Mark, if you've got specific debug logs I should fetch on the next try, tell me.
Regards,
Michael
Am 12.03.21 um 05:47 schrieb Craig Leres:
I just updated to 3.2.016-68-g8e10c6b7 and still get the network hang immediately after booting and logging into the web gui.
But I see now my problem is likely that I'm not using the right esp-idf (duh). Is there a way I can have master build using ~/esp/esp-idf and have for-v3.3 use a different path?)
Craig
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26