[Ovmsdev] Branch for-v3.3 network issues

Michael Balzer dexter at expeedo.de
Fri Sep 17 15:24:11 HKT 2021


Mark,

I've been running the new for-v3.3 version this week on both of my 
modules without ppp issues.

Duktape still occasionally runs into the null/undefined issue with for…in:

https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-744005044

for…in normally doesn't throw an error even if you run over null or 
undefined.

I think both could still be the SPIRAM bug, now probably only occurring 
with very specific conditions. We build with LWIP using SPIRAM as well, 
so the PPP instance is allocated from SPIRAM also. Reallocating the 
instance on each new connect implies a higher chance of triggering the 
problem if it's address specific. The Duktape stack object addresses 
vary continuously with the running event handlers and user interactions, 
so that also has a high chance of occasionally triggering an address 
specific bug.

We need to test the revision 3 ESP32 on this.

Regards,
Michael


Am 09.09.21 um 02:31 schrieb Mark Webb-Johnson:
> Reviving this old topic still impacting us…
>
> Comparing the master vs for-v3.3 branches, the only thing that changed 
> related to ppp is that in 3.3 the ppp object is dynamically created 
> and destroyed, while in master it is statically allocated at boot. The 
> ppp code is so simple, and I can’t see how that could be a problem. 
> Perhaps related to position in memory, and some other memory corruption?
>
> Anyway, I changed it to not destroy the ppp object when the gsm 
> connection is lost, but merely shutdown the ppp (which is what master 
> branch does). While it is still dynamically allocated, it is no longer 
> as dynamic (being created just once at startup of the cellular 
> system). I’ve never managed to reliably repeat this problem in my 
> environment, but I think this should help.It has been running on my 
> desktop test unit for the past four days without issue.
>
> That code is committed now. I would appreciate it if others who saw 
> this problem could try again with this latest build of the for-v3.3 
> branch.
>
> Regards, Mark.
>
>> On 24 Mar 2021, at 3:53 PM, Mark Webb-Johnson <mark at webb-johnson.net 
>> <mailto:mark at webb-johnson.net>> wrote:
>>
>> Signed PGP part
>> Good grief, this is not so easy. Now we have:
>>
>>
>>     Guru Meditation Error: Core  1 panic'ed (LoadProhibited).
>>     Exception was unhandled.
>>
>>
>>     Core 1 register dump:
>>
>>         PC  : 0x40008044  PS      : 0x00060f30  A0    : 0x800fe2cc
>>          A1      : 0x3ffcaa90
>>
>>         A2  : 0x3f413acc  A3      : 0x00000046  A4    : 0x00e6807e
>>          A5      : 0x00000000
>>
>>         A6  : 0x00000000  A7      : 0x00000000  A8    : 0x00000010
>>          A9      : 0x00e6807e
>>
>>         A10 : 0x00000078  A11     : 0x00000009  A12   : 0x3ffcaa3f
>>          A13     : 0x00000032
>>
>>         A14 : 0x00000000  A15     : 0x3ffcaa48  SAR   : 0x00000004
>>          EXCCAUSE: 0x0000001c
>>
>>         EXCVADDR: 0x00e6807e  LBEG    : 0x4008bdad  LEND  :
>>         0x4008bdd1  LCOUNT  : 0x800f93f4
>>
>>
>>     ELF file SHA256: 74bb0a75eeb4578b
>>
>>     Backtrace: 0x40008044:0x3ffcaa90 0x400fe2c9:0x3ffcab20
>>     0x400fe412:0x3ffcabb0 0x402937b5:0x3ffcabd0
>>
>>
>>     0x400fe2c9 is in OvmsNetManager::DoSafePrioritiseAndIndicate()
>>     (/Users/hq.mark.johnson/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_netmanager.cpp:723).
>>     718   }
>>     719
>>     720 for (struct netif *pri = netif_list; pri != NULL; pri=pri->next)
>>     721   {
>>     722 ESP_EARLY_LOGI(TAG,"DoSafePrioritiseAndIndicate: interface
>>     %p",pri);
>>     723 ESP_EARLY_LOGI(TAG,"DoSafePrioritiseAndIndicate:   name  
>>      %s",pri->name);
>>     724   if ((pri->name[0]==search[0])&&
>>     725       (pri->name[1]==search[1]))
>>     726     {
>>     727     if (search[0] != m_previous_name[0] || search[1] !=
>>     m_previous_name[1])
>>     0x400fe412 is in SafePrioritiseAndIndicate(void*)
>>     (/Users/hq.mark.johnson/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_netmanager.cpp:676).
>>     671   }
>>     672 }
>>     673
>>     674void SafePrioritiseAndIndicate(void* ctx)
>>     675 {
>>     676 MyNetManager.DoSafePrioritiseAndIndicate();
>>     677 }
>>     678
>>     679void OvmsNetManager::PrioritiseAndIndicate()
>>     680 {
>>     0x402937b5 is in tcpip_thread
>>     (/Users/hq.mark.johnson/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:158).
>>     153     break;
>>     154#endif /* LWIP_TCPIP_TIMEOUT && LWIP_TIMERS */
>>     155
>>     156   case TCPIP_MSG_CALLBACK:
>>     157     LWIP_DEBUGF(TCPIP_DEBUG, ("tcpip_thread: CALLBACK %p\n",
>>     (void *)msg));
>>     158 msg->msg.cb.function(msg->msg.cb.ctx);
>>     159     memp_free(MEMP_TCPIP_MSG_API, msg);
>>     160     break;
>>     161
>>     162   case TCPIP_MSG_CALLBACK_STATIC:
>>
>>
>> So the issue is most likely corruption of the network interface 
>> structure, not thread safe traversal.
>>
>> I had added some ESP_EARLY_LOGI statements, so can see a little more 
>> of what is going on:
>>
>>     I (103202) gsm-ppp: Initialising...
>>     I (103212) events: Signal(system.modem.netmode)
>>     I (105902) netmanager: DoSafePrioritiseAndIndicate: start
>>     I (105902) netmanager: DoSafePrioritiseAndIndicate: connected wifi
>>     I (105912) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x3ffed6a0
>>     I (105912) netmanager: DoSafePrioritiseAndIndicate:   name  pp
>>     I (105922) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x3ffde854
>>     I (105932) netmanager: DoSafePrioritiseAndIndicate:   name  ap
>>     I (105932) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x3ffde640
>>     I (105942) netmanager: DoSafePrioritiseAndIndicate:   name  st
>>     I (105952) netmanager: DoSafePrioritiseAndIndicate: end
>>     I (105902) gsm-ppp: StatusCallBack: None
>>     I (105902) gsm-ppp: status_cb: Connected
>>     I (105902) gsm-ppp:    our_ipaddr  = 10.52.40.80
>>>>     I (3708442) cellular: PPP Connection disconnected
>>     I (3708442) cellular: PPP Connection disconnected
>>     I (3709212) netmanager: DoSafePrioritiseAndIndicate: start
>>     I (3709212) netmanager: DoSafePrioritiseAndIndicate: connected wifi
>>     I (3709212) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x3ffed6a0
>>     I (3709222) netmanager: DoSafePrioritiseAndIndicate:   name  pp
>>     I (3709222) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x30323930
>>     I (3709232) netmanager: DoSafePrioritiseAndIndicate:   name  f
>>     I (3709242) netmanager: DoSafePrioritiseAndIndicate: interface
>>     0x667fc000
>>     I (3709252) netmanager: DoSafePrioritiseAndIndicate:   name
>>     Guru Meditation Error: Core  1 panic'ed (Interrupt wdt timeout on
>>     CPU1)
>>
>>
>> Doesn’t help much, apart from confirm the corruption. Took about an 
>> hour to recreate the problem.
>>
>> I’ll keep looking.
>>
>> Regards, Mark.
>>
>>> On 23 Mar 2021, at 4:05 PM, Mark Webb-Johnson <mark at webb-johnson.net 
>>> <mailto:mark at webb-johnson.net>> wrote:
>>>
>>> Signed PGP part
>>> My attempt didn’t work (still crashes), so I’m now trying your 
>>> suggestion of wrapping PrioritiseAndIndicate() in 
>>> a tcpip_callback_with_block.
>>>
>>>  🤞🏻
>>>
>>> Regards, Mark.
>>>
>>>> On 23 Mar 2021, at 3:02 PM, Michael Balzer <dexter at expeedo.de 
>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>
>>>> Signed PGP part
>>>> Mark,
>>>>
>>>> regarding point 2: I've had the same issue with jobs that need to 
>>>> iterate over the mongoose connection list, I introduced the 
>>>> netmanager job queue for this to delegate these to the mongoose 
>>>> context.
>>>>
>>>> I remember seeing LwIP has a similar API while browsing the source… 
>>>> yes, found it: the "tcpip_callback…" functions, e.g.:
>>>>
>>>> /**
>>>>  * Call a specific function in the thread context of
>>>>  * tcpip_thread for easy access synchronization.
>>>>  * A function called in that way may access lwIP core code
>>>>  * without fearing concurrent access.
>>>>  *
>>>>  * @param function the function to call
>>>>  * @param ctx parameter passed to f
>>>>  * @param block 1 to block until the request is posted, 0 to 
>>>> non-blocking mode
>>>>  * @return ERR_OK if the function was called, another err_t if not
>>>>  */
>>>> err_t
>>>> tcpip_callback_with_block(tcpip_callback_fn function, void *ctx, 
>>>> u8_t block)
>>>>
>>>> So we probably can use this to execute PrioritiseAndIndicate() in 
>>>> the LwIP context.
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>>
>>>> Am 23.03.21 um 06:47 schrieb Mark Webb-Johnson:
>>>>> OK, some progress…
>>>>>
>>>>>  1. I’ve added a check in the OvmsSyncHttpClient code to refuse to
>>>>>     block while running as the netman (mongoose) task. This will
>>>>>     now simply fail the http connection, and log an error. Not
>>>>>     perfect, and not a solution to the core problem, but at least
>>>>>     it avoids a known problem.
>>>>>
>>>>>     I’m not sure of the best permanent solution to this. It seems
>>>>>     that we need a callback interface to run commands
>>>>>     asynchronously, and use that in mongoose event handlers.
>>>>>     Adding another mongoose event loop, or using a separate
>>>>>     networking socket with select(), just minimise the problem -
>>>>>     they don’t solve it. The core issue here is blocking during a
>>>>>     mongoose event delivery. That is going to pause all high level
>>>>>     networking.
>>>>>
>>>>>  2. I found a race condition in ovms_netmanager that seems nasty.
>>>>>     The new cellular code could raise duplicate modem.down
>>>>>     signals, picked up and handled in ovms_netmanager. As part of
>>>>>     that it calls a PrioritiseAndIndicate() function that iterates
>>>>>     over the network interface list (maintained by LWIP). If that
>>>>>     network interface list is modified (eg; removing an interface)
>>>>>     while it is being traversed, nasty crashes can happen. The
>>>>>     ‘fix’ I’ve done is again just a workaround to try to reduce
>>>>>     the duplicate signals and hence reduce the likelyhood of the
>>>>>     problem happening, but it won’t fix the core problem (that is
>>>>>     in both master and for-v3.3).
>>>>>
>>>>>     There is a netif_find function in LWIP, but (a) that requires
>>>>>     an interface number that we don’t have, and (b) doesn’t seem
>>>>>     to lock the list either.
>>>>>
>>>>>     Can’t think of an elegant solution to this, other than
>>>>>     modifications to lwip. We could add our own mutex and use that
>>>>>     whenever we talk to lwip, but even that would miss out on some
>>>>>     modifications to the network interface list, I guess.
>>>>>
>>>>>
>>>>> These two changes are in ‘pre’ now, and I am trying them in my car.
>>>>>
>>>>> Regards, Mark.
>>>>>
>>>>>> On 22 Mar 2021, at 6:06 PM, Michael Balzer <dexter at expeedo.de 
>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>
>>>>>> Signed PGP part
>>>>>> In master, running commands via ssh or server-v2 block, because 
>>>>>> these are running synchronously in the mongoose context.
>>>>>>
>>>>>> Running commands via web doesn't block, as the webcommand class 
>>>>>> starts a separate task for each execution.
>>>>>>
>>>>>> The firmware config page does a synchronous call to 
>>>>>> MyOTA.GetStatus(), so that call is executed in the mongoose 
>>>>>> context. It still works in master, just needs a second or two to 
>>>>>> fetch the version file.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 22.03.21 um 10:38 schrieb Mark Webb-Johnson:
>>>>>>> In master branch, at the moment, if a command is run from the 
>>>>>>> web shell (or server v2), surely the mongoose task will block as 
>>>>>>> the web server / server v2 blocks waiting for the command to run 
>>>>>>> to completion?
>>>>>>>
>>>>>>> Doesn’t necessarily need to be a networking command. Something 
>>>>>>> long running like the string speed tests.
>>>>>>>
>>>>>>> In v3.3 I can easily detect the task wait being requested in the 
>>>>>>> http library (by seeing if current task id == mongoose task), 
>>>>>>> and fail (which I should do anyway). But I am more concerned 
>>>>>>> with the general case now (which I think may be wrong in both 
>>>>>>> master and for-v3.3).
>>>>>>>
>>>>>>> Regards, Mark
>>>>>>>
>>>>>>>> On 22 Mar 2021, at 5:22 PM, Michael Balzer <dexter at expeedo.de> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  I think we must avoid blocking the Mongoose task, as that's 
>>>>>>>> the central network dispatcher.
>>>>>>>>
>>>>>>>> Chris had implemented a workaround in one of his PRs that could 
>>>>>>>> allow that to be done temporarily by running a local Mongoose 
>>>>>>>> main loop during a synchronous operation, but I still see 
>>>>>>>> potential issues from that, as it wasn't the standard handling 
>>>>>>>> as done by the task, and as it may need to recurse.
>>>>>>>>
>>>>>>>> Maybe the old OvmsHttpClient using socket I/O is the right way 
>>>>>>>> for synchronous network operations?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
>>>>>>>>> Not sure how to resolve this.
>>>>>>>>>
>>>>>>>>> OvmsSyncHttpClient is currently used in commands from 
>>>>>>>>> ovms_plugins and ovms_ota.
>>>>>>>>>
>>>>>>>>> I could bring back the OvmsHttpClient blocking (non-mongoose) 
>>>>>>>>> implementation, but I don’t think that would address the core 
>>>>>>>>> problem here:
>>>>>>>>>
>>>>>>>>>     Inside a mongoose callback (inside the mongoose networking
>>>>>>>>>     task), we are making blocking calls (and in particular
>>>>>>>>>     calls that could block for several tens of seconds).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But fundamentally is it ok to block the mongoose networking 
>>>>>>>>> task for extended periods during a mongoose event callback?
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>> On 21 Mar 2021, at 9:57 PM, Michael Balzer <dexter at expeedo.de 
>>>>>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>
>>>>>>>>>> Signed PGP part
>>>>>>>>>> I've found opening the web UI firmware page or calling "ota 
>>>>>>>>>> status" via ssh to consistently deadlock the network on my 
>>>>>>>>>> module.
>>>>>>>>>>
>>>>>>>>>> I (130531) webserver: HTTP GET /cfg/firmware
>>>>>>>>>> D (130531) http: OvmsSyncHttpClient: Connect to 
>>>>>>>>>> ovms.dexters-web.de:80 <http://ovms.dexters-web.de/>
>>>>>>>>>> D (130541) http: OvmsSyncHttpClient: waiting for completion
>>>>>>>>>>
>>>>>>>>>> After that log message, the network is dead, and the 
>>>>>>>>>> netmanager also doesn't respond:
>>>>>>>>>>
>>>>>>>>>> OVMS# network list
>>>>>>>>>> ERROR: job failed
>>>>>>>>>> D (183241) netmanager: send cmd 1 from 0x3ffe7054
>>>>>>>>>> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>>>>>>>>>>
>>>>>>>>>> The interfaces seem to be registered and online, but nothing 
>>>>>>>>>> gets in or out:
>>>>>>>>>>
>>>>>>>>>> OVMS# network status
>>>>>>>>>> Interface#3: pp3 (ifup=1 linkup=1)
>>>>>>>>>>   IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>>>>>>>>>>
>>>>>>>>>> Interface#2: ap2 (ifup=1 linkup=1)
>>>>>>>>>>   IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>>>>>>>>>>
>>>>>>>>>> Interface#1: st1 (ifup=1 linkup=1)
>>>>>>>>>>   IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>>>>>>>>>>
>>>>>>>>>> DNS: 192.168.2.1
>>>>>>>>>>
>>>>>>>>>> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway 
>>>>>>>>>> 192.168.2.1)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A couple of minutes later, server-v2 recognizes the stale 
>>>>>>>>>> connection and issues a network restart, which fails 
>>>>>>>>>> resulting in the same behaviour as shown below with finally 
>>>>>>>>>> forced reboot by loss of an important event.
>>>>>>>>>>
>>>>>>>>>> Doing "ota status" from USB works normally, so this looks 
>>>>>>>>>> like OvmsSyncHttpClient not being able to run from within a 
>>>>>>>>>> mongoose client.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Michael
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>>>>>>>>>>> Tried to repeat this, but not having much success. Here is 
>>>>>>>>>>> my car module, with network still up:
>>>>>>>>>>>
>>>>>>>>>>>     OVMS# boot status
>>>>>>>>>>>     Last boot was 262355 second(s) ago
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I did manage to catch one network related crash after 
>>>>>>>>>>> repeatedly disconnecting and reconnecting the cellular 
>>>>>>>>>>> antenna. This was:
>>>>>>>>>>>
>>>>>>>>>>>     I (3717989) cellular: PPP Connection disconnected
>>>>>>>>>>>     Guru Meditation Error: Core  1 panic'ed
>>>>>>>>>>>     (LoadProhibited). Exception was unhandled.
>>>>>>>>>>>
>>>>>>>>>>>     0x400fe082 is in OvmsNetManager::PrioritiseAndIndicate()
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>>>>>>>>>>>
>>>>>>>>>>>         707       if ((pri->name[0]==search[0])&&
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400ed360 is in
>>>>>>>>>>>     OvmsMetricString::SetValue(std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >)
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>>>>>>>>>>>
>>>>>>>>>>>         1357    void OvmsMetricString::SetValue(std::string
>>>>>>>>>>>         value)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x4008bdad is at
>>>>>>>>>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x4008bdd1 is at
>>>>>>>>>>>     ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400fe886 is in
>>>>>>>>>>>     OvmsNetManager::ModemDown(std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >, void*)
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>>>>>>>>>>>
>>>>>>>>>>>         522 PrioritiseAndIndicate();
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400fd752 is in std::_Function_handler<void
>>>>>>>>>>>     (std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >, void*),
>>>>>>>>>>>     std::_Bind<std::_Mem_fn<void
>>>>>>>>>>>     (OvmsNetManager::*)(std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >, void*)>
>>>>>>>>>>>     (OvmsNetManager*, std::_Placeholder<1>,
>>>>>>>>>>>     std::_Placeholder<2>)> >::_M_invoke(std::_Any_data
>>>>>>>>>>>     const&, std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >&&,
>>>>>>>>>>>     void*&&)
>>>>>>>>>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>>>>>>>>>>>
>>>>>>>>>>>         600           { return
>>>>>>>>>>>         (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400f512e is in std::function<void
>>>>>>>>>>>     (std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >,
>>>>>>>>>>>     void*)>::operator()(std::__cxx11::basic_string<char,
>>>>>>>>>>>     std::char_traits<char>, std::allocator<char> >, void*)
>>>>>>>>>>>     const
>>>>>>>>>>>     (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>>>>>>>>>>>
>>>>>>>>>>>         2271  return _M_invoker(_M_functor,
>>>>>>>>>>>         std::forward<_ArgTypes>(__args)...);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400f52f1 is in
>>>>>>>>>>>     OvmsEvents::HandleQueueSignalEvent(event_queue_t*)
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>>>>>>>>>>>
>>>>>>>>>>>         283 m_current_callback->m_callback(m_current_event,
>>>>>>>>>>>         msg->body.signal.data);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400f53d8 is in OvmsEvents::EventTask()
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>>>>>>>>>>>
>>>>>>>>>>>         237 HandleQueueSignalEvent(&msg);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     0x400f545d is in EventLaunchTask(void*)
>>>>>>>>>>>     (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>>>>>>>>>>>
>>>>>>>>>>>         80  me->EventTask();
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> My for_v3.3 branch does include the preliminary changes to 
>>>>>>>>>>> support the wifi at 20MHz bandwidth, and perhaps those could 
>>>>>>>>>>> be affecting things. I do notice that if I ‘power wifi off’, 
>>>>>>>>>>> then ‘wifi mode client’, it can connect to the station, but 
>>>>>>>>>>> not get an IP address. I’ve just tried to merge in the 
>>>>>>>>>>> latest fixes to that, and rebuilt a release. I will continue 
>>>>>>>>>>> to test with that.
>>>>>>>>>>>
>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>
>>>>>>>>>>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer 
>>>>>>>>>>>> <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Signed PGP part
>>>>>>>>>>>> I just tried switching to for-v3.3 in my car module after 
>>>>>>>>>>>> tests on my desk module were OK, and I've run into the very 
>>>>>>>>>>>> same problem with for-v3.3. So the issue isn't related to 
>>>>>>>>>>>> esp-idf.
>>>>>>>>>>>>
>>>>>>>>>>>> The network only occasionally starts normally, but even 
>>>>>>>>>>>> then all connectivity is lost after a couple of minutes.
>>>>>>>>>>>>
>>>>>>>>>>>> The stale connection watchdog in server-v2 triggers a 
>>>>>>>>>>>> network restart, but that also doesn't seem to succeed:
>>>>>>>>>>>>
>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2: 
>>>>>>>>>>>> Detected stale connection (issue #241), restarting network
>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Stopping 
>>>>>>>>>>>> WIFI station
>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run -> 
>>>>>>>>>>>> init (0)
>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop, total 
>>>>>>>>>>>> sleep time: 831205045 us / 975329961 us
>>>>>>>>>>>>
>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>, 
>>>>>>>>>>>> old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
>>>>>>>>>>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>>>>>>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>>>>>>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>>>>>>>>>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi: Powering 
>>>>>>>>>>>> down WIFI driver
>>>>>>>>>>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc 
>>>>>>>>>>>> rx mblock:16
>>>>>>>>>>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi: Powering 
>>>>>>>>>>>> up WIFI driver
>>>>>>>>>>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init, 
>>>>>>>>>>>> erase log key successfully, reinit nvs log
>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver 
>>>>>>>>>>>> task: 3ffd4d84, prio:23, stack:3584, core=0
>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC 
>>>>>>>>>>>> address is not set, read default base MAC address from BLK0 
>>>>>>>>>>>> of EFUSE
>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base MAC 
>>>>>>>>>>>> address is not set, read default base MAC address from BLK0 
>>>>>>>>>>>> of EFUSE
>>>>>>>>>>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware 
>>>>>>>>>>>> version: 30f9e79
>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS 
>>>>>>>>>>>> flash: enabled
>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano 
>>>>>>>>>>>> formating: disabled
>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data frame 
>>>>>>>>>>>> dynamic rx buffer num: 16
>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init management 
>>>>>>>>>>>> frame dynamic rx buffer num: 16
>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init management 
>>>>>>>>>>>> short buffer num: 32
>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic tx 
>>>>>>>>>>>> buffer num: 16
>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx 
>>>>>>>>>>>> buffer size: 2212
>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static rx 
>>>>>>>>>>>> buffer num: 16
>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic rx 
>>>>>>>>>>>> buffer num: 16
>>>>>>>>>>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta 
>>>>>>>>>>>> (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>>>>>>>>>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power 
>>>>>>>>>>>> save buffer number: 8
>>>>>>>>>>>> 2021-03-12 14:53:02.652 CET I (982502) cellular-modem-auto: 
>>>>>>>>>>>> Restart
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State: 
>>>>>>>>>>>> Enter PowerOffOn state
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting 
>>>>>>>>>>>> down (hard)...
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: 
>>>>>>>>>>>> StatusCallBack: User Interrupt
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP 
>>>>>>>>>>>> connection has been closed
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is shutdown
>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting 
>>>>>>>>>>>> down (hard)...
>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: 
>>>>>>>>>>>> StatusCallBack: User Interrupt
>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP 
>>>>>>>>>>>> connection has been closed
>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is shutdown
>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown 
>>>>>>>>>>>> (direct)
>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) cellular-modem-auto: 
>>>>>>>>>>>> Power Cycle
>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET D (984532) events: 
>>>>>>>>>>>> Signal(system.wifi.down)
>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI 
>>>>>>>>>>>> client stop
>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager: 
>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI 
>>>>>>>>>>>> client down (with MODEM up): reconfigured for MODEM priority
>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events: 
>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events: 
>>>>>>>>>>>> Signal(system.wifi.sta.disconnected)
>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager: 
>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA 
>>>>>>>>>>>> disconnected with reason 8 = ASSOC_LEAVE
>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events: 
>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events: 
>>>>>>>>>>>> Signal(system.wifi.sta.stop)
>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager: 
>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events: 
>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events: 
>>>>>>>>>>>> Signal(system.wifi.ap.stop)
>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager: 
>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI 
>>>>>>>>>>>> access point is down
>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>>>>>>>>>> Signal(network.wifi.sta.bad)
>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events: 
>>>>>>>>>>>> Signal(system.wifi.sta.start)
>>>>>>>>>>>> 2021-03-12 14:53:04.732 CET D (984582) events: 
>>>>>>>>>>>> Signal(system.modem.down)
>>>>>>>>>>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM 
>>>>>>>>>>>> down (with WIFI client down): network connectivity has been 
>>>>>>>>>>>> lost
>>>>>>>>>>>> 2021-03-12 14:53:04.742 CET D (984592) events: 
>>>>>>>>>>>> Signal(system.modem.down)
>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events: 
>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events: 
>>>>>>>>>>>> Signal(system.wifi.ap.start)
>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI 
>>>>>>>>>>>> access point is up
>>>>>>>>>>>> 2021-03-12 14:53:26.802 CET E (1006652) events: 
>>>>>>>>>>>> SignalEvent: queue overflow (running 
>>>>>>>>>>>> system.wifi.ap.start->netmanager for 23 sec), event 
>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>> 2021-03-12 14:53:27.802 CET E (1007652) events: 
>>>>>>>>>>>> SignalEvent: queue overflow (running 
>>>>>>>>>>>> system.wifi.ap.start->netmanager for 24 sec), event 
>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>> 2021-03-12 14:53:28.802 CET E (1008652) events: 
>>>>>>>>>>>> SignalEvent: queue overflow (running 
>>>>>>>>>>>> system.wifi.ap.start->netmanager for 25 sec), event 
>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>> …and so on until
>>>>>>>>>>>> 2021-03-12 14:54:01.802 CET E (1041652) events: 
>>>>>>>>>>>> SignalEvent: lost important event => aborting
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I need my car now, so will switch back to master for now.
>>>>>>>>>>>>
>>>>>>>>>>>> Mark, if you've got specific debug logs I should fetch on 
>>>>>>>>>>>> the next try, tell me.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Michael
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>>>>>>>>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the 
>>>>>>>>>>>>> network hang immediately after booting and logging into 
>>>>>>>>>>>>> the web gui.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I see now my problem is likely that I'm not using the 
>>>>>>>>>>>>> right esp-idf (duh). Is there a way I can have master 
>>>>>>>>>>>>> build using ~/esp/esp-idf and have for-v3.3 use a 
>>>>>>>>>>>>> different path?)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Craig
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>>>> OvmsDev at lists.openvehicles.com 
>>>>>>>>>>>>> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev 
>>>>>>>>>>>>> <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> OvmsDev mailing list
>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>> _______________________________________________
>>>>>>>> OvmsDev mailing list
>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>
>>>>>> -- 
>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>> -- 
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210917/2cdcc03c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210917/2cdcc03c/attachment-0001.sig>


More information about the OvmsDev mailing list