[Ovmsdev] Branch for-v3.3 network issues
Michael Balzer
dexter at expeedo.de
Fri Sep 17 16:56:16 HKT 2021
I just remembered…
https://github.com/espressif/esp-idf/issues/2892#issuecomment-459120605
…and as Duktape runs on core #1, I now test shifting the Duktape heap
into the upper 2 MB of SPIRAM.
The TypeError in PubSub occurred ~ 3 times per day in my car module, so
I should see a difference over the weekend.
Regards,
Michael
Am 17.09.21 um 10:02 schrieb Mark Webb-Johnson:
> Thanks for the feedback.
>
> I should have a final sample of v3.3 hardware with rev3 esp32 in my
> hands towards the end of this month (this is the same sample set that
> goes to be destroyed by the certification labs). Is there any way of
> triggering the bug earlier, for replication? Like a stress test or
> something? Or just have to wait.
>
> Regards, Mark.
>
>> On 17 Sep 2021, at 3:24 PM, Michael Balzer <dexter at expeedo.de
>> <mailto:dexter at expeedo.de>> wrote:
>>
>> Signed PGP part
>> Mark,
>>
>> I've been running the new for-v3.3 version this week on both of my
>> modules without ppp issues.
>>
>> Duktape still occasionally runs into the null/undefined issue with
>> for…in:
>>
>> https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/issues/474#issuecomment-744005044
>>
>> for…in normally doesn't throw an error even if you run over null or
>> undefined.
>>
>> I think both could still be the SPIRAM bug, now probably only
>> occurring with very specific conditions. We build with LWIP using
>> SPIRAM as well, so the PPP instance is allocated from SPIRAM also.
>> Reallocating the instance on each new connect implies a higher chance
>> of triggering the problem if it's address specific. The Duktape stack
>> object addresses vary continuously with the running event handlers
>> and user interactions, so that also has a high chance of occasionally
>> triggering an address specific bug.
>>
>> We need to test the revision 3 ESP32 on this.
>>
>> Regards,
>> Michael
>>
>>
>> Am 09.09.21 um 02:31 schrieb Mark Webb-Johnson:
>>> Reviving this old topic still impacting us…
>>>
>>> Comparing the master vs for-v3.3 branches, the only thing that
>>> changed related to ppp is that in 3.3 the ppp object is dynamically
>>> created and destroyed, while in master it is statically allocated at
>>> boot. The ppp code is so simple, and I can’t see how that could be a
>>> problem. Perhaps related to position in memory, and some other
>>> memory corruption?
>>>
>>> Anyway, I changed it to not destroy the ppp object when the gsm
>>> connection is lost, but merely shutdown the ppp (which is what
>>> master branch does). While it is still dynamically allocated, it is
>>> no longer as dynamic (being created just once at startup of the
>>> cellular system). I’ve never managed to reliably repeat this problem
>>> in my environment, but I think this should help.It has been running
>>> on my desktop test unit for the past four days without issue.
>>>
>>> That code is committed now. I would appreciate it if others who saw
>>> this problem could try again with this latest build of the for-v3.3
>>> branch.
>>>
>>> Regards, Mark.
>>>
>>>> On 24 Mar 2021, at 3:53 PM, Mark Webb-Johnson
>>>> <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>>>
>>>> Signed PGP part
>>>> Good grief, this is not so easy. Now we have:
>>>>
>>>>
>>>> Guru Meditation Error: Core 1 panic'ed (LoadProhibited).
>>>> Exception was unhandled.
>>>>
>>>>
>>>> Core 1 register dump:
>>>>
>>>> PC : 0x40008044 PS : 0x00060f30 A0 :
>>>> 0x800fe2cc A1 : 0x3ffcaa90
>>>>
>>>> A2 : 0x3f413acc A3 : 0x00000046 A4 :
>>>> 0x00e6807e A5 : 0x00000000
>>>>
>>>> A6 : 0x00000000 A7 : 0x00000000 A8 :
>>>> 0x00000010 A9 : 0x00e6807e
>>>>
>>>> A10 : 0x00000078 A11 : 0x00000009 A12 :
>>>> 0x3ffcaa3f A13 : 0x00000032
>>>>
>>>> A14 : 0x00000000 A15 : 0x3ffcaa48 SAR :
>>>> 0x00000004 EXCCAUSE: 0x0000001c
>>>>
>>>> EXCVADDR: 0x00e6807e LBEG : 0x4008bdad LEND :
>>>> 0x4008bdd1 LCOUNT : 0x800f93f4
>>>>
>>>>
>>>> ELF file SHA256: 74bb0a75eeb4578b
>>>>
>>>> Backtrace: 0x40008044:0x3ffcaa90 0x400fe2c9:0x3ffcab20
>>>> 0x400fe412:0x3ffcabb0 0x402937b5:0x3ffcabd0
>>>>
>>>>
>>>> 0x400fe2c9 is in OvmsNetManager::DoSafePrioritiseAndIndicate()
>>>> (/Users/hq.mark.johnson/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_netmanager.cpp:723).
>>>> 718 }
>>>> 719
>>>> 720 for (struct netif *pri = netif_list; pri != NULL;
>>>> pri=pri->next)
>>>> 721 {
>>>> 722 ESP_EARLY_LOGI(TAG,"DoSafePrioritiseAndIndicate: interface
>>>> %p",pri);
>>>> 723 ESP_EARLY_LOGI(TAG,"DoSafePrioritiseAndIndicate: name
>>>> %s",pri->name);
>>>> 724 if ((pri->name[0]==search[0])&&
>>>> 725 (pri->name[1]==search[1]))
>>>> 726 {
>>>> 727 if (search[0] != m_previous_name[0] || search[1] !=
>>>> m_previous_name[1])
>>>> 0x400fe412 is in SafePrioritiseAndIndicate(void*)
>>>> (/Users/hq.mark.johnson/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_netmanager.cpp:676).
>>>> 671 }
>>>> 672 }
>>>> 673
>>>> 674void SafePrioritiseAndIndicate(void* ctx)
>>>> 675 {
>>>> 676 MyNetManager.DoSafePrioritiseAndIndicate();
>>>> 677 }
>>>> 678
>>>> 679void OvmsNetManager::PrioritiseAndIndicate()
>>>> 680 {
>>>> 0x402937b5 is in tcpip_thread
>>>> (/Users/hq.mark.johnson/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:158).
>>>> 153 break;
>>>> 154#endif /* LWIP_TCPIP_TIMEOUT && LWIP_TIMERS */
>>>> 155
>>>> 156 case TCPIP_MSG_CALLBACK:
>>>> 157 LWIP_DEBUGF(TCPIP_DEBUG, ("tcpip_thread: CALLBACK %p\n",
>>>> (void *)msg));
>>>> 158 msg->msg.cb.function(msg->msg.cb.ctx);
>>>> 159 memp_free(MEMP_TCPIP_MSG_API, msg);
>>>> 160 break;
>>>> 161
>>>> 162 case TCPIP_MSG_CALLBACK_STATIC:
>>>>
>>>>
>>>> So the issue is most likely corruption of the network interface
>>>> structure, not thread safe traversal.
>>>>
>>>> I had added some ESP_EARLY_LOGI statements, so can see a little
>>>> more of what is going on:
>>>>
>>>> I (103202) gsm-ppp: Initialising...
>>>> I (103212) events: Signal(system.modem.netmode)
>>>> I (105902) netmanager: DoSafePrioritiseAndIndicate: start
>>>> I (105902) netmanager: DoSafePrioritiseAndIndicate: connected wifi
>>>> I (105912) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x3ffed6a0
>>>> I (105912) netmanager: DoSafePrioritiseAndIndicate: name pp
>>>> I (105922) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x3ffde854
>>>> I (105932) netmanager: DoSafePrioritiseAndIndicate: name ap
>>>> I (105932) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x3ffde640
>>>> I (105942) netmanager: DoSafePrioritiseAndIndicate: name st
>>>> I (105952) netmanager: DoSafePrioritiseAndIndicate: end
>>>> I (105902) gsm-ppp: StatusCallBack: None
>>>> I (105902) gsm-ppp: status_cb: Connected
>>>> I (105902) gsm-ppp: our_ipaddr = 10.52.40.80
>>>> …
>>>> I (3708442) cellular: PPP Connection disconnected
>>>> I (3708442) cellular: PPP Connection disconnected
>>>> I (3709212) netmanager: DoSafePrioritiseAndIndicate: start
>>>> I (3709212) netmanager: DoSafePrioritiseAndIndicate: connected wifi
>>>> I (3709212) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x3ffed6a0
>>>> I (3709222) netmanager: DoSafePrioritiseAndIndicate: name pp
>>>> I (3709222) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x30323930
>>>> I (3709232) netmanager: DoSafePrioritiseAndIndicate: name f
>>>> I (3709242) netmanager: DoSafePrioritiseAndIndicate: interface
>>>> 0x667fc000
>>>> I (3709252) netmanager: DoSafePrioritiseAndIndicate: name
>>>> Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout
>>>> on CPU1)
>>>>
>>>>
>>>> Doesn’t help much, apart from confirm the corruption. Took about an
>>>> hour to recreate the problem.
>>>>
>>>> I’ll keep looking.
>>>>
>>>> Regards, Mark.
>>>>
>>>>> On 23 Mar 2021, at 4:05 PM, Mark Webb-Johnson
>>>>> <mark at webb-johnson.net <mailto:mark at webb-johnson.net>> wrote:
>>>>>
>>>>> Signed PGP part
>>>>> My attempt didn’t work (still crashes), so I’m now trying your
>>>>> suggestion of wrapping PrioritiseAndIndicate() in
>>>>> a tcpip_callback_with_block.
>>>>>
>>>>> 🤞🏻
>>>>>
>>>>> Regards, Mark.
>>>>>
>>>>>> On 23 Mar 2021, at 3:02 PM, Michael Balzer <dexter at expeedo.de
>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>
>>>>>> Signed PGP part
>>>>>> Mark,
>>>>>>
>>>>>> regarding point 2: I've had the same issue with jobs that need to
>>>>>> iterate over the mongoose connection list, I introduced the
>>>>>> netmanager job queue for this to delegate these to the mongoose
>>>>>> context.
>>>>>>
>>>>>> I remember seeing LwIP has a similar API while browsing the
>>>>>> source… yes, found it: the "tcpip_callback…" functions, e.g.:
>>>>>>
>>>>>> /**
>>>>>> * Call a specific function in the thread context of
>>>>>> * tcpip_thread for easy access synchronization.
>>>>>> * A function called in that way may access lwIP core code
>>>>>> * without fearing concurrent access.
>>>>>> *
>>>>>> * @param function the function to call
>>>>>> * @param ctx parameter passed to f
>>>>>> * @param block 1 to block until the request is posted, 0 to
>>>>>> non-blocking mode
>>>>>> * @return ERR_OK if the function was called, another err_t if not
>>>>>> */
>>>>>> err_t
>>>>>> tcpip_callback_with_block(tcpip_callback_fn function, void *ctx,
>>>>>> u8_t block)
>>>>>>
>>>>>> So we probably can use this to execute PrioritiseAndIndicate() in
>>>>>> the LwIP context.
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 23.03.21 um 06:47 schrieb Mark Webb-Johnson:
>>>>>>> OK, some progress…
>>>>>>>
>>>>>>> 1. I’ve added a check in the OvmsSyncHttpClient code to refuse
>>>>>>> to block while running as the netman (mongoose) task. This
>>>>>>> will now simply fail the http connection, and log an error.
>>>>>>> Not perfect, and not a solution to the core problem, but at
>>>>>>> least it avoids a known problem.
>>>>>>>
>>>>>>> I’m not sure of the best permanent solution to this. It
>>>>>>> seems that we need a callback interface to run commands
>>>>>>> asynchronously, and use that in mongoose event handlers.
>>>>>>> Adding another mongoose event loop, or using a separate
>>>>>>> networking socket with select(), just minimise the problem -
>>>>>>> they don’t solve it. The core issue here is blocking during
>>>>>>> a mongoose event delivery. That is going to pause all high
>>>>>>> level networking.
>>>>>>>
>>>>>>> 2. I found a race condition in ovms_netmanager that seems
>>>>>>> nasty. The new cellular code could raise duplicate
>>>>>>> modem.down signals, picked up and handled in
>>>>>>> ovms_netmanager. As part of that it calls
>>>>>>> a PrioritiseAndIndicate() function that iterates over the
>>>>>>> network interface list (maintained by LWIP). If that network
>>>>>>> interface list is modified (eg; removing an interface) while
>>>>>>> it is being traversed, nasty crashes can happen. The ‘fix’
>>>>>>> I’ve done is again just a workaround to try to reduce the
>>>>>>> duplicate signals and hence reduce the likelyhood of the
>>>>>>> problem happening, but it won’t fix the core problem (that
>>>>>>> is in both master and for-v3.3).
>>>>>>>
>>>>>>> There is a netif_find function in LWIP, but (a) that
>>>>>>> requires an interface number that we don’t have, and (b)
>>>>>>> doesn’t seem to lock the list either.
>>>>>>>
>>>>>>> Can’t think of an elegant solution to this, other than
>>>>>>> modifications to lwip. We could add our own mutex and use
>>>>>>> that whenever we talk to lwip, but even that would miss out
>>>>>>> on some modifications to the network interface list, I guess.
>>>>>>>
>>>>>>>
>>>>>>> These two changes are in ‘pre’ now, and I am trying them in my car.
>>>>>>>
>>>>>>> Regards, Mark.
>>>>>>>
>>>>>>>> On 22 Mar 2021, at 6:06 PM, Michael Balzer <dexter at expeedo.de
>>>>>>>> <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>
>>>>>>>> Signed PGP part
>>>>>>>> In master, running commands via ssh or server-v2 block, because
>>>>>>>> these are running synchronously in the mongoose context.
>>>>>>>>
>>>>>>>> Running commands via web doesn't block, as the webcommand class
>>>>>>>> starts a separate task for each execution.
>>>>>>>>
>>>>>>>> The firmware config page does a synchronous call to
>>>>>>>> MyOTA.GetStatus(), so that call is executed in the mongoose
>>>>>>>> context. It still works in master, just needs a second or two
>>>>>>>> to fetch the version file.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 22.03.21 um 10:38 schrieb Mark Webb-Johnson:
>>>>>>>>> In master branch, at the moment, if a command is run from the
>>>>>>>>> web shell (or server v2), surely the mongoose task will block
>>>>>>>>> as the web server / server v2 blocks waiting for the command
>>>>>>>>> to run to completion?
>>>>>>>>>
>>>>>>>>> Doesn’t necessarily need to be a networking command. Something
>>>>>>>>> long running like the string speed tests.
>>>>>>>>>
>>>>>>>>> In v3.3 I can easily detect the task wait being requested in
>>>>>>>>> the http library (by seeing if current task id == mongoose
>>>>>>>>> task), and fail (which I should do anyway). But I am more
>>>>>>>>> concerned with the general case now (which I think may be
>>>>>>>>> wrong in both master and for-v3.3).
>>>>>>>>>
>>>>>>>>> Regards, Mark
>>>>>>>>>
>>>>>>>>>> On 22 Mar 2021, at 5:22 PM, Michael Balzer
>>>>>>>>>> <dexter at expeedo.de> wrote:
>>>>>>>>>>
>>>>>>>>>> I think we must avoid blocking the Mongoose task, as that's
>>>>>>>>>> the central network dispatcher.
>>>>>>>>>>
>>>>>>>>>> Chris had implemented a workaround in one of his PRs that
>>>>>>>>>> could allow that to be done temporarily by running a local
>>>>>>>>>> Mongoose main loop during a synchronous operation, but I
>>>>>>>>>> still see potential issues from that, as it wasn't the
>>>>>>>>>> standard handling as done by the task, and as it may need to
>>>>>>>>>> recurse.
>>>>>>>>>>
>>>>>>>>>> Maybe the old OvmsHttpClient using socket I/O is the right
>>>>>>>>>> way for synchronous network operations?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Michael
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 22.03.21 um 07:15 schrieb Mark Webb-Johnson:
>>>>>>>>>>> Not sure how to resolve this.
>>>>>>>>>>>
>>>>>>>>>>> OvmsSyncHttpClient is currently used in commands from
>>>>>>>>>>> ovms_plugins and ovms_ota.
>>>>>>>>>>>
>>>>>>>>>>> I could bring back the OvmsHttpClient blocking
>>>>>>>>>>> (non-mongoose) implementation, but I don’t think that would
>>>>>>>>>>> address the core problem here:
>>>>>>>>>>>
>>>>>>>>>>> Inside a mongoose callback (inside the mongoose
>>>>>>>>>>> networking task), we are making blocking calls (and in
>>>>>>>>>>> particular calls that could block for several tens of
>>>>>>>>>>> seconds).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But fundamentally is it ok to block the mongoose networking
>>>>>>>>>>> task for extended periods during a mongoose event callback?
>>>>>>>>>>>
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>>> On 21 Mar 2021, at 9:57 PM, Michael Balzer
>>>>>>>>>>>> <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Signed PGP part
>>>>>>>>>>>> I've found opening the web UI firmware page or calling "ota
>>>>>>>>>>>> status" via ssh to consistently deadlock the network on my
>>>>>>>>>>>> module.
>>>>>>>>>>>>
>>>>>>>>>>>> I (130531) webserver: HTTP GET /cfg/firmware
>>>>>>>>>>>> D (130531) http: OvmsSyncHttpClient: Connect to
>>>>>>>>>>>> ovms.dexters-web.de:80 <http://ovms.dexters-web.de/>
>>>>>>>>>>>> D (130541) http: OvmsSyncHttpClient: waiting for completion
>>>>>>>>>>>>
>>>>>>>>>>>> After that log message, the network is dead, and the
>>>>>>>>>>>> netmanager also doesn't respond:
>>>>>>>>>>>>
>>>>>>>>>>>> OVMS# network list
>>>>>>>>>>>> ERROR: job failed
>>>>>>>>>>>> D (183241) netmanager: send cmd 1 from 0x3ffe7054
>>>>>>>>>>>> W (193241) netmanager: ExecuteJob: cmd 1: timeout
>>>>>>>>>>>>
>>>>>>>>>>>> The interfaces seem to be registered and online, but
>>>>>>>>>>>> nothing gets in or out:
>>>>>>>>>>>>
>>>>>>>>>>>> OVMS# network status
>>>>>>>>>>>> Interface#3: pp3 (ifup=1 linkup=1)
>>>>>>>>>>>> IPv4: 10.170.195.13/255.255.255.255 gateway 10.64.64.64
>>>>>>>>>>>>
>>>>>>>>>>>> Interface#2: ap2 (ifup=1 linkup=1)
>>>>>>>>>>>> IPv4: 192.168.4.1/255.255.255.0 gateway 192.168.4.1
>>>>>>>>>>>>
>>>>>>>>>>>> Interface#1: st1 (ifup=1 linkup=1)
>>>>>>>>>>>> IPv4: 192.168.2.106/255.255.255.0 gateway 192.168.2.1
>>>>>>>>>>>>
>>>>>>>>>>>> DNS: 192.168.2.1
>>>>>>>>>>>>
>>>>>>>>>>>> Default Interface: st1 (192.168.2.106/255.255.255.0 gateway
>>>>>>>>>>>> 192.168.2.1)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A couple of minutes later, server-v2 recognizes the stale
>>>>>>>>>>>> connection and issues a network restart, which fails
>>>>>>>>>>>> resulting in the same behaviour as shown below with finally
>>>>>>>>>>>> forced reboot by loss of an important event.
>>>>>>>>>>>>
>>>>>>>>>>>> Doing "ota status" from USB works normally, so this looks
>>>>>>>>>>>> like OvmsSyncHttpClient not being able to run from within a
>>>>>>>>>>>> mongoose client.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Michael
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Am 18.03.21 um 08:14 schrieb Mark Webb-Johnson:
>>>>>>>>>>>>> Tried to repeat this, but not having much success. Here is
>>>>>>>>>>>>> my car module, with network still up:
>>>>>>>>>>>>>
>>>>>>>>>>>>> OVMS# boot status
>>>>>>>>>>>>> Last boot was 262355 second(s) ago
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did manage to catch one network related crash after
>>>>>>>>>>>>> repeatedly disconnecting and reconnecting the cellular
>>>>>>>>>>>>> antenna. This was:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I (3717989) cellular: PPP Connection disconnected
>>>>>>>>>>>>> Guru Meditation Error: Core 1 panic'ed
>>>>>>>>>>>>> (LoadProhibited). Exception was unhandled.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400fe082 is in
>>>>>>>>>>>>> OvmsNetManager::PrioritiseAndIndicate()
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:707).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 707 if ((pri->name[0]==search[0])&&
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400ed360 is in
>>>>>>>>>>>>> OvmsMetricString::SetValue(std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >)
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_metrics.cpp:1358).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1357 void
>>>>>>>>>>>>> OvmsMetricString::SetValue(std::string value)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x4008bdad is at
>>>>>>>>>>>>> ../../../../.././newlib/libc/machine/xtensa/strcmp.S:586.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x4008bdd1 is at
>>>>>>>>>>>>> ../../../../.././newlib/libc/machine/xtensa/strcmp.S:604.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400fe886 is in
>>>>>>>>>>>>> OvmsNetManager::ModemDown(std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >, void*)
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_netmanager.cpp:522).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 522 PrioritiseAndIndicate();
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400fd752 is in std::_Function_handler<void
>>>>>>>>>>>>> (std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>>>>>>>>>> void*), std::_Bind<std::_Mem_fn<void
>>>>>>>>>>>>> (OvmsNetManager::*)(std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>>>>>>>>>> void*)> (OvmsNetManager*, std::_Placeholder<1>,
>>>>>>>>>>>>> std::_Placeholder<2>)> >::_M_invoke(std::_Any_data
>>>>>>>>>>>>> const&, std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >&&,
>>>>>>>>>>>>> void*&&)
>>>>>>>>>>>>> (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:600).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 600 { return
>>>>>>>>>>>>> (__object->*_M_pmf)(std::forward<_Args>(__args)...); }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400f512e is in std::function<void
>>>>>>>>>>>>> (std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>>>>>>>>>> void*)>::operator()(std::__cxx11::basic_string<char,
>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >, void*)
>>>>>>>>>>>>> const
>>>>>>>>>>>>> (/home/openvehicles/build/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/functional:2271).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2271 return _M_invoker(_M_functor,
>>>>>>>>>>>>> std::forward<_ArgTypes>(__args)...);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400f52f1 is in
>>>>>>>>>>>>> OvmsEvents::HandleQueueSignalEvent(event_queue_t*)
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:283).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 283
>>>>>>>>>>>>> m_current_callback->m_callback(m_current_event,
>>>>>>>>>>>>> msg->body.signal.data);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400f53d8 is in OvmsEvents::EventTask()
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:237).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 237 HandleQueueSignalEvent(&msg);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0x400f545d is in EventLaunchTask(void*)
>>>>>>>>>>>>> (/home/openvehicles/build/Open-Vehicle-Monitoring-System-pre/vehicle/OVMS.V3/main/ovms_events.cpp:80).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 80 me->EventTask();
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My for_v3.3 branch does include the preliminary changes to
>>>>>>>>>>>>> support the wifi at 20MHz bandwidth, and perhaps those
>>>>>>>>>>>>> could be affecting things. I do notice that if I ‘power
>>>>>>>>>>>>> wifi off’, then ‘wifi mode client’, it can connect to the
>>>>>>>>>>>>> station, but not get an IP address. I’ve just tried to
>>>>>>>>>>>>> merge in the latest fixes to that, and rebuilt a release.
>>>>>>>>>>>>> I will continue to test with that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards, Mark.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12 Mar 2021, at 10:32 PM, Michael Balzer
>>>>>>>>>>>>>> <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed PGP part
>>>>>>>>>>>>>> I just tried switching to for-v3.3 in my car module after
>>>>>>>>>>>>>> tests on my desk module were OK, and I've run into the
>>>>>>>>>>>>>> very same problem with for-v3.3. So the issue isn't
>>>>>>>>>>>>>> related to esp-idf.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The network only occasionally starts normally, but even
>>>>>>>>>>>>>> then all connectivity is lost after a couple of minutes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The stale connection watchdog in server-v2 triggers a
>>>>>>>>>>>>>> network restart, but that also doesn't seem to succeed:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET W (981652) ovms-server-v2:
>>>>>>>>>>>>>> Detected stale connection (issue #241), restarting network
>>>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi: Restart
>>>>>>>>>>>>>> 2021-03-12 14:53:01.802 CET I (981652) esp32wifi:
>>>>>>>>>>>>>> Stopping WIFI station
>>>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:state: run ->
>>>>>>>>>>>>>> init (0)
>>>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:pm stop,
>>>>>>>>>>>>>> total sleep time: 831205045 us / 975329961 us
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2021-03-12 14:53:01.812 CET I (981662) wifi:new:<1,0>,
>>>>>>>>>>>>>> old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
>>>>>>>>>>>>>> 2021-03-12 14:53:01.832 CET I (981682) wifi:flush txq
>>>>>>>>>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:stop sw txq
>>>>>>>>>>>>>> 2021-03-12 14:53:01.842 CET I (981692) wifi:lmac stop hw txq
>>>>>>>>>>>>>> 2021-03-12 14:53:01.852 CET I (981702) esp32wifi:
>>>>>>>>>>>>>> Powering down WIFI driver
>>>>>>>>>>>>>> 2021-03-12 14:53:01.852 CET I (981702) wifi:Deinit lldesc
>>>>>>>>>>>>>> rx mblock:16
>>>>>>>>>>>>>> 2021-03-12 14:53:01.862 CET I (981712) esp32wifi:
>>>>>>>>>>>>>> Powering up WIFI driver
>>>>>>>>>>>>>> 2021-03-12 14:53:01.862 CET I (981712) wifi:nvs_log_init,
>>>>>>>>>>>>>> erase log key successfully, reinit nvs log
>>>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) wifi:wifi driver
>>>>>>>>>>>>>> task: 3ffd4d84, prio:23, stack:3584, core=0
>>>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base
>>>>>>>>>>>>>> MAC address is not set, read default base MAC address
>>>>>>>>>>>>>> from BLK0 of EFUSE
>>>>>>>>>>>>>> 2021-03-12 14:53:01.882 CET I (981732) system_api: Base
>>>>>>>>>>>>>> MAC address is not set, read default base MAC address
>>>>>>>>>>>>>> from BLK0 of EFUSE
>>>>>>>>>>>>>> 2021-03-12 14:53:01.902 CET I (981752) wifi:wifi firmware
>>>>>>>>>>>>>> version: 30f9e79
>>>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config NVS
>>>>>>>>>>>>>> flash: enabled
>>>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:config nano
>>>>>>>>>>>>>> formating: disabled
>>>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init data
>>>>>>>>>>>>>> frame dynamic rx buffer num: 16
>>>>>>>>>>>>>> 2021-03-12 14:53:01.912 CET I (981762) wifi:Init
>>>>>>>>>>>>>> management frame dynamic rx buffer num: 16
>>>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init
>>>>>>>>>>>>>> management short buffer num: 32
>>>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic
>>>>>>>>>>>>>> tx buffer num: 16
>>>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static
>>>>>>>>>>>>>> rx buffer size: 2212
>>>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init static
>>>>>>>>>>>>>> rx buffer num: 16
>>>>>>>>>>>>>> 2021-03-12 14:53:01.922 CET I (981772) wifi:Init dynamic
>>>>>>>>>>>>>> rx buffer num: 16
>>>>>>>>>>>>>> 2021-03-12 14:53:02.642 CET I (982492) wifi:mode : sta
>>>>>>>>>>>>>> (30:ae:a4:5f:e7:ec) + softAP (30:ae:a4:5f:e7:ed)
>>>>>>>>>>>>>> 2021-03-12 14:53:02.652 CET I (982502) wifi:Total power
>>>>>>>>>>>>>> save buffer number: 8
>>>>>>>>>>>>>> 2021-03-12 14:53:02.652 CET I (982502)
>>>>>>>>>>>>>> cellular-modem-auto: Restart
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) cellular: State:
>>>>>>>>>>>>>> Enter PowerOffOn state
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting
>>>>>>>>>>>>>> down (hard)...
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp:
>>>>>>>>>>>>>> StatusCallBack: User Interrupt
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP
>>>>>>>>>>>>>> connection has been closed
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: PPP is
>>>>>>>>>>>>>> shutdown
>>>>>>>>>>>>>> 2021-03-12 14:53:02.662 CET I (982512) gsm-ppp: Shutting
>>>>>>>>>>>>>> down (hard)...
>>>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp:
>>>>>>>>>>>>>> StatusCallBack: User Interrupt
>>>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP
>>>>>>>>>>>>>> connection has been closed
>>>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-ppp: PPP is
>>>>>>>>>>>>>> shutdown
>>>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522) gsm-nmea: Shutdown
>>>>>>>>>>>>>> (direct)
>>>>>>>>>>>>>> 2021-03-12 14:53:02.672 CET I (982522)
>>>>>>>>>>>>>> cellular-modem-auto: Power Cycle
>>>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET D (984532) events:
>>>>>>>>>>>>>> Signal(system.wifi.down)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI
>>>>>>>>>>>>>> client stop
>>>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET E (984532) netmanager:
>>>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>>>> 2021-03-12 14:53:04.682 CET I (984532) netmanager: WIFI
>>>>>>>>>>>>>> client down (with MODEM up): reconfigured for MODEM priority
>>>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events:
>>>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET D (984542) events:
>>>>>>>>>>>>>> Signal(system.wifi.sta.disconnected)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET E (984542) netmanager:
>>>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>>>> 2021-03-12 14:53:04.692 CET I (984542) esp32wifi: STA
>>>>>>>>>>>>>> disconnected with reason 8 = ASSOC_LEAVE
>>>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events:
>>>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET D (984552) events:
>>>>>>>>>>>>>> Signal(system.wifi.sta.stop)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.702 CET E (984552) netmanager:
>>>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events:
>>>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET D (984562) events:
>>>>>>>>>>>>>> Signal(system.wifi.ap.stop)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET E (984562) netmanager:
>>>>>>>>>>>>>> Inconsistent state: no interface of type 'pp' found
>>>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET I (984562) netmanager: WIFI
>>>>>>>>>>>>>> access point is down
>>>>>>>>>>>>>> 2021-03-12 14:53:04.712 CET I (984562) esp32wifi: AP stopped
>>>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events:
>>>>>>>>>>>>>> Signal(network.wifi.sta.bad)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events:
>>>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.722 CET D (984572) events:
>>>>>>>>>>>>>> Signal(system.wifi.sta.start)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.732 CET D (984582) events:
>>>>>>>>>>>>>> Signal(system.modem.down)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.742 CET I (984592) netmanager: MODEM
>>>>>>>>>>>>>> down (with WIFI client down): network connectivity has
>>>>>>>>>>>>>> been lost
>>>>>>>>>>>>>> 2021-03-12 14:53:04.742 CET D (984592) events:
>>>>>>>>>>>>>> Signal(system.modem.down)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events:
>>>>>>>>>>>>>> Signal(system.event)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET D (984602) events:
>>>>>>>>>>>>>> Signal(system.wifi.ap.start)
>>>>>>>>>>>>>> 2021-03-12 14:53:04.752 CET I (984602) netmanager: WIFI
>>>>>>>>>>>>>> access point is up
>>>>>>>>>>>>>> 2021-03-12 14:53:26.802 CET E (1006652) events:
>>>>>>>>>>>>>> SignalEvent: queue overflow (running
>>>>>>>>>>>>>> system.wifi.ap.start->netmanager for 23 sec), event
>>>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>>>> 2021-03-12 14:53:27.802 CET E (1007652) events:
>>>>>>>>>>>>>> SignalEvent: queue overflow (running
>>>>>>>>>>>>>> system.wifi.ap.start->netmanager for 24 sec), event
>>>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>>>> 2021-03-12 14:53:28.802 CET E (1008652) events:
>>>>>>>>>>>>>> SignalEvent: queue overflow (running
>>>>>>>>>>>>>> system.wifi.ap.start->netmanager for 25 sec), event
>>>>>>>>>>>>>> 'ticker.1' dropped
>>>>>>>>>>>>>> …and so on until
>>>>>>>>>>>>>> 2021-03-12 14:54:01.802 CET E (1041652) events:
>>>>>>>>>>>>>> SignalEvent: lost important event => aborting
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I need my car now, so will switch back to master for now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mark, if you've got specific debug logs I should fetch on
>>>>>>>>>>>>>> the next try, tell me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 12.03.21 um 05:47 schrieb Craig Leres:
>>>>>>>>>>>>>>> I just updated to 3.2.016-68-g8e10c6b7 and still get the
>>>>>>>>>>>>>>> network hang immediately after booting and logging into
>>>>>>>>>>>>>>> the web gui.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But I see now my problem is likely that I'm not using
>>>>>>>>>>>>>>> the right esp-idf (duh). Is there a way I can have
>>>>>>>>>>>>>>> master build using ~/esp/esp-idf and have for-v3.3 use a
>>>>>>>>>>>>>>> different path?)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Craig
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>>>>>>> <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>>>>>> <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>>> _______________________________________________
>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> OvmsDev mailing list
>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>
>>>>>> --
>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>> --
>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210917/88029ba6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20210917/88029ba6/attachment-0001.sig>
More information about the OvmsDev
mailing list