Ok, but I'm in ap+client mode, but don't have anything attaching to the AP side of things.

AP+Client, hotspot running.  Boot.  Client gets an IP address, server connects, metrics sent.
Some time later (minutes or so), the modem finally connects.  Wifi is still connected, and is still the primary interface, but the server is no longer getting updates from module.  server stop, server start, hang.  The last thing to the console this past time before the phone client no longer got updates was "starting SNTP client"

Not 100% repeatable, but close to it.  I think partly it depends on the modem's luck in getting a good cell connection; it's kind of iffy here.  Or, is the deadlock in the SNTP client instead of MDNS?

Greg


Mark Webb-Johnson wrote:
It is a task mutex lock, so for mdns I have used them in ‘wait forever’ mode.

This is for the code I just submitted a few minutes ago.

Looking at the esp-idf mdns code, I think using AP or CLIENT modes would have been ok. Only one interface coming up/down then. But for APCLIENT mode, or when switch between AP and CLIENT modes, the fault will be triggered.

Regards, Mark.

On 19 Mar 2018, at 11:40 AM, Greg D. <gregd2350@gmail.com> wrote:

I wonder if those locks are the cause of the hangs I'm getting.  Do they have timers on them, or will they wait forever?

Greg


Mark Webb-Johnson wrote:
Messy. The issue is the restarting of mdns, coupled with AP/STA coming up/down in different threads. That could result in a mdns_free while another mdns_free or mdns_init is in progress. The mdns library is not thread safe. In particular, they mutex protect some internal stuff but not all of mdns_free or any of mdns_init.

I introduced an ovms_mutex library to main. This contains some helpful wrapper functions to make mutexes easy and safe to use:

class OvmsMutex
Encapsulates a mutex.
Provides Lock() and Unlock() functions to access it.

class OvmsMutexLock
Encapsulates a mutex lock in a safer manner.
Constructor locks it, and destructor unlocks it.
Using this is as simple as creating a mutex for the object to be protected, and then creating a OvmsMutexLock(&mutex) over that. When the OvmsMutexLock comes in scope, it will lock the mutex, and when it goes out of scope it will unlock it.

I’ve done this in other projects, with great success. Using OvmsMutexLock objects means you never forget to unlock the mutex (no matter how weird and wonderful your code paths are).

I then used OvmsMutex and OvmsMutexLock in ovms_mdns. The mdns stuff seems ok for me now.

Regards, Mark.

On 19 Mar 2018, at 9:51 AM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:


It seems that mdns_free is being called, while mdns_free is running…

It is trivial to fix it in the timer deletion code, but that just delays the problem until a bit later (the hostname free).

I am looking at this now.

Regards, Mark.

On 19 Mar 2018, at 9:40 AM, Stephen Casner <casner@acm.org> wrote:

On Sun, 18 Mar 2018, Michael Balzer wrote:

From a first look at the mdns code in your backtrace, I would guess
that's a double free() for the timer, it seems the timer_handle
never gets NULLed.

That may be a bug (deficiency) in the esp-idf mDNS code.

Perhaps I hit it because OvmsMDNS::StopMDNS was called without
everything coming up correctly first?

                                                      -- Steve
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev

_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev



_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev

_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev



_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev