[Ovmsdev] MDNS crash
Stephen Casner
casner at acm.org
Mon Mar 19 12:14:30 HKT 2018
Greg, the async console is a separate task that should be able to look
at some things as long as they don't depend upon the networking tasks
or objects. Is it hung also?
-- Steve
On Sun, 18 Mar 2018, Greg D. wrote:
> Ok, so not *in* SNTP, but related to SNTP. After a minute's wait, I saw a pair of
> records get "sent" to the v2 server. They never arrived. After that, I tried the
> stop/start of the server, and that's where it hung. Sounds like a missing mutex
> unlock somewhere. What resources does starting the server make use of, that might
> have also been used by SNTP?
>
> Greg
>
>
> Greg D. wrote:
> Ok, but I'm in ap+client mode, but don't have anything attaching to the AP
> side of things.
>
> AP+Client, hotspot running. Boot. Client gets an IP address, server
> connects, metrics sent.
> Some time later (minutes or so), the modem finally connects. Wifi is
> still connected, and is still the primary interface, but the server is no
> longer getting updates from module. server stop, server start, hang. The
> last thing to the console this past time before the phone client no longer
> got updates was "starting SNTP client"
>
> Not 100% repeatable, but close to it. I think partly it depends on the
> modem's luck in getting a good cell connection; it's kind of iffy here.
> Or, is the deadlock in the SNTP client instead of MDNS?
>
> Greg
>
>
> Mark Webb-Johnson wrote:
> It is a task mutex lock, so for mdns I have used them in ‘wait
> forever’ mode.
> This is for the code I just submitted a few minutes ago.
>
> Looking at the esp-idf mdns code, I think using AP or CLIENT modes would
> have been ok. Only one interface coming up/down then. But for APCLIENT
> mode, or when switch between AP and CLIENT modes, the fault will be
> triggered.
>
> Regards, Mark.
>
> On 19 Mar 2018, at 11:40 AM, Greg D. <gregd2350 at gmail.com>
> wrote:
>
> I wonder if those locks are the cause of the hangs I'm getting. Do
> they have timers on them, or will they wait forever?
>
> Greg
>
>
> Mark Webb-Johnson wrote:
> Messy. The issue is the restarting of mdns, coupled with
> AP/STA coming up/down in different threads. That could
> result in a mdns_free while another mdns_free or
> mdns_init is in progress. The mdns library is not thread
> safe. In particular, they mutex protect some internal
> stuff but not all of mdns_free or any of mdns_init.
> I introduced an ovms_mutex library to main. This contains some
> helpful wrapper functions to make mutexes easy and safe to
> use:
>
> class OvmsMutex
>
> Encapsulates a mutex.
> Provides Lock() and Unlock() functions to access
> it.
>
>
> class OvmsMutexLock
>
> Encapsulates a mutex lock in a safer
> manner.
>
> Constructor locks it, and destructor
> unlocks it.
>
> Using this is as simple as creating a
> mutex for the object to be protected,
> and then creating a
> OvmsMutexLock(&mutex) over that. When
> the OvmsMutexLock comes in scope, it
> will lock the mutex, and when it goes
> out of scope it will unlock it.
>
>
> I’ve done this in other projects, with great
> success. Using OvmsMutexLock objects means you
> never forget to unlock the mutex (no matter how
> weird and wonderful your code paths are).
>
>
> I then used OvmsMutex and OvmsMutexLock in ovms_mdns. The mdns
> stuff seems ok for me now.
>
> Regards, Mark.
>
> On 19 Mar 2018, at 9:51 AM, Mark Webb-Johnson
> <mark at webb-johnson.net> wrote:
>
>
> It seems that mdns_free is being called, while mdns_free
> is running…
>
> It is trivial to fix it in the timer deletion code, but
> that just delays the problem until a bit later (the
> hostname free).
>
> I am looking at this now.
>
> Regards, Mark.
>
> On 19 Mar 2018, at 9:40 AM, Stephen Casner
> <casner at acm.org> wrote:
>
> On Sun, 18 Mar 2018, Michael Balzer wrote:
>
> From a first look at the mdns
> code in your backtrace, I would
> guess
> that's a double free() for the
> timer, it seems the timer_handle
> never gets NULLed.
>
>
> That may be a bug (deficiency) in the
> esp-idf mDNS code.
>
> Perhaps I hit it because OvmsMDNS::StopMDNS
> was called without
> everything coming up correctly first?
>
> --
> Steve
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
>
More information about the OvmsDev
mailing list