[Ovmsdev] MDNS crash

Stephen Casner casner at acm.org
Mon Mar 19 12:14:30 HKT 2018


Greg, the async console is a separate task that should be able to look
at some things as long as they don't depend upon the networking tasks
or objects.  Is it hung also?

                                                        -- Steve

On Sun, 18 Mar 2018, Greg D. wrote:

> Ok, so not *in* SNTP, but related to SNTP.  After a minute's wait, I saw a pair of
> records get "sent" to the v2 server.  They never arrived.  After that, I tried the
> stop/start of the server, and that's where it hung.  Sounds like a missing mutex
> unlock somewhere.  What resources does starting the server make use of, that might
> have also been used by SNTP?
>
> Greg
>
>
> Greg D. wrote:
>       Ok, but I'm in ap+client mode, but don't have anything attaching to the AP
>       side of things.
>
>       AP+Client, hotspot running.  Boot.  Client gets an IP address, server
>       connects, metrics sent.
>       Some time later (minutes or so), the modem finally connects.  Wifi is
>       still connected, and is still the primary interface, but the server is no
>       longer getting updates from module.  server stop, server start, hang.  The
>       last thing to the console this past time before the phone client no longer
>       got updates was "starting SNTP client"
>
>       Not 100% repeatable, but close to it.  I think partly it depends on the
>       modem's luck in getting a good cell connection; it's kind of iffy here.
>       Or, is the deadlock in the SNTP client instead of MDNS?
>
>       Greg
>
>
>       Mark Webb-Johnson wrote:
>       It is a task mutex lock, so for mdns I have used them in ‘wait
>       forever’ mode.
> This is for the code I just submitted a few minutes ago.
>
> Looking at the esp-idf mdns code, I think using AP or CLIENT modes would
> have been ok. Only one interface coming up/down then. But for APCLIENT
> mode, or when switch between AP and CLIENT modes, the fault will be
> triggered.
>
> Regards, Mark.
>
>       On 19 Mar 2018, at 11:40 AM, Greg D. <gregd2350 at gmail.com>
>       wrote:
>
> I wonder if those locks are the cause of the hangs I'm getting.  Do
> they have timers on them, or will they wait forever?
>
> Greg
>
>
> Mark Webb-Johnson wrote:
>       Messy. The issue is the restarting of mdns, coupled with
>       AP/STA coming up/down in different threads. That could
>       result in a mdns_free while another mdns_free or
>       mdns_init is in progress. The mdns library is not thread
>       safe. In particular, they mutex protect some internal
>       stuff but not all of mdns_free or any of mdns_init.
> I introduced an ovms_mutex library to main. This contains some
> helpful wrapper functions to make mutexes easy and safe to
> use:
>
>       class OvmsMutex
>
>             Encapsulates a mutex.
> Provides Lock() and Unlock() functions to access
> it.
>
>
>       class OvmsMutexLock
>
>             Encapsulates a mutex lock in a safer
>             manner.
>
>             Constructor locks it, and destructor
>             unlocks it.
>
>             Using this is as simple as creating a
>             mutex for the object to be protected,
>             and then creating a
>             OvmsMutexLock(&mutex) over that. When
>             the OvmsMutexLock comes in scope, it
>             will lock the mutex, and when it goes
>             out of scope it will unlock it.
>
>
>       I’ve done this in other projects, with great
>       success. Using OvmsMutexLock objects means you
>       never forget to unlock the mutex (no matter how
>       weird and wonderful your code paths are).
>
>
> I then used OvmsMutex and OvmsMutexLock in ovms_mdns. The mdns
> stuff seems ok for me now.
>
> Regards, Mark.
>
>       On 19 Mar 2018, at 9:51 AM, Mark Webb-Johnson
>       <mark at webb-johnson.net> wrote:
>
>
> It seems that mdns_free is being called, while mdns_free
> is running…
>
> It is trivial to fix it in the timer deletion code, but
> that just delays the problem until a bit later (the
> hostname free).
>
> I am looking at this now.
>
> Regards, Mark.
>
>       On 19 Mar 2018, at 9:40 AM, Stephen Casner
>       <casner at acm.org> wrote:
>
>       On Sun, 18 Mar 2018, Michael Balzer wrote:
>
>             From a first look at the mdns
>             code in your backtrace, I would
>             guess
>             that's a double free() for the
>             timer, it seems the timer_handle
>             never gets NULLed.
>
>
>       That may be a bug (deficiency) in the
>       esp-idf mDNS code.
>
>       Perhaps I hit it because OvmsMDNS::StopMDNS
>       was called without
>       everything coming up correctly first?
>
>                                                             --
>       Steve
>       _______________________________________________
>       OvmsDev mailing list
>       OvmsDev at lists.teslaclub.hk
>       http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>
>
>
>
>


More information about the OvmsDev mailing list