[Ovmsdev] Netmanager priority issue solved

Fri Jul 24 22:02:25 HKT 2020

It seems we've got another mutex issue in duktape:

OVMS# mo ta
Number of Tasks = 22        Stack:  Now   Max Total    Heap 32-bit
SPIRAM C# PRI CPU% BPR/MH
3FFAFB88    1 Blk esp_timer         436   708  4096   40288    644 
31232  0  22   0%  22/ 0
3FFC0E90    2 Blk eventTask         476  1884  4608     104      0     
0  0  20   0%  20/ 0
3FFC3314    3 Blk OVMS Events       704  3360  8192   92364      0 
35464  1   8   1%   8/ 0
3FFC6764    4 Rdy OVMS DukTape      496 10864 12288     580      0
189492  1   3  10%   3/_*42*_

…increasing once per minute. I'll have a look.

Regards,
Michael

Am 24.07.20 um 15:27 schrieb Michael Balzer:
> TL;DR: you need to pull my latest esp-idf changes.
>
> I've finally found & solved the strange priority changes for our
> netmanager task (i.e. being raised suddenly from origin 5 to 18/22): the
> bug was in the esp-idf posix threads mutex implementation.
>
> I had suspected the mutex priority inheritance for a while, so added a
> way to retrieve the internal mutex hold count for our task list. Using
> this I noticed the hold count would always & only raise by 2 whenever
> any kind of mongoose connection was closed.
>
> That lead me to checking the thread concurrency protection for mongoose
> mbufs, because every mongoose connection has two mbufs associated (rx &
> tx), each having a posix mutex.
>
> The bug was: posix mutexes were deleted after locking (taking) them.
> FreeRTOS mutexes must not be deleted while being taken, that breaks the
> priority inheritance (more precisely the disinheritance), with the
> visible effect being the mutex hold count not returning to zero.
>
> As a side effect, this may also solve the strange event task starvations
> (hope so…). I was investigating this as I suspected some busy loop in
> the netmanager context. With the netman running at prio 22, that would
> effectively block almost all other processing including the timer
> service. I've found & fixed one potential busy loop trigger in the
> netman that would have been caused by the netman task still running
> while all interfaces had been lost -- not sure if that could happen, but
> it would explain the effects.
>
> So please watch your crash debug info & report if the issue still turns up.
>
> Regards,
> Michael
>

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20200724/7fd83c4c/attachment.htm>