[Ovmsdev] Netmanager priority issue solved

Sat Jul 25 00:00:27 HKT 2020

Fixed. I fell for the same assumption in the OvmsMutex / OvmsRecMutex
implementation.

Regards,
Michael

Am 24.07.20 um 16:02 schrieb Michael Balzer:
> It seems we've got another mutex issue in duktape:
>
> OVMS# mo ta
> Number of Tasks = 22        Stack:  Now   Max Total    Heap 32-bit
> SPIRAM C# PRI CPU% BPR/MH
> 3FFAFB88    1 Blk esp_timer         436   708  4096   40288    644 
> 31232  0  22   0%  22/ 0
> 3FFC0E90    2 Blk eventTask         476  1884  4608     104     
> 0      0  0  20   0%  20/ 0
> 3FFC3314    3 Blk OVMS Events       704  3360  8192   92364      0 
> 35464  1   8   1%   8/ 0
> 3FFC6764    4 Rdy OVMS DukTape      496 10864 12288     580      0
> 189492  1   3  10%   3/_*42*_
>
> …increasing once per minute. I'll have a look.
>
> Regards,
> Michael
>
>
> Am 24.07.20 um 15:27 schrieb Michael Balzer:
>> TL;DR: you need to pull my latest esp-idf changes.
>>
>> I've finally found & solved the strange priority changes for our
>> netmanager task (i.e. being raised suddenly from origin 5 to 18/22): the
>> bug was in the esp-idf posix threads mutex implementation.
>>
>> I had suspected the mutex priority inheritance for a while, so added a
>> way to retrieve the internal mutex hold count for our task list. Using
>> this I noticed the hold count would always & only raise by 2 whenever
>> any kind of mongoose connection was closed.
>>
>> That lead me to checking the thread concurrency protection for mongoose
>> mbufs, because every mongoose connection has two mbufs associated (rx &
>> tx), each having a posix mutex.
>>
>> The bug was: posix mutexes were deleted after locking (taking) them.
>> FreeRTOS mutexes must not be deleted while being taken, that breaks the
>> priority inheritance (more precisely the disinheritance), with the
>> visible effect being the mutex hold count not returning to zero.
>>
>> As a side effect, this may also solve the strange event task starvations
>> (hope so…). I was investigating this as I suspected some busy loop in
>> the netmanager context. With the netman running at prio 22, that would
>> effectively block almost all other processing including the timer
>> service. I've found & fixed one potential busy loop trigger in the
>> netman that would have been caused by the netman task still running
>> while all interfaces had been lost -- not sure if that could happen, but
>> it would explain the effects.
>>
>> So please watch your crash debug info & report if the issue still turns up.
>>
>> Regards,
>> Michael
>>
>
> -- 
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20200724/2248f2ec/attachment.html>