Michael, I confirm the same lack of crashes, and I agree that the problem is not strictly with WolfSSL. It seems that a task WDT ought to be able to report where the task was executing at the time. That would certainly be helpful in narrowing down the type of problem. I'm not sure if the trap to gdb mode would provide that info since I observed no response on the USB console when the module went comatose -- Steve On Sat, 24 Apr 2021, Michael Balzer wrote:
Steve,
in the two weeks since disabling TLS on the server V2 connection I haven't had a single crash. While that's not a proof yet the watchdog issue is TLS related, it's at least a strong indicator.
The watchdog triggers if the idle task on a core doesn't get a CPU share for 120 seconds. If the TLS functions block a CPU for more than a few seconds, that's already pretty bad, as that means TLS will cause delays in CAN processing (disrupting protocol transfers) and can possibly cause frame drops and queue overflows. Blocking the whole system for more than 120 seconds is totally unacceptable.
This doesn't feel like a calculation / math performance issue, it rather feels like a bug - and that may imply a security issue as well.
But I don't think this is caused by WolfSSL, as the issue has been present with mbedTLS as well, just didn't occur that frequently. Maybe some race condition with the LwIP task?
Regards, Michael
Am 11.04.21 um 09:44 schrieb Michael Balzer:
Steve,
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
I've disabled TLS on my module now and will report if that helps.
Regards, Michael
Am 10.04.21 um 21:20 schrieb Stephen Casner:
Michael,
As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS.
-- Steve
On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26