[Ovmsdev] OVMS going comatose

Stephen Casner casner at acm.org
Sun Apr 25 00:47:22 HKT 2021


Michael,

I confirm the same lack of crashes, and I agree that the problem is
not strictly with WolfSSL.

It seems that a task WDT ought to be able to report where the task was
executing at the time.  That would certainly be helpful in narrowing
down the type of problem.  I'm not sure if the trap to gdb mode would
provide that info since I observed no response on the USB console when
the module went comatose

                                                        -- Steve

On Sat, 24 Apr 2021, Michael Balzer wrote:

> Steve,
>
> in the two weeks since disabling TLS on the server V2 connection I haven't had
> a single crash. While that's not a proof yet the watchdog issue is TLS
> related, it's at least a strong indicator.
>
> The watchdog triggers if the idle task on a core doesn't get a CPU share for
> 120 seconds. If the TLS functions block a CPU for more than a few seconds,
> that's already pretty bad, as that means TLS will cause delays in CAN
> processing (disrupting protocol transfers) and can possibly cause frame drops
> and queue overflows. Blocking the whole system for more than 120 seconds is
> totally unacceptable.
>
> This doesn't feel like a calculation / math performance issue, it rather feels
> like a bug - and that may imply a security issue as well.
>
> But I don't think this is caused by WolfSSL, as the issue has been present
> with mbedTLS as well, just didn't occur that frequently. Maybe some race
> condition with the LwIP task?
>
> Regards,
> Michael
>
>
> Am 11.04.21 um 09:44 schrieb Michael Balzer:
> > Steve,
> >
> > I can confirm an increase of these events since we changed to WolfSSL, about
> > once every three days currently for me. The frequency was much lower before,
> > more like once or twice per month.
> >
> > I've disabled TLS on my module now and will report if that helps.
> >
> > Regards,
> > Michael
> >
> >
> > Am 10.04.21 um 21:20 schrieb Stephen Casner:
> > > Michael,
> > >
> > > As you saw from my earlier emails, I was getting these crashes
> > > typically after less than 24 hours of operation.  I changed my config
> > > to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT
> > > and there has not been a crash since.  So it definitely appears to be
> > > correlated with the additional processing to support TLS.
> > >
> > >                                                          -- Steve
> > >
> > > On Sun, 4 Apr 2021, Michael Balzer wrote:
> > >
> > > > Steve,
> > > >
> > > > that's the problem with this issue, it's totally unclear what causes
> > > > this.
> > > >
> > > > The signal dropping begins when the queue is full, which happens after
> > > > the
> > > > task has been blocked for ~as many seconds as the queue is big. So there
> > > > is no
> > > > logged activity that could cause this, your module basically went into
> > > > this
> > > > from idling.
> > > >
> > > > Regards,
> > > > Michael
> > >
> >
> >
> > _______________________________________________
> > OvmsDev mailing list
> > OvmsDev at lists.openvehicles.com
> > http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> --
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26


More information about the OvmsDev mailing list