[Ovmsdev] v3 hardware disconnecting from v2 server

Mark Webb-Johnson mark at webb-johnson.net
Mon Mar 26 15:36:38 HKT 2018


> But this will only catch the problem if there is some task that is
> running away.  It could be some kind of synchronization block instead.

Yes, that is my guess (a sync block).

> Another
> possibility that could be used in scenarios where wifi is still
> working and doesn't need to be brought up/down would be to connect
> with telnet or ssh and then issue the the simcom command in that
> console.

From my experience, the async console is fine, up until the time we touch the network (at which point the async console locks up). I think the issue is in the TiT tcp/ip task (which is the only viable explanation for those tcp/ip timeouts I see on wifi, shortly after ppp goes down:

I (136619030) gsm-ppp: Shutting down (hard)...^[[0m
I (136619040) events: Signal(system.modem.down)^[[0m
I (136619040) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway x.y.z.64)

I (179290100) wifi: bcn_timout,ap_probe_send_start
I (179292610) wifi: ap_probe_send over, resett wifi status to disassoc
I (179292610) wifi: state: run -> init (1)
I (179292620) wifi: pm stop, total sleep time: 0/138543477
I (179292620) wifi: n:13 0, o:13 0, ap:255 255, sta:13 0, prof:1

I think that is 179292 - 136619 = 11 hours, though!

With Steve’s latest extension to ‘module tasks stack’, we should be able to narrow this down. I think I will firstly see if the watchdog works around it.

Regards, Mark.

> On 26 Mar 2018, at 12:28 AM, Stephen Casner <casner at acm.org> wrote:
> 
> Enabling the watchdog timer is a good idea.  I've had it enabled in my
> config for some time.  When I was recently working on enhancements to
> the "module memory" command and had an infinite loop in my code, I got
> a timer trap:
> 
> Task watchdog got triggered. The following tasks did not reset the watchdog in time:
> - IDLE (CPU 1)
> Tasks currently running:
> CPU 0: IDLE
> CPU 1: AsyncConsole
> 
> But this will only catch the problem if there is some task that is
> running away.  It could be some kind of synchronization block instead.
> 
> If there is a particular command that seems to get stuck, we could
> consider (temporarily) invoking that command as a separate task so
> that the async console would still be usable to investigate.  Another
> possibility that could be used in scenarios where wifi is still
> working and doesn't need to be brought up/down would be to connect
> with telnet or ssh and then issue the the simcom command in that
> console.
> 
> I have added printing of each task's state in the "module tasks"
> output, but I have yet to observe any task in the "Run" state.  (I
> would expect AsyncConsole to be in Run state while executing that
> command, but it is not.)
> 
> I might be able to add an option on the command to print each task's
> PC, which the python code would look up to translate to a source
> line number.
> 
>                                                        -- Steve
> 
> On Sun, 25 Mar 2018, Mark Webb-Johnson wrote:
> 
>> I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
>> 
>> My guess is still the ppp code, during session teardown.
>> 
>> I sent a detailed message on this an hour or so, with my analysis of this.
>> 
>> Regards, Mark.
>> 
>>> On 25 Mar 2018, at 5:52 PM, Tom Parker <tom at carrott.org> wrote:
>>> 
>>> On 25/03/18 02:41, Mark Webb-Johnson wrote:
>>>> The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
>>> 
>>> Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state:
>>> 
>>> The mux is up
>>> AT communication with the simcom works on channel 3 and logs the expected tx and rx
>>> The simcom is connected to the cellular network
>>> No log messages recording periodic communications with the simcom are being recorded
>>> The monotonic and park time counters are not advancing.
>>> 
>>> Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance?
>>> 
>>> What is the best way to investigate the state of the timers and periodic execution?
>>> 
>>> See attached for a transcript of this debugging session.
>>> <ovms_2018-03-25T09_29_07+0000.log.bz2>_______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.teslaclub.hk
>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>> 
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.teslaclub.hk
>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>> 
>> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.teslaclub.hk/pipermail/ovmsdev/attachments/20180326/3ebb83fb/attachment-0001.html>


More information about the OvmsDev mailing list