But this will only catch the problem if there is some task that is
running away.  It could be some kind of synchronization block instead.

Yes, that is my guess (a sync block).

Another
possibility that could be used in scenarios where wifi is still
working and doesn't need to be brought up/down would be to connect
with telnet or ssh and then issue the the simcom command in that
console.

From my experience, the async console is fine, up until the time we touch the network (at which point the async console locks up). I think the issue is in the TiT tcp/ip task (which is the only viable explanation for those tcp/ip timeouts I see on wifi, shortly after ppp goes down:

I (136619030) gsm-ppp: Shutting down (hard)...^[[0m
I (136619040) events: Signal(system.modem.down)^[[0m
I (136619040) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway x.y.z.64)

I (179290100) wifi: bcn_timout,ap_probe_send_start
I (179292610) wifi: ap_probe_send over, resett wifi status to disassoc
I (179292610) wifi: state: run -> init (1)
I (179292620) wifi: pm stop, total sleep time: 0/138543477
I (179292620) wifi: n:13 0, o:13 0, ap:255 255, sta:13 0, prof:1

I think that is 179292 - 136619 = 11 hours, though!

With Steve’s latest extension to ‘module tasks stack’, we should be able to narrow this down. I think I will firstly see if the watchdog works around it.

Regards, Mark.

On 26 Mar 2018, at 12:28 AM, Stephen Casner <casner@acm.org> wrote:

Enabling the watchdog timer is a good idea.  I've had it enabled in my
config for some time.  When I was recently working on enhancements to
the "module memory" command and had an infinite loop in my code, I got
a timer trap:

Task watchdog got triggered. The following tasks did not reset the watchdog in time:
- IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: AsyncConsole

But this will only catch the problem if there is some task that is
running away.  It could be some kind of synchronization block instead.

If there is a particular command that seems to get stuck, we could
consider (temporarily) invoking that command as a separate task so
that the async console would still be usable to investigate.  Another
possibility that could be used in scenarios where wifi is still
working and doesn't need to be brought up/down would be to connect
with telnet or ssh and then issue the the simcom command in that
console.

I have added printing of each task's state in the "module tasks"
output, but I have yet to observe any task in the "Run" state.  (I
would expect AsyncConsole to be in Run state while executing that
command, but it is not.)

I might be able to add an option on the command to print each task's
PC, which the python code would look up to translate to a source
line number.

                                                       -- Steve

On Sun, 25 Mar 2018, Mark Webb-Johnson wrote:

I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.

My guess is still the ppp code, during session teardown.

I sent a detailed message on this an hour or so, with my analysis of this.

Regards, Mark.

On 25 Mar 2018, at 5:52 PM, Tom Parker <tom@carrott.org> wrote:

On 25/03/18 02:41, Mark Webb-Johnson wrote:
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?

Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state:

The mux is up
AT communication with the simcom works on channel 3 and logs the expected tx and rx
The simcom is connected to the cellular network
No log messages recording periodic communications with the simcom are being recorded
The monotonic and park time counters are not advancing.

Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance?

What is the best way to investigate the state of the timers and periodic execution?

See attached for a transcript of this debugging session.
<ovms_2018-03-25T09_29_07+0000.log.bz2>_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev

_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.teslaclub.hk
http://lists.teslaclub.hk/mailman/listinfo/ovmsdev