[Ovmsdev] LWIP lock-up

Mark Webb-Johnson mark at webb-johnson.net
Wed Mar 28 11:02:14 HKT 2018


Too damn reliable.

It has been three days since this last happened. Module has been in the car, rock solid, and no crashes/reboots for three days now. No watchdog timers fired.

Good grief.

Any ideas / suggestions as to how to repeat this issue, other than move to New Zealand (where Tom seems to get it fairly regularly)?

Regards, Mark.

> On 25 Mar 2018, at 3:54 PM, Mark Webb-Johnson <mark at webb-johnson.net> wrote:
> 
> 
> My car managed to repeat the issue with the task lockup I have seen occasionally. The symptom is that network connectivity fails and the async console becomes unresponsive when you issue a command to bring up/down wifi, simcom, etc.
> 
> The new housekeeping 10 minute ticker worked well for this, and stopped in the middle of the night:
> 
>> I (134401020) housekeeping: Local time: Sat Mar 24 20:15:45 2018
> I (135001020) housekeeping: Local time: Sat Mar 24 20:25:45 2018
> I (135601020) housekeeping: Local time: Sat Mar 24 20:35:45 2018
> I (136201020) housekeeping: Local time: Sat Mar 24 20:45:45 2018
> 
> Looking at the logs after Sat Mar 24 20:45:45, I see:
> 
> D (136543060) simcom: rx line ch=4 len=19  : +PPPD: DISCONNECTED
> I (136543070) simcom: PPP Connection disconnected
> I (136544020) simcom: Lost network connection (+PPP disconnect in NetMode)
> I (136544020) simcom: State: Enter NetLoss state
> ...
> I (136544030) gsm-ppp: Shutting down (hard)...
> I (136544040) events: Signal(system.modem.down)
> I (136544040) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway x.y.z.64)
> I (136544050) events: Signal(network.modem.down)
> I (136544050) netmanager: MODEM down (with WIFI client up): staying with WIFI client priority
> I (136544060) events: Signal(network.interface.change)
> 
> OK, so ppp went down (lost network connectivity), and WIFI client was up so we kept with that.
> 
> The GSM signal is still up (CREG), so we try to reconnect PPP:
> 
> I (136558160) simcom: PPP Connection is ready to start
> I (136559020) simcom: State: Enter NetMode state
> I (136559020) gsm-ppp: Initialising..
> ...
> D (136589180) simcom: rx line ch=4 len=8   : +CREG: 2
> I (136589190) simcom: CREG Network Registration: Searching
> ...
> D (136589250) simcom: rx line ch=4 len=8   : +CREG: 5^[[0m
> ^[[0;32mI (136589250) simcom: CREG Network Registration: RegisteredRoaming
> ...
> D (136589320) simcom: rx line ch=4 len=8   : +CREG: 2^[[0m
> ^[[0;32mI (136589320) simcom: CREG Network Registration: Searching
> ...
> D (136589380) simcom: rx line ch=4 len=8   : +CREG: 5^[[0m
> ^[[0;32mI (136589390) simcom: CREG Network Registration: RegisteredRoaming
> ...
> D (136601690) simcom: rx line ch=3 len=8   : +CREG: 2^[[0m
> ^[[0;32mI (136601690) simcom: CREG Network Registration: Searching
> ...
> D (136601780) simcom: rx line ch=3 len=8   : +CREG: 5^[[0m
> ^[[0;32mI (136601780) simcom: CREG Network Registration: RegisteredRoaming
> ...
> V (136617440) gsm-ppp: tx: 7e ff 7d 23 c0 21 7d 25 7d 2c 7d 20 20 4e 6f 20 | ~.}#.!}%},}  No 
> V (136617440) gsm-ppp: tx: 6e 65 74 77 6f 72 6b 20 70 72 6f 74 6f 63 6f 6c | network protocol
> V (136617450) gsm-ppp: tx: 73 20 72 75 6e 6e 69 6e 67 4b 8f 7e             | s runningK.~    
> V (136617450) simcom: tx: f9 09 ff 59 7e ff 7d 23 c0 21 7d 25 7d 2c 7d 20 | ...Y~.}#.!}%},} 
> V (136617450) simcom: tx: 20 4e 6f 20 6e 65 74 77 6f 72 6b 20 70 72 6f 74 |  No network prot
> V (136617460) simcom: tx: 6f 63 6f 6c 73 20 72 75 6e 6e 69 6e 67 4b 8f 7e | ocols runningK.~
> V (136617460) simcom: tx: a6 f9                                           | ..              
> V (136617490) simcom: rx: f9 09 ff 25 7e ff 7d 23 c0 21 7d 26 7d 2c 7d 20 | ...%~.}#.!}&},} 
> V (136617500) simcom: rx: 7d 24 8f 7d 3d 7e fb f9                         | }$.}=~..        
> V (136617510) gsm-mux: ProcessFrame(CHAN=2, ADDR=09, CTRL=ff, FCS=fb, LEN=24)
> V (136617520) gsm-mux: ChanProcessFrame(CHAN=2, ADDR=09, CTRL=ff, LEN=21, IFP=3)
> V (136617520) gsm-ppp: rx: 7e ff 7d 23 c0 21 7d 26 7d 2c 7d 20 7d 24 8f 7d | ~.}#.!}&},} }$.}
> V (136617530) gsm-ppp: rx: 3d 7e                                           | =~              
> I (136617540) gsm-ppp: StatusCallBack: Connection Lost
> E (136617540) gsm-ppp: status_cb: Connection lost
> I (136617550) gsm-ppp: Shutdown (via status callback)
> I (136617550) events: Signal(system.modem.down)
> I (136617560) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway z.y.z.64)
> I (136617560) gsm-ppp: Attempting PPP reconnecting in 30 seconds...
> 
> So, while initialising PPP we’ve been told that the cellular network is down (CREG: 2). Then, it flaps up and down like a yoyo, until the PPP connect finally fails.
> 
> At this point, things go screwy. The simcom tells us the PPP is down, and then we lose the ability to transmit to the modem (or at least log those transmits):
> 
> V (136618650) simcom: rx: 50 50 50 44 3a 20 44 49 53 43 4f 4e 4e 45 43 54 | PPPD: DISCONNECT
> V (136618660) simcom: rx: 45 44 0d 0a d4 f9 f9 11 ff 2f 0d 0a 2b 50 50 50 | ED......./..+PPP
> V (136618660) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=d4, LEN=29)
> V (136618670) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=26, IFP=3)
> D (136618680) simcom: rx line ch=3 len=19  : +PPPD: DISCONNECTED
> I (136618690) simcom: PPP Connection disconnected
> V (136618690) simcom: rx: 44 3a 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 0d | D: DISCONNECTED.
> V (136618700) simcom: rx: 0a d9 f9                                        | ...             
> V (136618710) gsm-mux: ProcessFrame(CHAN=4, ADDR=11, CTRL=ff, FCS=d9, LEN=29)
> V (136618710) gsm-mux: ChanProcessFrame(CHAN=4, ADDR=11, CTRL=ff, LEN=26, IFP=3)
> D (136618720) simcom: rx line ch=4 len=19  : +PPPD: DISCONNECTED
> I (136618730) simcom: PPP Connection disconnected
> I (136619020) simcom: Lost network connection (+PPP disconnect in NetMode)
> I (136619020) simcom: State: Enter NetLoss state
> V (136619030) simcom: tx: f9 0d ff 19 41 54 2b 43 47 41 54 54 3d 30 0d 0a | ....AT+CGATT=0..
> V (136619030) simcom: tx: 14 f9                                           | ..              
> I (136619030) gsm-ppp: Shutting down (hard)...^[[0m
> I (136619040) events: Signal(system.modem.down)^[[0m
> I (136619040) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway x.y.z.64)
> V (136632920) simcom: rx: f9 0d ff 0d 0d 0a 4f 4b 0d 0a 0f f9             | ......OK....    
> V (153076210) simcom: rx: f9 09 ff 1d 0d 0a 2b 43 53 51 3a 20 31 2c 39 39 | ......+CSQ: 1,99
> V (153076220) simcom: rx: 0d 0a d1 f9 f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: 
> V (153076220) simcom: rx: 31 2c 39 39 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 | 1,99..........+C
> V (153076230) simcom: rx: 53 51 3a 20 31 2c 39 39 0d 0a 1e f9             | SQ: 1,99....    
> V (153202470) simcom: rx: f9 09 ff 1d 0d 0a 2b 43 53 51 3a 20 34 2c 39 39 | ......+CSQ: 4,99
> V (153202480) simcom: rx: 0d 0a d1 f9 f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: 
> V (153202490) simcom: rx: 34 2c 39 39 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 | 4,99..........+C
> V (153202500) simcom: rx: 53 51 3a 20 34 2c 39 39 0d 0a 1e f9             | SQ: 4,99....    
> V (179212720) simcom: rx: f9 09 ff 1d 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 | ......+CSQ: 0,99
> V (179212730) simcom: rx: 0d 0a d1 f9 f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: 
> V (179212730) simcom: rx: 30 2c 39 39 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 | 0,99..........+C
> V (179212740) simcom: rx: 53 51 3a 20 30 2c 39 39 0d 0a 1e f9             | SQ: 0,99....    
> V (179242750) simcom: rx: f9 09 ff 1d 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 | ......+CSQ: 0,99
> V (179242760) simcom: rx: 0d 0a d1 f9 f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: 
> V (179242770) simcom: rx: 30 2c 39 39 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 | 0,99..........+C
> V (179242770) simcom: rx: 53 51 3a 20 30 2c 39 39 0d 0a 1e f9             | SQ: 0,99....    
> V (179246100) simcom: rx: f9 09 ff 19 0d 0a 2b 43 52 45 47 3a 20 32 0d 0a | ......+CREG: 2..
> V (179246110) simcom: rx: d6 f9 f9 0d ff 19 0d 0a 2b 43 52 45 47 3a 20 32 | ........+CREG: 2
> V (179246110) simcom: rx: 0d 0a 14 f9 f9 11 ff 19 0d 0a 2b 43 52 45 47 3a | ..........+CREG:
> V (179246120) simcom: rx: 20 32 0d 0a 19 f9                               |  2....          
> V (179251580) simcom: rx: f9 09 ff 1d 0d 0a 2b 43 53 51 3a 20 34 2c 39 39 | ......+CSQ: 4,99
> V (179251590) simcom: rx: 0d 0a d1 f9 f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: 
> V (179251600) simcom: rx: 34 2c 39 39 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 | 4,99..........+C
> V (179251600) simcom: rx: 53 51 3a 20 34 2c 39 39 0d 0a 1e f9             | SQ: 4,99....    
> I (179290100) wifi: bcn_timout,ap_probe_send_start
> I (179292610) wifi: ap_probe_send over, resett wifi status to disassoc
> I (179292610) wifi: state: run -> init (1)
> I (179292620) wifi: pm stop, total sleep time: 0/138543477
> I (179292620) wifi: n:13 0, o:13 0, ap:255 255, sta:13 0, prof:1
> 
> The simcom itself seems to be up, and is giving us unsolicited notifications. Finally, the wifi stack falls over (bcn_timeout.ap_probe_sent).
> 
> This is very nasty, and I suspect will take some time to resolve. We can probably make it more stable by introducing a check for network loss, and requiring the network to be up and stable for N seconds before even attempting a reconnection; but I don’t think that addresses the core issue (which is most likely in the lwip ppp stack).
> 
> Code-wise, the simcom driver entering ‘NetLoss’ state calls m_ppp.Shutdown(true) (which issues the shutting down (hard) message). Our pppos driver issues the system.modem.down event (which causes netmanager to do it’s thing), and then calls ppapi_close(m_ppp, nocarrier). I suspect that never returns (but don’t know for sure).
> 
> The other workaround is to enable watchdog timers. I’m 99% certain this sort of lockup would mess up the idle tasks and those would pickup and fire a watchdog timer. So, I’ve gone ahead and committed code to enable the watchdog timers (2 minutes), and make that available as an OTA update:
> 
> OVMS# ota status
> Firmware:          3.1.001/ota_0/main (build idf v3.1-dev-453-g0f978bcb Mar 25 2018 14:45:59)
> Server Available:  3.1.001
> 
> The module in my car is now running OTA, so I can verify the OTA update process is smooth (it seems so).
> 
> Regards, Mark.
> 
> 
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20180328/36478ceb/attachment.htm>


More information about the OvmsDev mailing list