Hi, I've been struggling for some time with intermittent disconnections from the v2 server. I've switched from hologram to a local cellular provider's sim card, to match my v2 hardware which is very stable and the problem persists. With the car stationary, the connection seems very solid, but on the move, it usually disconnects within half an hour. I'm running 98169d8862db2bbc4bea14d013883f26ee6afb4e (from 2018-01-19) with idf 5bf85d06d8c402fe30ecb1bf9c09d5e69b923b2f and xtensa-esp32-elf-linux64-1.22.0-80-g6c4433a-5.2.0.tar.gz. I tried logging to the sdcard and on my desk powered by USB that seemed to work well, but it didn't work very well (only a few lines logged to the sdcard and then the whole system seemed to crash) when powered from the OBD2 port. I haven't tried to debug that. With the sdcard logging turned off, and logging with minicom I can get good diagnostics. It would appear that the v3 hardware thinks it is still connected and continues to send data, but the v2 server thinks the client has disconnected. I've got ovms logs which have regular lines quoting the local time from the modem, and server logs from the dexters-web server which I think are not UTC. Anyway, 05:47 in the server log corresponds to 17:47 in the ovms v3 log. The relevant lines of the logs are below. On the server log the ovms v3 hardware client is #72 and then #94. We can see the client sent and the server received S messages at :47:15 and :47:32. The client sent more messages that were not received by the server, more than I've transcribed here (full log attached). The server recorded a disconnect at :48:29, but the client recorded the transmission of messages in it's log for minutes after that. I did a simcom status command which reports that everything is connected, and then I pressed the button to reboot the module. The server records a new connection at :54:07 and the messages logged as transmitted on the client start to correspond with the messages logged as received on the server start again. Thoughts on how to proceed? Another oddity not shown here but included in the full logs is that the client sent a lot of D messages shortly after reconnecting. Server Log: 2018-01-26 05:47:15,36664,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96' 2018-01-26 05:47:15,36665,'#72 C rx msg D 128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0' 2018-01-26 05:47:15,36666,'#72 C rx msg L 0,0,0,0,0,0,0,0,0,0,0' 2018-01-26 05:47:15,36667,'#72 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees' 2018-01-26 05:47:32,36717,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96' 2018-01-26 05:48:29,36926,'#72 C got error: Connection reset by peer' 2018-01-26 05:50:08,37206,'#93 A got login' 2018-01-26 05:51:24,37485,'#36 A rx msg A ' 2018-01-26 05:52:53,37742,'#93 A got error: Broken pipe' 2018-01-26 05:52:54,37745,'#72 A got login' 2018-01-26 05:54:07,37937,'#94 C got login' 2018-01-26 05:54:13,37957,'#94 C rx msg S 87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0' 2018-01-26 05:54:13,37958,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,59,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:13,37959,'#94 C rx msg L 0,0,0,0,0,0,104,0,0,0,0' 2018-01-26 05:54:13,37960,'#94 C rx msg W 0,0,0,0,0,0,0,0,0' 2018-01-26 05:54:13,37961,'#94 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,1,1,NL,2degrees' 2018-01-26 05:54:15,37964,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,60,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:15,37965,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,61,28,1,0,0,13.011,0,0,128,0,0' Serial Console Log: OVMS > I (79663340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79663340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0 I (79663340) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,0,0,0,0,0 I (79663350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees OVMS > D (79665360) simcom: rx line ch=3 len=11 : +CSQ: 10,99 D (79665360) simcom: rx line ch=4 len=11 : +CSQ: 10,99 OVMS > D (79674380) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79674390) simcom: rx line ch=4 len=11 : +CSQ: 13,99 OVMS > D (79678420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79678420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:26+52" D (79678420) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79678420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79678420) simcom: rx line ch=3 len=2 : OK OVMS > I (79681340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96 OVMS > D (79683370) simcom: rx line ch=3 len=10 : +CSQ: 9,99 D (79683370) simcom: rx line ch=4 len=10 : +CSQ: 9,99 OVMS > D (79692410) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79692420) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79708420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79708420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:56+52" D (79708420) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79708420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79708420) simcom: rx line ch=3 len=2 : OK OVMS > D (79713470) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79713470) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79716420) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79716420) simcom: rx line ch=4 len=10 : +CSQ: 0,99 OVMS > I (79724340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79724340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,55,79724,30,1,1,1,13.022,0,0,128,0,0 I (79724350) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,55,0,0,0,0 I (79724350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,0,1,NL,2degrees OVMS > D (79738360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79738360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:26+52" D (79738360) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79738360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79738360) simcom: rx line ch=3 len=2 : OK OVMS > D (79740490) simcom: rx line ch=3 len=10 : +CSQ: 4,99 D (79740500) simcom: rx line ch=4 len=10 : +CSQ: 4,99 OVMS > I (79741340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,388.50,96 OVMS > D (79764490) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79764490) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79768360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79768360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:56+52" D (79768360) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79768360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79768360) simcom: rx line ch=3 len=2 : OK OVMS > D (79773490) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79773500) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79779500) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79779500) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > I (79785340) ovms-server-v2: Send MP-0 S89,K,0,0,stopped,standard,75,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,378.00,96 ... OVMS > simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 25111 User Data: 0 Mux Open Channels: 4 PPP Connected on channel: #2 PPP Last Error: None GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected ... I rebooted the module and it did the normal connection dance ... D (48359) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:53:57+52" ... I (57609) ovms-server-v2: Status: Connecting... OVMS > I (58219) ovms-server-v2: Connection successful I (58219) ovms-server-v2: Status: Logging in... I (58219) ovms-server-v2: Sending server login: MP-C 0 pwLSijW/3qAZ6z0LOBHbGS rnkkhqHUA6zeZqLx1q1Tow== NZLV3 OVMS > I (58889) ovms-server-v2: Got server response: MP-S 0 QTjRAp8ZQPJQV1UXnepldQ w1vWpIrRR9+eot9z3uslgw== I (58889) ovms-server-v2: Server token is QTjRAp8ZQPJQV1UXnepldQ and digest is w1vWpIrRR9+eot9z3uslgw== I (58899) ovms-server-v2: Status: Server auth ok. Now priming crypto. I (58899) ovms-server-v2: Shared secret key is QTjRAp8ZQPJQV1UXnepldQpwLSijW/3qAZ6z0LOBHbGS (44 bytes) I (58899) ovms-server-v2: Status: OVMS V2 login successful, and crypto channel established OVMS > I (59189) ovms-server-v2: Incoming Msg: MP-0 Z4 I (59189) ovms-server-v2: One or more peers have connected I (59339) ovms-server-v2: Send MP-0 S87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout. Regards, Michael Am 28.01.2018 um 03:32 schrieb Tom Parker:
Hi,
I've been struggling for some time with intermittent disconnections from the v2 server. I've switched from hologram to a local cellular provider's sim card, to match my v2 hardware which is very stable and the problem persists. With the car stationary, the connection seems very solid, but on the move, it usually disconnects within half an hour.
I'm running 98169d8862db2bbc4bea14d013883f26ee6afb4e (from 2018-01-19) with idf 5bf85d06d8c402fe30ecb1bf9c09d5e69b923b2f and xtensa-esp32-elf-linux64-1.22.0-80-g6c4433a-5.2.0.tar.gz.
I tried logging to the sdcard and on my desk powered by USB that seemed to work well, but it didn't work very well (only a few lines logged to the sdcard and then the whole system seemed to crash) when powered from the OBD2 port. I haven't tried to debug that.
With the sdcard logging turned off, and logging with minicom I can get good diagnostics. It would appear that the v3 hardware thinks it is still connected and continues to send data, but the v2 server thinks the client has disconnected.
I've got ovms logs which have regular lines quoting the local time from the modem, and server logs from the dexters-web server which I think are not UTC. Anyway, 05:47 in the server log corresponds to 17:47 in the ovms v3 log.
The relevant lines of the logs are below. On the server log the ovms v3 hardware client is #72 and then #94. We can see the client sent and the server received S messages at :47:15 and :47:32. The client sent more messages that were not received by the server, more than I've transcribed here (full log attached). The server recorded a disconnect at :48:29, but the client recorded the transmission of messages in it's log for minutes after that. I did a simcom status command which reports that everything is connected, and then I pressed the button to reboot the module. The server records a new connection at :54:07 and the messages logged as transmitted on the client start to correspond with the messages logged as received on the server start again.
Thoughts on how to proceed?
Another oddity not shown here but included in the full logs is that the client sent a lot of D messages shortly after reconnecting.
Server Log:
2018-01-26 05:47:15,36664,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96' 2018-01-26 05:47:15,36665,'#72 C rx msg D 128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0' 2018-01-26 05:47:15,36666,'#72 C rx msg L 0,0,0,0,0,0,0,0,0,0,0' 2018-01-26 05:47:15,36667,'#72 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees' 2018-01-26 05:47:32,36717,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96' 2018-01-26 05:48:29,36926,'#72 C got error: Connection reset by peer' 2018-01-26 05:50:08,37206,'#93 A got login' 2018-01-26 05:51:24,37485,'#36 A rx msg A ' 2018-01-26 05:52:53,37742,'#93 A got error: Broken pipe' 2018-01-26 05:52:54,37745,'#72 A got login' 2018-01-26 05:54:07,37937,'#94 C got login' 2018-01-26 05:54:13,37957,'#94 C rx msg S 87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0' 2018-01-26 05:54:13,37958,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,59,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:13,37959,'#94 C rx msg L 0,0,0,0,0,0,104,0,0,0,0' 2018-01-26 05:54:13,37960,'#94 C rx msg W 0,0,0,0,0,0,0,0,0' 2018-01-26 05:54:13,37961,'#94 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,1,1,NL,2degrees' 2018-01-26 05:54:15,37964,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,60,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:15,37965,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,61,28,1,0,0,13.011,0,0,128,0,0'
Serial Console Log:
OVMS > I (79663340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79663340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0 I (79663340) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,0,0,0,0,0 I (79663350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees OVMS > D (79665360) simcom: rx line ch=3 len=11 : +CSQ: 10,99 D (79665360) simcom: rx line ch=4 len=11 : +CSQ: 10,99 OVMS > D (79674380) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79674390) simcom: rx line ch=4 len=11 : +CSQ: 13,99 OVMS > D (79678420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79678420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:26+52" D (79678420) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79678420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79678420) simcom: rx line ch=3 len=2 : OK OVMS > I (79681340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96 OVMS > D (79683370) simcom: rx line ch=3 len=10 : +CSQ: 9,99 D (79683370) simcom: rx line ch=4 len=10 : +CSQ: 9,99 OVMS > D (79692410) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79692420) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79708420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79708420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:56+52" D (79708420) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79708420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79708420) simcom: rx line ch=3 len=2 : OK OVMS > D (79713470) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79713470) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79716420) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79716420) simcom: rx line ch=4 len=10 : +CSQ: 0,99 OVMS > I (79724340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79724340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,55,79724,30,1,1,1,13.022,0,0,128,0,0 I (79724350) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,55,0,0,0,0 I (79724350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,0,1,NL,2degrees OVMS > D (79738360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79738360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:26+52" D (79738360) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79738360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79738360) simcom: rx line ch=3 len=2 : OK OVMS > D (79740490) simcom: rx line ch=3 len=10 : +CSQ: 4,99 D (79740500) simcom: rx line ch=4 len=10 : +CSQ: 4,99 OVMS > I (79741340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,388.50,96 OVMS > D (79764490) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79764490) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79768360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79768360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:56+52" D (79768360) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79768360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79768360) simcom: rx line ch=3 len=2 : OK OVMS > D (79773490) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79773500) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79779500) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79779500) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > I (79785340) ovms-server-v2: Send MP-0 S89,K,0,0,stopped,standard,75,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,378.00,96 ... OVMS > simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 25111 User Data: 0 Mux Open Channels: 4 PPP Connected on channel: #2 PPP Last Error: None GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected ... I rebooted the module and it did the normal connection dance ... D (48359) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:53:57+52" ... I (57609) ovms-server-v2: Status: Connecting... OVMS > I (58219) ovms-server-v2: Connection successful I (58219) ovms-server-v2: Status: Logging in... I (58219) ovms-server-v2: Sending server login: MP-C 0 pwLSijW/3qAZ6z0LOBHbGS rnkkhqHUA6zeZqLx1q1Tow== NZLV3 OVMS > I (58889) ovms-server-v2: Got server response: MP-S 0 QTjRAp8ZQPJQV1UXnepldQ w1vWpIrRR9+eot9z3uslgw== I (58889) ovms-server-v2: Server token is QTjRAp8ZQPJQV1UXnepldQ and digest is w1vWpIrRR9+eot9z3uslgw== I (58899) ovms-server-v2: Status: Server auth ok. Now priming crypto. I (58899) ovms-server-v2: Shared secret key is QTjRAp8ZQPJQV1UXnepldQpwLSijW/3qAZ6z0LOBHbGS (44 bytes) I (58899) ovms-server-v2: Status: OVMS V2 login successful, and crypto channel established OVMS > I (59189) ovms-server-v2: Incoming Msg: MP-0 Z4 I (59189) ovms-server-v2: One or more peers have connected I (59339) ovms-server-v2: Send MP-0 S87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code. -- Steve On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
Are the disconnects really the result of network drops? I recently "discovered" the Features and Parameters tabs in the V2 app (I'm still new at this) and notice that the server disconnects after some number of records have been sent. The server reconnects about 30 seconds later, but neither tab ever completes. Seems like there is an error along the way that kills the connection. 100% repeatable. Greg On January 28, 2018 10:13:40 AM PST, Stephen Casner <casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Greg, I think this is a different issue. I used to have this all the time, but since switching ovms_server_v2 to mongoose, I haven’t seen it. The features and parameters calls send a relatively large amount of data car->server. I think it is related to free RAM. Regards, Mark.
On 29 Jan 2018, at 2:55 AM, Greg D <gregd2350@gmail.com> wrote:
Are the disconnects really the result of network drops? I recently "discovered" the Features and Parameters tabs in the V2 app (I'm still new at this) and notice that the server disconnects after some number of records have been sent. The server reconnects about 30 seconds later, but neither tab ever completes. Seems like there is an error along the way that kills the connection. 100% repeatable.
Greg
On January 28, 2018 10:13:40 AM PST, Stephen Casner <casner@acm.org> wrote: I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
There is data corruption on the second line here:
I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32,
and on subsequent lines, more. When the server receives that, it will disconnect the client. That diagnostic logging is output before encryption, so it is something about the setup of std::ostringstream* buffer in ovms_server_v2.cpp OvmsServerV2::ProcessCommand(). Look at command case 3 (request parameter list). It seems that the std::ostringstream version of that code is only used in OvmsServerV2::ProcessCommand (the others use the string variant of it). Can you try changing those three occurrences to Transmit(buffer->str().c_str()) instead of Transmit(*buffer) Regards, Mark.
On 29 Jan 2018, at 3:28 PM, Greg D. <gregd2350@gmail.com> wrote:
Hi Mark,
Memory was my first thought too, but there's no overt indication of that. There's just an apparently spontaneous close, except it always happens at exactly the same place. This is with the module sitting on my desk (not in the vehicle). It acts the same in the car, but with real data. Tapping "Features" on the V2 client is similar. This is with the module connected to the local network via wifi.
I tried taking out SSH Server and SD support from the build config, to save RAM, but no change to the results. No change with a fresh fetch of Master from Github just now, either. Any idea what I can change (menuconfig, module config, logging), to reveal more info?
Greg
OVMS > log level verbose Logging level for * set to verbose
(Tapped "Parameters" here)
V (353350698) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_RECV) I (353350698) ovms-server-v2: Incoming Msg: MP-0 C3 I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32, I (353350708) ovms-server-v2: Send ���? c3,0,4,32,tmc.openvehicles.com <http://tmc.openvehicles.com/> I (353350708) ovms-server-v2: Send ���? c3,0,5,32,hologram I (353350708) ovms-server-v2: Send MP-0 c3,0,6,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,7,32, I (353350708) ovms-server-v2: Send 4��? c3,0,8,32,ROADSTER_834 I (353350708) ovms-server-v2: Send @��? c3,0,9,32,************* I (353350708) ovms-server-v2: Send ���? c3,0,10,32, I (353350708) ovms-server-v2: Send ���? c3,0,11,32, I (353350708) ovms-server-v2: Send `��? c3,0,12,32, I (353350708) ovms-server-v2: Send `��? c3,0,13,32, I (353350708) ovms-server-v2: Send `��? c3,0,14,32, I (353350708) ovms-server-v2: Send `��? c3,0,15,32, I (353350708) ovms-server-v2: Send `��? c3,0,16,32, I (353350708) ovms-server-v2: Send ���? c3,0,17,32, I (353350708) ovms-server-v2: Send ���? c3,0,18,32, I (353350708) ovms-server-v2: Send ���? c3,0,19,32, I (353350708) ovms-server-v2: Send ���? c3,0,20,32, I (353350708) ovms-server-v2: Send ���? c3,0,21,32, I (353350708) ovms-server-v2: Send ���? c3,0,22,32, I (353350708) ovms-server-v2: Send ���? c3,0,23,32, I (353350708) ovms-server-v2: Send ���? c3,0,24,32, I (353350708) ovms-server-v2: Send ���? c3,0,25,32, I (353350708) ovms-server-v2: Send ���? c3,0,26,32, I (353350708) ovms-server-v2: Send ���? c3,0,27,32, I (353350708) ovms-server-v2: Send ���? c3,0,28,32, I (353350708) ovms-server-v2: Send ���? c3,0,29,32, I (353350708) ovms-server-v2: Send ���? c3,0,30,32, I (353350708) ovms-server-v2: Send ���? c3,0,31,32, V (353350928) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) E (353350928) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS >
Mark Webb-Johnson wrote:
Greg,
I think this is a different issue. I used to have this all the time, but since switching ovms_server_v2 to mongoose, I haven’t seen it. The features and parameters calls send a relatively large amount of data car->server. I think it is related to free RAM.
Regards, Mark.
On 29 Jan 2018, at 2:55 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Are the disconnects really the result of network drops? I recently "discovered" the Features and Parameters tabs in the V2 app (I'm still new at this) and notice that the server disconnects after some number of records have been sent. The server reconnects about 30 seconds later, but neither tab ever completes. Seems like there is an error along the way that kills the connection. 100% repeatable.
Greg
On January 28, 2018 10:13:40 AM PST, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote: I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I think the issue is the last empty ‘send’. Really not sure why it is working for me, and not for you. Anyway, I’ve committed these changes to master, as well as a fix for that end case. Can you try now? Regards, Mark.
On 30 Jan 2018, at 2:10 AM, Greg D. <gregd2350@gmail.com> wrote:
Hi Mark,
Bingo! Well, mostly. Changes made fixed the corruption on both Parameter and Feature list fetching, both of which now complete back to the Android app. Changes pushed to my github fork.
But there's still a disconnect. This one looks like a buffer size limit being hit, given that the log ends mid-string. I tried setting the log level to 'warn', and the disconnect still occurs, so it's not due to the large text output. Or, is the disconnect intended after these fetches, and the message truncation just a consequence of our log buffering system? The server reconnects automatically, after a short delay.
Greg
OVMS > V (112065) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_RECV) I (112065) ovms-server-v2: Incoming Msg: MP-0 C1 I (112065) ovms-server-v2: Send MP-0 c1,0,0,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,1,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,2,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,3,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,4,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,5,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,6,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,7,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,8,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,9,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,10,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,11,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,12,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,13,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,14,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,15,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,16,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,17,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,18,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,19,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,20,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,21,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,22,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,23,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,24,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,25,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,26,32,0 I (112085) ovms-server-v2: Send MP-0 c1,0,27,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,28,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,29,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,30,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,31,32,0 I (112095) ovms-server-v2: Send V (112365) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) E (112365) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS >
Mark Webb-Johnson wrote:
There is data corruption on the second line here:
I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32,
and on subsequent lines, more.
When the server receives that, it will disconnect the client.
That diagnostic logging is output before encryption, so it is something about the setup of std::ostringstream* buffer in ovms_server_v2.cpp OvmsServerV2::ProcessCommand(). Look at command case 3 (request parameter list).
It seems that the std::ostringstream version of that code is only used in OvmsServerV2::ProcessCommand (the others use the string variant of it).
Can you try changing those three occurrences to Transmit(buffer->str().c_str()) instead of Transmit(*buffer)
Regards, Mark.
On 29 Jan 2018, at 3:28 PM, Greg D. <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Memory was my first thought too, but there's no overt indication of that. There's just an apparently spontaneous close, except it always happens at exactly the same place. This is with the module sitting on my desk (not in the vehicle). It acts the same in the car, but with real data. Tapping "Features" on the V2 client is similar. This is with the module connected to the local network via wifi.
I tried taking out SSH Server and SD support from the build config, to save RAM, but no change to the results. No change with a fresh fetch of Master from Github just now, either. Any idea what I can change (menuconfig, module config, logging), to reveal more info?
Greg
OVMS > log level verbose Logging level for * set to verbose
(Tapped "Parameters" here)
V (353350698) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_RECV) I (353350698) ovms-server-v2: Incoming Msg: MP-0 C3 I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32, I (353350708) ovms-server-v2: Send ���? c3,0,4,32,tmc.openvehicles.com <http://tmc.openvehicles.com/> I (353350708) ovms-server-v2: Send ���? c3,0,5,32,hologram I (353350708) ovms-server-v2: Send MP-0 c3,0,6,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,7,32, I (353350708) ovms-server-v2: Send 4��? c3,0,8,32,ROADSTER_834 I (353350708) ovms-server-v2: Send @��? c3,0,9,32,************* I (353350708) ovms-server-v2: Send ���? c3,0,10,32, I (353350708) ovms-server-v2: Send ���? c3,0,11,32, I (353350708) ovms-server-v2: Send `��? c3,0,12,32, I (353350708) ovms-server-v2: Send `��? c3,0,13,32, I (353350708) ovms-server-v2: Send `��? c3,0,14,32, I (353350708) ovms-server-v2: Send `��? c3,0,15,32, I (353350708) ovms-server-v2: Send `��? c3,0,16,32, I (353350708) ovms-server-v2: Send ���? c3,0,17,32, I (353350708) ovms-server-v2: Send ���? c3,0,18,32, I (353350708) ovms-server-v2: Send ���? c3,0,19,32, I (353350708) ovms-server-v2: Send ���? c3,0,20,32, I (353350708) ovms-server-v2: Send ���? c3,0,21,32, I (353350708) ovms-server-v2: Send ���? c3,0,22,32, I (353350708) ovms-server-v2: Send ���? c3,0,23,32, I (353350708) ovms-server-v2: Send ���? c3,0,24,32, I (353350708) ovms-server-v2: Send ���? c3,0,25,32, I (353350708) ovms-server-v2: Send ���? c3,0,26,32, I (353350708) ovms-server-v2: Send ���? c3,0,27,32, I (353350708) ovms-server-v2: Send ���? c3,0,28,32, I (353350708) ovms-server-v2: Send ���? c3,0,29,32, I (353350708) ovms-server-v2: Send ���? c3,0,30,32, I (353350708) ovms-server-v2: Send ���? c3,0,31,32, V (353350928) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) E (353350928) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS >
Mark Webb-Johnson wrote:
Greg,
I think this is a different issue. I used to have this all the time, but since switching ovms_server_v2 to mongoose, I haven’t seen it. The features and parameters calls send a relatively large amount of data car->server. I think it is related to free RAM.
Regards, Mark.
On 29 Jan 2018, at 2:55 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Are the disconnects really the result of network drops? I recently "discovered" the Features and Parameters tabs in the V2 app (I'm still new at this) and notice that the server disconnects after some number of records have been sent. The server reconnects about 30 seconds later, but neither tab ever completes. Seems like there is an error along the way that kills the connection. 100% repeatable.
Greg
On January 28, 2018 10:13:40 AM PST, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote: I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
git diff, or git diff —staged, should tell you.
On 30 Jan 2018, at 8:34 AM, Greg D. <gregd2350@gmail.com> wrote:
Hi Mark,
Looks like that did it. No more disconnects. Yea! The verbose logging is good now, too.
Greg
p.s. Just curious, why does Git think I'm still one commit ahead of Master?
Mark Webb-Johnson wrote:
I think the issue is the last empty ‘send’.
Really not sure why it is working for me, and not for you. Anyway, I’ve committed these changes to master, as well as a fix for that end case. Can you try now?
Regards, Mark.
On 30 Jan 2018, at 2:10 AM, Greg D. <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Bingo! Well, mostly. Changes made fixed the corruption on both Parameter and Feature list fetching, both of which now complete back to the Android app. Changes pushed to my github fork.
But there's still a disconnect. This one looks like a buffer size limit being hit, given that the log ends mid-string. I tried setting the log level to 'warn', and the disconnect still occurs, so it's not due to the large text output. Or, is the disconnect intended after these fetches, and the message truncation just a consequence of our log buffering system? The server reconnects automatically, after a short delay.
Greg
OVMS > V (112065) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_RECV) I (112065) ovms-server-v2: Incoming Msg: MP-0 C1 I (112065) ovms-server-v2: Send MP-0 c1,0,0,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,1,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,2,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,3,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,4,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,5,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,6,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,7,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,8,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,9,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,10,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,11,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,12,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,13,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,14,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,15,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,16,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,17,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,18,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,19,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,20,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,21,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,22,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,23,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,24,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,25,32,0 I (112075) ovms-server-v2: Send MP-0 c1,0,26,32,0 I (112085) ovms-server-v2: Send MP-0 c1,0,27,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,28,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,29,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,30,32,0 I (112095) ovms-server-v2: Send MP-0 c1,0,31,32,0 I (112095) ovms-server-v2: Send V (112365) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) E (112365) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS >
Mark Webb-Johnson wrote:
There is data corruption on the second line here:
I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32,
and on subsequent lines, more.
When the server receives that, it will disconnect the client.
That diagnostic logging is output before encryption, so it is something about the setup of std::ostringstream* buffer in ovms_server_v2.cpp OvmsServerV2::ProcessCommand(). Look at command case 3 (request parameter list).
It seems that the std::ostringstream version of that code is only used in OvmsServerV2::ProcessCommand (the others use the string variant of it).
Can you try changing those three occurrences to Transmit(buffer->str().c_str()) instead of Transmit(*buffer)
Regards, Mark.
On 29 Jan 2018, at 3:28 PM, Greg D. <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Memory was my first thought too, but there's no overt indication of that. There's just an apparently spontaneous close, except it always happens at exactly the same place. This is with the module sitting on my desk (not in the vehicle). It acts the same in the car, but with real data. Tapping "Features" on the V2 client is similar. This is with the module connected to the local network via wifi.
I tried taking out SSH Server and SD support from the build config, to save RAM, but no change to the results. No change with a fresh fetch of Master from Github just now, either. Any idea what I can change (menuconfig, module config, logging), to reveal more info?
Greg
OVMS > log level verbose Logging level for * set to verbose
(Tapped "Parameters" here)
V (353350698) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_RECV) I (353350698) ovms-server-v2: Incoming Msg: MP-0 C3 I (353350698) ovms-server-v2: Send MP-0 c3,0,0,32, I (353350698) ovms-server-v2: Send (��? c3,0,1,32,******** I (353350708) ovms-server-v2: Send MP-0 c3,0,2,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,3,32, I (353350708) ovms-server-v2: Send ���? c3,0,4,32,tmc.openvehicles.com <http://tmc.openvehicles.com/> I (353350708) ovms-server-v2: Send ���? c3,0,5,32,hologram I (353350708) ovms-server-v2: Send MP-0 c3,0,6,32, I (353350708) ovms-server-v2: Send MP-0 c3,0,7,32, I (353350708) ovms-server-v2: Send 4��? c3,0,8,32,ROADSTER_834 I (353350708) ovms-server-v2: Send @��? c3,0,9,32,************* I (353350708) ovms-server-v2: Send ���? c3,0,10,32, I (353350708) ovms-server-v2: Send ���? c3,0,11,32, I (353350708) ovms-server-v2: Send `��? c3,0,12,32, I (353350708) ovms-server-v2: Send `��? c3,0,13,32, I (353350708) ovms-server-v2: Send `��? c3,0,14,32, I (353350708) ovms-server-v2: Send `��? c3,0,15,32, I (353350708) ovms-server-v2: Send `��? c3,0,16,32, I (353350708) ovms-server-v2: Send ���? c3,0,17,32, I (353350708) ovms-server-v2: Send ���? c3,0,18,32, I (353350708) ovms-server-v2: Send ���? c3,0,19,32, I (353350708) ovms-server-v2: Send ���? c3,0,20,32, I (353350708) ovms-server-v2: Send ���? c3,0,21,32, I (353350708) ovms-server-v2: Send ���? c3,0,22,32, I (353350708) ovms-server-v2: Send ���? c3,0,23,32, I (353350708) ovms-server-v2: Send ���? c3,0,24,32, I (353350708) ovms-server-v2: Send ���? c3,0,25,32, I (353350708) ovms-server-v2: Send ���? c3,0,26,32, I (353350708) ovms-server-v2: Send ���? c3,0,27,32, I (353350708) ovms-server-v2: Send ���? c3,0,28,32, I (353350708) ovms-server-v2: Send ���? c3,0,29,32, I (353350708) ovms-server-v2: Send ���? c3,0,30,32, I (353350708) ovms-server-v2: Send ���? c3,0,31,32, V (353350928) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) E (353350928) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS >
Mark Webb-Johnson wrote:
Greg,
I think this is a different issue. I used to have this all the time, but since switching ovms_server_v2 to mongoose, I haven’t seen it. The features and parameters calls send a relatively large amount of data car->server. I think it is related to free RAM.
Regards, Mark.
> On 29 Jan 2018, at 2:55 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote: > > Are the disconnects really the result of network drops? I recently "discovered" the Features and Parameters tabs in the V2 app (I'm still new at this) and notice that the server disconnects after some number of records have been sent. The server reconnects about 30 seconds later, but neither tab ever completes. Seems like there is an error along the way that kills the connection. 100% repeatable. > > Greg > > > On January 28, 2018 10:13:40 AM PST, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote: > I handled wifi shutdowns cleanly when I first implemented telnet and > ssh as their own tasks. Now that they are under Mongoose, it is out > of my control. The socket is owned by the Mongoose code. > > -- Steve > > On Sun, 28 Jan 2018, Michael Balzer wrote: > > I've begun working on the webserver and noticed something that may > be correlated to this: sockets don't get closed when losing the > connection. The effect is visible on both web and telnet server (ssh > not tested). To reproduce, switch the Wifi network with an open > connection, the port will not be available until timeout. > > Regards, > Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I’ve tried pulling manually. See if that fixes it. Best way, in general, is for you to send a pull request to us. That way we see the changes you want to incorporate and can merge them in simply. Regards, Mark.
On 30 Jan 2018, at 10:58 AM, Greg D. <gregd2350@gmail.com> wrote:
Not the PC - it's fine, and in symc with my fork on Github. It's my fork on Github that says that it's 2 commits ahead. One of those is an empty remote-tracking commit from the last fetch; the other is the c_str() change to the v2 server... Did you actually sync my fork to Master, or just make the same changes locally?
Same changes, I guess, but if my fork isn't formally sync'd, I seem to keep accumulating those remote tracking commits, which becomes rather annoying. Should I be pushing directly to Master instead? (It's admittedly safer to not change this...).
Greg
Mark Webb-Johnson wrote:
git diff, or git diff —staged, should tell you.
On 30 Jan 2018, at 8:34 AM, Greg D. <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Looks like that did it. No more disconnects. Yea! The verbose logging is good now, too.
Greg
p.s. Just curious, why does Git think I'm still one commit ahead of Master?
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I suspect that this is timing related: OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped. I repeated the test, but with event logging on: OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 29/01/18 13:29, Mark Webb-Johnson wrote:
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
I don't think we need to address the situation where the server does not get informed of the disconnection -- the disconnection could happen before we know it has happened preventing us from signaling the server. This could happen simply by driving away from the wifi network. What we need to address is informing the client inside the vehicle module that it is no longer connected. In the case where we know the network has gone down, we can have the OVMS process listen for the event and take appropriate actions. In the case where the network is up but the socket is broken somewhere in the network and we get a RST response or we never get an ACK, we should signal the client. It's been a long time since I dealt with raw sockets and tried to handle all the edge cases so I'll have to defer to others with more knowledge of the apis to suggest how it should work. The v2 client is good in that it transmits periodically (which should eliminate the need for TCP level keepalive packets on all but the most aggressive networks) so the socket library has an opportunity detect the broken socket and tell the client code that the socket isn't working. I wrote an OVMS v2 python client which suffers from the same problem, every now and again it needs to be restarted because it thinks it is connected but it isn't. Unlike the vehicle module, this client never writes to the socket after login but I'm a little surprised the OS doesn't eventually say "this socket is broken" or something, I really should work out how to fix it (maybe by restarting if there has been no data for some number of minutes).
I am just suggesting we clean up that clean-shutdown case. If the network is being shut down (either on command, or via script), we can at least do it cleanly. We do get such an indication (shutting down event, followed by wifi shutdown, followed by shut down event). We should, in general, shutdown our connection on the indication-to-shutdown event, rather than the already-shutdown event (as that at least allows some chance to inform the remote end of the issue). Perhaps the ‘wifi mode off’ command should sleep for a second or two, after issuing the indication-to-shutdown event, to increase the chance of a clean shutdown? For the other case of connections being externally dropped, I’m trying to see how the ESP IDF v3 framework handles it. Espressif have changed things slightly, and there is now a ‘lost ip’ timer that signals the IP address being lost after a timeout; I think we can pickup on that event to shut things down. That is SYSTEM_EVENT_STA_LOST_IP, but wasn’t available in ESP IDF v2.1 (and is now available in ESP IDF v3 but we don’t use it). I am not sure how that differs from the SYSTEM_EVENT_STA_DISCONNECTED system event (which is the one we handle at the moment). In general, I don’t think our handling of these events is optimal at the moment. We have a variety of low-level events from things like the wifi driver, and then high-level network manager events. But things like console_ssh should probably only bring up the server in the case of a wifi connection (not simcom modem ppp). I see mdns uses the low-level wifi events, but console_ssh uses the high-level network manager events (because it integrates to mongoose so can’t bring up the server until mongoose is ready to be initialised). In the normal socket comms case, a write() on a disconnected socket will fail and indicate appropriately. But with the comms going through mongoose, I am not sure how exactly things are handled. When we mg_send() data, it is merely added to an output buffer and no socket write occurs at that time so we won’t be getting back an error indication. In fact, the prototype for mg_send is a void return: void mg_send(struct mg_connection *, const void *buf, int len); I assume that if mongoose calls write() on the socket, and an error indication comes back, then it closes the connection. In mg_write_to_socket(), I see that kind of logic (but haven’t traced it myself): if (n > 0) { mg_if_sent_cb(nc, n); } else if (n < 0 && mg_is_error(n)) { /* Something went wrong, drop the connection. */ nc->flags |= MG_F_CLOSE_IMMEDIATELY; } That should deliver an MG_EV_CLOSE to each of the active connections. I’ll have to look at it in more details. Just trying to get the final testing of v3.1 hardware done first. Regards, Mark.
On 29 Jan 2018, at 9:07 AM, Tom Parker <tom@carrott.org> wrote:
On 29/01/18 13:29, Mark Webb-Johnson wrote:
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
I don't think we need to address the situation where the server does not get informed of the disconnection -- the disconnection could happen before we know it has happened preventing us from signaling the server. This could happen simply by driving away from the wifi network.
What we need to address is informing the client inside the vehicle module that it is no longer connected. In the case where we know the network has gone down, we can have the OVMS process listen for the event and take appropriate actions. In the case where the network is up but the socket is broken somewhere in the network and we get a RST response or we never get an ACK, we should signal the client. It's been a long time since I dealt with raw sockets and tried to handle all the edge cases so I'll have to defer to others with more knowledge of the apis to suggest how it should work. The v2 client is good in that it transmits periodically (which should eliminate the need for TCP level keepalive packets on all but the most aggressive networks) so the socket library has an opportunity detect the broken socket and tell the client code that the socket isn't working.
I wrote an OVMS v2 python client which suffers from the same problem, every now and again it needs to be restarted because it thinks it is connected but it isn't. Unlike the vehicle module, this client never writes to the socket after login but I'm a little surprised the OS doesn't eventually say "this socket is broken" or something, I really should work out how to fix it (maybe by restarting if there has been no data for some number of minutes).
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Mark, Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down. It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway. I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more? -- Steve On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails. Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it? I can do some more debugging to get the exact point of failure. Regards, Michael Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll. Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well. Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> <mailto:casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function. There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached. If you apply the patch: you'll need log level verbose for full debug details like these: I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c *8192* V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 *8193* V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0 /now doing a HTTP GET /home from the browser//:/ V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 *8194* V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 *GET /home* V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 *I (35515) webserver: HTTP GET /home* V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0 *OVMS > wifi mode off* Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 *8194* V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 *8193* V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 *8192* V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0 … doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but… *OVMS > wifi mode client * Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server *V (54995) mongoose: mg_bind_opt Failed to open listener: 112* V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0 … so it's indeed 112 = EADDRINUSE as assumed. As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve? Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi. Regards, Michael
Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
I've begun working on the webserver and noticed something that may be correlated to this: sockets don't get closed when losing the connection. The effect is visible on both web and telnet server (ssh not tested). To reproduce, switch the Wifi network with an open connection, the port will not be available until timeout.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Strange. I don’t have the debugging enabled, but this is what I get: OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET / OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET / Tried it 10 times to be certain. Are you on the master branch of https://github.com/openvehicles/esp-idf.git <https://github.com/openvehicles/esp-idf.git> and up to date? There are a bunch of menuconfig options in Components/LWIP. This is what I have: │ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API ---> The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case. Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c 8192 V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 8193 V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
now doing a HTTP GET /home from the browser:
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 8194 V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 GET /home V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 I (35515) webserver: HTTP GET /home V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
OVMS > wifi mode off Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 8194 V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 8193 V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 8192 V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server V (54995) mongoose: mg_bind_opt Failed to open listener: 112 V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> <mailto:casner@acm.org> wrote:
I handled wifi shutdowns cleanly when I first implemented telnet and ssh as their own tasks. Now that they are under Mongoose, it is out of my control. The socket is owned by the Mongoose code.
-- Steve
On Sun, 28 Jan 2018, Michael Balzer wrote:
> I've begun working on the webserver and noticed something that may > be correlated to this: sockets don't get closed when losing the > connection. The effect is visible on both web and telnet server (ssh > not tested). To reproduce, switch the Wifi network with an open > connection, the port will not be available until timeout. > > Regards, > Michael > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> > http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
My esp-idf is up to date. Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session. I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though. Attaching my sdkconfig. Regards, Michael Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c *8192* V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 *8193* V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
/now doing a HTTP GET /home from the browser//:/
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 *8194* V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 *GET /home* V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 *I (35515) webserver: HTTP GET /home* V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
*OVMS > wifi mode off* Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 *8194* V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 *8193* V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 *8192* V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
*OVMS > wifi mode client * Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server *V (54995) mongoose: mg_bind_opt Failed to open listener: 112* V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
I suspect that this is timing related:
OVMS > wifi mode client XXX Starting WIFI as a client to XXX… … I (337100) ssh: Launching SSH Server V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 … OVMS > wifi mode off Stopping wifi station... I (350580) ovms-mdns: Stopping MDNS service I (350580) wifi: state: run -> init (0) I (350590) wifi: pm stop, total sleep time: 0/11930449 I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (350600) wifi: flush txq I (350600) wifi: stop sw txq I (350600) wifi: lmac stop hw txq I (350600) wifi: Deinit lldesc rx mblock:4 I (351340) webserver: Stopping Web Server I (351340) ssh: Stopping SSH Server V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 I (357570) event: station ip lost
But ssh client connection is still up on my workstation. Looks like the MG_EV_CLOSE events came in after the SSH server was stopped.
I repeated the test, but with event logging on:
OVMS > wifi mode off Stopping wifi station... I (34171) events: Signal(system.wifi.down) I (34171) events: Signal(network.wifi.down) I (34171) ovms-mdns: Stopping MDNS service I (34171) events: Signal(network.reconfigured) I (34171) events: Signal(network.down) I (34171) wifi: state: run -> init (0) I (34181) wifi: pm stop, total sleep time: 0/9603569 I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (34191) wifi: flush txq I (34191) wifi: stop sw txq I (34191) wifi: lmac stop hw txq I (34191) wifi: Deinit lldesc rx mblock:4 I (34191) events: Signal(system.wifi.sta.disconnected) I (34191) events: Signal(system.wifi.sta.stop) I (35131) events: Signal(network.mgr.stop) I (35131) webserver: Stopping Web Server I (35131) ssh: Stopping SSH Server V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0
MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference.
Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects.
Regards, Mark.
> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote: > > I handled wifi shutdowns cleanly when I first implemented telnet and > ssh as their own tasks. Now that they are under Mongoose, it is out > of my control. The socket is owned by the Mongoose code. > > -- Steve > > On Sun, 28 Jan 2018, Michael Balzer wrote: > >> I've begun working on the webserver and noticed something that may >> be correlated to this: sockets don't get closed when losing the >> connection. The effect is visible on both web and telnet server (ssh >> not tested). To reproduce, switch the Wifi network with an open >> connection, the port will not be available until timeout. >> >> Regards, >> Michael >> >> >> _______________________________________________ >> OvmsDev mailing list >> OvmsDev@lists.teslaclub.hk >> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
My v3 tree switched to 3.1 now, so not so easy ;-( I really need to work out a way to do automated builds (switching sdkconfig appropriately). I think you may be correct - it is probably not the listen socket that has the issue, but the actual connection socket. Can you try enabling the option SO_REUSEADDR in menuconfig? That should allow the listen socket to be opened, even if a connection socket is in TIME_WAIT state (assuming the mongoose library sets SO_REUSEADDR correctly). Regards, Mark
On 30 Jan 2018, at 3:54 PM, Michael Balzer <dexter@expeedo.de> wrote:
My esp-idf is up to date.
Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session.
I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though.
Attaching my sdkconfig.
Regards, Michael
Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git <https://github.com/openvehicles/esp-idf.git> and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c 8192 V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 8193 V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
now doing a HTTP GET /home from the browser:
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 8194 V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 GET /home V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 I (35515) webserver: HTTP GET /home V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
OVMS > wifi mode off Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 8194 V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 8193 V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 8192 V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server V (54995) mongoose: mg_bind_opt Failed to open listener: 112 V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner:
Mark,
Back in November we discussed this problem of the client's half of the connection being left open. Then as now, you commented that mongoose sends a close event (MG_EV_CLOSE) for each open connection when the interface went down.
It does send a close event, but not until after the wifi is already shut down so closing the socket at that point does not send any packet to the client. I decided to punt on that issue, though, because manually shutting down wifi is not an important use case. The more likely case is that wifi connectivity is lost due to motion or other causes and in that case no close packet can be delivered anyway.
I interpreted Michael's message to be referring to a problem with ports on the server (OVMS) end. That is, to say that mongoose didn't clean up LWIP state properly. Michael, can you explain a bit more?
-- Steve
On Mon, 29 Jan 2018, Mark Webb-Johnson wrote:
> I suspect that this is timing related: > > OVMS > wifi mode client XXX > Starting WIFI as a client to XXX… > … > I (337100) ssh: Launching SSH Server > V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc > V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 > V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 > V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 > V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 > … > OVMS > wifi mode off > Stopping wifi station... > I (350580) ovms-mdns: Stopping MDNS service > I (350580) wifi: state: run -> init (0) > I (350590) wifi: pm stop, total sleep time: 0/11930449 > I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 > I (350600) wifi: flush txq > I (350600) wifi: stop sw txq > I (350600) wifi: lmac stop hw txq > I (350600) wifi: Deinit lldesc rx mblock:4 > I (351340) webserver: Stopping Web Server > I (351340) ssh: Stopping SSH Server > V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 > V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 > I (357570) event: station ip lost > > But ssh client connection is still up on my workstation. Looks like > the MG_EV_CLOSE events came in after the SSH server was stopped. > > I repeated the test, but with event logging on: > > OVMS > wifi mode off > Stopping wifi station... > I (34171) events: Signal(system.wifi.down) > I (34171) events: Signal(network.wifi.down) > I (34171) ovms-mdns: Stopping MDNS service > I (34171) events: Signal(network.reconfigured) > I (34171) events: Signal(network.down) > I (34171) wifi: state: run -> init (0) > I (34181) wifi: pm stop, total sleep time: 0/9603569 > I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 > I (34191) wifi: flush txq > I (34191) wifi: stop sw txq > I (34191) wifi: lmac stop hw txq > I (34191) wifi: Deinit lldesc rx mblock:4 > I (34191) events: Signal(system.wifi.sta.disconnected) > I (34191) events: Signal(system.wifi.sta.stop) > I (35131) events: Signal(network.mgr.stop) > I (35131) webserver: Stopping Web Server > I (35131) ssh: Stopping SSH Server > V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 > V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 > > MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. > > Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. > > Regards, Mark. > >> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> <mailto:casner@acm.org> wrote: >> >> I handled wifi shutdowns cleanly when I first implemented telnet and >> ssh as their own tasks. Now that they are under Mongoose, it is out >> of my control. The socket is owned by the Mongoose code. >> >> -- Steve >> >> On Sun, 28 Jan 2018, Michael Balzer wrote: >> >>> I've begun working on the webserver and noticed something that may >>> be correlated to this: sockets don't get closed when losing the >>> connection. The effect is visible on both web and telnet server (ssh >>> not tested). To reproduce, switch the Wifi network with an open >>> connection, the port will not be available until timeout. >>> >>> Regards, >>> Michael >>> >>> >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <sdkconfig.txt>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Am 30.01.2018 um 09:08 schrieb Mark Webb-Johnson:
My v3 tree switched to 3.1 now, so not so easy ;-( I really need to work out a way to do automated builds (switching sdkconfig appropriately).
I think you may be correct - it is probably not the listen socket that has the issue, but the actual connection socket.
Can you try enabling the option SO_REUSEADDR in menuconfig? That should allow the listen socket to be opened, even if a connection socket is in TIME_WAIT state (assuming the mongoose library sets SO_REUSEADDR correctly).
It doesn't out of the box, as it still assumes LWIP does not support it, but after disabling the exclusion in mg_open_listening_socket() this works. So, another patch for mongoose… maybe we really should fork mongoose. If you don't have an objection, I can do that later on. Regards, Michael
Regards, Mark
On 30 Jan 2018, at 3:54 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
My esp-idf is up to date.
Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session.
I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though.
Attaching my sdkconfig.
Regards, Michael
Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c *8192* V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 *8193* V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
/now doing a HTTP GET /home from the browser//:/
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 *8194* V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 *GET /home* V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 *I (35515) webserver: HTTP GET /home* V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
*OVMS > wifi mode off* Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 *8194* V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 *8193* V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 *8192* V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
*OVMS > wifi mode client * Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server *V (54995) mongoose: mg_bind_opt Failed to open listener: 112* V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
Regards, Mark.
On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails.
Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it?
I can do some more debugging to get the exact point of failure.
Regards, Michael
Am 29.01.2018 um 02:28 schrieb Stephen Casner: > Mark, > > Back in November we discussed this problem of the client's half of the > connection being left open. Then as now, you commented that mongoose > sends a close event (MG_EV_CLOSE) for each open connection when the > interface went down. > > It does send a close event, but not until after the wifi is already > shut down so closing the socket at that point does not send any packet > to the client. I decided to punt on that issue, though, because > manually shutting down wifi is not an important use case. The more > likely case is that wifi connectivity is lost due to motion or other > causes and in that case no close packet can be delivered anyway. > > I interpreted Michael's message to be referring to a problem with > ports on the server (OVMS) end. That is, to say that mongoose didn't > clean up LWIP state properly. Michael, can you explain a bit more? > > -- Steve > > On Mon, 29 Jan 2018, Mark Webb-Johnson wrote: > >> I suspect that this is timing related: >> >> OVMS > wifi mode client XXX >> Starting WIFI as a client to XXX… >> … >> I (337100) ssh: Launching SSH Server >> V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc >> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 >> V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 >> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 >> V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 >> … >> OVMS > wifi mode off >> Stopping wifi station... >> I (350580) ovms-mdns: Stopping MDNS service >> I (350580) wifi: state: run -> init (0) >> I (350590) wifi: pm stop, total sleep time: 0/11930449 >> I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >> I (350600) wifi: flush txq >> I (350600) wifi: stop sw txq >> I (350600) wifi: lmac stop hw txq >> I (350600) wifi: Deinit lldesc rx mblock:4 >> I (351340) webserver: Stopping Web Server >> I (351340) ssh: Stopping SSH Server >> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 >> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 >> I (357570) event: station ip lost >> >> But ssh client connection is still up on my workstation. Looks like >> the MG_EV_CLOSE events came in after the SSH server was stopped. >> >> I repeated the test, but with event logging on: >> >> OVMS > wifi mode off >> Stopping wifi station... >> I (34171) events: Signal(system.wifi.down) >> I (34171) events: Signal(network.wifi.down) >> I (34171) ovms-mdns: Stopping MDNS service >> I (34171) events: Signal(network.reconfigured) >> I (34171) events: Signal(network.down) >> I (34171) wifi: state: run -> init (0) >> I (34181) wifi: pm stop, total sleep time: 0/9603569 >> I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >> I (34191) wifi: flush txq >> I (34191) wifi: stop sw txq >> I (34191) wifi: lmac stop hw txq >> I (34191) wifi: Deinit lldesc rx mblock:4 >> I (34191) events: Signal(system.wifi.sta.disconnected) >> I (34191) events: Signal(system.wifi.sta.stop) >> I (35131) events: Signal(network.mgr.stop) >> I (35131) webserver: Stopping Web Server >> I (35131) ssh: Stopping SSH Server >> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 >> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 >> >> MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. >> >> Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. >> >> Regards, Mark. >> >>> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote: >>> >>> I handled wifi shutdowns cleanly when I first implemented telnet and >>> ssh as their own tasks. Now that they are under Mongoose, it is out >>> of my control. The socket is owned by the Mongoose code. >>> >>> -- Steve >>> >>> On Sun, 28 Jan 2018, Michael Balzer wrote: >>> >>>> I've begun working on the webserver and noticed something that may >>>> be correlated to this: sockets don't get closed when losing the >>>> connection. The effect is visible on both web and telnet server (ssh >>>> not tested). To reproduce, switch the Wifi network with an open >>>> connection, the port will not be available until timeout. >>>> >>>> Regards, >>>> Michael >>>> >>>> >>>> _______________________________________________ >>>> OvmsDev mailing list >>>> OvmsDev@lists.teslaclub.hk >>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <sdkconfig.txt>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I’ve done the fork at the github level, and set appropriate permissions. You and Steve should have access. Can you set it up as a git subproject within ovms3? That way changes get pulled in automatically? Regarding the changes to mongoose itself, let’s try to do these in a way that can be pushed back to cesanta upstream as valuable patches. For example, this change to support SO_REUSEADDR is an ESP32 feature and can be enabled for that platform. Regards, Mark.
On 30 Jan 2018, at 5:14 PM, Michael Balzer <dexter@expeedo.de> wrote:
Am 30.01.2018 um 09:08 schrieb Mark Webb-Johnson:
My v3 tree switched to 3.1 now, so not so easy ;-( I really need to work out a way to do automated builds (switching sdkconfig appropriately).
I think you may be correct - it is probably not the listen socket that has the issue, but the actual connection socket.
Can you try enabling the option SO_REUSEADDR in menuconfig? That should allow the listen socket to be opened, even if a connection socket is in TIME_WAIT state (assuming the mongoose library sets SO_REUSEADDR correctly).
It doesn't out of the box, as it still assumes LWIP does not support it, but after disabling the exclusion in mg_open_listening_socket() this works.
So, another patch for mongoose… maybe we really should fork mongoose. If you don't have an objection, I can do that later on.
Regards, Michael
Regards, Mark
On 30 Jan 2018, at 3:54 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
My esp-idf is up to date.
Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session.
I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though.
Attaching my sdkconfig.
Regards, Michael
Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git <https://github.com/openvehicles/esp-idf.git> and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c 8192 V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 8193 V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
now doing a HTTP GET /home from the browser:
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 8194 V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 GET /home V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 I (35515) webserver: HTTP GET /home V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
OVMS > wifi mode off Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 8194 V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 8193 V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 8192 V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server V (54995) mongoose: mg_bind_opt Failed to open listener: 112 V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson:
The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll.
Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
Regards, Mark.
> On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: > > mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails. > > Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it? > > I can do some more debugging to get the exact point of failure. > > Regards, > Michael > > > Am 29.01.2018 um 02:28 schrieb Stephen Casner: >> Mark, >> >> Back in November we discussed this problem of the client's half of the >> connection being left open. Then as now, you commented that mongoose >> sends a close event (MG_EV_CLOSE) for each open connection when the >> interface went down. >> >> It does send a close event, but not until after the wifi is already >> shut down so closing the socket at that point does not send any packet >> to the client. I decided to punt on that issue, though, because >> manually shutting down wifi is not an important use case. The more >> likely case is that wifi connectivity is lost due to motion or other >> causes and in that case no close packet can be delivered anyway. >> >> I interpreted Michael's message to be referring to a problem with >> ports on the server (OVMS) end. That is, to say that mongoose didn't >> clean up LWIP state properly. Michael, can you explain a bit more? >> >> -- Steve >> >> On Mon, 29 Jan 2018, Mark Webb-Johnson wrote: >> >>> I suspect that this is timing related: >>> >>> OVMS > wifi mode client XXX >>> Starting WIFI as a client to XXX… >>> … >>> I (337100) ssh: Launching SSH Server >>> V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc >>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 >>> V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 >>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 >>> V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 >>> … >>> OVMS > wifi mode off >>> Stopping wifi station... >>> I (350580) ovms-mdns: Stopping MDNS service >>> I (350580) wifi: state: run -> init (0) >>> I (350590) wifi: pm stop, total sleep time: 0/11930449 >>> I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>> I (350600) wifi: flush txq >>> I (350600) wifi: stop sw txq >>> I (350600) wifi: lmac stop hw txq >>> I (350600) wifi: Deinit lldesc rx mblock:4 >>> I (351340) webserver: Stopping Web Server >>> I (351340) ssh: Stopping SSH Server >>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 >>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 >>> I (357570) event: station ip lost >>> >>> But ssh client connection is still up on my workstation. Looks like >>> the MG_EV_CLOSE events came in after the SSH server was stopped. >>> >>> I repeated the test, but with event logging on: >>> >>> OVMS > wifi mode off >>> Stopping wifi station... >>> I (34171) events: Signal(system.wifi.down) >>> I (34171) events: Signal(network.wifi.down) >>> I (34171) ovms-mdns: Stopping MDNS service >>> I (34171) events: Signal(network.reconfigured) >>> I (34171) events: Signal(network.down) >>> I (34171) wifi: state: run -> init (0) >>> I (34181) wifi: pm stop, total sleep time: 0/9603569 >>> I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>> I (34191) wifi: flush txq >>> I (34191) wifi: stop sw txq >>> I (34191) wifi: lmac stop hw txq >>> I (34191) wifi: Deinit lldesc rx mblock:4 >>> I (34191) events: Signal(system.wifi.sta.disconnected) >>> I (34191) events: Signal(system.wifi.sta.stop) >>> I (35131) events: Signal(network.mgr.stop) >>> I (35131) webserver: Stopping Web Server >>> I (35131) ssh: Stopping SSH Server >>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 >>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 >>> >>> MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. >>> >>> Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. >>> >>> Regards, Mark. >>> >>>> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> <mailto:casner@acm.org> wrote: >>>> >>>> I handled wifi shutdowns cleanly when I first implemented telnet and >>>> ssh as their own tasks. Now that they are under Mongoose, it is out >>>> of my control. The socket is owned by the Mongoose code. >>>> >>>> -- Steve >>>> >>>> On Sun, 28 Jan 2018, Michael Balzer wrote: >>>> >>>>> I've begun working on the webserver and noticed something that may >>>>> be correlated to this: sockets don't get closed when losing the >>>>> connection. The effect is visible on both web and telnet server (ssh >>>>> not tested). To reproduce, switch the Wifi network with an open >>>>> connection, the port will not be available until timeout. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> _______________________________________________ >>>>> OvmsDev mailing list >>>>> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >>>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev> > > > -- > Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal > Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> > http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <sdkconfig.txt>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Mongoose is now a submodule (making us a "superproject" :)). Note: I think you'll need to manually do a "git submodule init" after the next pull, but that may depend on your git version (?). I've pushed my changes regarding the ESP logging and the SO_REUSEADDR patch to our mongoose fork. I've also added a config option for the mongoose debugging (default disabled). Enabling now also includes time deltas (µs resoltion) for mongoose. Mark / Steve, if you'd like to check my changes first, I'll create a pull request tomorrow. Regards, Michael Am 31.01.2018 um 00:57 schrieb Mark Webb-Johnson:
I’ve done the fork at the github level, and set appropriate permissions. You and Steve should have access.
Can you set it up as a git subproject within ovms3? That way changes get pulled in automatically?
Regarding the changes to mongoose itself, let’s try to do these in a way that can be pushed back to cesanta upstream as valuable patches. For example, this change to support SO_REUSEADDR is an ESP32 feature and can be enabled for that platform.
Regards, Mark.
On 30 Jan 2018, at 5:14 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Am 30.01.2018 um 09:08 schrieb Mark Webb-Johnson:
My v3 tree switched to 3.1 now, so not so easy ;-( I really need to work out a way to do automated builds (switching sdkconfig appropriately).
I think you may be correct - it is probably not the listen socket that has the issue, but the actual connection socket.
Can you try enabling the option SO_REUSEADDR in menuconfig? That should allow the listen socket to be opened, even if a connection socket is in TIME_WAIT state (assuming the mongoose library sets SO_REUSEADDR correctly).
It doesn't out of the box, as it still assumes LWIP does not support it, but after disabling the exclusion in mg_open_listening_socket() this works.
So, another patch for mongoose… maybe we really should fork mongoose. If you don't have an objection, I can do that later on.
Regards, Michael
Regards, Mark
On 30 Jan 2018, at 3:54 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
My esp-idf is up to date.
Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session.
I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though.
Attaching my sdkconfig.
Regards, Michael
Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function.
There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached.
If you apply the patch: you'll need log level verbose for full debug details like these:
I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 I (14915) events: Signal(system.wifi.sta.connected) I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (14945) events: Signal(system.wifi.sta.gotip) I (14945) events: Signal(network.wifi.up) I (14945) ovms-mdns: Launching MDNS service I (14955) events: Signal(network.up) I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (14965) mongoose: mg_mgr_init_opt ================================== V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (14965) events: Signal(network.mgr.init) I (14965) webserver: Launching Web Server V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c *8192* V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c I (14995) telnet: Launching Telnet Server V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 *8193* V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 V (15495) simcom: tx: 41 54 0d 0a | AT.. D (15495) simcom: tx scmd ch=0 len=4 : AT|| I (15495) simcom: State timeout, transition to 13 I (15495) simcom: State: Enter PoweredOff state I (17905) wifi: pm start, type:0
/now doing a HTTP GET /home from the browser//:/
V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 *8194* V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 *GET /home* V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 *I (35515) webserver: HTTP GET /home* V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0
*OVMS > wifi mode off* Stopping wifi station... I (41145) events: Signal(system.wifi.down) I (41145) events: Signal(network.wifi.down) I (41145) events: Signal(network.reconfigured) I (41145) events: Signal(network.down) I (41155) wifi: state: run -> init (0) I (41155) wifi: pm stop, total sleep time: 0/23262330 I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 I (41165) wifi: flush txq I (41165) wifi: stop sw txq I (41165) wifi: lmac stop hw txq I (41165) wifi: Deinit lldesc rx mblock:4 I (41165) events: Signal(system.wifi.sta.disconnected) I (41175) events: Signal(system.wifi.sta.stop) I (41525) events: Signal(network.mgr.stop) I (41525) webserver: Stopping Web Server I (41525) telnet: Stopping Telnet Server V (41535) mongoose: mg_mgr_free 0x3ffb7348 V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 *8194* V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 *8193* V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 V (41555) mongoose: mg_close_conn 0x3ffe068c 1 *8192* V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0
… doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but…
*OVMS > wifi mode client * Starting WIFI as a client for any defined SSID I (47225) wifi: wifi firmware version: 403db1d I (47225) wifi: config NVS flash: enabled I (47225) wifi: config nano formating: disabled I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (47245) wifi: Init dynamic tx buffer num: 16 I (47245) wifi: Init data frame dynamic rx buffer num: 16 I (47245) wifi: Init management frame dynamic rx buffer num: 16 I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 I (47245) wifi: Init static rx buffer num: 4 I (47245) wifi: Init dynamic rx buffer num: 16 I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 I (47255) wifi: mode : sta (30:ae:a4:37:25:88) I (47255) events: Signal(system.wifi.sta.start) W (50495) wifi: incorrect scan type: 1073541416 I (52905) events: Signal(system.wifi.scan.done) I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 I (54885) wifi: state: init -> auth (b0) I (54895) wifi: state: auth -> assoc (0) I (54905) wifi: state: assoc -> run (10) I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 I (54915) events: Signal(system.wifi.sta.connected) I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 I (54955) events: Signal(system.wifi.sta.gotip) I (54955) events: Signal(network.wifi.up) I (54955) ovms-mdns: Launching MDNS service I (54955) events: Signal(network.up) I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() V (54965) mongoose: mg_mgr_init_opt ================================== V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 I (54965) events: Signal(network.mgr.init) I (54965) webserver: Launching Web Server *V (54995) mongoose: mg_bind_opt Failed to open listener: 112* V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 E (54995) webserver: Cannot bind to port 80: failed to open listener I (54995) telnet: Launching Telnet Server V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 I (57905) wifi: pm start, type:0
… so it's indeed 112 = EADDRINUSE as assumed.
As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve?
Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson: > The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll. > > Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and > check m_connected_wifi there as well.
The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi.
Regards, Michael
> > Regards, Mark. > >> On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >> >> mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails. >> >> Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it? >> >> I can do some more debugging to get the exact point of failure. >> >> Regards, >> Michael >> >> >> Am 29.01.2018 um 02:28 schrieb Stephen Casner: >>> Mark, >>> >>> Back in November we discussed this problem of the client's half of the >>> connection being left open. Then as now, you commented that mongoose >>> sends a close event (MG_EV_CLOSE) for each open connection when the >>> interface went down. >>> >>> It does send a close event, but not until after the wifi is already >>> shut down so closing the socket at that point does not send any packet >>> to the client. I decided to punt on that issue, though, because >>> manually shutting down wifi is not an important use case. The more >>> likely case is that wifi connectivity is lost due to motion or other >>> causes and in that case no close packet can be delivered anyway. >>> >>> I interpreted Michael's message to be referring to a problem with >>> ports on the server (OVMS) end. That is, to say that mongoose didn't >>> clean up LWIP state properly. Michael, can you explain a bit more? >>> >>> -- Steve >>> >>> On Mon, 29 Jan 2018, Mark Webb-Johnson wrote: >>> >>>> I suspect that this is timing related: >>>> >>>> OVMS > wifi mode client XXX >>>> Starting WIFI as a client to XXX… >>>> … >>>> I (337100) ssh: Launching SSH Server >>>> V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc >>>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 >>>> V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 >>>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 >>>> V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 >>>> … >>>> OVMS > wifi mode off >>>> Stopping wifi station... >>>> I (350580) ovms-mdns: Stopping MDNS service >>>> I (350580) wifi: state: run -> init (0) >>>> I (350590) wifi: pm stop, total sleep time: 0/11930449 >>>> I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>>> I (350600) wifi: flush txq >>>> I (350600) wifi: stop sw txq >>>> I (350600) wifi: lmac stop hw txq >>>> I (350600) wifi: Deinit lldesc rx mblock:4 >>>> I (351340) webserver: Stopping Web Server >>>> I (351340) ssh: Stopping SSH Server >>>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 >>>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 >>>> I (357570) event: station ip lost >>>> >>>> But ssh client connection is still up on my workstation. Looks like >>>> the MG_EV_CLOSE events came in after the SSH server was stopped. >>>> >>>> I repeated the test, but with event logging on: >>>> >>>> OVMS > wifi mode off >>>> Stopping wifi station... >>>> I (34171) events: Signal(system.wifi.down) >>>> I (34171) events: Signal(network.wifi.down) >>>> I (34171) ovms-mdns: Stopping MDNS service >>>> I (34171) events: Signal(network.reconfigured) >>>> I (34171) events: Signal(network.down) >>>> I (34171) wifi: state: run -> init (0) >>>> I (34181) wifi: pm stop, total sleep time: 0/9603569 >>>> I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>>> I (34191) wifi: flush txq >>>> I (34191) wifi: stop sw txq >>>> I (34191) wifi: lmac stop hw txq >>>> I (34191) wifi: Deinit lldesc rx mblock:4 >>>> I (34191) events: Signal(system.wifi.sta.disconnected) >>>> I (34191) events: Signal(system.wifi.sta.stop) >>>> I (35131) events: Signal(network.mgr.stop) >>>> I (35131) webserver: Stopping Web Server >>>> I (35131) ssh: Stopping SSH Server >>>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 >>>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 >>>> >>>> MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. >>>> >>>> Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. >>>> >>>> Regards, Mark. >>>> >>>>> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> wrote: >>>>> >>>>> I handled wifi shutdowns cleanly when I first implemented telnet and >>>>> ssh as their own tasks. Now that they are under Mongoose, it is out >>>>> of my control. The socket is owned by the Mongoose code. >>>>> >>>>> -- Steve >>>>> >>>>> On Sun, 28 Jan 2018, Michael Balzer wrote: >>>>> >>>>>> I've begun working on the webserver and noticed something that may >>>>>> be correlated to this: sockets don't get closed when losing the >>>>>> connection. The effect is visible on both web and telnet server (ssh >>>>>> not tested). To reproduce, switch the Wifi network with an open >>>>>> connection, the port will not be available until timeout. >>>>>> >>>>>> Regards, >>>>>> Michael >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> OvmsDev mailing list >>>>>> OvmsDev@lists.teslaclub.hk >>>>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev >> >> >> -- >> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >> _______________________________________________ >> OvmsDev mailing list >> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.teslaclub.hk > http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <mongoose-esp-log.diff>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <sdkconfig.txt>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I think we need a: git submodule update —init Builds for me after I did that. Regards, Mark
On 1 Feb 2018, at 5:42 AM, Michael Balzer <dexter@expeedo.de> wrote:
Mongoose is now a submodule (making us a "superproject" :)).
Note: I think you'll need to manually do a "git submodule init" after the next pull, but that may depend on your git version (?).
I've pushed my changes regarding the ESP logging and the SO_REUSEADDR patch to our mongoose fork.
I've also added a config option for the mongoose debugging (default disabled). Enabling now also includes time deltas (µs resoltion) for mongoose.
Mark / Steve, if you'd like to check my changes first, I'll create a pull request tomorrow.
Regards, Michael
Am 31.01.2018 um 00:57 schrieb Mark Webb-Johnson:
I’ve done the fork at the github level, and set appropriate permissions. You and Steve should have access.
Can you set it up as a git subproject within ovms3? That way changes get pulled in automatically?
Regarding the changes to mongoose itself, let’s try to do these in a way that can be pushed back to cesanta upstream as valuable patches. For example, this change to support SO_REUSEADDR is an ESP32 feature and can be enabled for that platform.
Regards, Mark.
On 30 Jan 2018, at 5:14 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Am 30.01.2018 um 09:08 schrieb Mark Webb-Johnson:
My v3 tree switched to 3.1 now, so not so easy ;-( I really need to work out a way to do automated builds (switching sdkconfig appropriately).
I think you may be correct - it is probably not the listen socket that has the issue, but the actual connection socket.
Can you try enabling the option SO_REUSEADDR in menuconfig? That should allow the listen socket to be opened, even if a connection socket is in TIME_WAIT state (assuming the mongoose library sets SO_REUSEADDR correctly).
It doesn't out of the box, as it still assumes LWIP does not support it, but after disabling the exclusion in mg_open_listening_socket() this works.
So, another patch for mongoose… maybe we really should fork mongoose. If you don't have an objection, I can do that later on.
Regards, Michael
Regards, Mark
On 30 Jan 2018, at 3:54 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
My esp-idf is up to date.
Have you done the GET from a browser (keepalive)? Another way to get the error is doing the wifi off/on after opening a telnet session.
I've compared my LWIP config, only difference was I had "Support per-interface loopback" (CONFIG_LWIP_NETIF_LOOPBACK) enabled, which is our default btw. Disabling it didn't change anything though.
Attaching my sdkconfig.
Regards, Michael
Am 30.01.2018 um 01:31 schrieb Mark Webb-Johnson:
Strange. I don’t have the debugging enabled, but this is what I get:
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (108251) wifi: wifi firmware version: 403db1d I (108251) wifi: config NVS flash: enabled … I (114861) wifi: connected with STUBBY, channel 11 I (116331) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) ovms-mdns: Launching MDNS service I (116331) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (116331) webserver: Launching Web Server I (116351) ssh: Launching SSH Server I (117841) wifi: pm start, type:0 I (118881) webserver: HTTP GET / I (120201) webserver: HTTP GET / I (120991) webserver: HTTP GET /
OVMS > wifi mode off Stopping wifi station... I (123821) wifi: state: run -> init (0) I (123821) wifi: pm stop, total sleep time: 0/5979789 I (123821) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 I (123841) wifi: flush txq I (123841) wifi: stop sw txq I (123841) wifi: lmac stop hw txq I (123841) wifi: Deinit lldesc rx mblock:4 I (124121) webserver: Stopping Web Server I (124121) ssh: Stopping SSH Server
OVMS > wifi mode client Starting WIFI as a client for any defined SSID I (126991) wifi: wifi firmware version: 403db1d ... I (132851) esp32wifi: Found SSID STUBBY - trying to connect I (134181) wifi: n:11 0, o:1 0, ap:255 255, sta:11 0, prof:1 I (134831) wifi: state: init -> auth (b0) I (134831) wifi: state: auth -> assoc (0) I (134841) wifi: state: assoc -> run (10) I (134861) wifi: connected with STUBBY, channel 11 I (135571) event: sta ip: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) ovms-mdns: Launching MDNS service I (135571) esp32wifi: WiFi UP with SSID: STUBBY, MAC: 30:ae:a4:80:8d:b0, IP: 10.8.8.80, mask: 255.255.255.0, gw: 10.8.8.62 I (135571) webserver: Launching Web Server I (135591) ssh: Launching SSH Server I (137071) webserver: HTTP GET / I (137841) wifi: pm start, type:0 I (140001) webserver: HTTP GET /
Tried it 10 times to be certain.
Are you on the master branch of https://github.com/openvehicles/esp-idf.git <https://github.com/openvehicles/esp-idf.git> and up to date?
There are a bunch of menuconfig options in Components/LWIP. This is what I have:
│ │ [ ] Enable copy between Layer2 and Layer3 packets │ │ │ │ (10) Max number of open sockets │ │ │ │ [ ] Enable SO_REUSEADDR option │ │ │ │ [ ] Enable SO_RCVBUF option │ │ │ │ (1) Maximum number of NTP servers │ │ │ │ [ ] Enable fragment outgoing IP packets │ │ │ │ [ ] Enable reassembly incoming fragmented IP packets │ │ │ │ [ ] Enable LWIP statistics │ │ │ │ [*] Enable LWIP ARP trust │ │ │ │ (32) TCPIP task receive mail box size │ │ │ │ [ ] DHCP: Perform ARP check on any offered address │ │ │ │ DHCP server ---> │ │ │ │ [ ] Enable IPV4 Link-Local Addressing (AUTOIP) ---- │ │ │ │ [ ] Support per-interface loopback ---- │ │ │ │ TCP ---> │ │ │ │ UDP ---> │ │ │ │ (2560) TCP/IP Task Stack Size │ │ │ │ [*] Enable PPP support (new/experimental) ---> │ │ │ │ ICMP ---> │ │ │ │ LWIP RAW API --->
The SO_REUSEADDR option is relevant, but not enabled in my config. It does seem like mongoose is trying to use it. I wonder what LWIP default behaviour is in that case.
Regards, Mark.
> On 30 Jan 2018, at 5:51 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: > > I've added an option to send the mongoose debug log output to the ESP log framework and add a debug log to the if_destroy function. > > There was some talk about not changing the original mongoose files / maybe doing an OVMS fork, so I won't check the patch in. Diff attached. > > If you apply the patch: you'll need log level verbose for full debug details like these: > > > I (14915) wifi: connected with devolo-f4068d73a03e, channel 11 > I (14915) events: Signal(system.wifi.sta.connected) > I (14945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 > I (14945) events: Signal(system.wifi.sta.gotip) > I (14945) events: Signal(network.wifi.up) > I (14945) ovms-mdns: Launching MDNS service > I (14955) events: Signal(network.up) > I (14955) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 > V (14965) mongoose: mg_socket_if_init 0x3ffb7348 using select() > V (14965) mongoose: mg_mgr_init_opt ================================== > V (14965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 > I (14965) events: Signal(network.mgr.init) > I (14965) webserver: Launching Web Server > V (14995) mongoose: mg_socket_if_sock_se 0x3ffe068c 8192 > V (14995) mongoose: mg_add_conn 0x3ffb7348 0x3ffe068c > I (14995) telnet: Launching Telnet Server > V (15025) mongoose: mg_socket_if_sock_se 0x3ffe58f0 8193 > V (15025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe58f0 > V (15495) simcom: tx: 41 54 0d 0a | AT.. > D (15495) simcom: tx scmd ch=0 len=4 : AT|| > I (15495) simcom: State timeout, transition to 13 > I (15495) simcom: State: Enter PoweredOff state > I (17905) wifi: pm start, type:0 > > > now doing a HTTP GET /home from the browser: > > V (35485) mongoose: mg_mgr_handle_conn 0x3ffe068c fd=8192 fd_flags=1 nc_flags=1 rmbl=0 smbl=0 > V (35485) mongoose: mg_add_conn 0x3ffb7348 0x3ffe97e0 > V (35495) mongoose: mg_if_accept_new_con 0x3ffe068c 0x3ffe97e0 -1 0 > V (35495) mongoose: mg_accept_conn 0x3ffe97e0 conn from 192.168.2.103:33574 > V (35505) mongoose: mg_socket_if_sock_se 0x3ffe97e0 8194 > V (35505) mongoose: mg_call 0x3ffe97e0 proto ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 > V (35505) mongoose: mg_call 0x3ffe97e0 user ev=1 ev_data=0x3ffe97f8 flags=0 rmbl=0 smbl=0 > D (35505) webserver: EventHandler: conn=0x3ffe97e0 ev=1 p=0x3ffe97f8 > V (35505) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 > V (35505) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 > V (35505) mongoose: mg_mgr_handle_conn 0x3ffe068c after fd=8192 nc_flags=1 rmbl=0 smbl=0 > V (35505) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=1 nc_flags=0 rmbl=0 smbl=0 > V (35515) mongoose: mg_handle_tcp_read 0x3ffe97e0 350 bytes (PLAIN) <- 8194 > V (35515) mongoose: mg_recv_common 0x3ffe97e0 350 0 > V (35515) mongoose: mg_call 0x3ffe97e0 proto ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 > V (35515) mongoose: mg_call 0x3ffe97e0 user ev=3 ev_data=0x3ffe95d0 flags=0 rmbl=350 smbl=0 > D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=3 p=0x3ffe95d0 > V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 > V (35515) mongoose: mg_http_handler2 0x3ffe97e0 192.168.2.103:33574 GET /home > V (35515) mongoose: mg_call 0x3ffe97e0 user ev=102 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 > D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=102 p=0x3ffe9400 > V (35515) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=0 > V (35515) mongoose: mg_call 0x3ffe97e0 user ev=100 ev_data=0x3ffe9400 flags=0 rmbl=350 smbl=0 > D (35515) webserver: EventHandler: conn=0x3ffe97e0 ev=100 p=0x3ffe9400 > I (35515) webserver: HTTP GET /home > V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=350 smbl=740 > V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=740 > V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=740 > V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 fd=8194 fd_flags=2 nc_flags=0 rmbl=0 smbl=740 > V (35525) mongoose: mg_write_to_socket 0x3ffe97e0 740 bytes -> 8194 > V (35525) mongoose: mg_if_sent_cb 0x3ffe97e0 740 > V (35525) mongoose: mg_call 0x3ffe97e0 proto ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 > V (35525) mongoose: mg_call 0x3ffe97e0 user ev=4 ev_data=0x3ffe95e0 flags=0 rmbl=0 smbl=0 > D (35525) webserver: EventHandler: conn=0x3ffe97e0 ev=4 p=0x3ffe95e0 > V (35525) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 > V (35525) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 > V (35525) mongoose: mg_mgr_handle_conn 0x3ffe97e0 after fd=8194 nc_flags=0 rmbl=0 smbl=0 > > > > OVMS > wifi mode off > Stopping wifi station... > I (41145) events: Signal(system.wifi.down) > I (41145) events: Signal(network.wifi.down) > I (41145) events: Signal(network.reconfigured) > I (41145) events: Signal(network.down) > I (41155) wifi: state: run -> init (0) > I (41155) wifi: pm stop, total sleep time: 0/23262330 > I (41155) wifi: n:11 0, o:11 2, ap:255 255, sta:11 2, prof:1 > I (41165) wifi: flush txq > I (41165) wifi: stop sw txq > I (41165) wifi: lmac stop hw txq > I (41165) wifi: Deinit lldesc rx mblock:4 > I (41165) events: Signal(system.wifi.sta.disconnected) > I (41175) events: Signal(system.wifi.sta.stop) > I (41525) events: Signal(network.mgr.stop) > I (41525) webserver: Stopping Web Server > I (41525) telnet: Stopping Telnet Server > V (41535) mongoose: mg_mgr_free 0x3ffb7348 > V (41535) mongoose: mg_close_conn 0x3ffe97e0 0 8194 > V (41535) mongoose: mg_socket_if_destroy nc=0x3ffe97e0 sock=8194 flags=0 > V (41545) mongoose: mg_call 0x3ffe97e0 proto ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 > V (41545) mongoose: mg_call 0x3ffe97e0 user ev=5 ev_data=0x0 flags=0 rmbl=0 smbl=0 > D (41545) webserver: EventHandler: conn=0x3ffe97e0 ev=5 p=0x0 > V (41545) mongoose: mg_call 0x3ffe97e0 after user flags=0 rmbl=0 smbl=0 > V (41545) mongoose: mg_call 0x3ffe97e0 after proto flags=0 rmbl=0 smbl=0 > V (41545) mongoose: mg_close_conn 0x3ffe58f0 1 8193 > V (41545) mongoose: mg_socket_if_destroy nc=0x3ffe58f0 sock=8193 flags=1 > V (41555) mongoose: mg_call 0x3ffe58f0 user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 > V (41555) mongoose: mg_call 0x3ffe58f0 after user flags=1 rmbl=0 smbl=0 > V (41555) mongoose: mg_close_conn 0x3ffe068c 1 8192 > V (41555) mongoose: mg_socket_if_destroy nc=0x3ffe068c sock=8192 flags=1 > V (41565) mongoose: mg_call 0x3ffe068c proto ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 > V (41565) mongoose: mg_call 0x3ffe068c user ev=5 ev_data=0x0 flags=1 rmbl=0 smbl=0 > D (41565) webserver: EventHandler: conn=0x3ffe068c ev=5 p=0x0 > V (41565) mongoose: mg_call 0x3ffe068c after user flags=1 rmbl=0 smbl=0 > V (41565) mongoose: mg_call 0x3ffe068c after proto flags=1 rmbl=0 smbl=0 > > > … doesn't look wrong so far, socket get closed correctly. mg_socket_if_destroy calls closesocket = lwip_close_r, so everything _should_ be ok… but… > > > OVMS > wifi mode client > Starting WIFI as a client for any defined SSID > I (47225) wifi: wifi firmware version: 403db1d > I (47225) wifi: config NVS flash: enabled > I (47225) wifi: config nano formating: disabled > I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE > I (47225) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE > I (47245) wifi: Init dynamic tx buffer num: 16 > I (47245) wifi: Init data frame dynamic rx buffer num: 16 > I (47245) wifi: Init management frame dynamic rx buffer num: 16 > I (47245) wifi: wifi driver task: 3ffe251c, prio:23, stack:4096 > I (47245) wifi: Init static rx buffer num: 4 > I (47245) wifi: Init dynamic rx buffer num: 16 > I (47245) wifi: wifi power manager task: 0x3ffe5a9c prio: 21 stack: 2560 > I (47255) wifi: mode : sta (30:ae:a4:37:25:88) > I (47255) events: Signal(system.wifi.sta.start) > W (50495) wifi: incorrect scan type: 1073541416 > I (52905) events: Signal(system.wifi.scan.done) > I (52905) esp32wifi: Found SSID devolo-f4068d73a03e - trying to connect > I (54235) wifi: n:11 2, o:1 0, ap:255 255, sta:11 2, prof:1 > I (54885) wifi: state: init -> auth (b0) > I (54895) wifi: state: auth -> assoc (0) > I (54905) wifi: state: assoc -> run (10) > I (54915) wifi: connected with devolo-f4068d73a03e, channel 11 > I (54915) events: Signal(system.wifi.sta.connected) > I (54945) event: sta ip: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 > I (54955) events: Signal(system.wifi.sta.gotip) > I (54955) events: Signal(network.wifi.up) > I (54955) ovms-mdns: Launching MDNS service > I (54955) events: Signal(network.up) > I (54965) esp32wifi: WiFi UP with SSID: devolo-f4068d73a03e, MAC: 30:ae:a4:37:25:88, IP: 192.168.2.101, mask: 255.255.255.0, gw: 192.168.2.1 > V (54965) mongoose: mg_socket_if_init 0x3ffb7348 using select() > V (54965) mongoose: mg_mgr_init_opt ================================== > V (54965) mongoose: mg_mgr_init_opt init mgr=0x3ffb7348 > I (54965) events: Signal(network.mgr.init) > I (54965) webserver: Launching Web Server > V (54995) mongoose: mg_bind_opt Failed to open listener: 112 > V (54995) mongoose: mg_socket_if_destroy nc=0x3ffe996c sock=-1 flags=1 > E (54995) webserver: Cannot bind to port 80: failed to open listener > I (54995) telnet: Launching Telnet Server > V (55025) mongoose: mg_socket_if_sock_se 0x3ffe98c8 8196 > V (55025) mongoose: mg_add_conn 0x3ffb7348 0x3ffe98c8 > I (57905) wifi: pm start, type:0 > > > … so it's indeed 112 = EADDRINUSE as assumed. > > As mongoose does everything right, this seems to be an LWIP issue? Mark, Steve? > > > Am 29.01.2018 um 14:17 schrieb Mark Webb-Johnson: >> The bind has to occur in the network.mgr.init handler. That is the point between mg_mgr_init and mg_mgr_poll. >> >> Perhaps in the handler we can check m_connected_wifi and only bind if wifi is up? But would also need to pickup network.reconfigured and check m_connected_wifi there as well. > > The webserver NetManInit function is already triggered on network.mgr.init and checks m_connected_wifi. > > Regards, > Michael > > >> >> Regards, Mark. >> >>> On 29 Jan 2018, at 6:11 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >>> >>> mg_bind() fails with "failed to open listener". I haven't traced it further yet, assumed it's the bind() that fails. >>> >>> Mongoose (now?) uses the standard socket API, so does not need to handle LWIP details, or does it? >>> >>> I can do some more debugging to get the exact point of failure. >>> >>> Regards, >>> Michael >>> >>> >>> Am 29.01.2018 um 02:28 schrieb Stephen Casner: >>>> Mark, >>>> >>>> Back in November we discussed this problem of the client's half of the >>>> connection being left open. Then as now, you commented that mongoose >>>> sends a close event (MG_EV_CLOSE) for each open connection when the >>>> interface went down. >>>> >>>> It does send a close event, but not until after the wifi is already >>>> shut down so closing the socket at that point does not send any packet >>>> to the client. I decided to punt on that issue, though, because >>>> manually shutting down wifi is not an important use case. The more >>>> likely case is that wifi connectivity is lost due to motion or other >>>> causes and in that case no close packet can be delivered anyway. >>>> >>>> I interpreted Michael's message to be referring to a problem with >>>> ports on the server (OVMS) end. That is, to say that mongoose didn't >>>> clean up LWIP state properly. Michael, can you explain a bit more? >>>> >>>> -- Steve >>>> >>>> On Mon, 29 Jan 2018, Mark Webb-Johnson wrote: >>>> >>>>> I suspect that this is timing related: >>>>> >>>>> OVMS > wifi mode client XXX >>>>> Starting WIFI as a client to XXX… >>>>> … >>>>> I (337100) ssh: Launching SSH Server >>>>> V (338830) ssh: Event MG_EV_ACCEPT conn 0x3ffee8b4, data 0x3ffee8cc >>>>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 21 >>>>> V (338840) ssh: Event MG_EV_SEND conn 0x3ffee8b4, data 0x3ffe5d90 >>>>> V (338840) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 1024 >>>>> V (338850) ssh: Event MG_EV_RECV conn 0x3ffee8b4, data received 408 >>>>> … >>>>> OVMS > wifi mode off >>>>> Stopping wifi station... >>>>> I (350580) ovms-mdns: Stopping MDNS service >>>>> I (350580) wifi: state: run -> init (0) >>>>> I (350590) wifi: pm stop, total sleep time: 0/11930449 >>>>> I (350590) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>>>> I (350600) wifi: flush txq >>>>> I (350600) wifi: stop sw txq >>>>> I (350600) wifi: lmac stop hw txq >>>>> I (350600) wifi: Deinit lldesc rx mblock:4 >>>>> I (351340) webserver: Stopping Web Server >>>>> I (351340) ssh: Stopping SSH Server >>>>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffee8b4, data 0x0 >>>>> V (351540) ssh: Event MG_EV_CLOSE conn 0x3ffeb468, data 0x0 >>>>> I (357570) event: station ip lost >>>>> >>>>> But ssh client connection is still up on my workstation. Looks like >>>>> the MG_EV_CLOSE events came in after the SSH server was stopped. >>>>> >>>>> I repeated the test, but with event logging on: >>>>> >>>>> OVMS > wifi mode off >>>>> Stopping wifi station... >>>>> I (34171) events: Signal(system.wifi.down) >>>>> I (34171) events: Signal(network.wifi.down) >>>>> I (34171) ovms-mdns: Stopping MDNS service >>>>> I (34171) events: Signal(network.reconfigured) >>>>> I (34171) events: Signal(network.down) >>>>> I (34171) wifi: state: run -> init (0) >>>>> I (34181) wifi: pm stop, total sleep time: 0/9603569 >>>>> I (34181) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1 >>>>> I (34191) wifi: flush txq >>>>> I (34191) wifi: stop sw txq >>>>> I (34191) wifi: lmac stop hw txq >>>>> I (34191) wifi: Deinit lldesc rx mblock:4 >>>>> I (34191) events: Signal(system.wifi.sta.disconnected) >>>>> I (34191) events: Signal(system.wifi.sta.stop) >>>>> I (35131) events: Signal(network.mgr.stop) >>>>> I (35131) webserver: Stopping Web Server >>>>> I (35131) ssh: Stopping SSH Server >>>>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffec138, data 0x0 >>>>> V (35331) ssh: Event MG_EV_CLOSE conn 0x3ffeb768, data 0x0 >>>>> >>>>> MDNS exits quicker because it responds to the network.wifi.down message, while ssh uses the network.mgr.stop event. In a clean shutdown there is a difference. >>>>> >>>>> Perhaps we should issue network.mgr.stop earlier? However, even if we close the connections from our end, I’m not sure if the packet would make it through to the other end before the wifi disconnects. >>>>> >>>>> Regards, Mark. >>>>> >>>>>> On 29 Jan 2018, at 2:13 AM, Stephen Casner <casner@acm.org> <mailto:casner@acm.org> wrote: >>>>>> >>>>>> I handled wifi shutdowns cleanly when I first implemented telnet and >>>>>> ssh as their own tasks. Now that they are under Mongoose, it is out >>>>>> of my control. The socket is owned by the Mongoose code. >>>>>> >>>>>> -- Steve >>>>>> >>>>>> On Sun, 28 Jan 2018, Michael Balzer wrote: >>>>>> >>>>>>> I've begun working on the webserver and noticed something that may >>>>>>> be correlated to this: sockets don't get closed when losing the >>>>>>> connection. The effect is visible on both web and telnet server (ssh >>>>>>> not tested). To reproduce, switch the Wifi network with an open >>>>>>> connection, the port will not be available until timeout. >>>>>>> >>>>>>> Regards, >>>>>>> Michael >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> OvmsDev mailing list >>>>>>> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >>>>>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev> >>> >>> >>> -- >>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev> >> >> >> >> _______________________________________________ >> OvmsDev mailing list >> OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> >> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev> > > -- > Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal > Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 > <mongoose-esp-log.diff>_______________________________________________ > OvmsDev mailing list > OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> > http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <sdkconfig.txt>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I don’t think we currently have any ping / timeout support in the v2 server module. We did that in ovms v2, and certainly need it in the new system. That should deal with things at the tcp/ip level. It seems our LWIP code supports TCP keepalive, and the server code definitely enables that. Perhaps we need to enable it on the client side too? The bigger issue is the PPP level, and I’m not sure how to reliably detect issues there. Also not sure if the issue you are seeing is at the TCP server connection level, or low level PPP transport. Do you know if you restart the v2 server module, but leave the simcom connection as it is, does that resolve it? Regards, Mark
On 28 Jan 2018, at 10:32 AM, Tom Parker <tom@carrott.org> wrote:
Hi,
I've been struggling for some time with intermittent disconnections from the v2 server. I've switched from hologram to a local cellular provider's sim card, to match my v2 hardware which is very stable and the problem persists. With the car stationary, the connection seems very solid, but on the move, it usually disconnects within half an hour.
I'm running 98169d8862db2bbc4bea14d013883f26ee6afb4e (from 2018-01-19) with idf 5bf85d06d8c402fe30ecb1bf9c09d5e69b923b2f and xtensa-esp32-elf-linux64-1.22.0-80-g6c4433a-5.2.0.tar.gz.
I tried logging to the sdcard and on my desk powered by USB that seemed to work well, but it didn't work very well (only a few lines logged to the sdcard and then the whole system seemed to crash) when powered from the OBD2 port. I haven't tried to debug that.
With the sdcard logging turned off, and logging with minicom I can get good diagnostics. It would appear that the v3 hardware thinks it is still connected and continues to send data, but the v2 server thinks the client has disconnected.
I've got ovms logs which have regular lines quoting the local time from the modem, and server logs from the dexters-web server which I think are not UTC. Anyway, 05:47 in the server log corresponds to 17:47 in the ovms v3 log.
The relevant lines of the logs are below. On the server log the ovms v3 hardware client is #72 and then #94. We can see the client sent and the server received S messages at :47:15 and :47:32. The client sent more messages that were not received by the server, more than I've transcribed here (full log attached). The server recorded a disconnect at :48:29, but the client recorded the transmission of messages in it's log for minutes after that. I did a simcom status command which reports that everything is connected, and then I pressed the button to reboot the module. The server records a new connection at :54:07 and the messages logged as transmitted on the client start to correspond with the messages logged as received on the server start again.
Thoughts on how to proceed?
Another oddity not shown here but included in the full logs is that the client sent a lot of D messages shortly after reconnecting.
Server Log:
2018-01-26 05:47:15,36664,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96' 2018-01-26 05:47:15,36665,'#72 C rx msg D 128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0' 2018-01-26 05:47:15,36666,'#72 C rx msg L 0,0,0,0,0,0,0,0,0,0,0' 2018-01-26 05:47:15,36667,'#72 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees' 2018-01-26 05:47:32,36717,'#72 C rx msg S 90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96' 2018-01-26 05:48:29,36926,'#72 C got error: Connection reset by peer' 2018-01-26 05:50:08,37206,'#93 A got login' 2018-01-26 05:51:24,37485,'#36 A rx msg A ' 2018-01-26 05:52:53,37742,'#93 A got error: Broken pipe' 2018-01-26 05:52:54,37745,'#72 A got login' 2018-01-26 05:54:07,37937,'#94 C got login' 2018-01-26 05:54:13,37957,'#94 C rx msg S 87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0' 2018-01-26 05:54:13,37958,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,59,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:13,37959,'#94 C rx msg L 0,0,0,0,0,0,104,0,0,0,0' 2018-01-26 05:54:13,37960,'#94 C rx msg W 0,0,0,0,0,0,0,0,0' 2018-01-26 05:54:13,37961,'#94 C rx msg F 3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,1,1,NL,2degrees' 2018-01-26 05:54:15,37964,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,60,28,1,0,0,13.011,0,0,128,0,0' 2018-01-26 05:54:15,37965,'#94 C rx msg D 128,0,5,0,0,33,0,0,104,61,28,1,0,0,13.011,0,0,128,0,0'
Serial Console Log:
OVMS > I (79663340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79663340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,0,79663,32,1,1,1,12.9231,0,0,128,0,0 I (79663340) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,0,0,0,0,0 I (79663350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,6,1,NL,2degrees OVMS > D (79665360) simcom: rx line ch=3 len=11 : +CSQ: 10,99 D (79665360) simcom: rx line ch=4 len=11 : +CSQ: 10,99 OVMS > D (79674380) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79674390) simcom: rx line ch=4 len=11 : +CSQ: 13,99 OVMS > D (79678420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79678420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:26+52" D (79678420) simcom: rx line ch=3 len=11 : +CSQ: 13,99 D (79678420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79678420) simcom: rx line ch=3 len=2 : OK OVMS > I (79681340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,387.00,96 OVMS > D (79683370) simcom: rx line ch=3 len=10 : +CSQ: 9,99 D (79683370) simcom: rx line ch=4 len=10 : +CSQ: 9,99 OVMS > D (79692410) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79692420) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79708420) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79708420) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:47:56+52" D (79708420) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79708420) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79708420) simcom: rx line ch=3 len=2 : OK OVMS > D (79713470) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79713470) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79716420) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79716420) simcom: rx line ch=4 len=10 : +CSQ: 0,99 OVMS > I (79724340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,392.00,96 I (79724340) ovms-server-v2: Send MP-0 D128,0,5,0,0,33,0,0,55,79724,30,1,1,1,13.022,0,0,128,0,0 I (79724350) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,55,0,0,0,0 I (79724350) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 20 2018 00:49:15,,0,1,NL,2degrees OVMS > D (79738360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79738360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:26+52" D (79738360) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (79738360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79738360) simcom: rx line ch=3 len=2 : OK OVMS > D (79740490) simcom: rx line ch=3 len=10 : +CSQ: 4,99 D (79740500) simcom: rx line ch=4 len=10 : +CSQ: 4,99 OVMS > I (79741340) ovms-server-v2: Send MP-0 S90,K,0,0,stopped,standard,76,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,388.50,96 OVMS > D (79764490) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79764490) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > D (79768360) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (79768360) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:48:56+52" D (79768360) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79768360) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (79768360) simcom: rx line ch=3 len=2 : OK OVMS > D (79773490) simcom: rx line ch=3 len=10 : +CSQ: 3,99 D (79773500) simcom: rx line ch=4 len=10 : +CSQ: 3,99 OVMS > D (79779500) simcom: rx line ch=3 len=10 : +CSQ: 7,99 D (79779500) simcom: rx line ch=4 len=10 : +CSQ: 7,99 OVMS > I (79785340) ovms-server-v2: Send MP-0 S89,K,0,0,stopped,standard,75,0,0,0,0,0,13,21,0,0,0,0,63.36,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,378.00,96 ... OVMS > simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 25111 User Data: 0 Mux Open Channels: 4 PPP Connected on channel: #2 PPP Last Error: None GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected ... I rebooted the module and it did the normal connection dance ... D (48359) simcom: rx line ch=3 len=29 : +CCLK: "18/01/26,17:53:57+52" ... I (57609) ovms-server-v2: Status: Connecting... OVMS > I (58219) ovms-server-v2: Connection successful I (58219) ovms-server-v2: Status: Logging in... I (58219) ovms-server-v2: Sending server login: MP-C 0 pwLSijW/3qAZ6z0LOBHbGS rnkkhqHUA6zeZqLx1q1Tow== NZLV3 OVMS > I (58889) ovms-server-v2: Got server response: MP-S 0 QTjRAp8ZQPJQV1UXnepldQ w1vWpIrRR9+eot9z3uslgw== I (58889) ovms-server-v2: Server token is QTjRAp8ZQPJQV1UXnepldQ and digest is w1vWpIrRR9+eot9z3uslgw== I (58899) ovms-server-v2: Status: Server auth ok. Now priming crypto. I (58899) ovms-server-v2: Shared secret key is QTjRAp8ZQPJQV1UXnepldQpwLSijW/3qAZ6z0LOBHbGS (44 bytes) I (58899) ovms-server-v2: Status: OVMS V2 login successful, and crypto channel established OVMS > I (59189) ovms-server-v2: Incoming Msg: MP-0 Z4 I (59189) ovms-server-v2: One or more peers have connected I (59339) ovms-server-v2: Send MP-0 S87,K,0,0,stopped,standard,73,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,385.50,0
<_-ovm-serverlogs.csv.bz2><ovms_2018-01-26T04:46:05+00:00.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 29/01/18 13:07, Mark Webb-Johnson wrote:
The bigger issue is the PPP level, and I’m not sure how to reliably detect issues there. Also not sure if the issue you are seeing is at the TCP server connection level, or low level PPP transport. Do you know if you restart the v2 server module, but leave the simcom connection as it is, does that resolve it?
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow. I commented out the MyMetrics.RegisterListener call in ovms_server_v2.cpp and got basically the same crash in OvmsEvents::DeregisterEvent so something strange is going on. I read the documentation for std::list and it looks like it's doing the right thing. Hmm... could the ec->m_caller == caller explode if the caller passed to RegisterEvent has been freed? My naive understanding of C++ would suggest no, because a pointer to invalid memory does not equal a pointer to valid memory, but I wouldn't be surprised if == does something other than pointer comparison sometimes. The caller is only accessed during deregister, so things would operate normally until you try to deregister. I suppose it would have been quicker to recompile with gdb enabled than to stare at the code, but that's for tomorrow. I did notice what looks like a memory leak, EventCallbackEntry objects are new'ed and added to the list in RegisterEvent, but they're only removed from the list in OvmsEvents::DeregisterEvent(). I think they should be delete'ed too? I (55374) ovms-server-v2: Send MP-0 W0,0,0,0,0,0,0,0,0 I (55374) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 30 2018 10:07:33,,5,1,NL,2degrees OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x40121a5a PS : 0x00060330 A0 : 0x8014fb4f A1 : 0x3ffd9ce0 0x40121a5a: OvmsMetrics::DeregisterListener(char const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 A2 : 0x3ffb6d90 A3 : 0x3f41749c A4 : 0x3ffdfa1d A5 : 0x3ffdb89c A6 : 0x3ffdb8ac A7 : 0x3ffdb8ac A8 : 0x00000000 A9 : 0x3ffd9cc0 A10 : 0x3ffdd244 A11 : 0x3ffdb904 A12 : 0x3ffd9cf0 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000018 EXCCAUSE: 0x0000001c EXCVADDR: 0x00000004 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5 Backtrace: 0x40121a5a:0x3ffd9ce0 0x4014fb4c:0x3ffd9d00 0x4014fbee:0x3ffd9d40 0x4014f117:0x3ffd9d60 0x4012ad9e:0x3ffd9d80 0x4012ae91:0x3ffd9db0 0x4012ae83:0x3ffd9de0 0x4012ae83:0x3ffd9e10 0x4012aeb9:0x3ffd9e40 0x401228b8:0x3ffd9e60 0x4012dbfd:0x3ffd9e80 0x4012dc64:0x3ffd9ee0 0x4012292f:0x3ffd9f20 0x40122946:0x3ffd9f40 0x40122e89:0x3ffd9f60 0x40125811:0x3ffd9f90 0x40125a09:0x3ffd9fc0 0x40122d2d:0x3ffd9fe0 0x40122d3c:0x3ffda000 0x40125481:0x3ffda020 0x40121a5a: OvmsMetrics::DeregisterListener(char const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 0x4014fb4c: OvmsServerV2::~OvmsServerV2() at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616 0x4014fbee: OvmsServerV2::~OvmsServerV2() at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625 0x4014f117: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1) 0x4012ad9e: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow.
Trying to repeat this, but can’t. Here is what I get: OVMS > server v2 start Launching OVMS Server V2 connection (oscv2) I (40161) ovms-server-v2: OVMS Server V2 registered metric modifier is #1 I (40161) ovms-server-v2: Status: Starting I (40161) ovms-server-v2: OVMS Server v2 running I (40171) ovms-server-v2: Connection is api.openvehicles.com:6867 X/Y I (40171) ovms-server-v2: Status: Connecting... I (40641) ovms-server-v2: Connection successful I (40641) ovms-server-v2: Status: Logging in... I (40641) ovms-server-v2: Sending server login: MP-C 0 ... I (40661) ovms-server-v2: Got server response: MP-S 0 ... I (40661) ovms-server-v2: Server token is ... I (40671) ovms-server-v2: Status: Server auth ok. Now priming crypto. I (40671) ovms-server-v2: Shared secret key is ... I (40671) ovms-server-v2: Status: OVMS V2 login successful, and crypto channel established I (40681) ovms-server-v2: Incoming Msg: MP-0 Z0 I (41441) ovms-server-v2: Send MP-0 S50,K,0,0,done,standard,200,160,0,0,0,0,13,4,0,0,0,0,160.00,0,0,0,0,-1,0,0,0,0,0,400,0,0.00,400.00,100 I (41441) ovms-server-v2: Send MP-0 D0,0,5,22,30,25,0,1000000,0,41,22,0,0,0,0,0,0,0,22,0 I (41451) ovms-server-v2: Send MP-0 L22.280869,114.160599,10,30,1,0,0,0,0.0,0,0 I (41451) ovms-server-v2: Send MP-0 W30,33,40,34,30,33,40,34,0 I (41451) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d06) Jan 30 2018 07:43:15,DEMODEMODEMO,0,1,DEMO,STUBBY OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) E (45601) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 Tried a dozen times or so. Is there anything special you do to cause this? Can you try the above simple case and see if you get the crash? What does your ‘module memory’ look like with v2 server running?
I did notice what looks like a memory leak, EventCallbackEntry objects are new'ed and added to the list in RegisterEvent, but they're only removed from the list in OvmsEvents::DeregisterEvent(). I think they should be delete'ed too?
Yep, that is not good. I found the same in metrics and notifications. Fixed all three. There are still memory leaks elsewhere (module memory is showing them), but that is a start. Regards, Mark
On 30 Jan 2018, at 6:53 PM, Tom Parker <tom@carrott.org> wrote:
On 29/01/18 13:07, Mark Webb-Johnson wrote:
The bigger issue is the PPP level, and I’m not sure how to reliably detect issues there. Also not sure if the issue you are seeing is at the TCP server connection level, or low level PPP transport. Do you know if you restart the v2 server module, but leave the simcom connection as it is, does that resolve it?
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow.
I commented out the MyMetrics.RegisterListener call in ovms_server_v2.cpp and got basically the same crash in OvmsEvents::DeregisterEvent so something strange is going on. I read the documentation for std::list and it looks like it's doing the right thing. Hmm... could the ec->m_caller == caller explode if the caller passed to RegisterEvent has been freed? My naive understanding of C++ would suggest no, because a pointer to invalid memory does not equal a pointer to valid memory, but I wouldn't be surprised if == does something other than pointer comparison sometimes. The caller is only accessed during deregister, so things would operate normally until you try to deregister.
I suppose it would have been quicker to recompile with gdb enabled than to stare at the code, but that's for tomorrow.
I did notice what looks like a memory leak, EventCallbackEntry objects are new'ed and added to the list in RegisterEvent, but they're only removed from the list in OvmsEvents::DeregisterEvent(). I think they should be delete'ed too?
I (55374) ovms-server-v2: Send MP-0 W0,0,0,0,0,0,0,0,0 I (55374) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Jan 30 2018 10:07:33,,5,1,NL,2degrees OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x40121a5a PS : 0x00060330 A0 : 0x8014fb4f A1 : 0x3ffd9ce0 0x40121a5a: OvmsMetrics::DeregisterListener(char const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327
A2 : 0x3ffb6d90 A3 : 0x3f41749c A4 : 0x3ffdfa1d A5 : 0x3ffdb89c A6 : 0x3ffdb8ac A7 : 0x3ffdb8ac A8 : 0x00000000 A9 : 0x3ffd9cc0 A10 : 0x3ffdd244 A11 : 0x3ffdb904 A12 : 0x3ffd9cf0 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000018 EXCCAUSE: 0x0000001c EXCVADDR: 0x00000004 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5
Backtrace: 0x40121a5a:0x3ffd9ce0 0x4014fb4c:0x3ffd9d00 0x4014fbee:0x3ffd9d40 0x4014f117:0x3ffd9d60 0x4012ad9e:0x3ffd9d80 0x4012ae91:0x3ffd9db0 0x4012ae83:0x3ffd9de0 0x4012ae83:0x3ffd9e10 0x4012aeb9:0x3ffd9e40 0x401228b8:0x3ffd9e60 0x4012dbfd:0x3ffd9e80 0x4012dc64:0x3ffd9ee0 0x4012292f:0x3ffd9f20 0x40122946:0x3ffd9f40 0x40122e89:0x3ffd9f60 0x40125811:0x3ffd9f90 0x40125a09:0x3ffd9fc0 0x40122d2d:0x3ffd9fe0 0x40122d3c:0x3ffda000 0x40125481:0x3ffda020 0x40121a5a: OvmsMetrics::DeregisterListener(char const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 0x4014fb4c: OvmsServerV2::~OvmsServerV2() at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616 0x4014fbee: OvmsServerV2::~OvmsServerV2() at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625 0x4014f117: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1) 0x4012ad9e: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /vagrant/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Crashes here, without fail (2 out of 2 tries). Latest bits from Master. I had log level set to verbose. Greg OVMS > server v2 status OVMS V2 login successful, and crypto channel established OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x401222ee PS : 0x00060830 A0 : 0x8014dac3 A1 : 0x3ffdf310 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299 A2 : 0x3ffe9474 A3 : 0x3f417638 A4 : 0x3ffe3fe8 A5 : 0x951ac700 A6 : 0x3ffe88ec A7 : 0x3ffe88ec A8 : 0x80122305 A9 : 0x3ffdf2f0 A10 : 0x3ffe92bc A11 : 0x3ffe92e8 A12 : 0x3ffdf330 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000008 EXCCAUSE: 0x0000001c EXCVADDR: 0x951ac704 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5 Backtrace: 0x401222ee:0x3ffdf310 0x4014dac0:0x3ffdf340 0x4014db62:0x3ffdf380 0x4014d0df:0x3ffdf3a0 0x401273d6:0x3ffdf3c0 0x401274c9:0x3ffdf3f0 0x401274bb:0x3ffdf420 0x401274bb:0x3ffdf450 0x401274f1:0x3ffdf480 0x4011e894:0x3ffdf4a0 0x40129af5:0x3ffdf4c0 0x40129b5c:0x3ffdf520 0x4011e90b:0x3ffdf560 0x4011e922:0x3ffdf580 0x40128855:0x3ffdf5a0 0x40124ac5:0x3ffdf5d0 0x40124cbd:0x3ffdf600 0x401286fd:0x3ffdf620 0x4012870c:0x3ffdf640 0x4011ddbd:0x3ffdf660 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299 0x4014dac0: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616 0x4014db62: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625 0x4014d0df: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1) 0x401273d6: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83 0x401274c9: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83 0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83 0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83 0x401274f1: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83 0x4011e894: Execute(microrl*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:47 0x40129af5: new_line_handler at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:620 0x40129b5c: microrl_insert_char at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:668 0x4011e90b: OvmsShell::ProcessChar(char) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:70 0x4011e922: OvmsShell::ProcessChars(char const*, int) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:77 (discriminator 2) 0x40128855: ConsoleAsync::HandleDeviceEvent(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:169 0x40124ac5: OvmsConsole::Poll(unsigned int, void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:150 0x40124cbd: OvmsConsole::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:130 (discriminator 1) 0x401286fd: ConsoleAsync::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:80 0x4012870c: non-virtual thunk to ConsoleAsync::Service() at ??:? 0x4011ddbd: TaskBase::Task(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./task_base.cpp:156 Rebooting... Mark Webb-Johnson wrote:
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow.
Trying to repeat this, but can’t. Here is what I get:
Something weird going on here:
0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
My line 299 of ovms_metrics.cpp is the ending curly brace on OvmsMetrics::InitFloat. A good 20+ lines above DeregisterListener (which starts at line 319). Can you send me your ovms_metrics.cpp and sdkconfig? Regards, Mark.
On 31 Jan 2018, at 1:20 PM, Greg D. <gregd2350@gmail.com> wrote:
Crashes here, without fail (2 out of 2 tries). Latest bits from Master. I had log level set to verbose.
Greg
OVMS > server v2 status OVMS V2 login successful, and crypto channel established OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x401222ee PS : 0x00060830 A0 : 0x8014dac3 A1 : 0x3ffdf310 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
A2 : 0x3ffe9474 A3 : 0x3f417638 A4 : 0x3ffe3fe8 A5 : 0x951ac700 A6 : 0x3ffe88ec A7 : 0x3ffe88ec A8 : 0x80122305 A9 : 0x3ffdf2f0 A10 : 0x3ffe92bc A11 : 0x3ffe92e8 A12 : 0x3ffdf330 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000008 EXCCAUSE: 0x0000001c EXCVADDR: 0x951ac704 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5
Backtrace: 0x401222ee:0x3ffdf310 0x4014dac0:0x3ffdf340 0x4014db62:0x3ffdf380 0x4014d0df:0x3ffdf3a0 0x401273d6:0x3ffdf3c0 0x401274c9:0x3ffdf3f0 0x401274bb:0x3ffdf420 0x401274bb:0x3ffdf450 0x401274f1:0x3ffdf480 0x4011e894:0x3ffdf4a0 0x40129af5:0x3ffdf4c0 0x40129b5c:0x3ffdf520 0x4011e90b:0x3ffdf560 0x4011e922:0x3ffdf580 0x40128855:0x3ffdf5a0 0x40124ac5:0x3ffdf5d0 0x40124cbd:0x3ffdf600 0x401286fd:0x3ffdf620 0x4012870c:0x3ffdf640 0x4011ddbd:0x3ffdf660 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
0x4014dac0: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616
0x4014db62: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625
0x4014d0df: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1)
0x401273d6: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274c9: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274f1: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x4011e894: Execute(microrl*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:47
0x40129af5: new_line_handler at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:620
0x40129b5c: microrl_insert_char at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:668
0x4011e90b: OvmsShell::ProcessChar(char) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:70
0x4011e922: OvmsShell::ProcessChars(char const*, int) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:77 (discriminator 2)
0x40128855: ConsoleAsync::HandleDeviceEvent(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:169
0x40124ac5: OvmsConsole::Poll(unsigned int, void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:150
0x40124cbd: OvmsConsole::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:130 (discriminator 1)
0x401286fd: ConsoleAsync::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:80
0x4012870c: non-virtual thunk to ConsoleAsync::Service() at ??:?
0x4011ddbd: TaskBase::Task(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./task_base.cpp:156
Rebooting...
Mark Webb-Johnson wrote:
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow.
Trying to repeat this, but can’t. Here is what I get:
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Sent direct. Greg Mark Webb-Johnson wrote:
Something weird going on here:
0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299 My line 299 of ovms_metrics.cpp is the ending curly brace on OvmsMetrics::InitFloat. A good 20+ lines above DeregisterListener (which starts at line 319).
Can you send me your ovms_metrics.cpp and sdkconfig?
Regards, Mark.
On 31 Jan 2018, at 1:20 PM, Greg D. <gregd2350@gmail.com> wrote:
Crashes here, without fail (2 out of 2 tries). Latest bits from Master. I had log level set to verbose.
Greg
OVMS > server v2 status OVMS V2 login successful, and crypto channel established OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x401222ee PS : 0x00060830 A0 : 0x8014dac3 A1 : 0x3ffdf310 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
A2 : 0x3ffe9474 A3 : 0x3f417638 A4 : 0x3ffe3fe8 A5 : 0x951ac700 A6 : 0x3ffe88ec A7 : 0x3ffe88ec A8 : 0x80122305 A9 : 0x3ffdf2f0 A10 : 0x3ffe92bc A11 : 0x3ffe92e8 A12 : 0x3ffdf330 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000008 EXCCAUSE: 0x0000001c EXCVADDR: 0x951ac704 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5
Backtrace: 0x401222ee:0x3ffdf310 0x4014dac0:0x3ffdf340 0x4014db62:0x3ffdf380 0x4014d0df:0x3ffdf3a0 0x401273d6:0x3ffdf3c0 0x401274c9:0x3ffdf3f0 0x401274bb:0x3ffdf420 0x401274bb:0x3ffdf450 0x401274f1:0x3ffdf480 0x4011e894:0x3ffdf4a0 0x40129af5:0x3ffdf4c0 0x40129b5c:0x3ffdf520 0x4011e90b:0x3ffdf560 0x4011e922:0x3ffdf580 0x40128855:0x3ffdf5a0 0x40124ac5:0x3ffdf5d0 0x40124cbd:0x3ffdf600 0x401286fd:0x3ffdf620 0x4012870c:0x3ffdf640 0x4011ddbd:0x3ffdf660 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
0x4014dac0: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616
0x4014db62: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625
0x4014d0df: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1)
0x401273d6: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274c9: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274f1: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x4011e894: Execute(microrl*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:47
0x40129af5: new_line_handler at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:620
0x40129b5c: microrl_insert_char at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:668
0x4011e90b: OvmsShell::ProcessChar(char) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:70
0x4011e922: OvmsShell::ProcessChars(char const*, int) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:77 (discriminator 2)
0x40128855: ConsoleAsync::HandleDeviceEvent(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:169
0x40124ac5: OvmsConsole::Poll(unsigned int, void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:150
0x40124cbd: OvmsConsole::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:130 (discriminator 1)
0x401286fd: ConsoleAsync::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:80
0x4012870c: non-virtual thunk to ConsoleAsync::Service() at ??:?
0x4011ddbd: TaskBase::Task(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./task_base.cpp:156
Rebooting...
Mark Webb-Johnson wrote:
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow. Trying to repeat this, but can’t. Here is what I get:
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Hi Mark,
Cleaned the build stuff, leaving sdkconfig as it is, and there was no change in the crash or line #. Changed sdkconfig and re-cleaned and built, and now the crash is gone.
So, which of the sdkconfig changes did it? Which would you like reverted?
Greg
Greg’s config differed from mine, and I gave the following recommendations:
Clear CONFIG_FATFS_PER_FILE_CACHE (save memory) CONFIG_FREERTOS_USE_TRACE_FACILITY=y (needed for module memory command) CONFIG_HEAP_POISONING_LIGHT=y (needed for module memory command) CONFIG_HEAP_TASK_TRACKING=y (needed for module task command)
His crashes stopped when he set those options. I just tried turning off CONFIG_HEAP_POISONING_LIGHT (and CONFIG_HEAP_TASK_TRACKING) and got a crash. Backtrace looks clean, and problem is related to deregistering the metrics listener: Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x4012892a PS : 0x00060e30 A0 : 0x80152a97 A1 : 0x3ffdb810 0x4012892a: OvmsMetrics::DeregisterListener(char const*) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 OvmsMetrics::DeregisterListener (this=0x3ffb7140 <MyMetrics>, caller=0x3f418acc "ovms-server-v2") at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 327 if (ec->m_caller == caller) (gdb) bt #0 OvmsMetrics::DeregisterListener (this=0x3ffb7140 <MyMetrics>, caller=0x3f418acc "ovms-server-v2") at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:327 #1 0x40152a97 in OvmsServerV2::~OvmsServerV2 (this=0x3ffe2974, __in_chrg=<optimized out>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616 #2 0x40152b39 in OvmsServerV2::~OvmsServerV2 (this=0x3ffe2974, __in_chrg=<optimized out>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625 #3 0x4015206e in ovmsv2_stop (verbosity=65535, writer=<optimized out>, cmd=0x3ffc5894, argc=0, argv=0x0) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 #4 0x40122af1 in OvmsCommand::Execute (this=0x3ffc5894, verbosity=65535, writer=0x3ffd8638, argc=0, argv=0x0) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:268 #5 0x40122be4 in OvmsCommand::Execute (this=0x3ffc57ac, verbosity=65535, writer=0x3ffd8638, argc=1, argv=0x3ffdb9c8) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:313 #6 0x40122bd6 in OvmsCommand::Execute (this=0x3ffc5738, verbosity=65535, writer=0x3ffd8638, argc=2, argv=0x3ffdb9c8) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:309 #7 0x40122bd6 in OvmsCommand::Execute (this=0x3ffb7068 <MyCommandApp+4>, verbosity=65535, writer=0x3ffd8638, argc=3, argv=0x3ffdb9c4) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:309 #8 0x40122c0c in OvmsCommandApp::Execute (this=0x3ffb7064 <MyCommandApp>, verbosity=65535, writer=0x3ffd8638, argc=3, argv=0x3ffdb9c0) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:641 #9 0x4012be0b in Execute (rl=0x3ffd864c, argc=3, argv=0x3ffdb9c0) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:47 #10 0x4012ddec in new_line_handler (pThis=0x3ffd864c) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:620 #11 0x4012de53 in microrl_insert_char (pThis=0x3ffd864c, ch=10) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:668 #12 0x4012be82 in OvmsShell::ProcessChar (this=0x3ffd8638, c=<optimized out>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:70 #13 0x4012be99 in OvmsShell::ProcessChars (this=0x3ffd8638, buf=0x3ffd8df9 "pAt\300\363\266\330X\312Nz\273\350\210Z6\263\317\061uu)\av\251\312\336;\215\324\025\271\032y!Ņ\220\060\325A\255\317Lz\365\204\f2O[\263\336r\300\235\214V\365\352\341\235}\037\212\366\350i1_\034遽͎\326h<\246\025\301\b\367\313\\Y\005Ź\300r6\217\213G`\030\364Ax$\n\242\255\v\211\365\071~\350\315\324[/\316\367+\227\226=\332\f\343\020\001\221\325p\375\v\375\231)\353ְ\027yu\260\037W3x\020\060/V\374\355\004Y[\273\316SU\205\265\366/\001\323?\307\353\\ \237\211\260ğЃ84E \032\274\303q3F+\353\343\346\324\327\027\"Enk\002\206", <incomplete sequence \371>..., len=1) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:77 #14 0x40120f54 in ConsoleAsync::HandleDeviceEvent (this=0x3ffd8638, pEvent=<optimized out>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:169 #15 0x40125754 in OvmsConsole::Poll (this=0x3ffd8638, ticks=4294967295, queue=0x3ffd94b0) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:150 #16 0x4012594c in OvmsConsole::Service (this=0x3ffd8638) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:130 #17 0x40120df8 in ConsoleAsync::Service (this=0x3ffd8638) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:80 #18 0x40120e07 in non-virtual thunk to ConsoleAsync::Service() () #19 0x4012cc80 in TaskBase::Task (object=0x3ffd8de4) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./task_base.cpp:156 (gdb) p ec $1 = (MetricCallbackEntry *) 0xe3ffe32 (gdb) p *ec $2 = {_vptr$MetricCallbackEntry = 0xffffffff, m_caller = 0xffffffff '\377' <repeats 200 times>..., m_callback = {<std::_Maybe_unary_or_binary_function<void, OvmsMetric*>> = {<std::unary_function<OvmsMetric*, void>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = { static _M_max_size = 8, static _M_max_align = 4, _M_functor = {_M_unused = {_M_object = 0xffffffff, _M_const_object = 0xffffffff, _M_function_pointer = 0xffffffff, _M_member_pointer = &virtual table offset -2, this adjustment -1}, _M_pod_data = "\377\377\377\377\377\377\377\377"}, _M_manager = 0xffffffff}, _M_invoker = 0xffffffff}} (gdb) p *ml $3 = {<std::__cxx11::_List_base<MetricCallbackEntry*, std::allocator<MetricCallbackEntry*> >> = { _M_impl = {<std::allocator<std::_List_node<MetricCallbackEntry*> >> = {<__gnu_cxx::new_allocator<std::_List_node<MetricCallbackEntry*> >> = {<No data fields>}, <No data fields>}, _M_node = {<std::__detail::_List_node_base> = {_M_next = 0x3ffe3fd4, _M_prev = 0x3ffe3fd4}, _M_data = 0}}}, <No data fields>} I’m rushing around at the moment, trying to get the hardware finalised, so don’t have a lot of time to work on it. As a workaround, @Tom please turn on CONFIG_FREERTOS_USE_TRACE_FACILITY, CONFIG_HEAP_POISONING_LIGHT and CONFIG_HEAP_TASK_TRACKING, and see if that works for you. Those settings will also give you the ‘module memory’ and ‘module tasks’ commands (which developers certainly need). Regards, Mark.
On 31 Jan 2018, at 1:38 PM, Greg D. <gregd2350@gmail.com> wrote:
Sent direct.
Greg
Mark Webb-Johnson wrote:
Something weird going on here:
0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299 My line 299 of ovms_metrics.cpp is the ending curly brace on OvmsMetrics::InitFloat. A good 20+ lines above DeregisterListener (which starts at line 319).
Can you send me your ovms_metrics.cpp and sdkconfig?
Regards, Mark.
On 31 Jan 2018, at 1:20 PM, Greg D. <gregd2350@gmail.com> wrote:
Crashes here, without fail (2 out of 2 tries). Latest bits from Master. I had log level set to verbose.
Greg
OVMS > server v2 status OVMS V2 login successful, and crypto channel established OVMS > server v2 stop Stopping OVMS Server V2 connection (oscv2) Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x401222ee PS : 0x00060830 A0 : 0x8014dac3 A1 : 0x3ffdf310 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
A2 : 0x3ffe9474 A3 : 0x3f417638 A4 : 0x3ffe3fe8 A5 : 0x951ac700 A6 : 0x3ffe88ec A7 : 0x3ffe88ec A8 : 0x80122305 A9 : 0x3ffdf2f0 A10 : 0x3ffe92bc A11 : 0x3ffe92e8 A12 : 0x3ffdf330 A13 : 0x3ffae8d8 A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000008 EXCCAUSE: 0x0000001c EXCVADDR: 0x951ac704 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff5
Backtrace: 0x401222ee:0x3ffdf310 0x4014dac0:0x3ffdf340 0x4014db62:0x3ffdf380 0x4014d0df:0x3ffdf3a0 0x401273d6:0x3ffdf3c0 0x401274c9:0x3ffdf3f0 0x401274bb:0x3ffdf420 0x401274bb:0x3ffdf450 0x401274f1:0x3ffdf480 0x4011e894:0x3ffdf4a0 0x40129af5:0x3ffdf4c0 0x40129b5c:0x3ffdf520 0x4011e90b:0x3ffdf560 0x4011e922:0x3ffdf580 0x40128855:0x3ffdf5a0 0x40124ac5:0x3ffdf5d0 0x40124cbd:0x3ffdf600 0x401286fd:0x3ffdf620 0x4012870c:0x3ffdf640 0x4011ddbd:0x3ffdf660 0x401222ee: OvmsMetrics::DeregisterListener(char const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_metrics.cpp:299
0x4014dac0: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1616
0x4014db62: OvmsServerV2::~OvmsServerV2() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1625
0x4014d0df: ovmsv2_stop(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/ovms_server_v2/src/ovms_server_v2.cpp:1659 (discriminator 1)
0x401273d6: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274c9: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274bb: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x401274f1: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_command.cpp:83
0x4011e894: Execute(microrl*, int, char const* const*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:47
0x40129af5: new_line_handler at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:620
0x40129b5c: microrl_insert_char at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/microrl/./microrl.c:668
0x4011e90b: OvmsShell::ProcessChar(char) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:70
0x4011e922: OvmsShell::ProcessChars(char const*, int) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_shell.cpp:77 (discriminator 2)
0x40128855: ConsoleAsync::HandleDeviceEvent(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:169
0x40124ac5: OvmsConsole::Poll(unsigned int, void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:150
0x40124cbd: OvmsConsole::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./ovms_console.cpp:130 (discriminator 1)
0x401286fd: ConsoleAsync::Service() at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./console_async.cpp:80
0x4012870c: non-virtual thunk to ConsoleAsync::Service() at ??:?
0x4011ddbd: TaskBase::Task(void*) at /home/greg/greg/ovms/Open-Vehicle-Monitoring-System-3-master/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/./task_base.cpp:156
Rebooting...
Mark Webb-Johnson wrote:
Stopping the v2 server causes a panic even when it has connected right after boot. The below stack trace is from current master. Looking at the code where it panics, it seems like the MetricCallbackList is corrupted somehow. Trying to repeat this, but can’t. Here is what I get:
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 31/01/18 20:06, Mark Webb-Johnson wrote:
As a workaround, @Tom please turn on CONFIG_FREERTOS_USE_TRACE_FACILITY, CONFIG_HEAP_POISONING_LIGHT and CONFIG_HEAP_TASK_TRACKING, and see if that works for you. Those settings will also give you the ‘module memory’ and ‘module tasks’ commands (which developers certainly need).
I wasn't able to find a combination of those options which avoided the crash. I did find the bug, when you erase an iterator from a list or map, the iterator is invalidated. I'm guessing this is because the memory storing the iterator object itself has been released. I've sent a pull request which fixes the server v2 stop crash it in my environment, but I haven't audited all the code to see if the faulty pattern exists anywhere else. I haven't yet had a chance to reproduce the original disconnection problem and see if stopping and starting the v2 server connection helps.
On 04/02/18 15:33, Tom Parker wrote:
I haven't yet had a chance to reproduce the original disconnection problem and see if stopping and starting the v2 server connection helps.
It does not help. I've seen a stack overflow during server v2 stop, but the backtrace didn't make any sense so I think I might have run addr2line on the wrong binary. I've also seen the server v2 stop work. Below we see the server v2 sending messages to the server which are not being received. I restarted the server v2 and then it noticed that it could not connect. Then I tried restarting the simcom and that didn't restore network connectivity. Only when I reset it with the button did it reconnect. I've tried this simcom power cycle to restore connectivity several times and it's never worked once it's in this bad state. It feels like the simcom state machine has a bug where it thinks that it is connected but actually it isn't, so it never reconnects. This bug appears to persist over simcom power cycles. I've now got verbose logging turned on and I've removed modemmanager on my data logging laptop so hopefully I'll have some more information about the cause. I (3929403) ovms-server-v2: Send MP-0 F3.0.0/factory/main build (idf v3.1-dev-217-g5bf85d0) Feb 4 2018 02:36:19,,2,1,NL,2degrees OVMS > I (3961393) ovms-server-v2: Send MP-0 S35,K,0,0,stopped,standard,29,0,0,0,0,0,13,21,0,0,0,0,62.23,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,373.00,94 OVMS > I (3990393) ovms-server-v2: Send MP-0 S35,K,0,0,stopped,standard,29,0,0,0,0,0,13,21,0,0,0,0,62.23,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,371.00,94 I (3990393) ovms-server-v2: Send MP-0 D128,0,5,0,0,27,0,0,49,3990,22,1,1,1,13.0385,0,0,128,0,0 I (3990403) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,49,0,0,0,0 OVMS > I (4051393) ovms-server-v2: Send MP-0 S34,K,0,0,stopped,standard,29,0,0,0,0,0,13,21,0,0,0,0,62.23,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,370.00,94 I (4051393) ovms-server-v2: Send MP-0 D128,0,5,0,0,27,0,0,17,4051,22,1,1,1,12.9945,0,0,128,0,0 I (4051403) ovms-server-v2: Send MP-0 L0,0,0,0,0,0,17,0,0,0,0 OVMS > simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 618 User Data: 0 Mux Open Channels: 4 PPP Connected on channel: #2 PPP Last Error: None GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected OVMS > simcom server v2 stop Stopping OVMS Server V2 connection (oscv2) OVMS > E (4104393) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS > server v2 start Launching OVMS Server V2 connection (oscv2) OVMS > I (4108553) ovms-server-v2: Status: Starting I (4108553) ovms-server-v2: OVMS Server v2 running I (4108553) ovms-server-v2: Connection is ovms.dexters-web.de:6867 NZLV3/Iekei2ae I (4108553) ovms-server-v2: Status: Connecting... OVMS > W (4119433) ovms-server-v2: Connection failed E (4119443) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 OVMS > simocom com power off Unrecognised command OVMS > power simcom off Power mode of simcom is now off OVMS > I (4139133) simcom: State: Enter PoweringOff state I (4139133) gsm-ppp: Shutting down (soft)... I (4139133) ovms-server-v2: Network is reconfigured, so disconnect network connection I (4139153) gsm-nmea: Shutdown (direct) I (4139153) simcom: Power Cycle OVMS > I (4149393) simcom: State timeout, transition to 1 I (4149393) simcom: State: Enter CheckPowerOff state OVMS > I (4151153) gsm-ppp: StatusCallBack: User Interrupt I (4151153) gsm-ppp: PPP connection has been closed OVMS > D (4160393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4161393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4162393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4163393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4164393) simcom: tx scmd ch=0 len=4 : AT|| I (4164393) simcom: State timeout, transition to 13 I (4164393) simcom: State: Enter PoweredOff state OVMS > simnc cmpower simcom off n Power mode of simcom is now on OVMS > I (4178943) simcom: State: Enter PoweringOn state I (4178943) simcom: Power Cycle D (4179393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4180393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4181393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4182393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4183393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > D (4184393) simcom: tx scmd ch=0 len=4 : AT|| OVMS > I (4184883) simcom: State: Enter PoweredOn state OVMS > D (4188213) simcom: rx line ch=0 len=12 : +CPIN: READY D (4188213) simcom: rx line ch=0 len=12 : OPL UPDATING D (4188213) simcom: rx line ch=0 len=12 : PNN UPDATING OVMS > D (4189783) simcom: rx line ch=0 len=8 : SMS DONE OVMS > D (4192823) simcom: rx line ch=0 len=10 : CALL READY D (4192823) simcom: rx line ch=0 len=7 : PB DONE OVMS > D (4194393) simcom: tx scmd ch=0 len=103 : AT+CPIN?;+CREG=1;+CTZU=1;+CTZR=1;+CLIP=1;+CMGF=1;+CNMI=1,2,0,0,0;+CSDH=1;+CMEE=2;+CSQ;+AUTOCSQ=1,1;E0|| D (4194453) simcom: rx line ch=0 len=101 : AT+CPIN?;+CREG=1;+CTZU=1;+CTZR=1;+CLIP=1;+CMGF=1;+CNMI=1,2,0,0,0;+CSDH=1;+CMEE=2;+CSQ;+AUTOCSQ=1,1;E0 D (4194533) simcom: rx line ch=0 len=12 : +CPIN: READY D (4194533) simcom: rx line ch=0 len=10 : +CSQ: 6,99 D (4194543) simcom: rx line ch=0 len=2 : OK OVMS > D (4196393) simcom: tx scmd ch=0 len=16 : AT+CGMR;+ICCID|| D (4196463) simcom: rx line ch=0 len=23 : +CGMR: 35316B10SIM5360E D (4196463) simcom: rx line ch=0 len=27 : +ICCID: 8964240002011263238 D (4196463) simcom: rx line ch=0 len=2 : OK OVMS > D (4199393) simcom: tx scmd ch=0 len=8 : AT+COPS? OVMS > D (4200393) simcom: tx scmd ch=0 len=20 : AT+CMUXSRVPORT=3,1|| D (4200393) simcom: rx line ch=0 len=5 : ERROR OVMS > D (4201393) simcom: tx scmd ch=0 len=20 : AT+CMUXSRVPORT=2,1|| D (4201403) simcom: rx line ch=0 len=2 : OK OVMS > D (4202393) simcom: tx scmd ch=0 len=20 : AT+CMUXSRVPORT=1,1|| D (4202403) simcom: rx line ch=0 len=2 : OK OVMS > D (4203393) simcom: tx scmd ch=0 len=20 : AT+CMUXSRVPORT=0,5|| D (4203423) simcom: rx line ch=0 len=2 : OK OVMS > D (4204393) simcom: tx scmd ch=0 len=11 : AT+CMUX=0|| D (4204403) simcom: rx line ch=0 len=2 : OK I (4204403) simcom: State: Enter MuxStart state I (4204403) gsm-mux: Start MUX I (4204413) gsm-mux: Channel #0 is open I (4204423) gsm-mux: Channel #1 is open I (4204423) gsm-mux: Channel #2 is open I (4204433) gsm-mux: Channel #3 is open I (4204443) gsm-mux: Channel #4 is open OVMS > D (4234413) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (4234423) simcom: rx line ch=3 len=29 : +CCLK: "18/02/05,16:46:44+52" D (4234423) simcom: rx line ch=3 len=10 : +CSQ: 6,99 D (4234423) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (4234423) simcom: rx line ch=3 len=2 : OK OVMS > D (4264413) simcom: rx line ch=3 len=10 : +CREG: 1,1 D (4264413) simcom: rx line ch=3 len=29 : +CCLK: "18/02/05,16:47:14+52" D (4264413) simcom: rx line ch=3 len=10 : +CSQ: 6,99 D (4264413) simcom: rx line ch=3 len=23 : +COPS: 0,0,"2degrees",2 D (4264413) simcom: rx line ch=3 len=2 : OK
On 06/02/18 12:42, Tom Parker wrote:
It feels like the simcom state machine has a bug where it thinks that it is connected but actually it isn't, so it never reconnects. This bug appears to persist over simcom power cycles. I've now got verbose logging turned on and I've removed modemmanager on my data logging laptop so hopefully I'll have some more information about the cause.
Attached is another log file. Here the server v2 code and the simcom noticed that it was disconnected which I haven't seen before, but the simcom never reconnected. See the attached file for the earlier and later messages, the AT+CREG?... loop quoted below went on for about 15 minutes without reconnecting, and doing a power simcom off followed by on did not make it reconnect. Pressing the button made it connect straight away. I believe I was stationary when power cycling and then resetting the whole module. Why is "simcom: rx line" not always sometimes printed? If you look further back and forward in the log you'll see it decoding most of the AT command responses. Is this the simcom response parser? It looks like the AT+CREG?... are being sent but sometimes the results aren't be interpreted, so the simcom code doesn't know it is connected? +CSQ: 1,99 suggests there is very low signal? That might explain the disconnections. My antenna is poor (a wifi rubber ducky with an RP-SMA adapter, I could switch to a proper antenna too but then my problem might go away and we wouldn't be able to collect debug information). After the reboot the signal quality was only +CSQ: 2,99 and it reconnected fine. OVMS > V (4235393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4235393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4235393) simcom: tx: 0a cf f9 | ... V (4235403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4235403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4235413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4235413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4235413) simcom: rx: 2c 31 38 3a 31 35 3a 33 30 2b 35 32 22 0d 0a 0d | ,18:15:30+52"... V (4235413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 0,99....+ V (4235413) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4235413) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom statusV (4245393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4245393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4245393) simcom: tx: 0a cf f9 | ... V (4245423) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4245423) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4245463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4245463) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4245463) simcom: rx: 2c 31 38 3a 31 35 3a 34 30 2b 35 32 22 0d 0a 0d | ,18:15:40+52"... V (4245473) simcom: rx: 0a 2b 43 53 51 3a 20 31 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 1,99....+ V (4245473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4245473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom status SIMCOM Network Registration: Searching State: NetWait Ticker: 1570 User Data: 0 Mux Open Channels: 4 PPP Not Connected PPP Last Error: User Interrupt GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected OVMS > server v2 status Error: Disconnected from OVMS Server V2
It seems that around 4045393 in the logs the simcom driver thinks it is in mux mode, but the modem is not. The modem is just transmitting raw AT commands, not framed in gsm-mux. The gsm-mux framing starts with a f9 and ends with a checksum then f9 again. Like this: V (119553) gsm-ppp: tx: 7e 21 45 00 00 7a 00 34 00 00 ff 06 08 58 64 66 | ~!E..z.4.....Xdf V (119553) gsm-ppp: tx: 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 2f 84 b9 bc | F...K..<..../... V (119553) gsm-ppp: tx: 8d 1c 50 18 16 2a 31 ce 00 00 31 78 4b 44 45 79 | ..P..*1...1xKDEy V (119553) gsm-ppp: tx: 35 5a 35 6e 4b 35 66 35 2b 62 70 56 6a 52 2b 34 | 5Z5nK5f5+bpVjR+4 V (119553) gsm-ppp: tx: 59 49 44 62 44 68 6c 63 66 72 47 54 50 44 45 56 | YIDbDhlcfrGTPDEV V (119553) gsm-ppp: tx: 2b 65 42 59 78 4d 49 41 31 4c 42 77 4d 52 45 7a | +eBYxMIA1LBwMREz V (119553) gsm-ppp: tx: 79 36 44 74 76 2b 58 56 41 36 61 51 58 51 36 46 | y6Dtv+XVA6aQXQ6F V (119553) gsm-ppp: tx: 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a bd aa 7e | sIBNLYHA==....~ V (119553) simcom: tx: f9 09 ff ff 7e 21 45 00 00 7a 00 34 00 00 ff 06 | ....~!E..z.4.... V (119553) simcom: tx: 08 58 64 66 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 | .XdfF...K..<.... V (119553) simcom: tx: 2f 84 b9 bc 8d 1c 50 18 16 2a 31 ce 00 00 31 78 | /.....P..*1...1x V (119553) simcom: tx: 4b 44 45 79 35 5a 35 6e 4b 35 66 35 2b 62 70 56 | KDEy5Z5nK5f5+bpV V (119553) simcom: tx: 6a 52 2b 34 59 49 44 62 44 68 6c 63 66 72 47 54 | jR+4YIDbDhlcfrGT V (119553) simcom: tx: 50 44 45 56 2b 65 42 59 78 4d 49 41 31 4c 42 77 | PDEV+eBYxMIA1LBw V (119553) simcom: tx: 4d 52 45 7a 79 36 44 74 76 2b 58 56 41 36 61 51 | MREzy6Dtv+XVA6aQ V (119553) simcom: tx: 58 51 36 46 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a | XQ6FsIBNLYHA==.. V (119553) simcom: tx: bd aa 7e 9a f9 | ..~.. There was a network disconnection around 552393, recovered well. Another at 2080493, also recovered well. But then around 2666343 we lose gms mux framing. V (2666333) simcom: rx: 14 f9 f9 11 ff 19 0d 0a 2b 43 52 45 47 3a 20 32 | ........+CREG: 2 V (2666333) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=14, LEN=18) V (2666333) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=15, IFP=3) D (2666333) simcom: rx line ch=3 len=8 : +CREG: 2 I (2666343) simcom: CREG Network Registration: Searching V (2666343) simcom: rx: 0d 0a 19 f9 | .... V (2666343) gsm-mux: ProcessFrame(CHAN=4, ADDR=11, CTRL=ff, FCS=19, LEN=18) V (2666343) gsm-mux: ChanProcessFrame(CHAN=4, ADDR=11, CTRL=ff, LEN=15, IFP=3) D (2666343) simcom: rx line ch=4 len=8 : +CREG: 2 I (2666393) simcom: Lost network connection (NetworkRegistration in NetMode) I (2666393) simcom: State: Enter NetLoss state V (2666393) simcom: tx: f9 0d ff 19 41 54 2b 43 47 41 54 54 3d 30 0d 0a | ....AT+CGATT=0.. V (2666393) simcom: tx: 14 f9 | .. I (2666393) gsm-ppp: Shutting down (hard)... I (2666393) ovms-server-v2: Network is reconfigured, so disconnect network connection E (2666393) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 I (2666403) gsm-ppp: StatusCallBack: User Interrupt I (2666403) gsm-ppp: PPP connection has been closed V (2666493) simcom: rx: f9 0d ff 0d 0d 0a 4f 4b 0d 0a 0f f9 | ......OK.... V (2666503) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 33 31 2c 39 | ......+CSQ: 31,9 V (2666503) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2666503) simcom: rx: 20 33 31 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 31,99.......... V (2666503) simcom: rx: 2b 43 52 45 47 3a 20 31 0d 0a 14 f9 f9 11 ff 19 | +CREG: 1........ V (2666503) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 31 0d 0a 19 f9 | ..+CREG: 1.... V (2666513) simcom: rx: f9 09 ff 23 7e ff 7d 23 c0 21 7d 25 7d 32 7d 20 | ...#~.}#.!}%}2} V (2666513) simcom: rx: 7d 24 cc ad 7e 1f f9 f9 0d ff 2f 0d 0a 2b 50 50 | }$..~...../..+PP V (2666513) simcom: rx: 50 44 3a 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 | PD: DISCONNECTED V (2666513) simcom: rx: 0d 0a d4 f9 f9 11 ff 2f 0d 0a 2b 50 50 50 44 3a | ......./..+PPPD: V (2666513) simcom: rx: 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 0d 0a d9 | DISCONNECTED... V (2666513) simcom: rx: f9 | . V (2666703) mongoose: mg_close_conn 44002356 0x3ffe8230 2048 8201 V (2666703) mongoose: mg_socket_if_destroy 1754 nc=0x3ffe8230 sock=8201 flags=800 V (2666703) mongoose: mg_call 3806 0x3ffe8230 user ev=5 ev_data=0x0 flags=2048 rmbl=0 smbl=0 V (2666703) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) V (2666713) mongoose: mg_call 4587 0x3ffe8230 after user flags=2048 rmbl=0 smbl=0 V (2666723) mongoose: mg_mgr_free 12130 0x3ffb71f8 OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2667003) simcom: rx: f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 | ......+CSQ: 0,99 V (2667003) simcom: rx: 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: V (2667013) simcom: rx: 30 2c 39 39 0d 0a 1e f9 | 0,99.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669253) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 35 2c 39 | ......+CSQ: 15,9 V (2669263) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2669263) simcom: rx: 20 31 35 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 15,99.......... V (2669263) simcom: rx: 2b 43 52 45 47 3a 20 32 0d 0a 14 f9 f9 11 ff 19 | +CREG: 2........ V (2669263) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 32 0d 0a 19 f9 | ..+CREG: 2.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669503) simcom: rx: f9 09 ff 25 7e ff 7d 23 c0 21 7d 25 7d 33 7d 20 | ...%~.}#.!}%}3} V (2669503) simcom: rx: 7d 24 7d 30 f7 7e fb f9 | }$}0.~.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2670013) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 32 2c 39 | ......+CSQ: 12,9 V (2670013) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2670013) simcom: rx: 20 31 32 2c 39 39 0d 0a fd f9 | 12,99.... The gsm-mux seems no longer able to decode the incoming frames. It seems to recover for a while, but then the mux fails completely around 2705413: V (2695413) simcom: rx: f9 0d ff 9f 0d 0a 2b 43 52 45 47 3a 20 31 2c 32 | ......+CREG: 1,2 V (2695423) simcom: rx: 0d 0a 0d 0a 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 | ....+CCLK: "18/0 V (2695423) simcom: rx: 32 2f 30 39 2c 31 37 3a 34 39 3a 34 39 2b 35 32 | 2/09,17:49:49+52 V (2695423) simcom: rx: 22 0d 0a 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 0d | "....+CSQ: 0,99. V (2695423) simcom: rx: 0a 0d 0a 2b 43 4f 50 53 3a 20 30 0d 0a 0d 0a 4f | ...+COPS: 0....O V (2695423) simcom: rx: 4b 0d 0a 10 f9 | K.... V (2695423) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=10, LEN=85) V (2695423) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=82, IFP=3) D (2695423) simcom: rx line ch=3 len=10 : +CREG: 1,2 D (2695423) simcom: rx line ch=3 len=29 : +CCLK: "18/02/09,17:49:49+52" D (2695423) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (2695423) simcom: rx line ch=3 len=8 : +COPS: 0 D (2695423) simcom: rx line ch=3 len=2 : OK OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2703403) simcom: rx: 0d 0a 53 54 41 52 54 0d 0a | ..START.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2705393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2705393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2705393) simcom: tx: 0a cf f9 | ... V (2705403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2705403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2705413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 32 0d 0a 0d 0a | ..+CREG: 0,2.... V (2705413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2705413) simcom: rx: 2c 31 37 3a 34 39 3a 35 39 2b 35 32 22 0d 0a 0d | ,17:49:59+52"... V (2705413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 45 | .+CSQ: 0,99....E V (2705413) simcom: rx: 52 52 4f 52 0d 0a | RROR.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2706283) simcom: rx: 0d 0a 2b 43 50 49 4e 3a 20 52 45 41 44 59 0d 0a | ..+CPIN: READY.. V (2706283) simcom: rx: 0d 0a 4f 50 4c 20 55 50 44 41 54 49 4e 47 0d 0a | ..OPL UPDATING.. V (2706293) simcom: rx: 0d 0a 50 4e 4e 20 55 50 44 41 54 49 4e 47 0d 0a | ..PNN UPDATING.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2707863) simcom: rx: 0d 0a 53 4d 53 20 44 4f 4e 45 0d 0a | ..SMS DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2711073) simcom: rx: 0d 0a 43 41 4c 4c 20 52 45 41 44 59 0d 0a 0d 0a | ..CALL READY.... V (2711073) simcom: rx: 50 42 20 44 4f 4e 45 0d 0a | PB DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2715393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2715393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2715393) simcom: tx: 0a cf f9 | ... V (2715403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2715403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2715413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2715423) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2715423) simcom: rx: 2c 31 37 3a 35 30 3a 31 30 2b 35 32 22 0d 0a 0d | ,17:50:10+52"... V (2715423) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2715423) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2715423) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2725393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2725393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2725393) simcom: tx: 0a cf f9 | ... V (2725413) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2725413) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2725463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2725473) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2725473) simcom: rx: 2c 31 37 3a 35 30 3a 32 30 2b 35 32 22 0d 0a 0d | ,17:50:20+52"... V (2725473) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2725473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2725473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. That looks like a reboot of the modem (around the time ERROR is received from the modem). From that point on, we think we are in mux mode, and the modem doesn’t. Not surprised that nothing works. A modem power on/off would probably fix it, but nothing in the current software detects that situation. I think what I’ll do is add a timeout in mux mode. Or a ping style frame (I’ll have to see what the spec allows). Then we can detect that mux has failed, and go through a reset cycle if that happens. Regards, Mark.
On 12 Feb 2018, at 6:35 PM, Tom Parker <tom@carrott.org> wrote:
On 06/02/18 12:42, Tom Parker wrote:
It feels like the simcom state machine has a bug where it thinks that it is connected but actually it isn't, so it never reconnects. This bug appears to persist over simcom power cycles. I've now got verbose logging turned on and I've removed modemmanager on my data logging laptop so hopefully I'll have some more information about the cause.
Attached is another log file. Here the server v2 code and the simcom noticed that it was disconnected which I haven't seen before, but the simcom never reconnected.
See the attached file for the earlier and later messages, the AT+CREG?... loop quoted below went on for about 15 minutes without reconnecting, and doing a power simcom off followed by on did not make it reconnect. Pressing the button made it connect straight away. I believe I was stationary when power cycling and then resetting the whole module.
Why is "simcom: rx line" not always sometimes printed? If you look further back and forward in the log you'll see it decoding most of the AT command responses. Is this the simcom response parser? It looks like the AT+CREG?... are being sent but sometimes the results aren't be interpreted, so the simcom code doesn't know it is connected?
+CSQ: 1,99 suggests there is very low signal? That might explain the disconnections. My antenna is poor (a wifi rubber ducky with an RP-SMA adapter, I could switch to a proper antenna too but then my problem might go away and we wouldn't be able to collect debug information). After the reboot the signal quality was only +CSQ: 2,99 and it reconnected fine.
OVMS > V (4235393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4235393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4235393) simcom: tx: 0a cf f9 | ... V (4235403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4235403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4235413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4235413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4235413) simcom: rx: 2c 31 38 3a 31 35 3a 33 30 2b 35 32 22 0d 0a 0d | ,18:15:30+52"... V (4235413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 0,99....+ V (4235413) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4235413) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom statusV (4245393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4245393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4245393) simcom: tx: 0a cf f9 | ... V (4245423) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4245423) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4245463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4245463) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4245463) simcom: rx: 2c 31 38 3a 31 35 3a 34 30 2b 35 32 22 0d 0a 0d | ,18:15:40+52"... V (4245473) simcom: rx: 0a 2b 43 53 51 3a 20 31 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 1,99....+ V (4245473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4245473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom status SIMCOM Network Registration: Searching State: NetWait Ticker: 1570 User Data: 0 Mux Open Channels: 4 PPP Not Connected PPP Last Error: User Interrupt GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected OVMS > server v2 status Error: Disconnected from OVMS Server V2
<ovms_2018-02-09T03_56_08+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Tom, I haven’t forgotten about this. Grateful for all the testing you are doing. Gory details: http://www.qtc.jp/3GPP/GSM/SMG_24/tdocs/P-97-1031.pdf <http://www.qtc.jp/3GPP/GSM/SMG_24/tdocs/P-97-1031.pdf> I found that we can pickup GSM MUX framing errors, as this data coming back from the modem (when in mux mode) is not framed correctly so will be treated as inter-frame errors. We can count those in the mux and if they exceed a limit then shutdown the mux. There doesn’t seem to be a ping frame, but we can set a timer after the last frame was received, and if we don’t get another within a reasonable time, then shutdown the mux. Then, we add code in the higher layer to check the mux is up when it should be. If not, we’ll go through a power cycle of the simcom. I will make these changes. Not too hard to do, but I need to do at least minimal testing. Mired in pre-production hell at the moment, but I’ll try to get it done this week. Regards, Mark.
On 14 Feb 2018, at 1:57 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
It seems that around 4045393 in the logs the simcom driver thinks it is in mux mode, but the modem is not. The modem is just transmitting raw AT commands, not framed in gsm-mux.
The gsm-mux framing starts with a f9 and ends with a checksum then f9 again. Like this:
V (119553) gsm-ppp: tx: 7e 21 45 00 00 7a 00 34 00 00 ff 06 08 58 64 66 | ~!E..z.4.....Xdf V (119553) gsm-ppp: tx: 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 2f 84 b9 bc | F...K..<..../... V (119553) gsm-ppp: tx: 8d 1c 50 18 16 2a 31 ce 00 00 31 78 4b 44 45 79 | ..P..*1...1xKDEy V (119553) gsm-ppp: tx: 35 5a 35 6e 4b 35 66 35 2b 62 70 56 6a 52 2b 34 | 5Z5nK5f5+bpVjR+4 V (119553) gsm-ppp: tx: 59 49 44 62 44 68 6c 63 66 72 47 54 50 44 45 56 | YIDbDhlcfrGTPDEV V (119553) gsm-ppp: tx: 2b 65 42 59 78 4d 49 41 31 4c 42 77 4d 52 45 7a | +eBYxMIA1LBwMREz V (119553) gsm-ppp: tx: 79 36 44 74 76 2b 58 56 41 36 61 51 58 51 36 46 | y6Dtv+XVA6aQXQ6F V (119553) gsm-ppp: tx: 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a bd aa 7e | sIBNLYHA==....~ V (119553) simcom: tx: f9 09 ff ff 7e 21 45 00 00 7a 00 34 00 00 ff 06 | ....~!E..z.4.... V (119553) simcom: tx: 08 58 64 66 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 | .XdfF...K..<.... V (119553) simcom: tx: 2f 84 b9 bc 8d 1c 50 18 16 2a 31 ce 00 00 31 78 | /.....P..*1...1x V (119553) simcom: tx: 4b 44 45 79 35 5a 35 6e 4b 35 66 35 2b 62 70 56 | KDEy5Z5nK5f5+bpV V (119553) simcom: tx: 6a 52 2b 34 59 49 44 62 44 68 6c 63 66 72 47 54 | jR+4YIDbDhlcfrGT V (119553) simcom: tx: 50 44 45 56 2b 65 42 59 78 4d 49 41 31 4c 42 77 | PDEV+eBYxMIA1LBw V (119553) simcom: tx: 4d 52 45 7a 79 36 44 74 76 2b 58 56 41 36 61 51 | MREzy6Dtv+XVA6aQ V (119553) simcom: tx: 58 51 36 46 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a | XQ6FsIBNLYHA==.. V (119553) simcom: tx: bd aa 7e 9a f9 | ..~..
There was a network disconnection around 552393, recovered well. Another at 2080493, also recovered well.
But then around 2666343 we lose gms mux framing.
V (2666333) simcom: rx: 14 f9 f9 11 ff 19 0d 0a 2b 43 52 45 47 3a 20 32 | ........+CREG: 2 V (2666333) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=14, LEN=18) V (2666333) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=15, IFP=3) D (2666333) simcom: rx line ch=3 len=8 : +CREG: 2 I (2666343) simcom: CREG Network Registration: Searching V (2666343) simcom: rx: 0d 0a 19 f9 | .... V (2666343) gsm-mux: ProcessFrame(CHAN=4, ADDR=11, CTRL=ff, FCS=19, LEN=18) V (2666343) gsm-mux: ChanProcessFrame(CHAN=4, ADDR=11, CTRL=ff, LEN=15, IFP=3) D (2666343) simcom: rx line ch=4 len=8 : +CREG: 2 I (2666393) simcom: Lost network connection (NetworkRegistration in NetMode) I (2666393) simcom: State: Enter NetLoss state V (2666393) simcom: tx: f9 0d ff 19 41 54 2b 43 47 41 54 54 3d 30 0d 0a | ....AT+CGATT=0.. V (2666393) simcom: tx: 14 f9 | .. I (2666393) gsm-ppp: Shutting down (hard)... I (2666393) ovms-server-v2: Network is reconfigured, so disconnect network connection E (2666393) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 I (2666403) gsm-ppp: StatusCallBack: User Interrupt I (2666403) gsm-ppp: PPP connection has been closed V (2666493) simcom: rx: f9 0d ff 0d 0d 0a 4f 4b 0d 0a 0f f9 | ......OK.... V (2666503) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 33 31 2c 39 | ......+CSQ: 31,9 V (2666503) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2666503) simcom: rx: 20 33 31 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 31,99.......... V (2666503) simcom: rx: 2b 43 52 45 47 3a 20 31 0d 0a 14 f9 f9 11 ff 19 | +CREG: 1........ V (2666503) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 31 0d 0a 19 f9 | ..+CREG: 1.... V (2666513) simcom: rx: f9 09 ff 23 7e ff 7d 23 c0 21 7d 25 7d 32 7d 20 | ...#~.}#.!}%}2} V (2666513) simcom: rx: 7d 24 cc ad 7e 1f f9 f9 0d ff 2f 0d 0a 2b 50 50 | }$..~...../..+PP V (2666513) simcom: rx: 50 44 3a 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 | PD: DISCONNECTED V (2666513) simcom: rx: 0d 0a d4 f9 f9 11 ff 2f 0d 0a 2b 50 50 50 44 3a | ......./..+PPPD: V (2666513) simcom: rx: 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 0d 0a d9 | DISCONNECTED... V (2666513) simcom: rx: f9 | . V (2666703) mongoose: mg_close_conn 44002356 0x3ffe8230 2048 8201 V (2666703) mongoose: mg_socket_if_destroy 1754 nc=0x3ffe8230 sock=8201 flags=800 V (2666703) mongoose: mg_call 3806 0x3ffe8230 user ev=5 ev_data=0x0 flags=2048 rmbl=0 smbl=0 V (2666703) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) V (2666713) mongoose: mg_call 4587 0x3ffe8230 after user flags=2048 rmbl=0 smbl=0 V (2666723) mongoose: mg_mgr_free 12130 0x3ffb71f8 OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2667003) simcom: rx: f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 | ......+CSQ: 0,99 V (2667003) simcom: rx: 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: V (2667013) simcom: rx: 30 2c 39 39 0d 0a 1e f9 | 0,99.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669253) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 35 2c 39 | ......+CSQ: 15,9 V (2669263) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2669263) simcom: rx: 20 31 35 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 15,99.......... V (2669263) simcom: rx: 2b 43 52 45 47 3a 20 32 0d 0a 14 f9 f9 11 ff 19 | +CREG: 2........ V (2669263) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 32 0d 0a 19 f9 | ..+CREG: 2.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669503) simcom: rx: f9 09 ff 25 7e ff 7d 23 c0 21 7d 25 7d 33 7d 20 | ...%~.}#.!}%}3} V (2669503) simcom: rx: 7d 24 7d 30 f7 7e fb f9 | }$}0.~.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2670013) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 32 2c 39 | ......+CSQ: 12,9 V (2670013) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2670013) simcom: rx: 20 31 32 2c 39 39 0d 0a fd f9 | 12,99....
The gsm-mux seems no longer able to decode the incoming frames. It seems to recover for a while, but then the mux fails completely around 2705413:
V (2695413) simcom: rx: f9 0d ff 9f 0d 0a 2b 43 52 45 47 3a 20 31 2c 32 | ......+CREG: 1,2 V (2695423) simcom: rx: 0d 0a 0d 0a 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 | ....+CCLK: "18/0 V (2695423) simcom: rx: 32 2f 30 39 2c 31 37 3a 34 39 3a 34 39 2b 35 32 | 2/09,17:49:49+52 V (2695423) simcom: rx: 22 0d 0a 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 0d | "....+CSQ: 0,99. V (2695423) simcom: rx: 0a 0d 0a 2b 43 4f 50 53 3a 20 30 0d 0a 0d 0a 4f | ...+COPS: 0....O V (2695423) simcom: rx: 4b 0d 0a 10 f9 | K.... V (2695423) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=10, LEN=85) V (2695423) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=82, IFP=3) D (2695423) simcom: rx line ch=3 len=10 : +CREG: 1,2 D (2695423) simcom: rx line ch=3 len=29 : +CCLK: "18/02/09,17:49:49+52" D (2695423) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (2695423) simcom: rx line ch=3 len=8 : +COPS: 0 D (2695423) simcom: rx line ch=3 len=2 : OK OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2703403) simcom: rx: 0d 0a 53 54 41 52 54 0d 0a | ..START.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2705393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2705393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2705393) simcom: tx: 0a cf f9 | ... V (2705403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2705403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2705413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 32 0d 0a 0d 0a | ..+CREG: 0,2.... V (2705413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2705413) simcom: rx: 2c 31 37 3a 34 39 3a 35 39 2b 35 32 22 0d 0a 0d | ,17:49:59+52"... V (2705413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 45 | .+CSQ: 0,99....E V (2705413) simcom: rx: 52 52 4f 52 0d 0a | RROR.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2706283) simcom: rx: 0d 0a 2b 43 50 49 4e 3a 20 52 45 41 44 59 0d 0a | ..+CPIN: READY.. V (2706283) simcom: rx: 0d 0a 4f 50 4c 20 55 50 44 41 54 49 4e 47 0d 0a | ..OPL UPDATING.. V (2706293) simcom: rx: 0d 0a 50 4e 4e 20 55 50 44 41 54 49 4e 47 0d 0a | ..PNN UPDATING.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2707863) simcom: rx: 0d 0a 53 4d 53 20 44 4f 4e 45 0d 0a | ..SMS DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2711073) simcom: rx: 0d 0a 43 41 4c 4c 20 52 45 41 44 59 0d 0a 0d 0a | ..CALL READY.... V (2711073) simcom: rx: 50 42 20 44 4f 4e 45 0d 0a | PB DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2715393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2715393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2715393) simcom: tx: 0a cf f9 | ... V (2715403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2715403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2715413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2715423) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2715423) simcom: rx: 2c 31 37 3a 35 30 3a 31 30 2b 35 32 22 0d 0a 0d | ,17:50:10+52"... V (2715423) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2715423) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2715423) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2725393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2725393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2725393) simcom: tx: 0a cf f9 | ... V (2725413) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2725413) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2725463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2725473) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2725473) simcom: rx: 2c 31 37 3a 35 30 3a 32 30 2b 35 32 22 0d 0a 0d | ,17:50:20+52"... V (2725473) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2725473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2725473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK..
That looks like a reboot of the modem (around the time ERROR is received from the modem).
From that point on, we think we are in mux mode, and the modem doesn’t. Not surprised that nothing works. A modem power on/off would probably fix it, but nothing in the current software detects that situation.
I think what I’ll do is add a timeout in mux mode. Or a ping style frame (I’ll have to see what the spec allows). Then we can detect that mux has failed, and go through a reset cycle if that happens.
Regards, Mark.
On 12 Feb 2018, at 6:35 PM, Tom Parker <tom@carrott.org <mailto:tom@carrott.org>> wrote:
On 06/02/18 12:42, Tom Parker wrote:
It feels like the simcom state machine has a bug where it thinks that it is connected but actually it isn't, so it never reconnects. This bug appears to persist over simcom power cycles. I've now got verbose logging turned on and I've removed modemmanager on my data logging laptop so hopefully I'll have some more information about the cause.
Attached is another log file. Here the server v2 code and the simcom noticed that it was disconnected which I haven't seen before, but the simcom never reconnected.
See the attached file for the earlier and later messages, the AT+CREG?... loop quoted below went on for about 15 minutes without reconnecting, and doing a power simcom off followed by on did not make it reconnect. Pressing the button made it connect straight away. I believe I was stationary when power cycling and then resetting the whole module.
Why is "simcom: rx line" not always sometimes printed? If you look further back and forward in the log you'll see it decoding most of the AT command responses. Is this the simcom response parser? It looks like the AT+CREG?... are being sent but sometimes the results aren't be interpreted, so the simcom code doesn't know it is connected?
+CSQ: 1,99 suggests there is very low signal? That might explain the disconnections. My antenna is poor (a wifi rubber ducky with an RP-SMA adapter, I could switch to a proper antenna too but then my problem might go away and we wouldn't be able to collect debug information). After the reboot the signal quality was only +CSQ: 2,99 and it reconnected fine.
OVMS > V (4235393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4235393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4235393) simcom: tx: 0a cf f9 | ... V (4235403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4235403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4235413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4235413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4235413) simcom: rx: 2c 31 38 3a 31 35 3a 33 30 2b 35 32 22 0d 0a 0d | ,18:15:30+52"... V (4235413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 0,99....+ V (4235413) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4235413) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom statusV (4245393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4245393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4245393) simcom: tx: 0a cf f9 | ... V (4245423) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4245423) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4245463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4245463) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4245463) simcom: rx: 2c 31 38 3a 31 35 3a 34 30 2b 35 32 22 0d 0a 0d | ,18:15:40+52"... V (4245473) simcom: rx: 0a 2b 43 53 51 3a 20 31 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 1,99....+ V (4245473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4245473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom status SIMCOM Network Registration: Searching State: NetWait Ticker: 1570 User Data: 0 Mux Open Channels: 4 PPP Not Connected PPP Last Error: User Interrupt GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected OVMS > server v2 status Error: Disconnected from OVMS Server V2
<ovms_2018-02-09T03_56_08+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Fix for this has been committed. Version will show as v3.0.990. Most of the work was in setting up the counters in gsmmux. The simcom implementation was pretty simple - logic I have is that if we haven’t received a good mux frame in 3 minutes then we assume the mux is down and power cycle everything. I’ve also updated the ‘simcom status’ command to show some more information. I’ve tested it as far as I can on the bench. It needs to go in a car now to see if things are more stable. Regards, Mark
On 20 Feb 2018, at 4:08 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Tom,
I haven’t forgotten about this. Grateful for all the testing you are doing.
Gory details: http://www.qtc.jp/3GPP/GSM/SMG_24/tdocs/P-97-1031.pdf <http://www.qtc.jp/3GPP/GSM/SMG_24/tdocs/P-97-1031.pdf>
I found that we can pickup GSM MUX framing errors, as this data coming back from the modem (when in mux mode) is not framed correctly so will be treated as inter-frame errors. We can count those in the mux and if they exceed a limit then shutdown the mux. There doesn’t seem to be a ping frame, but we can set a timer after the last frame was received, and if we don’t get another within a reasonable time, then shutdown the mux.
Then, we add code in the higher layer to check the mux is up when it should be. If not, we’ll go through a power cycle of the simcom.
I will make these changes. Not too hard to do, but I need to do at least minimal testing. Mired in pre-production hell at the moment, but I’ll try to get it done this week.
Regards, Mark.
On 14 Feb 2018, at 1:57 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
It seems that around 4045393 in the logs the simcom driver thinks it is in mux mode, but the modem is not. The modem is just transmitting raw AT commands, not framed in gsm-mux.
The gsm-mux framing starts with a f9 and ends with a checksum then f9 again. Like this:
V (119553) gsm-ppp: tx: 7e 21 45 00 00 7a 00 34 00 00 ff 06 08 58 64 66 | ~!E..z.4.....Xdf V (119553) gsm-ppp: tx: 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 2f 84 b9 bc | F...K..<..../... V (119553) gsm-ppp: tx: 8d 1c 50 18 16 2a 31 ce 00 00 31 78 4b 44 45 79 | ..P..*1...1xKDEy V (119553) gsm-ppp: tx: 35 5a 35 6e 4b 35 66 35 2b 62 70 56 6a 52 2b 34 | 5Z5nK5f5+bpVjR+4 V (119553) gsm-ppp: tx: 59 49 44 62 44 68 6c 63 66 72 47 54 50 44 45 56 | YIDbDhlcfrGTPDEV V (119553) gsm-ppp: tx: 2b 65 42 59 78 4d 49 41 31 4c 42 77 4d 52 45 7a | +eBYxMIA1LBwMREz V (119553) gsm-ppp: tx: 79 36 44 74 76 2b 58 56 41 36 61 51 58 51 36 46 | y6Dtv+XVA6aQXQ6F V (119553) gsm-ppp: tx: 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a bd aa 7e | sIBNLYHA==....~ V (119553) simcom: tx: f9 09 ff ff 7e 21 45 00 00 7a 00 34 00 00 ff 06 | ....~!E..z.4.... V (119553) simcom: tx: 08 58 64 66 46 1c bc 8a 4b e5 fa 3c 1a d3 00 00 | .XdfF...K..<.... V (119553) simcom: tx: 2f 84 b9 bc 8d 1c 50 18 16 2a 31 ce 00 00 31 78 | /.....P..*1...1x V (119553) simcom: tx: 4b 44 45 79 35 5a 35 6e 4b 35 66 35 2b 62 70 56 | KDEy5Z5nK5f5+bpV V (119553) simcom: tx: 6a 52 2b 34 59 49 44 62 44 68 6c 63 66 72 47 54 | jR+4YIDbDhlcfrGT V (119553) simcom: tx: 50 44 45 56 2b 65 42 59 78 4d 49 41 31 4c 42 77 | PDEV+eBYxMIA1LBw V (119553) simcom: tx: 4d 52 45 7a 79 36 44 74 76 2b 58 56 41 36 61 51 | MREzy6Dtv+XVA6aQ V (119553) simcom: tx: 58 51 36 46 73 49 42 4e 4c 59 48 41 3d 3d 0d 0a | XQ6FsIBNLYHA==.. V (119553) simcom: tx: bd aa 7e 9a f9 | ..~..
There was a network disconnection around 552393, recovered well. Another at 2080493, also recovered well.
But then around 2666343 we lose gms mux framing.
V (2666333) simcom: rx: 14 f9 f9 11 ff 19 0d 0a 2b 43 52 45 47 3a 20 32 | ........+CREG: 2 V (2666333) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=14, LEN=18) V (2666333) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=15, IFP=3) D (2666333) simcom: rx line ch=3 len=8 : +CREG: 2 I (2666343) simcom: CREG Network Registration: Searching V (2666343) simcom: rx: 0d 0a 19 f9 | .... V (2666343) gsm-mux: ProcessFrame(CHAN=4, ADDR=11, CTRL=ff, FCS=19, LEN=18) V (2666343) gsm-mux: ChanProcessFrame(CHAN=4, ADDR=11, CTRL=ff, LEN=15, IFP=3) D (2666343) simcom: rx line ch=4 len=8 : +CREG: 2 I (2666393) simcom: Lost network connection (NetworkRegistration in NetMode) I (2666393) simcom: State: Enter NetLoss state V (2666393) simcom: tx: f9 0d ff 19 41 54 2b 43 47 41 54 54 3d 30 0d 0a | ....AT+CGATT=0.. V (2666393) simcom: tx: 14 f9 | .. I (2666393) gsm-ppp: Shutting down (hard)... I (2666393) ovms-server-v2: Network is reconfigured, so disconnect network connection E (2666393) ovms-server-v2: Status: Error: Disconnected from OVMS Server V2 I (2666403) gsm-ppp: StatusCallBack: User Interrupt I (2666403) gsm-ppp: PPP connection has been closed V (2666493) simcom: rx: f9 0d ff 0d 0d 0a 4f 4b 0d 0a 0f f9 | ......OK.... V (2666503) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 33 31 2c 39 | ......+CSQ: 31,9 V (2666503) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2666503) simcom: rx: 20 33 31 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 31,99.......... V (2666503) simcom: rx: 2b 43 52 45 47 3a 20 31 0d 0a 14 f9 f9 11 ff 19 | +CREG: 1........ V (2666503) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 31 0d 0a 19 f9 | ..+CREG: 1.... V (2666513) simcom: rx: f9 09 ff 23 7e ff 7d 23 c0 21 7d 25 7d 32 7d 20 | ...#~.}#.!}%}2} V (2666513) simcom: rx: 7d 24 cc ad 7e 1f f9 f9 0d ff 2f 0d 0a 2b 50 50 | }$..~...../..+PP V (2666513) simcom: rx: 50 44 3a 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 | PD: DISCONNECTED V (2666513) simcom: rx: 0d 0a d4 f9 f9 11 ff 2f 0d 0a 2b 50 50 50 44 3a | ......./..+PPPD: V (2666513) simcom: rx: 20 44 49 53 43 4f 4e 4e 45 43 54 45 44 0d 0a d9 | DISCONNECTED... V (2666513) simcom: rx: f9 | . V (2666703) mongoose: mg_close_conn 44002356 0x3ffe8230 2048 8201 V (2666703) mongoose: mg_socket_if_destroy 1754 nc=0x3ffe8230 sock=8201 flags=800 V (2666703) mongoose: mg_call 3806 0x3ffe8230 user ev=5 ev_data=0x0 flags=2048 rmbl=0 smbl=0 V (2666703) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) V (2666713) mongoose: mg_call 4587 0x3ffe8230 after user flags=2048 rmbl=0 smbl=0 V (2666723) mongoose: mg_mgr_free 12130 0x3ffb71f8 OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2667003) simcom: rx: f9 0d ff 1d 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 | ......+CSQ: 0,99 V (2667003) simcom: rx: 0d 0a 13 f9 f9 11 ff 1d 0d 0a 2b 43 53 51 3a 20 | ..........+CSQ: V (2667013) simcom: rx: 30 2c 39 39 0d 0a 1e f9 | 0,99.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669253) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 35 2c 39 | ......+CSQ: 15,9 V (2669263) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2669263) simcom: rx: 20 31 35 2c 39 39 0d 0a fd f9 f9 0d ff 19 0d 0a | 15,99.......... V (2669263) simcom: rx: 2b 43 52 45 47 3a 20 32 0d 0a 14 f9 f9 11 ff 19 | +CREG: 2........ V (2669263) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 32 0d 0a 19 f9 | ..+CREG: 2.... OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2669503) simcom: rx: f9 09 ff 25 7e ff 7d 23 c0 21 7d 25 7d 33 7d 20 | ...%~.}#.!}%}3} V (2669503) simcom: rx: 7d 24 7d 30 f7 7e fb f9 | }$}0.~.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2670013) simcom: rx: f9 0d ff 1f 0d 0a 2b 43 53 51 3a 20 31 32 2c 39 | ......+CSQ: 12,9 V (2670013) simcom: rx: 39 0d 0a f0 f9 f9 11 ff 1f 0d 0a 2b 43 53 51 3a | 9..........+CSQ: V (2670013) simcom: rx: 20 31 32 2c 39 39 0d 0a fd f9 | 12,99....
The gsm-mux seems no longer able to decode the incoming frames. It seems to recover for a while, but then the mux fails completely around 2705413:
V (2695413) simcom: rx: f9 0d ff 9f 0d 0a 2b 43 52 45 47 3a 20 31 2c 32 | ......+CREG: 1,2 V (2695423) simcom: rx: 0d 0a 0d 0a 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 | ....+CCLK: "18/0 V (2695423) simcom: rx: 32 2f 30 39 2c 31 37 3a 34 39 3a 34 39 2b 35 32 | 2/09,17:49:49+52 V (2695423) simcom: rx: 22 0d 0a 0d 0a 2b 43 53 51 3a 20 30 2c 39 39 0d | "....+CSQ: 0,99. V (2695423) simcom: rx: 0a 0d 0a 2b 43 4f 50 53 3a 20 30 0d 0a 0d 0a 4f | ...+COPS: 0....O V (2695423) simcom: rx: 4b 0d 0a 10 f9 | K.... V (2695423) gsm-mux: ProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, FCS=10, LEN=85) V (2695423) gsm-mux: ChanProcessFrame(CHAN=3, ADDR=0d, CTRL=ff, LEN=82, IFP=3) D (2695423) simcom: rx line ch=3 len=10 : +CREG: 1,2 D (2695423) simcom: rx line ch=3 len=29 : +CCLK: "18/02/09,17:49:49+52" D (2695423) simcom: rx line ch=3 len=10 : +CSQ: 0,99 D (2695423) simcom: rx line ch=3 len=8 : +COPS: 0 D (2695423) simcom: rx line ch=3 len=2 : OK OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2703403) simcom: rx: 0d 0a 53 54 41 52 54 0d 0a | ..START.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2705393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2705393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2705393) simcom: tx: 0a cf f9 | ... V (2705403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2705403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2705413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 32 0d 0a 0d 0a | ..+CREG: 0,2.... V (2705413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2705413) simcom: rx: 2c 31 37 3a 34 39 3a 35 39 2b 35 32 22 0d 0a 0d | ,17:49:59+52"... V (2705413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 45 | .+CSQ: 0,99....E V (2705413) simcom: rx: 52 52 4f 52 0d 0a | RROR.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2706283) simcom: rx: 0d 0a 2b 43 50 49 4e 3a 20 52 45 41 44 59 0d 0a | ..+CPIN: READY.. V (2706283) simcom: rx: 0d 0a 4f 50 4c 20 55 50 44 41 54 49 4e 47 0d 0a | ..OPL UPDATING.. V (2706293) simcom: rx: 0d 0a 50 4e 4e 20 55 50 44 41 54 49 4e 47 0d 0a | ..PNN UPDATING.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2707863) simcom: rx: 0d 0a 53 4d 53 20 44 4f 4e 45 0d 0a | ..SMS DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2711073) simcom: rx: 0d 0a 43 41 4c 4c 20 52 45 41 44 59 0d 0a 0d 0a | ..CALL READY.... V (2711073) simcom: rx: 50 42 20 44 4f 4e 45 0d 0a | PB DONE.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2715393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2715393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2715393) simcom: tx: 0a cf f9 | ... V (2715403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2715403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2715413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2715423) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2715423) simcom: rx: 2c 31 37 3a 35 30 3a 31 30 2b 35 32 22 0d 0a 0d | ,17:50:10+52"... V (2715423) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2715423) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2715423) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > gytdercvghnjm=k',.=uigyryr6 ,kjbuhygb vg bV (2725393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (2725393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (2725393) simcom: tx: 0a cf f9 | ... V (2725413) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (2725413) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (2725463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (2725473) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (2725473) simcom: rx: 2c 31 37 3a 35 30 3a 32 30 2b 35 32 22 0d 0a 0d | ,17:50:20+52"... V (2725473) simcom: rx: 0a 2b 43 53 51 3a 20 35 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 5,99....+ V (2725473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (2725473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK..
That looks like a reboot of the modem (around the time ERROR is received from the modem).
From that point on, we think we are in mux mode, and the modem doesn’t. Not surprised that nothing works. A modem power on/off would probably fix it, but nothing in the current software detects that situation.
I think what I’ll do is add a timeout in mux mode. Or a ping style frame (I’ll have to see what the spec allows). Then we can detect that mux has failed, and go through a reset cycle if that happens.
Regards, Mark.
On 12 Feb 2018, at 6:35 PM, Tom Parker <tom@carrott.org <mailto:tom@carrott.org>> wrote:
On 06/02/18 12:42, Tom Parker wrote:
It feels like the simcom state machine has a bug where it thinks that it is connected but actually it isn't, so it never reconnects. This bug appears to persist over simcom power cycles. I've now got verbose logging turned on and I've removed modemmanager on my data logging laptop so hopefully I'll have some more information about the cause.
Attached is another log file. Here the server v2 code and the simcom noticed that it was disconnected which I haven't seen before, but the simcom never reconnected.
See the attached file for the earlier and later messages, the AT+CREG?... loop quoted below went on for about 15 minutes without reconnecting, and doing a power simcom off followed by on did not make it reconnect. Pressing the button made it connect straight away. I believe I was stationary when power cycling and then resetting the whole module.
Why is "simcom: rx line" not always sometimes printed? If you look further back and forward in the log you'll see it decoding most of the AT command responses. Is this the simcom response parser? It looks like the AT+CREG?... are being sent but sometimes the results aren't be interpreted, so the simcom code doesn't know it is connected?
+CSQ: 1,99 suggests there is very low signal? That might explain the disconnections. My antenna is poor (a wifi rubber ducky with an RP-SMA adapter, I could switch to a proper antenna too but then my problem might go away and we wouldn't be able to collect debug information). After the reboot the signal quality was only +CSQ: 2,99 and it reconnected fine.
OVMS > V (4235393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4235393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4235393) simcom: tx: 0a cf f9 | ... V (4235403) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4235403) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4235413) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4235413) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4235413) simcom: rx: 2c 31 38 3a 31 35 3a 33 30 2b 35 32 22 0d 0a 0d | ,18:15:30+52"... V (4235413) simcom: rx: 0a 2b 43 53 51 3a 20 30 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 0,99....+ V (4235413) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4235413) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom statusV (4245393) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (4245393) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (4245393) simcom: tx: 0a cf f9 | ... V (4245423) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (4245423) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (4245463) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (4245463) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 32 2f 30 39 | +CCLK: "18/02/09 V (4245463) simcom: rx: 2c 31 38 3a 31 35 3a 34 30 2b 35 32 22 0d 0a 0d | ,18:15:40+52"... V (4245473) simcom: rx: 0a 2b 43 53 51 3a 20 31 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 1,99....+ V (4245473) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (4245473) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > simcom status SIMCOM Network Registration: Searching State: NetWait Ticker: 1570 User Data: 0 Mux Open Channels: 4 PPP Not Connected PPP Last Error: User Interrupt GPS: disabled GPS time: disabled NMEA (GPS/GLONASS) Not Connected OVMS > server v2 status Error: Disconnected from OVMS Server V2
<ovms_2018-02-09T03_56_08+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev <http://lists.teslaclub.hk/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 21/02/18 14:43, Mark Webb-Johnson wrote:
I’ve tested it as far as I can on the bench. It needs to go in a car now to see if things are more stable.
It didn't fix the permanent disconnects but it does seem to recover from problems in the mux layer. Since my signal strength is reported as being so low, I've installed a proper antenna now to eliminate this as a possible cause of not being able to reconnect, but I haven't had a chance to put the data logger on it and see what is going on while driving. It still disconnects about once per day. I'll post again when I have a log of it disconnecting and remaining disconnected. Unfortunately I synced with the latest code this afternoon and it crashes before it's connected, but that is for another post.
On 01/03/18 21:40, Tom Parker wrote:
I'll post again when I have a log of it disconnecting and remaining disconnected. Unfortunately I synced with the latest code this afternoon and it crashes before it's connected, but that is for another post.
While syncing up with the latest code I managed to end up with esp-idf from https://github.com/espressif/esp-idf.git master not from https://github.com/openvehicles/esp-idf.git I'll try again tomorrow with the right remote and see if that fixes it.
I had some time to put the data logger in my car and examine the results. It turns out I spoke too soon when I said that the framing errors had gone away. I've seen this pattern of disconnection several times now with the large number of framing errors and the long time since the last RX frame in the Mux statistics. I'm not really sure what is salient in the log leading up to the disconnection, please see attached for a couple of examples. The ovms_2018-03-04T19_43_10+0000.log is particularly interesting because the gsm-ppp and server-v2 processes are still working, but no data is being sent and the normal gsm-mux chatter is missing. The server logs for this one indicate that I (2085209) ovms-server-v2: Send MP-0 F3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL, 2degrees was the last message received. I stopped the data logger before rebooting the ovms, so there's no record of the reconnection from the ovms module's point of view. Server Logs, note 8 minute gap (I was driving so waited to reboot it) 2018-03-04 21:16:05,61446,'#74 C rx msg F 3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL,2degrees' 2018-03-04 21:17:29,61640,'#15 A rx msg A ' 2018-03-04 21:22:33,62318,'#15 A rx msg A ' 2018-03-04 21:23:37,62442,'#9 C got login' 2018-03-04 21:23:37,62443,'#74 C error - duplicate car login - clearing first connection' 2018-03-04 21:23:50,62467,'#9 C error - Unable to decode message - aborting connection' 2018-03-04 21:24:12,62523,'#66 C got login' 2018-03-04 21:24:40,62597,'#66 C rx msg S 74,K,0,0,stopped,standard,62,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,384.50,0' This is an extract from ovms_2018-03-05T05_40_29+0000.log.bz2 where the disconnection seems like it was detected and we're left with the modem status queries in a loop. When I tried to power off the simcom modem, the ovms locked up and wouldn't respond to the serial console. I rebooted the modem with the button and it immediately reconnected (not shown but it's in the attached log). OVMS > V (35194199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35194199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35194199) simcom: tx: 0a cf f9 | ... V (35194219) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35194219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (35194279) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (35194279) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 33 2f 30 35 | +CCLK: "18/03/05 V (35194279) simcom: rx: 2c 31 39 3a 30 39 3a 30 35 2b 35 32 22 0d 0a 0d | ,19:09:05+52"... V (35194279) simcom: rx: 0a 2b 43 53 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 3,99....+ V (35194279) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (35194279) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > V (35224199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35224199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35224199) simcom: tx: 0a cf f9 | ... V (35224209) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35224219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d 0d 0a 2b 43 | +CSQ;+COPS?...+C V (35224219) simcom: rx: 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a 2b 43 43 4c | REG: 0,1....+CCL V (35224219) simcom: rx: 4b 3a 20 22 31 38 2f 30 33 2f 30 35 2c 31 39 3a | K: "18/03/05,19: V (35224219) simcom: rx: 30 39 3a 33 35 2b 35 32 22 0d 0a 0d 0a 2b 43 53 | 09:35+52"....+CS V (35224219) simcom: rx: 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b 43 4f 50 53 | Q: 3,99....+COPS V (35224219) simcom: rx: 3a 20 30 2c 30 2c 22 32 64 65 67 72 65 65 73 22 | : 0,0,"2degrees" V (35224219) simcom: rx: 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ,2....OK.. OVMS > a simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 1235 User Data: 0 Mux Status: up Open Channels: 4 Framing Errors: 5103 Last RX frame: 1222 sec(s) ago RX frames: 2512 TX frames: 2500 PPP Not Connected Last Error: Connection Lost GPS Status: disabled Time: disabled NMEA: GPS/GLONASS Not Connected OVMS > simcom power simcom off ets Jun 8 2016 00:22:57 rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT) configsip: 156795334, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
Tom, I reviewed, and found a couple of states where we were not handling timeout conditions. Specifically: NetWait: If waiting for GSM signal, and the mux went down, the indication that GSM is back would never come. Blocked in that state until power cycle. NetMode: If mux went down while in established network connection mode, it wouldn’t check. Blocked in that state until power cycle. MuxStart: If the mux doesn’t come up properly (4 channels established), it would block in this state until power cycle. I think your issue was the NetMode one. I’ve personally seen the NetWait one in my car. I’ve fixed those three, and committed that. This has been running 24x7 in my car for the past few days, and I’m fixing these as and when I find them. I’ll try to make some time to look at the state checks with a critical eye and try to ensure we have sensible timeouts on all of them (pro-actively). I’m wondering if the mux checking can be taken out generically to always reset if mux fails (no matter the state). Regards, Mark
On 6 Mar 2018, at 5:51 PM, Tom Parker <tom@carrott.org> wrote:
I had some time to put the data logger in my car and examine the results.
It turns out I spoke too soon when I said that the framing errors had gone away. I've seen this pattern of disconnection several times now with the large number of framing errors and the long time since the last RX frame in the Mux statistics. I'm not really sure what is salient in the log leading up to the disconnection, please see attached for a couple of examples.
The ovms_2018-03-04T19_43_10+0000.log is particularly interesting because the gsm-ppp and server-v2 processes are still working, but no data is being sent and the normal gsm-mux chatter is missing.
The server logs for this one indicate that
I (2085209) ovms-server-v2: Send MP-0 F3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL, 2degrees
was the last message received. I stopped the data logger before rebooting the ovms, so there's no record of the reconnection from the ovms module's point of view.
Server Logs, note 8 minute gap (I was driving so waited to reboot it)
2018-03-04 21:16:05,61446,'#74 C rx msg F 3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL,2degrees' 2018-03-04 21:17:29,61640,'#15 A rx msg A ' 2018-03-04 21:22:33,62318,'#15 A rx msg A ' 2018-03-04 21:23:37,62442,'#9 C got login' 2018-03-04 21:23:37,62443,'#74 C error - duplicate car login - clearing first connection' 2018-03-04 21:23:50,62467,'#9 C error - Unable to decode message - aborting connection' 2018-03-04 21:24:12,62523,'#66 C got login' 2018-03-04 21:24:40,62597,'#66 C rx msg S 74,K,0,0,stopped,standard,62,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,384.50,0'
This is an extract from ovms_2018-03-05T05_40_29+0000.log.bz2 where the disconnection seems like it was detected and we're left with the modem status queries in a loop. When I tried to power off the simcom modem, the ovms locked up and wouldn't respond to the serial console. I rebooted the modem with the button and it immediately reconnected (not shown but it's in the attached log).
OVMS > V (35194199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35194199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35194199) simcom: tx: 0a cf f9 | ... V (35194219) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35194219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (35194279) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (35194279) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 33 2f 30 35 | +CCLK: "18/03/05 V (35194279) simcom: rx: 2c 31 39 3a 30 39 3a 30 35 2b 35 32 22 0d 0a 0d | ,19:09:05+52"... V (35194279) simcom: rx: 0a 2b 43 53 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 3,99....+ V (35194279) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (35194279) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > V (35224199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35224199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35224199) simcom: tx: 0a cf f9 | ... V (35224209) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35224219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d 0d 0a 2b 43 | +CSQ;+COPS?...+C V (35224219) simcom: rx: 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a 2b 43 43 4c | REG: 0,1....+CCL V (35224219) simcom: rx: 4b 3a 20 22 31 38 2f 30 33 2f 30 35 2c 31 39 3a | K: "18/03/05,19: V (35224219) simcom: rx: 30 39 3a 33 35 2b 35 32 22 0d 0a 0d 0a 2b 43 53 | 09:35+52"....+CS V (35224219) simcom: rx: 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b 43 4f 50 53 | Q: 3,99....+COPS V (35224219) simcom: rx: 3a 20 30 2c 30 2c 22 32 64 65 67 72 65 65 73 22 | : 0,0,"2degrees" V (35224219) simcom: rx: 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ,2....OK.. OVMS > a simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 1235 User Data: 0
Mux Status: up Open Channels: 4 Framing Errors: 5103 Last RX frame: 1222 sec(s) ago RX frames: 2512 TX frames: 2500
PPP Not Connected Last Error: Connection Lost
GPS Status: disabled Time: disabled NMEA: GPS/GLONASS Not Connected OVMS > simcom power simcom off ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT) configsip: 156795334, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
<ovms_2018-03-05T05_40_29+0000.log.bz2><ovms_2018-03-04T19_43_10+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I reviewed all the states, and implemented some new timeout checks as appropriate. I also made the mux timeout work in ALL states. This should address these issues that I and Tom have seen. There may be others, but the state ones should be ok now. I’ll try it in my car tomorrow. Regards, Mark
On 13 Mar 2018, at 10:04 AM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Tom,
I reviewed, and found a couple of states where we were not handling timeout conditions. Specifically:
NetWait: If waiting for GSM signal, and the mux went down, the indication that GSM is back would never come. Blocked in that state until power cycle. NetMode: If mux went down while in established network connection mode, it wouldn’t check. Blocked in that state until power cycle. MuxStart: If the mux doesn’t come up properly (4 channels established), it would block in this state until power cycle.
I think your issue was the NetMode one. I’ve personally seen the NetWait one in my car.
I’ve fixed those three, and committed that. This has been running 24x7 in my car for the past few days, and I’m fixing these as and when I find them. I’ll try to make some time to look at the state checks with a critical eye and try to ensure we have sensible timeouts on all of them (pro-actively). I’m wondering if the mux checking can be taken out generically to always reset if mux fails (no matter the state).
Regards, Mark
On 6 Mar 2018, at 5:51 PM, Tom Parker <tom@carrott.org <mailto:tom@carrott.org>> wrote:
I had some time to put the data logger in my car and examine the results.
It turns out I spoke too soon when I said that the framing errors had gone away. I've seen this pattern of disconnection several times now with the large number of framing errors and the long time since the last RX frame in the Mux statistics. I'm not really sure what is salient in the log leading up to the disconnection, please see attached for a couple of examples.
The ovms_2018-03-04T19_43_10+0000.log is particularly interesting because the gsm-ppp and server-v2 processes are still working, but no data is being sent and the normal gsm-mux chatter is missing.
The server logs for this one indicate that
I (2085209) ovms-server-v2: Send MP-0 F3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL, 2degrees
was the last message received. I stopped the data logger before rebooting the ovms, so there's no record of the reconnection from the ovms module's point of view.
Server Logs, note 8 minute gap (I was driving so waited to reboot it)
2018-03-04 21:16:05,61446,'#74 C rx msg F 3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL,2degrees' 2018-03-04 21:17:29,61640,'#15 A rx msg A ' 2018-03-04 21:22:33,62318,'#15 A rx msg A ' 2018-03-04 21:23:37,62442,'#9 C got login' 2018-03-04 21:23:37,62443,'#74 C error - duplicate car login - clearing first connection' 2018-03-04 21:23:50,62467,'#9 C error - Unable to decode message - aborting connection' 2018-03-04 21:24:12,62523,'#66 C got login' 2018-03-04 21:24:40,62597,'#66 C rx msg S 74,K,0,0,stopped,standard,62,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,384.50,0'
This is an extract from ovms_2018-03-05T05_40_29+0000.log.bz2 where the disconnection seems like it was detected and we're left with the modem status queries in a loop. When I tried to power off the simcom modem, the ovms locked up and wouldn't respond to the serial console. I rebooted the modem with the button and it immediately reconnected (not shown but it's in the attached log).
OVMS > V (35194199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35194199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35194199) simcom: tx: 0a cf f9 | ... V (35194219) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35194219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (35194279) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (35194279) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 33 2f 30 35 | +CCLK: "18/03/05 V (35194279) simcom: rx: 2c 31 39 3a 30 39 3a 30 35 2b 35 32 22 0d 0a 0d | ,19:09:05+52"... V (35194279) simcom: rx: 0a 2b 43 53 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 3,99....+ V (35194279) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (35194279) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > V (35224199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35224199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35224199) simcom: tx: 0a cf f9 | ... V (35224209) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35224219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d 0d 0a 2b 43 | +CSQ;+COPS?...+C V (35224219) simcom: rx: 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a 2b 43 43 4c | REG: 0,1....+CCL V (35224219) simcom: rx: 4b 3a 20 22 31 38 2f 30 33 2f 30 35 2c 31 39 3a | K: "18/03/05,19: V (35224219) simcom: rx: 30 39 3a 33 35 2b 35 32 22 0d 0a 0d 0a 2b 43 53 | 09:35+52"....+CS V (35224219) simcom: rx: 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b 43 4f 50 53 | Q: 3,99....+COPS V (35224219) simcom: rx: 3a 20 30 2c 30 2c 22 32 64 65 67 72 65 65 73 22 | : 0,0,"2degrees" V (35224219) simcom: rx: 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ,2....OK.. OVMS > a simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 1235 User Data: 0
Mux Status: up Open Channels: 4 Framing Errors: 5103 Last RX frame: 1222 sec(s) ago RX frames: 2512 TX frames: 2500
PPP Not Connected Last Error: Connection Lost
GPS Status: disabled Time: disabled NMEA: GPS/GLONASS Not Connected OVMS > simcom power simcom off ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT) configsip: 156795334, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
<ovms_2018-03-05T05_40_29+0000.log.bz2><ovms_2018-03-04T19_43_10+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Tom, This (and the subsequent work to do this globally, irrespective of the state) seems to be working well in my car for the past couple of days. How is it working out for you? Are you still getting cases where the SIMCOM loses connectivity and it can’t recover? Regards, Mark.
On 13 Mar 2018, at 10:04 AM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Tom,
I reviewed, and found a couple of states where we were not handling timeout conditions. Specifically:
NetWait: If waiting for GSM signal, and the mux went down, the indication that GSM is back would never come. Blocked in that state until power cycle. NetMode: If mux went down while in established network connection mode, it wouldn’t check. Blocked in that state until power cycle. MuxStart: If the mux doesn’t come up properly (4 channels established), it would block in this state until power cycle.
I think your issue was the NetMode one. I’ve personally seen the NetWait one in my car.
I’ve fixed those three, and committed that. This has been running 24x7 in my car for the past few days, and I’m fixing these as and when I find them. I’ll try to make some time to look at the state checks with a critical eye and try to ensure we have sensible timeouts on all of them (pro-actively). I’m wondering if the mux checking can be taken out generically to always reset if mux fails (no matter the state).
Regards, Mark
On 6 Mar 2018, at 5:51 PM, Tom Parker <tom@carrott.org <mailto:tom@carrott.org>> wrote:
I had some time to put the data logger in my car and examine the results.
It turns out I spoke too soon when I said that the framing errors had gone away. I've seen this pattern of disconnection several times now with the large number of framing errors and the long time since the last RX frame in the Mux statistics. I'm not really sure what is salient in the log leading up to the disconnection, please see attached for a couple of examples.
The ovms_2018-03-04T19_43_10+0000.log is particularly interesting because the gsm-ppp and server-v2 processes are still working, but no data is being sent and the normal gsm-mux chatter is missing.
The server logs for this one indicate that
I (2085209) ovms-server-v2: Send MP-0 F3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL, 2degrees
was the last message received. I stopped the data logger before rebooting the ovms, so there's no record of the reconnection from the ovms module's point of view.
Server Logs, note 8 minute gap (I was driving so waited to reboot it)
2018-03-04 21:16:05,61446,'#74 C rx msg F 3.0.990-30-g613bfd2/factory/main build (idf v3.1-dev-453-g0f978bc) Mar 2 2018 06:35:40,,5,1,NL,2degrees' 2018-03-04 21:17:29,61640,'#15 A rx msg A ' 2018-03-04 21:22:33,62318,'#15 A rx msg A ' 2018-03-04 21:23:37,62442,'#9 C got login' 2018-03-04 21:23:37,62443,'#74 C error - duplicate car login - clearing first connection' 2018-03-04 21:23:50,62467,'#9 C error - Unable to decode message - aborting connection' 2018-03-04 21:24:12,62523,'#66 C got login' 2018-03-04 21:24:40,62597,'#66 C rx msg S 74,K,0,0,stopped,standard,62,0,0,0,0,0,13,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,384.50,0'
This is an extract from ovms_2018-03-05T05_40_29+0000.log.bz2 where the disconnection seems like it was detected and we're left with the modem status queries in a loop. When I tried to power off the simcom modem, the ovms locked up and wouldn't respond to the serial console. I rebooted the modem with the button and it immediately reconnected (not shown but it's in the attached log).
OVMS > V (35194199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35194199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35194199) simcom: tx: 0a cf f9 | ... V (35194219) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35194219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | +CSQ;+COPS?. V (35194279) simcom: rx: 0d 0a 2b 43 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a | ..+CREG: 0,1.... V (35194279) simcom: rx: 2b 43 43 4c 4b 3a 20 22 31 38 2f 30 33 2f 30 35 | +CCLK: "18/03/05 V (35194279) simcom: rx: 2c 31 39 3a 30 39 3a 30 35 2b 35 32 22 0d 0a 0d | ,19:09:05+52"... V (35194279) simcom: rx: 0a 2b 43 53 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b | .+CSQ: 3,99....+ V (35194279) simcom: rx: 43 4f 50 53 3a 20 30 2c 30 2c 22 32 64 65 67 72 | COPS: 0,0,"2degr V (35194279) simcom: rx: 65 65 73 22 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ees",2....OK.. OVMS > V (35224199) simcom: tx: f9 0d ff 3b 41 54 2b 43 52 45 47 3f 3b 2b 43 43 | ...;AT+CREG?;+CC V (35224199) simcom: tx: 4c 4b 3f 3b 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d | LK?;+CSQ;+COPS?. V (35224199) simcom: tx: 0a cf f9 | ... V (35224209) simcom: rx: 41 54 2b 43 52 45 47 3f 3b 2b 43 43 4c 4b 3f 3b | AT+CREG?;+CCLK?; V (35224219) simcom: rx: 2b 43 53 51 3b 2b 43 4f 50 53 3f 0d 0d 0a 2b 43 | +CSQ;+COPS?...+C V (35224219) simcom: rx: 52 45 47 3a 20 30 2c 31 0d 0a 0d 0a 2b 43 43 4c | REG: 0,1....+CCL V (35224219) simcom: rx: 4b 3a 20 22 31 38 2f 30 33 2f 30 35 2c 31 39 3a | K: "18/03/05,19: V (35224219) simcom: rx: 30 39 3a 33 35 2b 35 32 22 0d 0a 0d 0a 2b 43 53 | 09:35+52"....+CS V (35224219) simcom: rx: 51 3a 20 33 2c 39 39 0d 0a 0d 0a 2b 43 4f 50 53 | Q: 3,99....+COPS V (35224219) simcom: rx: 3a 20 30 2c 30 2c 22 32 64 65 67 72 65 65 73 22 | : 0,0,"2degrees" V (35224219) simcom: rx: 2c 32 0d 0a 0d 0a 4f 4b 0d 0a | ,2....OK.. OVMS > a simcom status SIMCOM Network Registration: RegisteredHome State: NetMode Ticker: 1235 User Data: 0
Mux Status: up Open Channels: 4 Framing Errors: 5103 Last RX frame: 1222 sec(s) ago RX frames: 2512 TX frames: 2500
PPP Not Connected Last Error: Connection Lost
GPS Status: disabled Time: disabled NMEA: GPS/GLONASS Not Connected OVMS > simcom power simcom off ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT) configsip: 156795334, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
<ovms_2018-03-05T05_40_29+0000.log.bz2><ovms_2018-03-04T19_43_10+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 15/03/18 13:28, Mark Webb-Johnson wrote:
This (and the subsequent work to do this globally, irrespective of the state) seems to be working well in my car for the past couple of days.
How is it working out for you? Are you still getting cases where the SIMCOM loses connectivity and it can’t recover?
It's much much better, but it's still broken :( Yesterday, with the first half of your changes, it had very poor connectivity but was able to reconnect several times. Today, with the second half of your changes, it didn't seem to reconnect by itself. Unfortunately I haven't had a data logger on it with your improvements so I don't have a lot of information about what is wrong. I went and investigated it's current state this evening and while the CLI user interface was working well, nothing simcom related was being logged. I guess there's something that wedges that task. I wasn't able to make it connect by rebooting with the button. Stuff happened, but it didn't talk to the v2 server. It's possible the signal strength is too weak at this location? I tried two OVMS antennas with no luck. I've attached the log. I've testing a Spark m2m simcard, and the last few days have been using that. I have access to their m2m monitoring facility, that reported a 3 minute session with 1457 bytes up and 5321 bytes down. After failing to reconnect after rebooting with the button, I put the 2degress consumer simcard back in and it connected immediately. You can see the last thing I do in the log is change the apn. I haven't included the log of the working session. Thank you for finding the mux bugs, I'm sorry I haven't had time to debug it myself. I'll try to get some minor leaf improvements in before the firmware is locked in but my weekend is looking busy again :( In other New Zealand news, 2degrees turned off their 2g network early this morning. Vodafone are apparently going to keep theirs up until 2025. I've got a new simcard for my v2 module but I haven't switched it over yet.
Probably the most useful output, at the time when the modem couldn’t recover would be: simcom status network status module memory module tasks I’m only running with the DEMO vehicle module, Server V2, Wifi AP, and SIMCOM. No GPS enabled. The network seems stable to me (although I am getting one general crash about once a day). Regards, Mark.
On 15 Mar 2018, at 6:18 PM, Tom Parker <tom@carrott.org> wrote:
On 15/03/18 13:28, Mark Webb-Johnson wrote:
This (and the subsequent work to do this globally, irrespective of the state) seems to be working well in my car for the past couple of days.
How is it working out for you? Are you still getting cases where the SIMCOM loses connectivity and it can’t recover?
It's much much better, but it's still broken :( Yesterday, with the first half of your changes, it had very poor connectivity but was able to reconnect several times. Today, with the second half of your changes, it didn't seem to reconnect by itself. Unfortunately I haven't had a data logger on it with your improvements so I don't have a lot of information about what is wrong. I went and investigated it's current state this evening and while the CLI user interface was working well, nothing simcom related was being logged. I guess there's something that wedges that task.
I wasn't able to make it connect by rebooting with the button. Stuff happened, but it didn't talk to the v2 server. It's possible the signal strength is too weak at this location? I tried two OVMS antennas with no luck. I've attached the log. I've testing a Spark m2m simcard, and the last few days have been using that. I have access to their m2m monitoring facility, that reported a 3 minute session with 1457 bytes up and 5321 bytes down. After failing to reconnect after rebooting with the button, I put the 2degress consumer simcard back in and it connected immediately. You can see the last thing I do in the log is change the apn. I haven't included the log of the working session.
Thank you for finding the mux bugs, I'm sorry I haven't had time to debug it myself. I'll try to get some minor leaf improvements in before the firmware is locked in but my weekend is looking busy again :(
In other New Zealand news, 2degrees turned off their 2g network early this morning. Vodafone are apparently going to keep theirs up until 2025. I've got a new simcard for my v2 module but I haven't switched it over yet. <ovms_2018-03-15T09_21_02+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 16/03/18 14:48, Mark Webb-Johnson wrote:
Probably the most useful output, at the time when the modem couldn’t recover would be:
simcom status network status module memory module tasks
I’m only running with the DEMO vehicle module, Server V2, Wifi AP, and SIMCOM. No GPS enabled. The network seems stable to me (although I am getting one general crash about once a day).
I'm running the Leaf module, server v2, no wifi, simcom and no gps. The first 3 of the attached logs record what happened yesterday and this morning with code from ee023a5 from a few days ago: ovms_2018-03-17T04_28_21+0000.log.bz2: Driving 10km and into an underground carpark where the OVMS lost signal. Interestingly my cell phone (on a different carrier who still have a 2G network) maintained a connection but degraded to 2G and then stayed on 2G for an hour or so afterwards even though I went back back to the surface. ovms_2018-03-17T08_38_49+0000.log.bz2: 4 hours later, driving out of the carpark and returning to the starting point. Connection to the v2 server was not restored. ovms_2018-03-17T23_01_38+0000.log.bz2: 12 hours later, running your recommended commands. I note that 13 hours before the OVMS was still talking to the simcom but in this session it was no longer talking. When I tried to power off the simcom, the OVMS hung and I had to reset it with the button. You will see in the logs that I switched back to the Spark network using an m2m simcard after resetting with the button. I worked out why the signal strength is so poor here -- Spark's network is on 850/2100MHz while I have a SIM5360E module which I think is a 900/2100MHz modem. According to internet sources, Spark only have 2100MHz infill with the main network running on 850MHz, so that probably explains why it works in some locations and not others. With current master running in the OVMS and using the m2m console, I disabled the simcard, and pretty much immediately you can see the disconnection in the ovms logs. However when I activate the simcard, it does not reconnect. If I press the button to reboot the module it does reconnect. In this log file, 445227 is after I reactivated the simcard (I actually reactivated it some time before then). Doing a "power simcom off", followed by a "power simcom on" seems to have fixed it. I don't know if the power cycle is required to reactivate the simcard or if it merely speeds up the activation? ovms_2018-03-18T09:25:41+00:00.log.bz2 is a transcript of this session.
Hi Mark, I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself. Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log.
Which firmware version is this with? Regards, Mark
On 20 Mar 2018, at 5:01 PM, Tom Parker <tom@carrott.org> wrote:
Hi Mark,
I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself.
Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log. <ovms_2018-03-20T04_40_03+0000.log.bz2> _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Sorry I should have specified, it's with 15933a41ddca3a28d37ab788916b93480574447a with a few hopefully unrelated leaf specific changes I'm preparing to send though. On 20/03/18 22:05, Mark Webb-Johnson wrote:
Which firmware version is this with?
Regards, Mark
On 20 Mar 2018, at 5:01 PM, Tom Parker <tom@carrott.org> wrote:
Hi Mark,
I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself.
Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log. <ovms_2018-03-20T04_40_03+0000.log.bz2> _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Ah, that’s quite old (since when is two days old, he says :-) But it pre-dates the 2bcbc8e995977c904f5346c44221e07413e56a62 stuff that messed things up. Do you know at about what time (in the log file) the issue happened? Also, do you have a ‘simcom status’ from the issue time? Regards, Mark
On 20 Mar 2018, at 5:07 PM, Tom Parker <tom@carrott.org> wrote:
Sorry I should have specified, it's with 15933a41ddca3a28d37ab788916b93480574447a with a few hopefully unrelated leaf specific changes I'm preparing to send though.
On 20/03/18 22:05, Mark Webb-Johnson wrote:
Which firmware version is this with?
Regards, Mark
On 20 Mar 2018, at 5:01 PM, Tom Parker <tom@carrott.org> wrote:
Hi Mark,
I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself.
Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log. <ovms_2018-03-20T04_40_03+0000.log.bz2> _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
It looks like around 38675227 is when it died. There are simcom status and module memory and tasks at 38714257 as well as a simcom power cycle that didn't bring it back. On 20/03/18 23:56, Mark Webb-Johnson wrote:
Ah, that’s quite old (since when is two days old, he says :-) But it pre-dates the 2bcbc8e995977c904f5346c44221e07413e56a62 stuff that messed things up.
Do you know at about what time (in the log file) the issue happened? Also, do you have a ‘simcom status’ from the issue time?
Regards, Mark
On 20 Mar 2018, at 5:07 PM, Tom Parker <tom@carrott.org> wrote:
Sorry I should have specified, it's with 15933a41ddca3a28d37ab788916b93480574447a with a few hopefully unrelated leaf specific changes I'm preparing to send though.
On 20/03/18 22:05, Mark Webb-Johnson wrote:
Which firmware version is this with?
Regards, Mark
On 20 Mar 2018, at 5:01 PM, Tom Parker <tom@carrott.org> wrote:
Hi Mark,
I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself.
Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log. <ovms_2018-03-20T04_40_03+0000.log.bz2> _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Tom, On one of my drives, I saw a similar thing. The symptom for me was that ‘power simcom off’ would lock up the console, and wifi could not in general connect to any stations any more. Everything else appeared to be running fine. Since using the latest ‘wifirefactor’ code, I haven’t see this. But, I haven’t used Access Point mode, or Web Server. I’m trying to keep things simple, to get a base system stable first, before adding more complexity, so I just have a ovms server v2, a wifi scanning client (connecting to any available access point), and a simcom modem running. That seems stable for the past two days for me. So, I’ve just merged in the ‘wifirefactor’ code to master. Can you try with that code, and see if you still get issues? Regards, Mark.
On 21 Mar 2018, at 4:59 PM, Tom Parker <tom@carrott.org> wrote:
It looks like around 38675227 is when it died. There are simcom status and module memory and tasks at 38714257 as well as a simcom power cycle that didn't bring it back.
On 20/03/18 23:56, Mark Webb-Johnson wrote:
Ah, that’s quite old (since when is two days old, he says :-) But it pre-dates the 2bcbc8e995977c904f5346c44221e07413e56a62 stuff that messed things up.
Do you know at about what time (in the log file) the issue happened? Also, do you have a ‘simcom status’ from the issue time?
Regards, Mark
On 20 Mar 2018, at 5:07 PM, Tom Parker <tom@carrott.org> wrote:
Sorry I should have specified, it's with 15933a41ddca3a28d37ab788916b93480574447a with a few hopefully unrelated leaf specific changes I'm preparing to send though.
On 20/03/18 22:05, Mark Webb-Johnson wrote:
Which firmware version is this with?
Regards, Mark
On 20 Mar 2018, at 5:01 PM, Tom Parker <tom@carrott.org> wrote:
Hi Mark,
I'm back using the 2degrees simcard (they run a 900MHz/2100MHz network that is compatible with my simcom module) and performance is very good compared to where we were a few weeks ago. There is a dead spot which I drive through often where it would usually disconnect and never reconnect, now it still disconnects and I can see the app shows the last contact lag by a couple of minutes, but it reconnected all by itself.
Unfortunately it dropped offline and never reconnected this evening. It disconnected shortly before I arrived at a meeting, I left the data logger connected for a few hours before trying "power simcom off" and on again which wasn't enough to clear it, I had to reboot the module the button. See attached log. <ovms_2018-03-20T04_40_03+0000.log.bz2> _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 23/03/18 21:52, Mark Webb-Johnson wrote:
On one of my drives, I saw a similar thing. The symptom for me was that ‘power simcom off’ would lock up the console, and wifi could not in general connect to any stations any more. Everything else appeared to be running fine.
Since using the latest ‘wifirefactor’ code, I haven’t see this. But, I haven’t used Access Point mode, or Web Server. I’m trying to keep things simple, to get a base system stable first, before adding more complexity, so I just have a ovms server v2, a wifi scanning client (connecting to any available access point), and a simcom modem running. That seems stable for the past two days for me.
I'm not running a wifi client at all, and I'm fairly sure I have everything else turned off, I'm only using the leaf, simcom and server v2.
So, I’ve just merged in the ‘wifirefactor’ code to master. Can you try with that code, and see if you still get issues?
I'm running it now, I'll let you know how it goes. I have a reproducible testcase: when using the spark m2m service, the ovms never reconnects if I disable and re-activate the simcard. This doesn't freeze the whole system when I power off the simcom so it may be we have two bugs. I don't have the right simcards here so I'll try it on Monday.
Tom, What do you mean by ‘disable and reactivate the SIM card’? How are you doing that? The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same? Cellular connectivity should be a different issue. Maybe just can’t connect to the signal - but the logs should tell us more info. We can also directly issue AT commands when the modem is in that state to try to find out why. Regards, Mark
On 24 Mar 2018, at 8:43 AM, Tom Parker <tom@carrott.org> wrote:
On 23/03/18 21:52, Mark Webb-Johnson wrote:
On one of my drives, I saw a similar thing. The symptom for me was that ‘power simcom off’ would lock up the console, and wifi could not in general connect to any stations any more. Everything else appeared to be running fine.
Since using the latest ‘wifirefactor’ code, I haven’t see this. But, I haven’t used Access Point mode, or Web Server. I’m trying to keep things simple, to get a base system stable first, before adding more complexity, so I just have a ovms server v2, a wifi scanning client (connecting to any available access point), and a simcom modem running. That seems stable for the past two days for me.
I'm not running a wifi client at all, and I'm fairly sure I have everything else turned off, I'm only using the leaf, simcom and server v2.
So, I’ve just merged in the ‘wifirefactor’ code to master. Can you try with that code, and see if you still get issues?
I'm running it now, I'll let you know how it goes. I have a reproducible testcase: when using the spark m2m service, the ovms never reconnects if I disable and re-activate the simcard. This doesn't freeze the whole system when I power off the simcom so it may be we have two bugs. I don't have the right simcards here so I'll try it on Monday.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 25/03/18 02:41, Mark Webb-Johnson wrote:
What do you mean by ‘disable and reactivate the SIM card’? How are you doing that?
I have access to Spark's m2m service which allows customers to deactivate and reactivate simcards. The deactivation is immediate, I can see the simcom reports it is disconnected and the ovms starts trying to reconnect. Last time I tried, when I reactivated, the simcom reconnected to the cell phone network but the ovms never re-established a ppp connection. I can send it sms messages and see them show up in the simcom logging so I know some things are working. I last tried this a couple of weeks ago, I'll try again tomorrow. I have a range of devices on this service and some of the reconnect while other don't, so the ovms isn't the only one having trouble.
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
I've certainly had situations where doing simcom power off causes the async console to freeze, but I've only tried to do that after the cellular is has stopped working.
Cellular connectivity should be a different issue. Maybe just can’t connect to the signal - but the logs should tell us more info. We can also directly issue AT commands when the modem is in that state to try to find out why.
Any advice on which commands to try when it's not connected?
I think this may be a different issue, and maybe cellular related. It depends on the network, but the cellular standards do have a mechanism where if the cellular network denies registration to a connecting client, the client is supposed to back-off and not try again for a while. I know SIMCOM supports this, because in the early days we had this with Hologram. This is the “FPLMN” list. You can check the status of that list with: AT+CRSM=176,28539,0,0,12 The response is a list of cellular network codes that are currently blocked. The list can be manually cleared with: AT+CRSM=214,28539,0,0,12,"FFFFFFFFFFFFFFFFFFFFFFFF" Then power cycling the modem.
Any advice on which commands to try when it's not connected?
Assuming ‘simcom status’ shows the mux is up and running, then just try to talk to the modem and see what it is doing: simcom muxtx 3 AT Just try to get AT command responses first. If that works, then need to see why it is not connecting. Check network registration status, COPS status, etc. Regards, Mark.
On 25 Mar 2018, at 3:39 AM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 02:41, Mark Webb-Johnson wrote:
What do you mean by ‘disable and reactivate the SIM card’? How are you doing that?
I have access to Spark's m2m service which allows customers to deactivate and reactivate simcards. The deactivation is immediate, I can see the simcom reports it is disconnected and the ovms starts trying to reconnect. Last time I tried, when I reactivated, the simcom reconnected to the cell phone network but the ovms never re-established a ppp connection. I can send it sms messages and see them show up in the simcom logging so I know some things are working. I last tried this a couple of weeks ago, I'll try again tomorrow.
I have a range of devices on this service and some of the reconnect while other don't, so the ovms isn't the only one having trouble.
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
I've certainly had situations where doing simcom power off causes the async console to freeze, but I've only tried to do that after the cellular is has stopped working.
Cellular connectivity should be a different issue. Maybe just can’t connect to the signal - but the logs should tell us more info. We can also directly issue AT commands when the modem is in that state to try to find out why.
Any advice on which commands to try when it's not connected? _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 25/03/18 02:41, Mark Webb-Johnson wrote:
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state: The mux is up AT communication with the simcom works on channel 3 and logs the expected tx and rx The simcom is connected to the cellular network No log messages recording periodic communications with the simcom are being recorded The monotonic and park time counters are not advancing. Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance? What is the best way to investigate the state of the timers and periodic execution? See attached for a transcript of this debugging session.
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped. My guess is still the ppp code, during session teardown. I sent a detailed message on this an hour or so, with my analysis of this. Regards, Mark.
On 25 Mar 2018, at 5:52 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 02:41, Mark Webb-Johnson wrote:
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state:
The mux is up AT communication with the simcom works on channel 3 and logs the expected tx and rx The simcom is connected to the cellular network No log messages recording periodic communications with the simcom are being recorded The monotonic and park time counters are not advancing.
Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance?
What is the best way to investigate the state of the timers and periodic execution?
See attached for a transcript of this debugging session. <ovms_2018-03-25T09_29_07+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On 25/03/18 22:58, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
It certainly has -- Housekeeping::Ticker1 increments monotonic and that isn't working. I had a look at the simcom power code and there is a vTaskDelay(1000 / portTICK_PERIOD_MS) in simcom::PowerCycle() but I'm not clear whether that is called when you power off the modem.
My guess is still the ppp code, during session teardown.
Would moar printfs to pin down where it dies help? I see the watchdog changes you've added, I'm not running them yet. I'll do that tomorrow.
I sent a detailed message on this an hour or so, with my analysis of this.
Read that, tried a few things from it. Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system?
Would moar printfs to pin down where it dies help?
I added a log output to ppp just after closing the connection, to see if we come back from the call to shutdown the ppp link.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system?
I haven’t managed to get gdb to recognise freertos tasks. I’ve seen references to doing this over JTAG, but not via gdbstub. Anybody else have any luck with this? Regards, Mark.
On 25 Mar 2018, at 6:10 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 22:58, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
It certainly has -- Housekeeping::Ticker1 increments monotonic and that isn't working. I had a look at the simcom power code and there is a vTaskDelay(1000 / portTICK_PERIOD_MS) in simcom::PowerCycle() but I'm not clear whether that is called when you power off the modem.
My guess is still the ppp code, during session teardown.
Would moar printfs to pin down where it dies help? I see the watchdog changes you've added, I'm not running them yet. I'll do that tomorrow.
I sent a detailed message on this an hour or so, with my analysis of this.
Read that, tried a few things from it.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system? _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I have not dug deeply, but all the references I have seen to using gdb for debugging on the ESP32 require extra JTAG hardware. The ESP-WROVER-KIT includes the FTDI FT2232HL USB bridge that allows a JTAG connection over USB in addition to the UART connection. I don't know if it would have been feasible to include that debugging function into the OVMS v3 hardware design. -- Steve On Mon, 26 Mar 2018, Mark Webb-Johnson wrote:
Would moar printfs to pin down where it dies help?
I added a log output to ppp just after closing the connection, to see if we come back from the call to shutdown the ppp link.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system?
I haven't managed to get gdb to recognise freertos tasks. I've seen references to doing this over JTAG, but not via gdbstub. Anybody else have any luck with this?
Regards, Mark.
On 25 Mar 2018, at 6:10 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 22:58, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
It certainly has -- Housekeeping::Ticker1 increments monotonic and that isn't working. I had a look at the simcom power code and there is a vTaskDelay(1000 / portTICK_PERIOD_MS) in simcom::PowerCycle() but I'm not clear whether that is called when you power off the modem.
My guess is still the ppp code, during session teardown.
Would moar printfs to pin down where it dies help? I see the watchdog changes you've added, I'm not running them yet. I'll do that tomorrow.
I sent a detailed message on this an hour or so, with my analysis of this.
Read that, tried a few things from it.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system? _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
I did look at JTAG, but it would have meant a few changes: Needs some GPIOs that we don’t have spare Needs to change to FTDI dual async chip Overall, just too risky to add at the late stage, so we didn’t attempt it. Regards, Mark.
On 27 Mar 2018, at 3:26 AM, Stephen Casner <casner@acm.org> wrote:
I have not dug deeply, but all the references I have seen to using gdb for debugging on the ESP32 require extra JTAG hardware. The ESP-WROVER-KIT includes the FTDI FT2232HL USB bridge that allows a JTAG connection over USB in addition to the UART connection. I don't know if it would have been feasible to include that debugging function into the OVMS v3 hardware design.
-- Steve
On Mon, 26 Mar 2018, Mark Webb-Johnson wrote:
Would moar printfs to pin down where it dies help?
I added a log output to ppp just after closing the connection, to see if we come back from the call to shutdown the ppp link.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system?
I haven't managed to get gdb to recognise freertos tasks. I've seen references to doing this over JTAG, but not via gdbstub. Anybody else have any luck with this?
Regards, Mark.
On 25 Mar 2018, at 6:10 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 22:58, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
It certainly has -- Housekeeping::Ticker1 increments monotonic and that isn't working. I had a look at the simcom power code and there is a vTaskDelay(1000 / portTICK_PERIOD_MS) in simcom::PowerCycle() but I'm not clear whether that is called when you power off the modem.
My guess is still the ppp code, during session teardown.
Would moar printfs to pin down where it dies help? I see the watchdog changes you've added, I'm not running them yet. I'll do that tomorrow.
I sent a detailed message on this an hour or so, with my analysis of this.
Read that, tried a few things from it.
Is it possible to break into the system with gdb over the serial port and do a "thread apply all bt" or similar to get the state of all the threads? Or place a breakpoints in the right places and resume the system? _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Enabling the watchdog timer is a good idea. I've had it enabled in my config for some time. When I was recently working on enhancements to the "module memory" command and had an infinite loop in my code, I got a timer trap: Task watchdog got triggered. The following tasks did not reset the watchdog in time: - IDLE (CPU 1) Tasks currently running: CPU 0: IDLE CPU 1: AsyncConsole But this will only catch the problem if there is some task that is running away. It could be some kind of synchronization block instead. If there is a particular command that seems to get stuck, we could consider (temporarily) invoking that command as a separate task so that the async console would still be usable to investigate. Another possibility that could be used in scenarios where wifi is still working and doesn't need to be brought up/down would be to connect with telnet or ssh and then issue the the simcom command in that console. I have added printing of each task's state in the "module tasks" output, but I have yet to observe any task in the "Run" state. (I would expect AsyncConsole to be in Run state while executing that command, but it is not.) I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number. -- Steve On Sun, 25 Mar 2018, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
My guess is still the ppp code, during session teardown.
I sent a detailed message on this an hour or so, with my analysis of this.
Regards, Mark.
On 25 Mar 2018, at 5:52 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 02:41, Mark Webb-Johnson wrote:
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state:
The mux is up AT communication with the simcom works on channel 3 and logs the expected tx and rx The simcom is connected to the cellular network No log messages recording periodic communications with the simcom are being recorded The monotonic and park time counters are not advancing.
Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance?
What is the best way to investigate the state of the timers and periodic execution?
See attached for a transcript of this debugging session. <ovms_2018-03-25T09_29_07+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On Sun, 25 Mar 2018, Stephen Casner wrote:
I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number.
I have now done this. "module tasks" works as before, but "module tasks stack" shows a crude stack trace for each task. -- Steve
That is pretty cool. The ‘make monitor’ auto-magic address discovery makes the output useful: 3FFC8BAC 16 Blk tiT 484 692 6144 0 0 128 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 Now, the only issue is recreating the problem in a manner where I can get in with ‘make monitor’. Or, maybe not… $ ~/esp/xtensa-esp32-elf/bin/xtensa-esp32-elf-addr2line -pfiaC -e build/ovms3.elf 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 It seems that I only need the ‘.elf’ file that was used to build the .bin that went on the module. This might be workable. I will try to keep the .elf and .bin files, and see if I can get something the next time it locks up (although I think my watchdog reboot may now avoid that). I’m 99.9% certain the issue is in the TiT tcpip task. Both wifi and pppos go down, while other parts of the system still seem to be running ok. Regards, Mark.
On 26 Mar 2018, at 2:33 PM, Stephen Casner <casner@acm.org> wrote:
On Sun, 25 Mar 2018, Stephen Casner wrote:
I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number.
I have now done this. "module tasks" works as before, but "module tasks stack" shows a crude stack trace for each task.
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Now that we can now see this, it is quite scary: Number of Tasks = 17 Stack: Now Max Total Heap 32-bit SPIRAM 3FFF2658 23 Rdy NetManTask 3116 5340 7168 12152 27488 0 $ ~/esp/xtensa-esp32-elf/bin/xtensa-esp32-elf-addr2line -pfiaC -e build/ovms3.elf 0x40000000 0x40000000 0x400ffe7c 0x400f2466 0x400f45f6 0x400f4726 0x400d8335 0x40152cdd 0x40151b95 0x400d81ee 0x400d830e 0x400e9d20 0x400e2e7d 0x400e2f94 0x400e2f86 0x400e2f86 0x400e2fbc 0x400edbab 0x400f11e4 0x400f124b 0x400edbda 0x40152976 0x401529e9 0x402050ee 0x400d8ded 0x400e5b4c 0x400d8474 0x400f6d25 0x400f6f1d 0x4008c604 0x40081374 0x400da5bb 0x400da622 0x400f647e 0x401e58d8 0x401d12f6 0x400ffe7c 0x401d1d8c 0x400f2466 0x400f45f6 0x400f777a 0x400f778d 0x400f778d 0x400f7801 0x400f7a59 0x400f7ce8 0x400f4494 0x400f4494 0x400eaf4d 0x400eaf8c 0x40000000: ?? ??:0 0x40000000: ?? ??:0 0x400ffe7c: gettimeofday at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/syscalls/../../../.././newlib/libc/syscalls/sysgettod.c:13 0x400f2466: cs_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f45f6: mg_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4726: mg_send at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400d8335: SendCallback(WOLFSSH*, void*, unsigned int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:976 0x40152cdd: HighwaterCheck at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 (inlined by) SendBuffered at vehicle/OVMS.V3/components/wolfssh/src/internal.c:849 0x40151b95: wolfSSH_stream_send at vehicle/OVMS.V3/components/wolfssh/src/ssh.c:635 0x400d81ee: ConsoleSSH::write(void const*, unsigned int) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:926 0x400d830e: ConsoleSSH::printf(char const*, ...) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:897 0x400e9d20: module_tasks(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_module.cpp:823 0x400e2e7d: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f94: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f86: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f86: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2fbc: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400edbab: Execute(microrl*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_shell.cpp:49 0x400f11e4: print_prompt at vehicle/OVMS.V3/components/microrl/./microrl.c:281 (inlined by) new_line_handler at vehicle/OVMS.V3/components/microrl/./microrl.c:621 0x400f124b: microrl_insert_char at vehicle/OVMS.V3/components/microrl/./microrl.c:669 0x400edbda: OvmsShell::ProcessChar(char) at vehicle/OVMS.V3/main/./ovms_shell.cpp:70 0x40152976: GrowBuffer at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 0x401529e9: GetInputData at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 0x402050ee: OvmsShell::ProcessChars(char const*, int) at vehicle/OVMS.V3/main/./ovms_shell.cpp:75 (discriminator 2) 0x400d8ded: ConsoleSSH::HandleDeviceEvent(void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:479 0x400e5b4c: OvmsConsole::Poll(unsigned int, void*) at vehicle/OVMS.V3/main/./ovms_console.cpp:153 0x400d8474: ConsoleSSH::Receive() at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:308 0x400f6d25: mg_http_handler2 at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f6f1d: mg_http_handler at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x4008c604: vTaskExitCritical at /Users/mark/esp/esp-idf/components/freertos/./tasks.c:4837 0x40081374: esp_crosscore_int_send at /Users/mark/esp/esp-idf/components/esp32/./crosscore_int.c:103 0x400da5bb: OvmsSSH::EventHandler(mg_connection*, int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:110 0x400da622: MongooseHandler(mg_connection*, int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:61 0x400f647e: mg_call at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x401e58d8: netconn_recved at /Users/mark/esp/esp-idf/components/lwip/api/api_lib.c:830 (discriminator 4) 0x401d12f6: lwip_recvfrom at /Users/mark/esp/esp-idf/components/lwip/api/sockets.c:3229 0x400ffe7c: gettimeofday at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/syscalls/../../../.././newlib/libc/syscalls/sysgettod.c:13 0x401d1d8c: lwip_recvfrom_r at /Users/mark/esp/esp-idf/components/lwip/api/sockets.c:3229 0x400f2466: cs_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f45f6: mg_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f777a: mg_recv_common at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f778d: mg_if_recv_tcp_cb at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f778d: mg_if_recv_tcp_cb at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7801: mg_handle_tcp_read at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7a59: mg_mgr_handle_conn at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7ce8: mg_socket_if_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4494: mg_mgr_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4494: mg_mgr_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400eaf4d: OvmsNetManager::MongooseTask() at vehicle/OVMS.V3/main/./ovms_netmanager.cpp:380 0x400eaf8c: MongooseRawTask(void*) at vehicle/OVMS.V3/main/./ovms_netmanager.cpp:370 That is a huge stack for a little box. Regards, Mark.
On 26 Mar 2018, at 3:31 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
That is pretty cool. The ‘make monitor’ auto-magic address discovery makes the output useful:
3FFC8BAC 16 Blk tiT 484 692 6144 0 0 128 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474
Now, the only issue is recreating the problem in a manner where I can get in with ‘make monitor’. Or, maybe not…
$ ~/esp/xtensa-esp32-elf/bin/xtensa-esp32-elf-addr2line -pfiaC -e build/ovms3.elf 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474
It seems that I only need the ‘.elf’ file that was used to build the .bin that went on the module.
This might be workable. I will try to keep the .elf and .bin files, and see if I can get something the next time it locks up (although I think my watchdog reboot may now avoid that).
I’m 99.9% certain the issue is in the TiT tcpip task. Both wifi and pppos go down, while other parts of the system still seem to be running ok.
Regards, Mark.
On 26 Mar 2018, at 2:33 PM, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote:
On Sun, 25 Mar 2018, Stephen Casner wrote:
I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number.
I have now done this. "module tasks" works as before, but "module tasks stack" shows a crude stack trace for each task.
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Please be warned that there may be false entries like 0x40000000 in this crude stack trace because my code is simply scanning through the stack looking for 0x80xxxxxx values and subtracting 0x40000000 from them to print. The reason for that translation is that the system uses the high bits of the address field to encode some information about the size of the register field in the stack frame. But to Mark's point about the deep stack, we chose to concentrate the work in the NetManTask with one large stack allocation rather than having multiple tasks with perhaps somewhat less allocation. -- Steve On Mon, 26 Mar 2018, Mark Webb-Johnson wrote:
Now that we can now see this, it is quite scary:
Number of Tasks = 17 Stack: Now Max Total Heap 32-bit SPIRAM 3FFF2658 23 Rdy NetManTask 3116 5340 7168 12152 27488 0
$ ~/esp/xtensa-esp32-elf/bin/xtensa-esp32-elf-addr2line -pfiaC -e build/ovms3.elf 0x40000000 0x40000000 0x400ffe7c 0x400f2466 0x400f45f6 0x400f4726 0x400d8335 0x40152cdd 0x40151b95 0x400d81ee 0x400d830e 0x400e9d20 0x400e2e7d 0x400e2f94 0x400e2f86 0x400e2f86 0x400e2fbc 0x400edbab 0x400f11e4 0x400f124b 0x400edbda 0x40152976 0x401529e9 0x402050ee 0x400d8ded 0x400e5b4c 0x400d8474 0x400f6d25 0x400f6f1d 0x4008c604 0x40081374 0x400da5bb 0x400da622 0x400f647e 0x401e58d8 0x401d12f6 0x400ffe7c 0x401d1d8c 0x400f2466 0x400f45f6 0x400f777a 0x400f778d 0x400f778d 0x400f7801 0x400f7a59 0x400f7ce8 0x400f4494 0x400f4494 0x400eaf4d 0x400eaf8c 0x40000000: ?? ??:0 0x40000000: ?? ??:0 0x400ffe7c: gettimeofday at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/syscalls/../../../.././newlib/libc/syscalls/sysgettod.c:13 0x400f2466: cs_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f45f6: mg_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4726: mg_send at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400d8335: SendCallback(WOLFSSH*, void*, unsigned int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:976 0x40152cdd: HighwaterCheck at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 (inlined by) SendBuffered at vehicle/OVMS.V3/components/wolfssh/src/internal.c:849 0x40151b95: wolfSSH_stream_send at vehicle/OVMS.V3/components/wolfssh/src/ssh.c:635 0x400d81ee: ConsoleSSH::write(void const*, unsigned int) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:926 0x400d830e: ConsoleSSH::printf(char const*, ...) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:897 0x400e9d20: module_tasks(int, OvmsWriter*, OvmsCommand*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_module.cpp:823 0x400e2e7d: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f94: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f86: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2f86: OvmsCommand::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400e2fbc: OvmsCommandApp::Execute(int, OvmsWriter*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_command.cpp:94 0x400edbab: Execute(microrl*, int, char const* const*) at vehicle/OVMS.V3/main/./ovms_shell.cpp:49 0x400f11e4: print_prompt at vehicle/OVMS.V3/components/microrl/./microrl.c:281 (inlined by) new_line_handler at vehicle/OVMS.V3/components/microrl/./microrl.c:621 0x400f124b: microrl_insert_char at vehicle/OVMS.V3/components/microrl/./microrl.c:669 0x400edbda: OvmsShell::ProcessChar(char) at vehicle/OVMS.V3/main/./ovms_shell.cpp:70 0x40152976: GrowBuffer at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 0x401529e9: GetInputData at vehicle/OVMS.V3/components/wolfssh/src/internal.c:4194 0x402050ee: OvmsShell::ProcessChars(char const*, int) at vehicle/OVMS.V3/main/./ovms_shell.cpp:75 (discriminator 2) 0x400d8ded: ConsoleSSH::HandleDeviceEvent(void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:479 0x400e5b4c: OvmsConsole::Poll(unsigned int, void*) at vehicle/OVMS.V3/main/./ovms_console.cpp:153 0x400d8474: ConsoleSSH::Receive() at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:308 0x400f6d25: mg_http_handler2 at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f6f1d: mg_http_handler at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x4008c604: vTaskExitCritical at /Users/mark/esp/esp-idf/components/freertos/./tasks.c:4837 0x40081374: esp_crosscore_int_send at /Users/mark/esp/esp-idf/components/esp32/./crosscore_int.c:103 0x400da5bb: OvmsSSH::EventHandler(mg_connection*, int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:110 0x400da622: MongooseHandler(mg_connection*, int, void*) at vehicle/OVMS.V3/components/console_ssh/src/console_ssh.cpp:61 0x400f647e: mg_call at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x401e58d8: netconn_recved at /Users/mark/esp/esp-idf/components/lwip/api/api_lib.c:830 (discriminator 4) 0x401d12f6: lwip_recvfrom at /Users/mark/esp/esp-idf/components/lwip/api/sockets.c:3229 0x400ffe7c: gettimeofday at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/syscalls/../../../.././newlib/libc/syscalls/sysgettod.c:13 0x401d1d8c: lwip_recvfrom_r at /Users/mark/esp/esp-idf/components/lwip/api/sockets.c:3229 0x400f2466: cs_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f45f6: mg_time at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f777a: mg_recv_common at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f778d: mg_if_recv_tcp_cb at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f778d: mg_if_recv_tcp_cb at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7801: mg_handle_tcp_read at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7a59: mg_mgr_handle_conn at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f7ce8: mg_socket_if_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4494: mg_mgr_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400f4494: mg_mgr_poll at vehicle/OVMS.V3/components/mongoose/mongoose/mongoose.c:11373 0x400eaf4d: OvmsNetManager::MongooseTask() at vehicle/OVMS.V3/main/./ovms_netmanager.cpp:380 0x400eaf8c: MongooseRawTask(void*) at vehicle/OVMS.V3/main/./ovms_netmanager.cpp:370
That is a huge stack for a little box.
Regards, Mark.
On 26 Mar 2018, at 3:31 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
That is pretty cool. The ‘make monitor’ auto-magic address discovery makes the output useful:
3FFC8BAC 16 Blk tiT 484 692 6144 0 0 128 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474
Now, the only issue is recreating the problem in a manner where I can get in with ‘make monitor’. Or, maybe not…
$ ~/esp/xtensa-esp32-elf/bin/xtensa-esp32-elf-addr2line -pfiaC -e build/ovms3.elf 0x401fe551 0x4008ebfd 0x401f23f8 0x401ebd0c 0x401e978e 0x401e978e 0x401fe551: sys_arch_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:548 0x4008ebfd: xQueueGenericReceive at /Users/mark/esp/esp-idf/components/freertos/./queue.c:2037 0x401f23f8: sys_timeouts_mbox_fetch at /Users/mark/esp/esp-idf/components/lwip/core/timers.c:551 0x401ebd0c: netif_poll at /Users/mark/esp/esp-idf/components/lwip/core/netif.c:440 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474 0x401e978e: tcpip_thread at /Users/mark/esp/esp-idf/components/lwip/api/tcpip.c:474
It seems that I only need the ‘.elf’ file that was used to build the .bin that went on the module.
This might be workable. I will try to keep the .elf and .bin files, and see if I can get something the next time it locks up (although I think my watchdog reboot may now avoid that).
I’m 99.9% certain the issue is in the TiT tcpip task. Both wifi and pppos go down, while other parts of the system still seem to be running ok.
Regards, Mark.
On 26 Mar 2018, at 2:33 PM, Stephen Casner <casner@acm.org <mailto:casner@acm.org>> wrote:
On Sun, 25 Mar 2018, Stephen Casner wrote:
I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number.
I have now done this. "module tasks" works as before, but "module tasks stack" shows a crude stack trace for each task.
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk <mailto:OvmsDev@lists.teslaclub.hk> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
But this will only catch the problem if there is some task that is running away. It could be some kind of synchronization block instead.
Yes, that is my guess (a sync block).
Another possibility that could be used in scenarios where wifi is still working and doesn't need to be brought up/down would be to connect with telnet or ssh and then issue the the simcom command in that console.
From my experience, the async console is fine, up until the time we touch the network (at which point the async console locks up). I think the issue is in the TiT tcp/ip task (which is the only viable explanation for those tcp/ip timeouts I see on wifi, shortly after ppp goes down: I (136619030) gsm-ppp: Shutting down (hard)...^[[0m I (136619040) events: Signal(system.modem.down)^[[0m I (136619040) netmanager: Interface priority is st1 (x.y.z.212/255.255.248.0 gateway x.y.z.64) I (179290100) wifi: bcn_timout,ap_probe_send_start I (179292610) wifi: ap_probe_send over, resett wifi status to disassoc I (179292610) wifi: state: run -> init (1) I (179292620) wifi: pm stop, total sleep time: 0/138543477 I (179292620) wifi: n:13 0, o:13 0, ap:255 255, sta:13 0, prof:1 I think that is 179292 - 136619 = 11 hours, though! With Steve’s latest extension to ‘module tasks stack’, we should be able to narrow this down. I think I will firstly see if the watchdog works around it. Regards, Mark.
On 26 Mar 2018, at 12:28 AM, Stephen Casner <casner@acm.org> wrote:
Enabling the watchdog timer is a good idea. I've had it enabled in my config for some time. When I was recently working on enhancements to the "module memory" command and had an infinite loop in my code, I got a timer trap:
Task watchdog got triggered. The following tasks did not reset the watchdog in time: - IDLE (CPU 1) Tasks currently running: CPU 0: IDLE CPU 1: AsyncConsole
But this will only catch the problem if there is some task that is running away. It could be some kind of synchronization block instead.
If there is a particular command that seems to get stuck, we could consider (temporarily) invoking that command as a separate task so that the async console would still be usable to investigate. Another possibility that could be used in scenarios where wifi is still working and doesn't need to be brought up/down would be to connect with telnet or ssh and then issue the the simcom command in that console.
I have added printing of each task's state in the "module tasks" output, but I have yet to observe any task in the "Run" state. (I would expect AsyncConsole to be in Run state while executing that command, but it is not.)
I might be able to add an option on the command to print each task's PC, which the python code would look up to translate to a source line number.
-- Steve
On Sun, 25 Mar 2018, Mark Webb-Johnson wrote:
I think that the housekeeping task is locked up. With the latest code I have, the per-10-minute housekeeping message has stopped.
My guess is still the ppp code, during session teardown.
I sent a detailed message on this an hour or so, with my analysis of this.
Regards, Mark.
On 25 Mar 2018, at 5:52 PM, Tom Parker <tom@carrott.org> wrote:
On 25/03/18 02:41, Mark Webb-Johnson wrote:
The issue I was having was some task (I suspect lwip) gets messed up. Stopping the modem, trying to connect wifi, etc, all just locked up the async console. I think you had the same?
Re-winding this thread to the 'power simcom off' freezes the console aspect as distinct from the disabling the simcard issue. I had the module disconnect again this evening, though without a datalogger attached. Some more information about this state:
The mux is up AT communication with the simcom works on channel 3 and logs the expected tx and rx The simcom is connected to the cellular network No log messages recording periodic communications with the simcom are being recorded The monotonic and park time counters are not advancing.
Could the cause of this problem be that the timers have stopped? This could explain why the periodic interrogation of the simcom has stopped, and perhaps simcom power off has a spin wait or other loop waiting forever for the timer to advance?
What is the best way to investigate the state of the timers and periodic execution?
See attached for a transcript of this debugging session. <ovms_2018-03-25T09_29_07+0000.log.bz2>_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
participants (6)
-
Greg D -
Greg D. -
Mark Webb-Johnson -
Michael Balzer -
Stephen Casner -
Tom Parker