Since updating my in-car OVMS with a build from the master branch after my WolfSSL update I've experienced a couple of instances where connectivity from app was lost. The first time I needed a quick resolution so I just reset OVMS. It happened again during the night while charging, and this time I connected to the console to check conditions before resetting. What I found was that the async console was not responding at all. Since I have OVMS configured to drop into gdb on a crash, I should have seen something. I've turned on logging to file to see if that sheds any light. -- Steve
On Thu, 25 Mar 2021, Stephen Casner wrote:
Since updating my in-car OVMS with a build from the master branch after my WolfSSL update I've experienced a couple of instances where connectivity from app was lost. The first time I needed a quick resolution so I just reset OVMS. It happened again during the night while charging, and this time I connected to the console to check conditions before resetting. What I found was that the async console was not responding at all. Since I have OVMS configured to drop into gdb on a crash, I should have seen something. I've turned on logging to file to see if that sheds any light.
I had another apparent crash today, now with file logging enabled. This occurred while on a drive, so away from the home wifi. The last log messages indicate that the crash happened when bringing up networking via the modem connection: 2021-03-27 12:48:39.688 PDT I (176007218) netmanager: MODEM up (with WIFI client down): starting network with MODEM 2021-03-27 12:48:39.698 PDT I (176007228) time: Starting SNTP client 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Launching Web Server 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Binding to port 80 (http) 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Binding to port 443 (https) 2021-03-27 12:48:39.768 PDT I (176007298) telnet: Launching Telnet Server 2021-03-27 12:48:39.768 PDT I (176007298) ssh: Launching SSH Server 2021-03-27 12:48:39.788 PDT I (176007318) ovms-server-v2: Status: Network is up, so attempt network connection The last logged activity is to start server v2, and I do have TLS enabled, so this may indicate a problem with the new WolfSSL integration. Does this sound similar to any of the other crashes seen recently? -- Steve
Steve, I haven't seen any lockups like yours recently. A single report on a forum about 4 weeks ago was due to a corrupted config store partition, and I'm not sure that was already with the new WolfSSL code. I've just scanned the current crash reports on my server (updated hourly: https://dexters-web.de/f/firmware/developer/crashreport.csv) for anything WolfSSL related and found only one crash clearly originating within WolfSSL (decoded backtrace attached). That seems to be unlocking a mutex from the wrong task. What's more often now is the abort from the failed assertion in lock_acquire_generic, which seems to be caused by an ESP_LOG call from within a critical section. According to this info… https://github.com/espressif/esp-idf/issues/4412#issuecomment-559208996 …that could be the source for our primary issue of unexplained WDTs. But I think that can only either trigger the dog or trigger an abort immediately, not freeze the module. So… sorry, no clue here. Regards, Michael Am 28.03.21 um 04:31 schrieb Stephen Casner:
On Thu, 25 Mar 2021, Stephen Casner wrote:
Since updating my in-car OVMS with a build from the master branch after my WolfSSL update I've experienced a couple of instances where connectivity from app was lost. The first time I needed a quick resolution so I just reset OVMS. It happened again during the night while charging, and this time I connected to the console to check conditions before resetting. What I found was that the async console was not responding at all. Since I have OVMS configured to drop into gdb on a crash, I should have seen something. I've turned on logging to file to see if that sheds any light. I had another apparent crash today, now with file logging enabled. This occurred while on a drive, so away from the home wifi. The last log messages indicate that the crash happened when bringing up networking via the modem connection:
2021-03-27 12:48:39.688 PDT I (176007218) netmanager: MODEM up (with WIFI client down): starting network with MODEM 2021-03-27 12:48:39.698 PDT I (176007228) time: Starting SNTP client 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Launching Web Server 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Binding to port 80 (http) 2021-03-27 12:48:39.708 PDT I (176007238) webserver: Binding to port 443 (https) 2021-03-27 12:48:39.768 PDT I (176007298) telnet: Launching Telnet Server 2021-03-27 12:48:39.768 PDT I (176007298) ssh: Launching SSH Server 2021-03-27 12:48:39.788 PDT I (176007318) ovms-server-v2: Status: Network is up, so attempt network connection
The last logged activity is to start server v2, and I do have TLS enabled, so this may indicate a problem with the new WolfSSL integration. Does this sound similar to any of the other crashes seen recently?
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, Thanks for that info. I think I should build again with the configuration changed to not drop into the gdb stub since I'm not actively using that for debugging. Then if the problem is a panic or other crash and I'm just not able to get any response because gdb gets stuck somehow at least OVMS will recover. Maybe if the server gets a useful crash report we'll learn something. BTW, I did also get a WDT error after resetting on the previous lockup (before file logging), and after the reset this time there was a hang that did not last long enough to trigget the WDT. Log segment below. -- Steve 2021-03-27 15:39:20.750 PDT I (15000) ovms-server-v2: Send MP-0 S55.0,M,65535,0.00,done,standard,120,118,32,0,0,290,7,4,0,0,0,0,0.00,0,0,0,0,0,0,0,0,0,0,0,0,0.00,0.00,0,0.00,0.00 2021-03-27 15:39:20.750 PDT I (15000) ovms-server-v2: Send MP-0 D65,128,5,39,59,38,1341,615196,0,14,31,2,1,1,12.91,0,12.6,32,39,0,0 2021-03-27 15:39:20.770 PDT I (15020) ovms-server-v2: Send MP-0 L37.346294,-122.037178,77,71,1,1,0.0,1341,0,0.387,0,0,0,0,,0,0,0 2021-03-27 15:39:20.770 PDT I (15020) ovms-server-v2: Send MP-0 Y4,FL,FR,RL,RR,0,-1,0,-1,0,-1,0,-1 2021-03-27 15:39:20.770 PDT I (15020) ovms-server-v2: Send MP-0 W0,0,0,0,0,0,0,0,-1 2021-03-27 15:39:20.770 PDT I (15020) ovms-server-v2: Send MP-0 F3.2.016-49-gb10a3583/ota_1/main (build idf v3.3.4-845-gd59ed8bba-dirty Mar 21 2021 10:57:09),5YJRE11B181000033,28,1,TR1N,Revelstoke,-1,-1 2021-03-27 15:39:20.780 PDT I (15030) ovms-server-v2: Send MP-0 PINot charging|SOC: - 2021-03-27 15:39:37.740 PDT I (31990) simcom: State timeout, transition to 2 2021-03-27 15:39:37.740 PDT I (31990) simcom: State: Enter PoweringOn state 2021-03-27 15:39:37.740 PDT I (31990) simcom: Power Cycle 2021-03-27 15:39:38.410 PDT W (32660) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:38.420 PDT I (32670) webserver: HTTP GET /cfg/logging 2021-03-27 15:39:40.570 PDT W (34820) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:42.690 PDT W (36940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:44.650 PDT I (38900) simcom: State: Enter PoweredOn state 2021-03-27 15:39:46.970 PDT W (41220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:49.170 PDT W (43420) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:49.180 PDT W (43430) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 72 drops 2021-03-27 15:39:51.310 PDT W (45560) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:51.320 PDT W (45570) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 8 drops 2021-03-27 15:39:53.490 PDT W (47740) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:53.500 PDT W (47750) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops 2021-03-27 15:39:54.740 PDT W (48990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:55.620 PDT W (49870) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:55.630 PDT W (49880) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 9 drops 2021-03-27 15:39:55.740 PDT W (49990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:56.740 PDT W (50990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:56.740 PDT W (50990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:57.680 PDT W (51930) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:39:57.690 PDT W (51940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops 2021-03-27 15:39:57.740 PDT W (51990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:58.740 PDT W (52990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:59.740 PDT W (53990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:39:59.800 PDT W (54050) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops 2021-03-27 15:39:59.810 PDT W (54060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:00.740 PDT W (54990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:00.750 PDT W (55000) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:01.740 PDT W (55990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:01.890 PDT W (56140) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:01.900 PDT W (56150) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 12 drops 2021-03-27 15:40:02.740 PDT W (56990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:03.740 PDT W (57990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:03.740 PDT I (57990) simcom: State: Enter MuxStart state 2021-03-27 15:40:03.750 PDT I (58000) gsm-mux: Start MUX 2021-03-27 15:40:03.750 PDT W (58000) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:03.750 PDT I (58000) gsm-mux: Channel #0 is open 2021-03-27 15:40:03.760 PDT I (58010) gsm-mux: Channel #1 is open 2021-03-27 15:40:03.770 PDT I (58020) gsm-mux: Channel #2 is open 2021-03-27 15:40:03.770 PDT I (58020) gsm-mux: Channel #3 is open 2021-03-27 15:40:03.780 PDT I (58030) gsm-mux: Channel #4 is open 2021-03-27 15:40:03.970 PDT W (58220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:03.990 PDT W (58240) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 18 drops 2021-03-27 15:40:04.740 PDT I (58990) simcom: State: Enter NetWait state 2021-03-27 15:40:04.740 PDT W (58990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:04.750 PDT W (59000) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:05.740 PDT W (59990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:06.100 PDT W (60350) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:06.120 PDT W (60370) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 13 drops 2021-03-27 15:40:08.230 PDT I (62480) webserver: HTTP POST /cfg/logging 2021-03-27 15:40:08.230 PDT W (62480) webserver: HandleLogin: auth failure for username 'admin' 2021-03-27 15:40:08.240 PDT W (62490) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:08.260 PDT W (62510) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops 2021-03-27 15:40:10.300 PDT W (64550) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:12.420 PDT W (66670) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:12.440 PDT W (66690) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 17 drops 2021-03-27 15:40:14.510 PDT W (68760) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:14.520 PDT W (68770) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops 2021-03-27 15:40:16.630 PDT I (70880) webserver: HTTP POST /cfg/logging 2021-03-27 15:40:16.640 PDT I (70890) webserver: HandleLogin: 'admin' logged in, sid 847393266b07789 2021-03-27 15:40:16.650 PDT W (70900) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:18.810 PDT W (73060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:20.820 PDT I (75070) webserver: HTTP GET /menu 2021-03-27 15:40:20.850 PDT W (75100) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected 2021-03-27 15:40:20.860 PDT I (75110) webserver: HTTP GET /cfg/logging 2021-03-27 15:40:33.650 PDT W (87900) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.660 PDT W (87910) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.660 PDT W (87910) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.660 PDT W (87910) script: DuktapeDispatch: msg type 2 lost, queue full 2021-03-27 15:40:33.670 PDT W (87920) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.670 PDT W (87920) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.680 PDT W (87930) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.690 PDT W (87940) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.690 PDT W (87940) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.700 PDT W (87950) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.710 PDT W (87960) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.710 PDT W (87960) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.720 PDT W (87970) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.720 PDT W (87970) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.730 PDT W (87980) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.730 PDT I (87980) simcom: CREG Network Registration: RegisteredRoaming 2021-03-27 15:40:33.740 PDT W (87990) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.740 PDT I (87990) simcom: State: Enter NetStart state 2021-03-27 15:40:33.750 PDT W (88000) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.750 PDT W (88000) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.760 PDT W (88010) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.770 PDT W (88020) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.770 PDT W (88020) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.780 PDT W (88030) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.780 PDT W (88030) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.790 PDT W (88040) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.790 PDT W (88040) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.800 PDT W (88050) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.800 PDT W (88050) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.810 PDT W (88060) script: DuktapeDispatch: msg type 3 lost, queue full 2021-03-27 15:40:33.810 PDT I (88060) simcom: PPP Connection is ready to start 2021-03-27 15:40:33.810 PDT I (88060) simcom: State: Enter NetMode state 2021-03-27 15:40:33.810 PDT I (88060) gsm-ppp: Initialising... 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: StatusCallBack: None 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: status_cb: Connected 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: our_ipaddr = 10.170.41.247 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: his_ipaddr = 10.64.64.64 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: netmask = 255.255.255.255 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: DNS#0 = 212.9.0.135 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: DNS#1 = 212.9.0.136 2021-03-27 15:40:36.620 PDT I (90870) gsm-ppp: our6_ipaddr = :: 2021-03-27 15:40:36.630 PDT I (90880) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority 2021-03-27 15:41:15.890 PDT I (130140) webserver: HTTP GET /status 2021-03-27 15:41:41.690 PDT I (155940) ovms-server-v2: Incoming Msg: MP-0 Z1 2021-03-27 15:41:41.690 PDT I (155940) ovms-server-v2: One or more peers have connected 2021-03-27 15:41:41.740 PDT I (155990) ovms-server-v2: Send MP-0 S55.0,M,65535,0.00,done,standard,120,118,32,0,0,290,7,4,0,0,0,0,0.00,0,0,0,0,0,0,0,0,0,0,0,0,0.00,0.00,0,0.00,0.00 2021-03-27 15:41:41.750 PDT I (156000) ovms-server-v2: Send MP-0 D65,128,5,39,59,38,1341,615196,0,155,31,2,1,1,12.88,0,12.6,32,39,0,0 2021-03-27 15:41:41.760 PDT I (156010) ovms-server-v2: Send MP-0 L37.346313,-122.037216,77,87,1,1,0.0,1341,0,0.000,0,0,0,0,,0,0,0 2021-03-27 15:41:41.760 PDT I (156010) ovms-server-v2: Send MP-0 Y4,FL,FR,RL,RR,0,-1,0,-1,0,-1,0,-1 2021-03-27 15:41:41.760 PDT I (156010) ovms-server-v2: Send MP-0 W0,0,0,0,0,0,0,0,-1 2021-03-27 15:41:41.760 PDT I (156010) ovms-server-v2: Send MP-0 F3.2.016-49-gb10a3583/ota_1/main (build idf v3.3.4-845-gd59ed8bba-dirty Mar 21 2021 10:57:09),5YJRE11B181000033,28,1,TR1N,Revelstoke,-1,-1 2021-03-27 15:41:47.740 PDT I (161990) ovms-server-v2: Send MP-0 L37.346313,-122.037216,77,87,0,1,0.0,1341,0,0.000,0,0,0,0,,0,0,0
On Sun, 28 Mar 2021, Stephen Casner wrote:
BTW, I did also get a WDT error after resetting on the previous lockup (before file logging), and after the reset this time there was a hang that did not last long enough to trigget the WDT. Log segment below.
I just remembered that I still had the log output from the USB console for the previous lockup where the WDT error occured. Both began after "simcom: Power Cycle". Log segment below. -- Steve I (31990) simcom: State timeout, transition to 2 I (31990) simcom: State: Enter PoweringOn state I (31990) simcom: Power Cycle W (32660) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (32670) webserver: HTTP GET /cfg/logging W (34820) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (36940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (38900) simcom: State: Enter PoweredOn state W (41220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43420) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43430) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 72 drops W (45560) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (45570) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 8 drops W (47740) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (47750) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (48990) script: DuktapeDispatch: msg type 3 lost, queue full W (49870) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (49880) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 9 drops W (49990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (51930) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (51940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (51990) script: DuktapeDispatch: msg type 3 lost, queue full W (52990) script: DuktapeDispatch: msg type 3 lost, queue full W (53990) script: DuktapeDispatch: msg type 3 lost, queue full W (54050) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (54060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (54990) script: DuktapeDispatch: msg type 3 lost, queue full W (55000) script: DuktapeDispatch: msg type 3 lost, queue full W (55990) script: DuktapeDispatch: msg type 3 lost, queue full W (56140) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (56150) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 12 drops W (56990) script: DuktapeDispatch: msg type 3 lost, queue full W (57990) script: DuktapeDispatch: msg type 3 lost, queue full I (57990) simcom: State: Enter MuxStart state I (58000) gsm-mux: Start MUX W (58000) script: DuktapeDispatch: msg type 3 lost, queue full I (58000) gsm-mux: Channel #0 is open I (58010) gsm-mux: Channel #1 is open I (58020) gsm-mux: Channel #2 is open I (58020) gsm-mux: Channel #3 is open I (58030) gsm-mux: Channel #4 is open W (58220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (58240) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 18 drops I (58990) simcom: State: Enter NetWait state W (58990) script: DuktapeDispatch: msg type 3 lost, queue full W (59000) script: DuktapeDispatch: msg type 3 lost, queue full W (59990) script: DuktapeDispatch: msg type 3 lost, queue full W (60350) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detectedE (75150) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (75150) task_wdt: - OVMS DukTape (CPU 1) E (75150) task_wdt: Tasks currently running: E (75150) task_wdt: CPU 0: IDLE0 E (75150) task_wdt: CPU 1: OVMS NetMan W (60370) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 13 drops I (62480) webserver: HTTP POST /cfg/logging W (62480) webserver: HandleLogin: auth failure for username 'admin' W (62490) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (62510) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (64550) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66670) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66690) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 17 drops W (68760) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (68770) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops I (70880) webserver: HTTP POST /cfg/logging I (70890) webserver: HandleLogin: 'admin' logged in, sid 847393266b07789 W (70900) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (73060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75070) webserver: HTTP GET /menu W (75100) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75110) webserver: HTTP GET /cfg/logging OVMS# W (87900) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 2 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87930) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87950) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87980) script: DuktapeDispatch: msg type 3 lost, queue full I (87980) simcom: CREG Network Registration: RegisteredRoaming W (87990) script: DuktapeDispatch: msg type 3 lost, queue full I (87990) simcom: State: Enter NetStart state W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88010) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88060) script: DuktapeDispatch: msg type 3 lost, queue full I (88060) simcom: PPP Connection is ready to start I (88060) simcom: State: Enter NetMode state I (88060) gsm-ppp: Initialising... W (88070) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88090) script: DuktapeDispatch: msg type 3 lost, queue full W (88100) script: DuktapeDispatch: msg type 3 lost, queue full W (88110) script: DuktapeDispatch: msg type 3 lost, queue full W (88120) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88140) script: DuktapeDispatch: msg type 3 lost, queue full W (88150) websocket: WebSocketHandler[0x3f84b154]: job queue overflow detected W (88150) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow detected W (88170) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow resolved, 32 drops W (88590) websocket: WebSocketHandler[0x3f84b154]: job queue overflow resolved, 35 drops I (90870) gsm-ppp: StatusCallBack: None I (90870) gsm-ppp: status_cb: Connected I (90870) gsm-ppp: our_ipaddr = 10.170.41.247 I (90870) gsm-ppp: his_ipaddr = 10.64.64.64 I (90870) gsm-ppp: netmask = 255.255.255.255 I (90870) gsm-ppp: DNS#0 = 212.9.0.135 I (90870) gsm-ppp: DNS#1 = 212.9.0.136 I (90870) gsm-ppp: our6_ipaddr = :: I (90880) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority
Steve, that might be the first real clue about the WDT issue, possibly in combination with Mark's finding of a corrupted netif list. Maybe the NetMan task was running some endless loop due to an unclean modem (ppp) shutdown? A modem reboot is currently necessary once in a while when the modem gets stuck. We should investigate the shutdown sequence if that happens with a connection in use. Regards, Michael Am 29.03.21 um 02:37 schrieb Stephen Casner:
On Sun, 28 Mar 2021, Stephen Casner wrote:
BTW, I did also get a WDT error after resetting on the previous lockup (before file logging), and after the reset this time there was a hang that did not last long enough to trigget the WDT. Log segment below. I just remembered that I still had the log output from the USB console for the previous lockup where the WDT error occured. Both began after "simcom: Power Cycle". Log segment below.
-- Steve
I (31990) simcom: State timeout, transition to 2 I (31990) simcom: State: Enter PoweringOn state I (31990) simcom: Power Cycle W (32660) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (32670) webserver: HTTP GET /cfg/logging W (34820) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (36940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (38900) simcom: State: Enter PoweredOn state W (41220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43420) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43430) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 72 drops W (45560) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (45570) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 8 drops W (47740) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (47750) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (48990) script: DuktapeDispatch: msg type 3 lost, queue full W (49870) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (49880) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 9 drops W (49990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (51930) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (51940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (51990) script: DuktapeDispatch: msg type 3 lost, queue full W (52990) script: DuktapeDispatch: msg type 3 lost, queue full W (53990) script: DuktapeDispatch: msg type 3 lost, queue full W (54050) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (54060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (54990) script: DuktapeDispatch: msg type 3 lost, queue full W (55000) script: DuktapeDispatch: msg type 3 lost, queue full W (55990) script: DuktapeDispatch: msg type 3 lost, queue full W (56140) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (56150) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 12 drops W (56990) script: DuktapeDispatch: msg type 3 lost, queue full W (57990) script: DuktapeDispatch: msg type 3 lost, queue full I (57990) simcom: State: Enter MuxStart state I (58000) gsm-mux: Start MUX W (58000) script: DuktapeDispatch: msg type 3 lost, queue full I (58000) gsm-mux: Channel #0 is open I (58010) gsm-mux: Channel #1 is open I (58020) gsm-mux: Channel #2 is open I (58020) gsm-mux: Channel #3 is open I (58030) gsm-mux: Channel #4 is open W (58220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (58240) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 18 drops I (58990) simcom: State: Enter NetWait state W (58990) script: DuktapeDispatch: msg type 3 lost, queue full W (59000) script: DuktapeDispatch: msg type 3 lost, queue full W (59990) script: DuktapeDispatch: msg type 3 lost, queue full W (60350) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detectedE (75150) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (75150) task_wdt: - OVMS DukTape (CPU 1) E (75150) task_wdt: Tasks currently running: E (75150) task_wdt: CPU 0: IDLE0 E (75150) task_wdt: CPU 1: OVMS NetMan
W (60370) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 13 drops I (62480) webserver: HTTP POST /cfg/logging W (62480) webserver: HandleLogin: auth failure for username 'admin' W (62490) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (62510) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (64550) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66670) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66690) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 17 drops W (68760) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (68770) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops I (70880) webserver: HTTP POST /cfg/logging I (70890) webserver: HandleLogin: 'admin' logged in, sid 847393266b07789 W (70900) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (73060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75070) webserver: HTTP GET /menu W (75100) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75110) webserver: HTTP GET /cfg/logging OVMS# W (87900) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 2 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87930) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87950) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87980) script: DuktapeDispatch: msg type 3 lost, queue full I (87980) simcom: CREG Network Registration: RegisteredRoaming W (87990) script: DuktapeDispatch: msg type 3 lost, queue full I (87990) simcom: State: Enter NetStart state W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88010) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88060) script: DuktapeDispatch: msg type 3 lost, queue full I (88060) simcom: PPP Connection is ready to start I (88060) simcom: State: Enter NetMode state I (88060) gsm-ppp: Initialising... W (88070) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88090) script: DuktapeDispatch: msg type 3 lost, queue full W (88100) script: DuktapeDispatch: msg type 3 lost, queue full W (88110) script: DuktapeDispatch: msg type 3 lost, queue full W (88120) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88140) script: DuktapeDispatch: msg type 3 lost, queue full W (88150) websocket: WebSocketHandler[0x3f84b154]: job queue overflow detected W (88150) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow detected W (88170) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow resolved, 32 drops W (88590) websocket: WebSocketHandler[0x3f84b154]: job queue overflow resolved, 35 drops I (90870) gsm-ppp: StatusCallBack: None I (90870) gsm-ppp: status_cb: Connected I (90870) gsm-ppp: our_ipaddr = 10.170.41.247 I (90870) gsm-ppp: his_ipaddr = 10.64.64.64 I (90870) gsm-ppp: netmask = 255.255.255.255 I (90870) gsm-ppp: DNS#0 = 212.9.0.135 I (90870) gsm-ppp: DNS#1 = 212.9.0.136 I (90870) gsm-ppp: our6_ipaddr = :: I (90880) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, It seems the NetMan loop is not endless, just long, since the two events I've reported look very similar but the WDT did not occur in the one I reported from the file log and in both cases normal operation resumed. Actually, almost normal. In one case I found that ssh connection attempts timed out although telnet worked, so I had to reboot again. I don't recall encountering that situation before. -- Steve On Mon, 29 Mar 2021, Michael Balzer wrote:
Steve,
that might be the first real clue about the WDT issue, possibly in combination with Mark's finding of a corrupted netif list.
Maybe the NetMan task was running some endless loop due to an unclean modem (ppp) shutdown?
A modem reboot is currently necessary once in a while when the modem gets stuck. We should investigate the shutdown sequence if that happens with a connection in use.
Regards, Michael
Am 29.03.21 um 02:37 schrieb Stephen Casner:
On Sun, 28 Mar 2021, Stephen Casner wrote:
BTW, I did also get a WDT error after resetting on the previous lockup (before file logging), and after the reset this time there was a hang that did not last long enough to trigget the WDT. Log segment below. I just remembered that I still had the log output from the USB console for the previous lockup where the WDT error occured. Both began after "simcom: Power Cycle". Log segment below.
-- Steve
I (31990) simcom: State timeout, transition to 2 I (31990) simcom: State: Enter PoweringOn state I (31990) simcom: Power Cycle W (32660) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (32670) webserver: HTTP GET /cfg/logging W (34820) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (36940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (38900) simcom: State: Enter PoweredOn state W (41220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43420) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43430) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 72 drops W (45560) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (45570) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 8 drops W (47740) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (47750) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (48990) script: DuktapeDispatch: msg type 3 lost, queue full W (49870) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (49880) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 9 drops W (49990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (51930) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (51940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (51990) script: DuktapeDispatch: msg type 3 lost, queue full W (52990) script: DuktapeDispatch: msg type 3 lost, queue full W (53990) script: DuktapeDispatch: msg type 3 lost, queue full W (54050) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (54060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (54990) script: DuktapeDispatch: msg type 3 lost, queue full W (55000) script: DuktapeDispatch: msg type 3 lost, queue full W (55990) script: DuktapeDispatch: msg type 3 lost, queue full W (56140) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (56150) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 12 drops W (56990) script: DuktapeDispatch: msg type 3 lost, queue full W (57990) script: DuktapeDispatch: msg type 3 lost, queue full I (57990) simcom: State: Enter MuxStart state I (58000) gsm-mux: Start MUX W (58000) script: DuktapeDispatch: msg type 3 lost, queue full I (58000) gsm-mux: Channel #0 is open I (58010) gsm-mux: Channel #1 is open I (58020) gsm-mux: Channel #2 is open I (58020) gsm-mux: Channel #3 is open I (58030) gsm-mux: Channel #4 is open W (58220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (58240) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 18 drops I (58990) simcom: State: Enter NetWait state W (58990) script: DuktapeDispatch: msg type 3 lost, queue full W (59000) script: DuktapeDispatch: msg type 3 lost, queue full W (59990) script: DuktapeDispatch: msg type 3 lost, queue full W (60350) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detectedE (75150) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (75150) task_wdt: - OVMS DukTape (CPU 1) E (75150) task_wdt: Tasks currently running: E (75150) task_wdt: CPU 0: IDLE0 E (75150) task_wdt: CPU 1: OVMS NetMan
W (60370) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 13 drops I (62480) webserver: HTTP POST /cfg/logging W (62480) webserver: HandleLogin: auth failure for username 'admin' W (62490) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (62510) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (64550) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66670) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66690) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 17 drops W (68760) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (68770) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops I (70880) webserver: HTTP POST /cfg/logging I (70890) webserver: HandleLogin: 'admin' logged in, sid 847393266b07789 W (70900) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (73060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75070) webserver: HTTP GET /menu W (75100) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75110) webserver: HTTP GET /cfg/logging OVMS# W (87900) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 2 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87930) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87950) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87980) script: DuktapeDispatch: msg type 3 lost, queue full I (87980) simcom: CREG Network Registration: RegisteredRoaming W (87990) script: DuktapeDispatch: msg type 3 lost, queue full I (87990) simcom: State: Enter NetStart state W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88010) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88060) script: DuktapeDispatch: msg type 3 lost, queue full I (88060) simcom: PPP Connection is ready to start I (88060) simcom: State: Enter NetMode state I (88060) gsm-ppp: Initialising... W (88070) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88090) script: DuktapeDispatch: msg type 3 lost, queue full W (88100) script: DuktapeDispatch: msg type 3 lost, queue full W (88110) script: DuktapeDispatch: msg type 3 lost, queue full W (88120) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88140) script: DuktapeDispatch: msg type 3 lost, queue full W (88150) websocket: WebSocketHandler[0x3f84b154]: job queue overflow detected W (88150) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow detected W (88170) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow resolved, 32 drops W (88590) websocket: WebSocketHandler[0x3f84b154]: job queue overflow resolved, 35 drops I (90870) gsm-ppp: StatusCallBack: None I (90870) gsm-ppp: status_cb: Connected I (90870) gsm-ppp: our_ipaddr = 10.170.41.247 I (90870) gsm-ppp: his_ipaddr = 10.64.64.64 I (90870) gsm-ppp: netmask = 255.255.255.255 I (90870) gsm-ppp: DNS#0 = 212.9.0.135 I (90870) gsm-ppp: DNS#1 = 212.9.0.136 I (90870) gsm-ppp: our6_ipaddr = :: I (90880) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
If it helps, here is a recent one from my test module: I (44109193) ovms-server-v2: Send MP-0 S0,K,0,0.00,stopped,standard,0,0,0,0,0,0,0,21,0,0,0,0,0.00,0,0,0,0,-1,0,0,0,0,0,0,0,0.00,0.00,0,0.00,0.00 I (44109193) ovms-server-v2: Send MP-0 D0,0,5,0,0,0,0,0,0,310502,0,0,0,0,0,0,12.6,0,0,0,0 W (44202193) ovms-server-v2: Detected stale connection (issue #241), restarting network I (44202193) esp32wifi: Restart I (44202193) esp32wifi: Powering up WIFI driver I (44202213) wifi:wifi driver task: 3ffde34c, prio:23, stack:3584, core=0 I (44202213) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (44202213) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE I (44202233) wifi:wifi firmware version: 30f9e79 I (44202233) wifi:config NVS flash:I (44202243) gsm-ppp: GsmPPPOS_StatusCallback: User Interrupt I (44202243) gsm-ppp: GsmPPPOS_StatusCallback: PPPERR_USER PPP connection has been closed … I (44202243) cellular-modem-auto: Restart I (44202243) cellular: State: Enter PowerOffOn state I (44202243) gsm-ppp: Shutting down (hard)... I (44202243) gsm-ppp: StatusCallBack: User Interrupt V (44202253) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) I (44202253) ovms-server-v2: Status: Disconnected I (44202263) gsm-ppp: PPP is shutdown I (44202263) gsm-mux: Stop MUX I (44202263) cellular-modem-auto: Power Cycle … I (44204263) netmanager: DoSafePrioritiseAndIndicate: start I (44204263) netmanager: DoSafePrioritiseAndIndicate: none I (44204273) netmanager: DoSafePrioritiseAndIndicate: start I (44204273) netmanager: DoSafePrioritiseAndIndicate: none … I (44267803) gsm-ppp: GsmPPPOS_StatusCallback: None I (44267813) netmanager: DoSafePrioritiseAndIndicate: start I (44267813) netmanager: DoSafePrioritiseAndIndicate: connected modem I (44267813) netmanager: DoSafePrioritiseAndIndicate: interface 0x3ffdf3b0 I (44267823) netmanager: DoSafePrioritiseAndIndicate: name pp2 I (44267833) netmanager: DoSafePrioritiseAndIndicate: end I (44267803) gsm-ppp: StatusCallBack: None I (44267813) gsm-ppp: status_cb: Connected … I (44269193) ovms-server-v2: Connection is api.openvehicles.com:6870 (redacted) I (44269193) ovms-server-v2: Status: Connecting... I (44269573) events: Signal(server.v2.connecting) V (44287823) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CONNECT=113) W (44287823) ovms-server-v2: Connection failed E (44287823) ovms-server-v2: Status: Error: Connection failed … (Repeat many times) … I (47714543) events: Signal(server.v2.waitreconnect) V (47714553) ovms-server-v2: OvmsServerV2MongooseCallback(MG_EV_CLOSE) I (47714563) ovms-server-v2: Status: Disconnected I (47714563) events: Signal(server.v2.waitreconnect) I (47774193) ovms-server-v2: Connection is api.openvehicles.com:6870 (redacted) I (47774193) ovms-server-v2: Status: Connecting... I (47774263) events: Signal(server.v2.connecting) W (47814193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full W (47815193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full W (47816193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full W (47817193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full … W (47835193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full W (47836193) ovms-duktape: DuktapeDispatch: msg type 3 lost, queue full E (47837193) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (47837193) task_wdt: - IDLE1 (CPU 1) E (47837193) task_wdt: Tasks currently running: E (47837193) task_wdt: CPU 0: IDLE0 E (47837193) task_wdt: CPU 1: OVMS Console E (47837193) task_wdt: Aborting. abort() was called at PC 0x400e85a4 on core 0 ELF file SHA256: a810b0dbae83e330 Backtrace: 0x4008e58e:0x3ffb0690 0x4008e829:0x3ffb06b0 0x400e85a4:0x3ffb06d0 0x4008406a:0x3ffb06f0 The backtrace is in the watchdog system. I don’t think this is the same problem I am having. But it does seem that the "Detected stale connection (issue #241), restarting network” system doesn’t work properly. Regards, Mark.
On 29 Mar 2021, at 11:26 PM, Stephen Casner <casner@acm.org> wrote:
Michael,
It seems the NetMan loop is not endless, just long, since the two events I've reported look very similar but the WDT did not occur in the one I reported from the file log and in both cases normal operation resumed. Actually, almost normal. In one case I found that ssh connection attempts timed out although telnet worked, so I had to reboot again. I don't recall encountering that situation before.
-- Steve
On Mon, 29 Mar 2021, Michael Balzer wrote:
Steve,
that might be the first real clue about the WDT issue, possibly in combination with Mark's finding of a corrupted netif list.
Maybe the NetMan task was running some endless loop due to an unclean modem (ppp) shutdown?
A modem reboot is currently necessary once in a while when the modem gets stuck. We should investigate the shutdown sequence if that happens with a connection in use.
Regards, Michael
Am 29.03.21 um 02:37 schrieb Stephen Casner:
On Sun, 28 Mar 2021, Stephen Casner wrote:
BTW, I did also get a WDT error after resetting on the previous lockup (before file logging), and after the reset this time there was a hang that did not last long enough to trigget the WDT. Log segment below. I just remembered that I still had the log output from the USB console for the previous lockup where the WDT error occured. Both began after "simcom: Power Cycle". Log segment below.
-- Steve
I (31990) simcom: State timeout, transition to 2 I (31990) simcom: State: Enter PoweringOn state I (31990) simcom: Power Cycle W (32660) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (32670) webserver: HTTP GET /cfg/logging W (34820) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (36940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (38900) simcom: State: Enter PoweredOn state W (41220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43420) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (43430) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 72 drops W (45560) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (45570) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 8 drops W (47740) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (47750) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (48990) script: DuktapeDispatch: msg type 3 lost, queue full W (49870) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (49880) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 9 drops W (49990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (50990) script: DuktapeDispatch: msg type 3 lost, queue full W (51930) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (51940) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (51990) script: DuktapeDispatch: msg type 3 lost, queue full W (52990) script: DuktapeDispatch: msg type 3 lost, queue full W (53990) script: DuktapeDispatch: msg type 3 lost, queue full W (54050) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 11 drops W (54060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (54990) script: DuktapeDispatch: msg type 3 lost, queue full W (55000) script: DuktapeDispatch: msg type 3 lost, queue full W (55990) script: DuktapeDispatch: msg type 3 lost, queue full W (56140) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (56150) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 12 drops W (56990) script: DuktapeDispatch: msg type 3 lost, queue full W (57990) script: DuktapeDispatch: msg type 3 lost, queue full I (57990) simcom: State: Enter MuxStart state I (58000) gsm-mux: Start MUX W (58000) script: DuktapeDispatch: msg type 3 lost, queue full I (58000) gsm-mux: Channel #0 is open I (58010) gsm-mux: Channel #1 is open I (58020) gsm-mux: Channel #2 is open I (58020) gsm-mux: Channel #3 is open I (58030) gsm-mux: Channel #4 is open W (58220) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (58240) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 18 drops I (58990) simcom: State: Enter NetWait state W (58990) script: DuktapeDispatch: msg type 3 lost, queue full W (59000) script: DuktapeDispatch: msg type 3 lost, queue full W (59990) script: DuktapeDispatch: msg type 3 lost, queue full W (60350) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detectedE (75150) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (75150) task_wdt: - OVMS DukTape (CPU 1) E (75150) task_wdt: Tasks currently running: E (75150) task_wdt: CPU 0: IDLE0 E (75150) task_wdt: CPU 1: OVMS NetMan
W (60370) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 13 drops I (62480) webserver: HTTP POST /cfg/logging W (62480) webserver: HandleLogin: auth failure for username 'admin' W (62490) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (62510) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops W (64550) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66670) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (66690) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 17 drops W (68760) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (68770) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow resolved, 7 drops I (70880) webserver: HTTP POST /cfg/logging I (70890) webserver: HandleLogin: 'admin' logged in, sid 847393266b07789 W (70900) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected W (73060) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75070) webserver: HTTP GET /menu W (75100) websocket: WebSocketHandler[0x3f84b22c]: job queue overflow detected I (75110) webserver: HTTP GET /cfg/logging OVMS# W (87900) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 3 lost, queue full W (87910) script: DuktapeDispatch: msg type 2 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87920) script: DuktapeDispatch: msg type 3 lost, queue full W (87930) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87940) script: DuktapeDispatch: msg type 3 lost, queue full W (87950) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87960) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87970) script: DuktapeDispatch: msg type 3 lost, queue full W (87980) script: DuktapeDispatch: msg type 3 lost, queue full I (87980) simcom: CREG Network Registration: RegisteredRoaming W (87990) script: DuktapeDispatch: msg type 3 lost, queue full I (87990) simcom: State: Enter NetStart state W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88000) script: DuktapeDispatch: msg type 3 lost, queue full W (88010) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88020) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88030) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88040) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88050) script: DuktapeDispatch: msg type 3 lost, queue full W (88060) script: DuktapeDispatch: msg type 3 lost, queue full I (88060) simcom: PPP Connection is ready to start I (88060) simcom: State: Enter NetMode state I (88060) gsm-ppp: Initialising... W (88070) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88080) script: DuktapeDispatch: msg type 3 lost, queue full W (88090) script: DuktapeDispatch: msg type 3 lost, queue full W (88100) script: DuktapeDispatch: msg type 3 lost, queue full W (88110) script: DuktapeDispatch: msg type 3 lost, queue full W (88120) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88130) script: DuktapeDispatch: msg type 3 lost, queue full W (88140) script: DuktapeDispatch: msg type 3 lost, queue full W (88150) websocket: WebSocketHandler[0x3f84b154]: job queue overflow detected W (88150) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow detected W (88170) websocket: WebSocketHandler[0x3f84b3ac]: job queue overflow resolved, 32 drops W (88590) websocket: WebSocketHandler[0x3f84b154]: job queue overflow resolved, 35 drops I (90870) gsm-ppp: StatusCallBack: None I (90870) gsm-ppp: status_cb: Connected I (90870) gsm-ppp: our_ipaddr = 10.170.41.247 I (90870) gsm-ppp: his_ipaddr = 10.64.64.64 I (90870) gsm-ppp: netmask = 255.255.255.255 I (90870) gsm-ppp: DNS#0 = 212.9.0.135 I (90870) gsm-ppp: DNS#1 = 212.9.0.136 I (90870) gsm-ppp: our6_ipaddr = :: I (90880) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
I've had two more crashes where I had to recover by resetting the unit, but these show a different pattern of log messages, as attached below. -- Steve 2021-03-27 23:31:29.355 PDT I (25393605) gsm-ppp: StatusCallBack: User Interrupt 2021-03-27 23:31:29.355 PDT I (25393605) gsm-ppp: PPP connection has been closed 2021-03-27 23:31:32.875 PDT I (25397125) simcom: State timeout, transition to 5 2021-03-27 23:31:32.875 PDT I (25397125) simcom: State: Enter NetWait state 2021-03-27 23:31:36.875 PDT I (25401125) simcom: State: Enter NetStart state 2021-03-27 23:31:37.975 PDT I (25402225) simcom: PPP Connection is ready to start 2021-03-27 23:31:38.875 PDT I (25403125) simcom: State: Enter NetMode state 2021-03-27 23:31:38.875 PDT I (25403125) gsm-ppp: Initialising... 2021-03-27 23:31:41.615 PDT I (25405865) gsm-ppp: StatusCallBack: None 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: status_cb: Connected 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: our_ipaddr = 10.170.41.247 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: his_ipaddr = 10.64.64.64 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: netmask = 255.255.255.255 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: DNS#0 = 212.9.0.135 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: DNS#1 = 212.9.0.136 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: our6_ipaddr = :: 2021-03-27 23:31:41.625 PDT I (25405875) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority 2021-03-27 23:33:16.885 PDT I (25501135) housekeeping: 2021-03-27 23:33:16 PDT (RAM: 8b=66932-67484 32b=10036) 2021-03-27 23:38:23.805 PDT E (25808125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:24.805 PDT E (25809125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:25.805 PDT E (25810125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:26.805 PDT E (25811125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:26.805 PDT E (25811125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:27.805 PDT E (25812125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:28.805 PDT E (25813125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:29.805 PDT E (25814125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:30.805 PDT E (25815125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:31.805 PDT E (25816125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:32.805 PDT E (25817125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:33.805 PDT E (25818125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:34.805 PDT E (25819125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:35.805 PDT E (25820125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:36.805 PDT E (25821125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:36.805 PDT E (25821125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:37.805 PDT E (25822125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:38.805 PDT E (25823125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:39.805 PDT E (25824125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:40.805 PDT E (25825125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:41.805 PDT E (25826125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:42.805 PDT E (25827125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:43.805 PDT E (25828125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:44.805 PDT E (25829125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:45.805 PDT E (25830125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:46.805 PDT E (25831125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:46.805 PDT E (25831125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:47.805 PDT E (25832125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:48.805 PDT E (25833125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:48.815 PDT E (25833135) events: SignalEvent: queue overflow, event 'server.web.socket.closed' dropped 2021-03-27 23:38:48.815 PDT E (25833135) events: SignalEvent: lost important event => aborting 2021-03-31 00:10:52.161 PDT I (8451) command: OpenLogfile: now logging to file '/sd/crash.log' 2021-03-31 01:45:44.761 PDT I (5701001) housekeeping: 2021-03-31 01:45:44 PDT (RAM: 8b=68340-69352 32b=10036) 2021-03-31 01:47:12.751 PDT E (5788981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:13.751 PDT E (5789981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:14.751 PDT E (5790981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:14.751 PDT E (5790981) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-31 01:47:15.751 PDT E (5791981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:16.751 PDT E (5792981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:17.751 PDT E (5793981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:18.751 PDT E (5794981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:19.751 PDT E (5795981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:20.751 PDT E (5796981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:21.751 PDT E (5797981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:22.751 PDT E (5798981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:23.751 PDT E (5799981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:24.751 PDT E (5800981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:24.751 PDT E (5800981) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-31 01:47:25.751 PDT E (5801981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:26.751 PDT E (5802981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:27.751 PDT E (5803981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:28.751 PDT E (5804981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:29.751 PDT E (5805981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:29.761 PDT E (5805991) events: SignalEvent: queue overflow, event 'server.web.socket.closed' dropped 2021-03-31 01:47:29.761 PDT E (5805991) events: SignalEvent: lost important event => aborting 1969-12-31 16:00:08.620 PST I (8260) command: OpenLogfile: now logging to file '/sd/crash.log'
Steve, that's the problem with this issue, it's totally unclear what causes this. The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling. Regards, Michael Am 04.04.21 um 08:43 schrieb Stephen Casner:
I've had two more crashes where I had to recover by resetting the unit, but these show a different pattern of log messages, as attached below.
-- Steve
2021-03-27 23:31:29.355 PDT I (25393605) gsm-ppp: StatusCallBack: User Interrupt 2021-03-27 23:31:29.355 PDT I (25393605) gsm-ppp: PPP connection has been closed 2021-03-27 23:31:32.875 PDT I (25397125) simcom: State timeout, transition to 5 2021-03-27 23:31:32.875 PDT I (25397125) simcom: State: Enter NetWait state 2021-03-27 23:31:36.875 PDT I (25401125) simcom: State: Enter NetStart state 2021-03-27 23:31:37.975 PDT I (25402225) simcom: PPP Connection is ready to start 2021-03-27 23:31:38.875 PDT I (25403125) simcom: State: Enter NetMode state 2021-03-27 23:31:38.875 PDT I (25403125) gsm-ppp: Initialising... 2021-03-27 23:31:41.615 PDT I (25405865) gsm-ppp: StatusCallBack: None 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: status_cb: Connected 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: our_ipaddr = 10.170.41.247 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: his_ipaddr = 10.64.64.64 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: netmask = 255.255.255.255 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: DNS#0 = 212.9.0.135 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: DNS#1 = 212.9.0.136 2021-03-27 23:31:41.625 PDT I (25405875) gsm-ppp: our6_ipaddr = :: 2021-03-27 23:31:41.625 PDT I (25405875) netmanager: MODEM up (with WIFI client up): staying with WIFI client priority 2021-03-27 23:33:16.885 PDT I (25501135) housekeeping: 2021-03-27 23:33:16 PDT (RAM: 8b=66932-67484 32b=10036) 2021-03-27 23:38:23.805 PDT E (25808125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:24.805 PDT E (25809125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:25.805 PDT E (25810125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:26.805 PDT E (25811125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:26.805 PDT E (25811125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:27.805 PDT E (25812125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:28.805 PDT E (25813125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:29.805 PDT E (25814125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:30.805 PDT E (25815125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:31.805 PDT E (25816125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:32.805 PDT E (25817125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:33.805 PDT E (25818125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:34.805 PDT E (25819125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:35.805 PDT E (25820125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:36.805 PDT E (25821125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:36.805 PDT E (25821125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:37.805 PDT E (25822125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:38.805 PDT E (25823125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:39.805 PDT E (25824125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:40.805 PDT E (25825125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:41.805 PDT E (25826125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:42.805 PDT E (25827125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:43.805 PDT E (25828125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:44.805 PDT E (25829125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:45.805 PDT E (25830125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:46.805 PDT E (25831125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:46.805 PDT E (25831125) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-27 23:38:47.805 PDT E (25832125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:48.805 PDT E (25833125) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-27 23:38:48.815 PDT E (25833135) events: SignalEvent: queue overflow, event 'server.web.socket.closed' dropped 2021-03-27 23:38:48.815 PDT E (25833135) events: SignalEvent: lost important event => aborting 2021-03-31 00:10:52.161 PDT I (8451) command: OpenLogfile: now logging to file '/sd/crash.log'
2021-03-31 01:45:44.761 PDT I (5701001) housekeeping: 2021-03-31 01:45:44 PDT (RAM: 8b=68340-69352 32b=10036) 2021-03-31 01:47:12.751 PDT E (5788981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:13.751 PDT E (5789981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:14.751 PDT E (5790981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:14.751 PDT E (5790981) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-31 01:47:15.751 PDT E (5791981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:16.751 PDT E (5792981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:17.751 PDT E (5793981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:18.751 PDT E (5794981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:19.751 PDT E (5795981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:20.751 PDT E (5796981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:21.751 PDT E (5797981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:22.751 PDT E (5798981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:23.751 PDT E (5799981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:24.751 PDT E (5800981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:24.751 PDT E (5800981) events: SignalEvent: queue overflow, event 'ticker.10' dropped 2021-03-31 01:47:25.751 PDT E (5801981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:26.751 PDT E (5802981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:27.751 PDT E (5803981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:28.751 PDT E (5804981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:29.751 PDT E (5805981) events: SignalEvent: queue overflow, event 'ticker.1' dropped 2021-03-31 01:47:29.761 PDT E (5805991) events: SignalEvent: queue overflow, event 'server.web.socket.closed' dropped 2021-03-31 01:47:29.761 PDT E (5805991) events: SignalEvent: lost important event => aborting 1969-12-31 16:00:08.620 PST I (8260) command: OpenLogfile: now logging to file '/sd/crash.log' _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS. -- Steve On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
Steve, I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month. I've disabled TLS on my module now and will report if that helps. Regards, Michael Am 10.04.21 um 21:20 schrieb Stephen Casner:
Michael,
As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS.
-- Steve
On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, Does TLS periodically require some potentially expensive cryptographic operation such as generating a new key? Perhaps there is a difference in the period specification for MBEDTLS vs WolfSSL. There are relatively expensive operations done when the connection is established; those don't seem to cause WDT or queue overflow. So if there is a periodic expensive operation, the problem must be that doing it in the midst of some other operation causes the problem. -- Steve On Sun, 11 Apr 2021, Michael Balzer wrote:
Steve,
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
I've disabled TLS on my module now and will report if that helps.
Regards, Michael
On Sun, 11 Apr 2021, Michael Balzer wrote:
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
If anyone wants to switch back to MBEDTLS to run a comparison, I believe all you need to do is undo the change to mongoose/include/mg_locals.h from commit afd1d8c99ff883f6e485adcc32afb85307a34b74 (just lines 62-64). -- Steve
Steve, in the two weeks since disabling TLS on the server V2 connection I haven't had a single crash. While that's not a proof yet the watchdog issue is TLS related, it's at least a strong indicator. The watchdog triggers if the idle task on a core doesn't get a CPU share for 120 seconds. If the TLS functions block a CPU for more than a few seconds, that's already pretty bad, as that means TLS will cause delays in CAN processing (disrupting protocol transfers) and can possibly cause frame drops and queue overflows. Blocking the whole system for more than 120 seconds is totally unacceptable. This doesn't feel like a calculation / math performance issue, it rather feels like a bug – and that may imply a security issue as well. But I don't think this is caused by WolfSSL, as the issue has been present with mbedTLS as well, just didn't occur that frequently. Maybe some race condition with the LwIP task? Regards, Michael Am 11.04.21 um 09:44 schrieb Michael Balzer:
Steve,
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
I've disabled TLS on my module now and will report if that helps.
Regards, Michael
Am 10.04.21 um 21:20 schrieb Stephen Casner:
Michael,
As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS.
-- Steve
On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, I confirm the same lack of crashes, and I agree that the problem is not strictly with WolfSSL. It seems that a task WDT ought to be able to report where the task was executing at the time. That would certainly be helpful in narrowing down the type of problem. I'm not sure if the trap to gdb mode would provide that info since I observed no response on the USB console when the module went comatose -- Steve On Sat, 24 Apr 2021, Michael Balzer wrote:
Steve,
in the two weeks since disabling TLS on the server V2 connection I haven't had a single crash. While that's not a proof yet the watchdog issue is TLS related, it's at least a strong indicator.
The watchdog triggers if the idle task on a core doesn't get a CPU share for 120 seconds. If the TLS functions block a CPU for more than a few seconds, that's already pretty bad, as that means TLS will cause delays in CAN processing (disrupting protocol transfers) and can possibly cause frame drops and queue overflows. Blocking the whole system for more than 120 seconds is totally unacceptable.
This doesn't feel like a calculation / math performance issue, it rather feels like a bug - and that may imply a security issue as well.
But I don't think this is caused by WolfSSL, as the issue has been present with mbedTLS as well, just didn't occur that frequently. Maybe some race condition with the LwIP task?
Regards, Michael
Am 11.04.21 um 09:44 schrieb Michael Balzer:
Steve,
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
I've disabled TLS on my module now and will report if that helps.
Regards, Michael
Am 10.04.21 um 21:20 schrieb Stephen Casner:
Michael,
As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS.
-- Steve
On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Not sure if it is related, but I am hitting this with my work on canlog. I’m seeing: 0x400f7af1: CheckQueueOverflow(char const*, char*) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_events.cpp:381 ELF file SHA256: 57f584b0e3fe542a Backtrace: 0x4008e6f3:0x3ffcce00 0x4008e95d:0x3ffcce20 0x400f7af1:0x3ffcce40 0x400f8051:0x3ffcce80 0x400f91d0:0x3ffcceb0 0x40092387:0x3ffccf00 0x4008e6f3: invoke_abort at /Users/mark/esp/esp-idf/components/esp32/panic.c:736 0x4008e95d: abort at /Users/mark/esp/esp-idf/components/esp32/panic.c:736 0x400f7af1: CheckQueueOverflow(char const*, char*) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_events.cpp:381 0x400f8051: OvmsEvents::SignalEvent(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, void (*)(char const*, void*), unsigned int) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_events.cpp:459 0x400f91d0: HousekeepingTicker1(void*) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_housekeeping.cpp:122 0x40092387: prvProcessExpiredTimer at /Users/mark/esp/esp-idf/components/freertos/timers.c:485 (inlined by) prvProcessTimerOrBlockTask at /Users/mark/esp/esp-idf/components/freertos/timers.c:571 (inlined by) prvTimerTask at /Users/mark/esp/esp-idf/components/freertos/timers.c:544 So that is the CheckQueueOverflow() system within events, that handles when an event cannot be queued. In this case it is aborting the system (to cause a restart). Not a watchdog timer. In my case, using gdb stubs gets into the debugger nicely. But I get: Info thread gives: … 3 Thread 2 (OVMS Events CPU1) 0x40081a28 in esp_crosscore_int_send_yield (core_id=1) at /Users/mark/esp/esp-idf/components/esp32/crosscore_int.c:112 2 Thread 1 (OVMS NetMan CPU1) 0x7f400992 in ?? () * 1 Remote target 0x4008e6f3 in invoke_abort () at /Users/mark/esp/esp-idf/components/esp32/panic.c:156(gdb) thread 1 and: [Switching to thread 1 (Remote target)] #0 0x4008e6f3 in invoke_abort () at /Users/mark/esp/esp-idf/components/esp32/panic.c:156 156 *((int *) 0) = 0; (gdb) bt #0 0x4008e6f3 in invoke_abort () at /Users/mark/esp/esp-idf/components/esp32/panic.c:156 #1 0x4008e960 in abort () at /Users/mark/esp/esp-idf/components/esp32/panic.c:171 Backtrace stopped: previous frame identical to this frame (corrupt stack?) [Switching to thread 2 (Thread 1)] MWJ: This is NETMAN #0 0x7f400992 in ?? () (gdb) bt #0 0x7f400992 in ?? () #1 0x7ffe6583 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 3 MWJ: This is EVENTS [Switching to thread 3 (Thread 2)] #0 0x40081a28 in esp_crosscore_int_send_yield (core_id=1) at /Users/mark/esp/esp-idf/components/esp32/crosscore_int.c:112 112 esp_crosscore_int_send(core_id, REASON_YIELD); (gdb) bt #0 0x40081a28 in esp_crosscore_int_send_yield (core_id=1) at /Users/mark/esp/esp-idf/components/esp32/crosscore_int.c:112 #1 0x4009030c in xQueueGenericReceive (xQueue=0x3ffc01cc, pvBuffer=0x3ffc23f0, xTicksToWait=500, xJustPeeking=0) at /Users/mark/esp/esp-idf/components/freertos/queue.c:1592 #2 0x400f85c2 in OvmsEvents::EventTask (this=0x3ffb5dec <MyEvents>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_events.cpp:213 #3 0x400f8660 in EventLaunchTask (pvParameters=0x3ffb5dec <MyEvents>) at /Users/mark/Documents/ovms/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/main/ovms_events.cpp:80 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Seems stack is corrupted somewhere. I’m still looking. But the GDB stub does seem helpful for this. Mark
On 25 Apr 2021, at 12:47 AM, Stephen Casner <casner@acm.org> wrote:
Michael,
I confirm the same lack of crashes, and I agree that the problem is not strictly with WolfSSL.
It seems that a task WDT ought to be able to report where the task was executing at the time. That would certainly be helpful in narrowing down the type of problem. I'm not sure if the trap to gdb mode would provide that info since I observed no response on the USB console when the module went comatose
-- Steve
On Sat, 24 Apr 2021, Michael Balzer wrote:
Steve,
in the two weeks since disabling TLS on the server V2 connection I haven't had a single crash. While that's not a proof yet the watchdog issue is TLS related, it's at least a strong indicator.
The watchdog triggers if the idle task on a core doesn't get a CPU share for 120 seconds. If the TLS functions block a CPU for more than a few seconds, that's already pretty bad, as that means TLS will cause delays in CAN processing (disrupting protocol transfers) and can possibly cause frame drops and queue overflows. Blocking the whole system for more than 120 seconds is totally unacceptable.
This doesn't feel like a calculation / math performance issue, it rather feels like a bug - and that may imply a security issue as well.
But I don't think this is caused by WolfSSL, as the issue has been present with mbedTLS as well, just didn't occur that frequently. Maybe some race condition with the LwIP task?
Regards, Michael
Am 11.04.21 um 09:44 schrieb Michael Balzer:
Steve,
I can confirm an increase of these events since we changed to WolfSSL, about once every three days currently for me. The frequency was much lower before, more like once or twice per month.
I've disabled TLS on my module now and will report if that helps.
Regards, Michael
Am 10.04.21 um 21:20 schrieb Stephen Casner:
Michael,
As you saw from my earlier emails, I was getting these crashes typically after less than 24 hours of operation. I changed my config to disable TLS on server v2 and rebooted 2021-04-05 23:36:04.648 PDT and there has not been a crash since. So it definitely appears to be correlated with the additional processing to support TLS.
-- Steve
On Sun, 4 Apr 2021, Michael Balzer wrote:
Steve,
that's the problem with this issue, it's totally unclear what causes this.
The signal dropping begins when the queue is full, which happens after the task has been blocked for ~as many seconds as the queue is big. So there is no logged activity that could cause this, your module basically went into this from idling.
Regards, Michael
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
participants (3)
-
Mark Webb-Johnson -
Michael Balzer -
Stephen Casner