On analysis of issue #189 I just did a build using the latest upstream/master.

There now are options to enable wifi debug logging:
CONFIG_ESP32_WIFI_DEBUG_LOG_ENABLE=y
CONFIG_ESP32_WIFI_DEBUG_LOG_DEBUG=y
CONFIG_ESP32_WIFI_DEBUG_LOG_VERBOSE=
CONFIG_ESP32_WIFI_DEBUG_LOG_MODULE_ALL=
CONFIG_ESP32_WIFI_DEBUG_LOG_MODULE_WIFI=y
CONFIG_ESP32_WIFI_DEBUG_LOG_MODULE_COEX=
CONFIG_ESP32_WIFI_DEBUG_LOG_MODULE_MESH=
CONFIG_ESP32_WIFI_DEBUG_LOG_SUBMODULE=
That may help in debugging our wifi reliability issue if it's still unfixed.

Regards,
Michael


Am 08.01.19 um 05:58 schrieb Mark Webb-Johnson:
The core issue here (wifi connection being lost, but ESP stack not able to report that to the application - us) is the real culprit. I, like Michael, hope that later versions of the ESP IDF can solve this. Once the wear levelling version upgrade bug is fixed, perhaps we can try? I do see this in my home (despite the wifi access point being about 2metres from the car), so suspect it is more related to a timeout / interference. For me, it only happens once every few weeks.

Implementing a facility for fallback to modem even if wifi is up (in the case of connections over wifi failing) is probably a sensible feature. I think we would need:

  1. A way for connection success / failure over a particular transport to be reported to the network layer. Or some way for network layer to access historical statistics.

  2. Some logic in the network layer to determine wifi is unreliable (presumably based on sequential connections failures without success, over time).

  3. Some logic in the network layer to determine wifi is reliable again. Perhaps after some time (hours) on modem failover, it could switch back to wifi and try again.

  4. The switch is done in the network layer itself, and our current default route switching mechanism should support that just fine.

I guess a config to enable/disable this as a feature.

But, I would really much rather ‘fix’ the wifi in the first place. It is certainly easier to see if any fix is effective at the moment, with the wifi unreliable, than if we had some automatic failover to modem situation.

Regards, Mark.

On 5 Jan 2019, at 2:56 PM, Stephen Casner <casner@acm.org> wrote:

Michael,

You're right that we can't depend on the websocket job queue overflow
to detect loss of wifi connectivity.  If the improvements in the wifi
driver make it sufficiently robust to detect disassociation, then we
may not need to do anything else to work around that problem.

However, there may well be situations where wifi is able to associate
just fine, but there is no connectivity upstream from that point to
the server.  To handle such cases I think it would be a good idea to
have a signal that both the websocket and server-v[23] can send to
netmanager to trigger switching to another path.

                                                       -- Steve

On Fri, 4 Jan 2019, Michael Balzer wrote:

Steve,

you could have enabled event logging additionally, but there clearly is no event from the wifi driver on the disassociation, or the netmanager would have logged
this as well.

You're probably right in the websocket job queue overflow indicating the loss, but that won't fit as a general canary, as it's only active while at least one
web client is connected.

Another thing you could monitor is the signal quality, or maybe check for a lack of update callbacks? That's CSIRxCallback in esp32wifi.

But that's all working around the underlying wifi blob bug. We first should check if the current IDF blob does a better job.

Regards,
Michael


Am 04.01.19 um 06:15 schrieb Stephen Casner:
Yesterday I found another instance where I could not ssh to OVMS nor
ping it.  This time I verified in my router status that the wifi
association with OVMS was "inactive" (down).  The iPhone app said that
server-v2 was not hear for 108 minutes.  This time I have a full log
file covering back to the previous day, which is attached.

I connected to OVMS with the serial console and the output from the
serial monitor is appended to the attached log file.  The "wifi stat"
and "net stat" commands both indicated that the wifi connection was up
when all the external indications were to the contrary.  Going back
108 minutes in the log file corresponds roughly to the first instance
of "job queue overflow detected".  Perhaps the web server can act as
the canary in the coal mine?  There is no indication that the wifi
driver signaled any problem at that time.

Diagnosing this problem is difficult because the loss of connectivity
occurs after a day or a few days of operation.  I have not tried the
current esp-idf yet; I may do that, but I'm not sure how long that
would need to operate without loss to determine success.

                                                       -- Steve
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26