A way for connection success / failure over a particular transport to be reported to the network layer. Or some way for network layer to access historical statistics.
Some logic in the network layer to determine wifi is unreliable (presumably based on sequential connections failures without success, over time).
Some logic in the network layer to determine wifi is reliable again. Perhaps after some time (hours) on modem failover, it could switch back to wifi and try again.
The switch is done in the network layer itself, and our current default route switching mechanism should support that just fine.

I guess a config to enable/disable this as a feature.

But, I would really much rather ‘fix’ the wifi in the first place. It is certainly easier to see if any fix is effective at the moment, with the wifi unreliable, than if we had some automatic failover to modem situation.

Regards, Mark.

On 5 Jan 2019, at 2:56 PM, Stephen Casner <casner@acm.org> wrote:

Michael,

You're right that we can't depend on the websocket job queue overflow
to detect loss of wifi connectivity. If the improvements in the wifi
driver make it sufficiently robust to detect disassociation, then we
may not need to do anything else to work around that problem.

However, there may well be situations where wifi is able to associate
just fine, but there is no connectivity upstream from that point to
the server. To handle such cases I think it would be a good idea to
have a signal that both the websocket and server-v[23] can send to
netmanager to trigger switching to another path.

-- Steve

On Fri, 4 Jan 2019, Michael Balzer wrote:

Steve,

you could have enabled event logging additionally, but there clearly is no event from the wifi driver on the disassociation, or the netmanager would have logged
this as well.

You're probably right in the websocket job queue overflow indicating the loss, but that won't fit as a general canary, as it's only active while at least one
web client is connected.

Another thing you could monitor is the signal quality, or maybe check for a lack of update callbacks? That's CSIRxCallback in esp32wifi.

But that's all working around the underlying wifi blob bug. We first should check if the current IDF blob does a better job.

Regards,
Michael

Am 04.01.19 um 06:15 schrieb Stephen Casner:

Yesterday I found another instance where I could not ssh to OVMS nor
ping it. This time I verified in my router status that the wifi
association with OVMS was "inactive" (down). The iPhone app said that
server-v2 was not hear for 108 minutes. This time I have a full log
file covering back to the previous day, which is attached.

I connected to OVMS with the serial console and the output from the
serial monitor is appended to the attached log file. The "wifi stat"
and "net stat" commands both indicated that the wifi connection was up
when all the external indications were to the contrary. Going back
108 minutes in the log file corresponds roughly to the first instance
of "job queue overflow detected". Perhaps the web server can act as
the canary in the coal mine? There is no indication that the wifi
driver signaled any problem at that time.

Diagnosing this problem is difficult because the loss of connectivity
occurs after a day or a few days of operation. I have not tried the
current esp-idf yet; I may do that, but I'm not sure how long that
would need to operate without loss to determine success.

-- Steve

_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev