CAN drivers: fix & harmonize frame transmission failure handling

older
PR #516 followup, and looking for...

Michael Balzer

7 Jan 2021 7 Jan '21

8:02 a.m.

Everyone, please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already. I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers. I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically. Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state. Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module. The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category. I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary. Thanks, Michael -- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Attachments:

attachment.html (text/html — 3.8 KB)
OpenPGP_signature.sig (application/pgp-signature — 203 bytes)

Show replies by date

Chris van der Meijden

7 Jan 7 Jan

8:57 a.m.

Hey Michael I fetched can-txfail-fix. It compiled without any problem. Flash worked and it boots. But it did not resolve the TX problems for me on CAN3 (VWUP T26A). Still getting the Errors from my TX workaround: I (94931) v-vweup: RemoteCommandHandler E (94931) can: can3: intr=1 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=8 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=2 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=16 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=3 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=24 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=4 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=32 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=5 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=40 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=6 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=48 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=7 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=56 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=8 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=64 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=9 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=72 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=10 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=80 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=11 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=88 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 W (94941) mcp2515: can3 EFLG: TX_Err_Warn EWARN E (94941) can: can3: intr=12 rxpkt=0 txpkt=0 errflags=0xa00510a0 rxerr=0 txerr=96 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94951) can: can3: intr=14 rxpkt=0 txpkt=1 errflags=0x80001080 rxerr=1 txerr=95 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 I (96931) v-vweup: Sent Wakeup Command - stage 1 Sorry ... Greetinx Chris

...

o date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/comm it/c94592a11ad2c989e65313d23a8876cf38787d70

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

Michael Balzer

10:53 a.m.

Chris, thanks for testing & feedback. You will still get multiple error log messages on a switched-off bus. I've kept it that way to get error log entries also when the error counter stays below the warning (96) or passive limit (128). What your log shows is, that after 96 retries (and thus reaching just the error-warning threshold), the frame was finally transmitted successfully. You can see that from the txerr counter getting decreased to 95 finally. That's the way CAN works. The rxerr=1 is probably coming from an initial bit noise the transceiver sees on the bus. After that, the bus should be awake, and further transmissions should get through directly. The rxerr should decrease to 0, the txerr decrease by 1 for each successful transmission and finally also reach 0. You can follow these by looking at "can can3 status". So the fix seems to not have broken the transmission in this case, which is a good result. If the fix has a benefit for your wakeup sequence, it should work more reliably now, because it should now reliably get to be the first frame to be transmitted. Regards, Michael Am 07.01.21 um 17:57 schrieb Chris van der Meijden:

...

Hey Michael

I fetched can-txfail-fix. It compiled without any problem. Flash worked and it boots.

But it did not resolve the TX problems for me on CAN3 (VWUP T26A).

Still getting the Errors from my TX workaround:

I (94931) v-vweup: RemoteCommandHandler E (94931) can: can3: intr=1 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=8 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=2 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=16 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=3 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=24 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=4 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=32 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94931) can: can3: intr=5 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=40 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=6 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=48 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=7 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=56 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=8 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=64 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=9 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=72 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=10 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=80 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94941) can: can3: intr=11 rxpkt=0 txpkt=0 errflags=0x80001080 rxerr=0 txerr=88 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 W (94941) mcp2515: can3 EFLG: TX_Err_Warn EWARN E (94941) can: can3: intr=12 rxpkt=0 txpkt=0 errflags=0xa00510a0 rxerr=0 txerr=96 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 E (94951) can: can3: intr=14 rxpkt=0 txpkt=1 errflags=0x80001080 rxerr=1 txerr=95 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 I (96931) v-vweup: Sent Wakeup Command - stage 1

Sorry ...

Greetinx

Chris

...
o date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/comm it/c94592a11ad2c989e65313d23a8876cf38787d70

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Steve Davies

10:22 p.m.

Hi Michael, The change looks helpful, thanks. I'll try it during the course of the day. It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute. if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive? If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again. Thanks, Steve On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de> wrote:

...

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit:

https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945...

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

Steve Davies

8 Jan 8 Jan

12:33 a.m.

Hi Michael, Here's the log from a test on my car with your branch I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all". Hope its helpful. https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... Steve On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za> wrote:

...

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de> wrote:

...
Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit:

https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945...

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

Michael Balzer

9:52 a.m.

Steve, thanks, that's perfect. The failure handling works as designed in your case. Regarding your question:

...

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access. The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode). So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming…() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device. Regards, Michael Am 08.01.21 um 09:33 schrieb Steve Davies:

...

Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Michael Balzer

9 Jan 9 Jan

5:18 a.m.

As all tests were positive and without issues, I've merged the rework into master. I now consider extending the poller to allow to hook into transmission failures. Also, if a TX callback is present in a frame, I don't think we need the error log entry from the CAN framework. That would eliminate most CAN error log messages from regular poller "pings". Regards, Michael Am 08.01.21 um 18:52 schrieb Michael Balzer:

...

Steve,

thanks, that's perfect. The failure handling works as designed in your case.

Regarding your question:

...
It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access.

The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode).

So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming…() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device.

Regards, Michael

Am 08.01.21 um 09:33 schrieb Steve Davies:

...
Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Michael Balzer

9:02 a.m.

I've just pushed the two poller extensions described before: 1. If some TxCallback has been registered, either globally or on the frame, the CAN framework now won't add an extra error status log entry on TX failure. If you need to see/log these, activate a CAN logger capable of catching TX failures, e.g. "can log start monitor crtd", and look for "TX_Fail" entries. With a registered TxCallback, during normal operation you will only see error level log entries from TX failures when the CAN framework encounters a bus error condition. "Ping" frames/requests sent while the error state is active won't produce standard log entries. 2. The vehicle poller now registers a TxCallback for all requests sent, so automatically fulfills the above condition. You can hook into that callback simply by overriding the following method: /** * IncomingPollTxCallback: poller TX callback (stub, override with vehicle implementation) * This is called by PollerTxCallback() on TX success/failure for a poller request. * You can use this to detect CAN bus issues, e.g. if the car switches off the OBD port. * * ATT: this is executed in the main CAN task context. Keep it simple. * Complex processing here will affect overall CAN performance. * * @param bus * CAN bus the current poll is done on * @param txid * The module TX ID of the current poll * @param type * OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC * @param pid * PID addressed (depending on the request type, may be none / 8 bit / 16 bit) * @param success * Frame transmission success */ void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid, uint16_t type, uint16_t pid, bool success) { } Regards, Michael Am 09.01.21 um 14:18 schrieb Michael Balzer:

...

As all tests were positive and without issues, I've merged the rework into master.

I now consider extending the poller to allow to hook into transmission failures.

Also, if a TX callback is present in a frame, I don't think we need the error log entry from the CAN framework. That would eliminate most CAN error log messages from regular poller "pings".

Regards, Michael

Am 08.01.21 um 18:52 schrieb Michael Balzer:

...
Steve,

thanks, that's perfect. The failure handling works as designed in your case.

Regarding your question:

...
It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access.

The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode).

So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming…() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device.

Regards, Michael

Am 08.01.21 um 09:33 schrieb Steve Davies:

...
Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Stephen Casner

10:11 a.m.

Michael, Do you know if anyone is tracking your changes for the Tesla Roadster to make any changes that are necessary or appropriate? I am not at all familiar with that part of the code. -- Steve On Sat, 9 Jan 2021, Michael Balzer wrote:

...

I've just pushed the two poller extensions described before:

1. If some TxCallback has been registered, either globally or on the frame, the CAN framework now won't add an extra error status log entry on TX failure. If you need to see/log these, activate a CAN logger capable of catching TX failures, e.g. "can log start monitor crtd", and look for "TX_Fail" entries. With a registered TxCallback, during normal operation you will only see error level log entries from TX failures when the CAN framework encounters a bus error condition. "Ping" frames/requests sent while the error state is active won't produce standard log entries.

2. The vehicle poller now registers a TxCallback for all requests sent, so automatically fulfills the above condition. You can hook into that callback simply by overriding the following method:

/** * IncomingPollTxCallback: poller TX callback (stub, override with vehicle implementation) * This is called by PollerTxCallback() on TX success/failure for a poller request. * You can use this to detect CAN bus issues, e.g. if the car switches off the OBD port. * * ATT: this is executed in the main CAN task context. Keep it simple. * Complex processing here will affect overall CAN performance. * * @param bus * CAN bus the current poll is done on * @param txid * The module TX ID of the current poll * @param type * OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC * @param pid * PID addressed (depending on the request type, may be none / 8 bit / 16 bit) * @param success * Frame transmission success */ void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid, uint16_t type, uint16_t pid, bool success) { }

Regards, Michael

Am 09.01.21 um 14:18 schrieb Michael Balzer:

...
As all tests were positive and without issues, I've merged the rework into master.

I now consider extending the poller to allow to hook into transmission failures.

Also, if a TX callback is present in a frame, I don't think we need the error log entry from the CAN framework. That would eliminate most CAN error log messages from regular poller "pings".

Regards, Michael

Am 08.01.21 um 18:52 schrieb Michael Balzer:

...
Steve,

thanks, that's perfect. The failure handling works as designed in your case.

Regarding your question:

...
It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access.

The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode).

So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming...() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device.

Regards, Michael

Am 08.01.21 um 09:33 schrieb Steve Davies:

...
Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Michael Balzer

12:59 p.m.

Steve, none of the changes should be critical to any existing vehicle module. I checked all CAN callback uses and found no conflicts with the changes. The new transmission failure handling is also more a fix than a change, and the poller extensions are fully compatible & optional to use. But I'm also not familiar with the Roadster module. Is there some special CAN or poller use in the Roadster code you're concerned about? According to the git log, Mark authored the last changes to the Roadster module. Mark? Regards, Michael Am 09.01.21 um 19:11 schrieb Stephen Casner:

...

Michael,

Do you know if anyone is tracking your changes for the Tesla Roadster to make any changes that are necessary or appropriate? I am not at all familiar with that part of the code.

-- Steve

On Sat, 9 Jan 2021, Michael Balzer wrote:

...
I've just pushed the two poller extensions described before:

1. If some TxCallback has been registered, either globally or on the frame, the CAN framework now won't add an extra error status log entry on TX failure. If you need to see/log these, activate a CAN logger capable of catching TX failures, e.g. "can log start monitor crtd", and look for "TX_Fail" entries. With a registered TxCallback, during normal operation you will only see error level log entries from TX failures when the CAN framework encounters a bus error condition. "Ping" frames/requests sent while the error state is active won't produce standard log entries.

2. The vehicle poller now registers a TxCallback for all requests sent, so automatically fulfills the above condition. You can hook into that callback simply by overriding the following method:

/** * IncomingPollTxCallback: poller TX callback (stub, override with vehicle implementation) * This is called by PollerTxCallback() on TX success/failure for a poller request. * You can use this to detect CAN bus issues, e.g. if the car switches off the OBD port. * * ATT: this is executed in the main CAN task context. Keep it simple. * Complex processing here will affect overall CAN performance. * * @param bus * CAN bus the current poll is done on * @param txid * The module TX ID of the current poll * @param type * OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC * @param pid * PID addressed (depending on the request type, may be none / 8 bit / 16 bit) * @param success * Frame transmission success */ void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid, uint16_t type, uint16_t pid, bool success) { }

Regards, Michael

Am 09.01.21 um 14:18 schrieb Michael Balzer:

...
As all tests were positive and without issues, I've merged the rework into master.

I now consider extending the poller to allow to hook into transmission failures.

Also, if a TX callback is present in a frame, I don't think we need the error log entry from the CAN framework. That would eliminate most CAN error log messages from regular poller "pings".

Regards, Michael

Am 08.01.21 um 18:52 schrieb Michael Balzer:

...
Steve,

thanks, that's perfect. The failure handling works as designed in your case.

Regarding your question:

...
It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again. With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access.

The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode).

So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming...() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device.

Regards, Michael

Am 08.01.21 um 09:33 schrieb Steve Davies:

...
Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

Stephen Casner

2:21 p.m.

On Sat, 9 Jan 2021, Michael Balzer wrote:

...

But I'm also not familiar with the Roadster module. Is there some special CAN or poller use in the Roadster code you're concerned about?

No, because I don't know the code. Just wondering whether I should be concerned. -- Steve

Mark Webb-Johnson

4:11 p.m.

I’m looking after the roadster module. I don’t have the car anymore, but do have a VMS+VDS on the bench (which covers most of the functionality). No problems seen with the CAN bus changes. Roadster CAN should be on all the time. Best thing roadster users here can do is the put their module on EDGE, or at least EAP, so we can catch problems before they go out widespread. Regards, Mark P.S. I’ve been quiet here lately due to crazy day job pressures - trying to get a new cloud services project launched with a distributed development team during COVID.

...

On 10 Jan 2021, at 2:12 AM, Stephen Casner <casner@acm.org> wrote:

Michael,

Do you know if anyone is tracking your changes for the Tesla Roadster to make any changes that are necessary or appropriate? I am not at all familiar with that part of the code.

-- Steve

...
On Sat, 9 Jan 2021, Michael Balzer wrote:

I've just pushed the two poller extensions described before:

1. If some TxCallback has been registered, either globally or on the frame, the CAN framework now won't add an extra error status log entry on TX failure. If you need to see/log these, activate a CAN logger capable of catching TX failures, e.g. "can log start monitor crtd", and look for "TX_Fail" entries. With a registered TxCallback, during normal operation you will only see error level log entries from TX failures when the CAN framework encounters a bus error condition. "Ping" frames/requests sent while the error state is active won't produce standard log entries.

2. The vehicle poller now registers a TxCallback for all requests sent, so automatically fulfills the above condition. You can hook into that callback simply by overriding the following method:

/** * IncomingPollTxCallback: poller TX callback (stub, override with vehicle implementation) * This is called by PollerTxCallback() on TX success/failure for a poller request. * You can use this to detect CAN bus issues, e.g. if the car switches off the OBD port. * * ATT: this is executed in the main CAN task context. Keep it simple. * Complex processing here will affect overall CAN performance. * * @param bus * CAN bus the current poll is done on * @param txid * The module TX ID of the current poll * @param type * OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC * @param pid * PID addressed (depending on the request type, may be none / 8 bit / 16 bit) * @param success * Frame transmission success */ void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid, uint16_t type, uint16_t pid, bool success) { }

Regards, Michael

...
Am 09.01.21 um 14:18 schrieb Michael Balzer: As all tests were positive and without issues, I've merged the rework into master.

I now consider extending the poller to allow to hook into transmission failures.

Also, if a TX callback is present in a frame, I don't think we need the error log entry from the CAN framework. That would eliminate most CAN error log messages from regular poller "pings".

Regards, Michael

Am 08.01.21 um 18:52 schrieb Michael Balzer:

...
Steve,

thanks, that's perfect. The failure handling works as designed in your case.

Regarding your question:

...
It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

With the old handling, the queued frames would have get sent as soon as the bus got awake again. That's nasty, as the frames may have been for a specific task (e.g. some protocol part), and should not be sent to a just woken up car. That could produce any sort of problem up to queued OBD writes corrupting the car memory. It was also nasty the driver would then send the whole TX queue at once, flooding the bus. A vehicle could see that as a malicious activity and block access.

The new handling will abort the transmission as soon as the CAN controller runs into the retransmission limit (128 tries, formally CAN error-passive mode).

So you now need to "ping" the car with some simple read or session state request, and check if a response comes in to determine if the bus is online. If using the poller, you'll get a respective Incoming...() callback. If you don't use the poller, you can set the TX callback pointer on the frame you send. The TX callback is called with a success indicator, so you can know a frame has been sent even if you don't get a response from the device.

Regards, Michael

Am 08.01.21 um 09:33 schrieb Steve Davies:

...
Hi Michael,

Here's the log from a test on my car with your branch

I started the car, left it for a while, then shut it down and waited until the OBD-II first went to "not getting replies to my requests" and then to "not sending anything at all".

Hope its helpful.

https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=s... <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>

Steve

On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve@telviva.co.za <mailto:steve@telviva.co.za>> wrote:

Hi Michael,

The change looks helpful, thanks. I'll try it during the course of the day.

It does prompt me to ask a question that I had - On the i3, if you do something like send a lock from the key or the Connected Drive APP then the OBD-II comes alive but goes asleep again in less than a minute.

if I have a PID that I poll infrequently - say every 120 seconds. What happens in this case? Would they be seen as "overdue" when the bus comes alive and polled immediately, or is it a matter of luck if the 120th tick arrives at a time when the bus is alive?

If the latter I need to poll even things like the VIN every 10 seconds to make sure I get it before the bus goes to sleep again.

Thanks, Steve

On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:

Everyone,

please pull & test the new "can-txfail-fix" branch. It's up to date and includes the BMW i3 code already.

I need to get feedback from users of both can1 (esp32can) & can2/3/4 (mcp2515), as changes had to be made to both drivers.

I'll quote from my commit: https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c945... <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>

Design goals: - any TX can either fail or succeed, the result state is terminal - the respective TX callback is called exactly once - transmissions fail on reaching the error-passive bus state and on message/bus errors while in error-passive state - a failed TX will be aborted (no retries after bus recovery), i.e. will be retried at most 128 times (in error-active phase) - reduce excessive CAN error logging - reduce excessive interrupt load with switched-off buses

This results in the application being able to reliably detect a switched-off vehicle bus by the TX callback's success indicator. It also results in frames no longer being held in the TX buffer or added to the TX queue when the bus is switched off. The application can now rely on getting a clean bus state on every reconnect, without any queued old frames to be sent automatically.

Secondary benefit from aborting the transmission is, the module doesn't need to handle the load from the continuously triggered CAN error interrupts by retransmission attempts in error-passive state.

Reason for this was a) Steve's question on aborting transmissions / flushing the queue and b) my new car now also switching off the bus, with the annoying effect of a frozen can1 every 2-3 days, needing to reboot the module. I'm not sure yet if the freeze issue is solved, but I haven't had it since running these changes on my module.

The other issue of the transceivers resending frames queued long ago may have caused all sorts of strange & unrepeatable issues. I remember the VW crew having problems that fell into this category.

I've verified the new MCP2515 implementation only on my workbench (with an Arduino as the CAN tester), so real life tests are necessary.

Thanks, Michael

-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev

1917

Age (days ago)

1920

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Chris van der Meijden
Mark Webb-Johnson
Michael Balzer
Stephen Casner
Steve Davies