[Ovmsdev] CAN drivers: fix & harmonize frame transmission failure handling
Stephen Casner
casner at acm.org
Sun Jan 10 02:11:57 HKT 2021
Michael,
Do you know if anyone is tracking your changes for the Tesla Roadster
to make any changes that are necessary or appropriate? I am not at
all familiar with that part of the code.
-- Steve
On Sat, 9 Jan 2021, Michael Balzer wrote:
> I've just pushed the two poller extensions described before:
>
> 1. If some TxCallback has been registered, either globally or on the
> frame, the CAN framework now won't add an extra error status log
> entry on TX failure. If you need to see/log these, activate a CAN
> logger capable of catching TX failures, e.g. "can log start monitor
> crtd", and look for "TX_Fail" entries. With a registered TxCallback,
> during normal operation you will only see error level log entries
> from TX failures when the CAN framework encounters a bus error
> condition. "Ping" frames/requests sent while the error state is
> active won't produce standard log entries.
>
> 2. The vehicle poller now registers a TxCallback for all requests sent,
> so automatically fulfills the above condition. You can hook into
> that callback simply by overriding the following method:
>
> /**
> * IncomingPollTxCallback: poller TX callback (stub, override with
> vehicle implementation)
> * This is called by PollerTxCallback() on TX success/failure for
> a poller request.
> * You can use this to detect CAN bus issues, e.g. if the car
> switches off the OBD port.
> *
> * ATT: this is executed in the main CAN task context. Keep it simple.
> * Complex processing here will affect overall CAN performance.
> *
> * @param bus
> * CAN bus the current poll is done on
> * @param txid
> * The module TX ID of the current poll
> * @param type
> * OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC
> * @param pid
> * PID addressed (depending on the request type, may be none / 8
> bit / 16 bit)
> * @param success
> * Frame transmission success
> */
> void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid,
> uint16_t type, uint16_t pid, bool success)
> {
> }
>
>
> Regards,
> Michael
>
>
> Am 09.01.21 um 14:18 schrieb Michael Balzer:
> > As all tests were positive and without issues, I've merged the rework into
> > master.
> >
> > I now consider extending the poller to allow to hook into transmission
> > failures.
> >
> > Also, if a TX callback is present in a frame, I don't think we need the
> > error log entry from the CAN framework. That would eliminate most CAN error
> > log messages from regular poller "pings".
> >
> > Regards,
> > Michael
> >
> >
> > Am 08.01.21 um 18:52 schrieb Michael Balzer:
> > > Steve,
> > >
> > > thanks, that's perfect. The failure handling works as designed in your
> > > case.
> > >
> > > Regarding your question:
> > > > It does prompt me to ask a question that I had - On the i3, if you do
> > > > something like send a lock from the key or the Connected Drive APP then
> > > > the OBD-II comes alive but goes asleep again in less than a minute.
> > > >
> > > > if I have a PID that I poll infrequently - say every 120 seconds. What
> > > > happens in this case? Would they be seen as "overdue" when the bus
> > > > comes alive and polled immediately, or is it a matter of luck if the
> > > > 120th tick arrives at a time when the bus is alive?
> > > >
> > > > If the latter I need to poll even things like the VIN every 10 seconds
> > > > to make sure I get it before the bus goes to sleep again.
> > >
> > > With the old handling, the queued frames would have get sent as soon as
> > > the bus got awake again. That's nasty, as the frames may have been for a
> > > specific task (e.g. some protocol part), and should not be sent to a just
> > > woken up car. That could produce any sort of problem up to queued OBD
> > > writes corrupting the car memory. It was also nasty the driver would then
> > > send the whole TX queue at once, flooding the bus. A vehicle could see
> > > that as a malicious activity and block access.
> > >
> > > The new handling will abort the transmission as soon as the CAN controller
> > > runs into the retransmission limit (128 tries, formally CAN error-passive
> > > mode).
> > >
> > > So you now need to "ping" the car with some simple read or session state
> > > request, and check if a response comes in to determine if the bus is
> > > online. If using the poller, you'll get a respective Incoming...()
> > > callback. If you don't use the poller, you can set the TX callback pointer
> > > on the frame you send. The TX callback is called with a success indicator,
> > > so you can know a frame has been sent even if you don't get a response
> > > from the device.
> > >
> > > Regards,
> > > Michael
> > >
> > >
> > > Am 08.01.21 um 09:33 schrieb Steve Davies:
> > > > Hi Michael,
> > > >
> > > > Here's the log from a test on my car with your branch
> > > >
> > > > I started the car, left it for a while, then shut it down and waited
> > > > until the OBD-II first went to "not getting replies to my requests" and
> > > > then to "not sending anything at all".
> > > >
> > > > Hope its helpful.
> > > >
> > > > https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing
> > > > <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>
> > > >
> > > > Steve
> > > >
> > > >
> > > > On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve at telviva.co.za
> > > > <mailto:steve at telviva.co.za>> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > The change looks helpful, thanks. I'll try it during the course
> > > > of the day.
> > > >
> > > > It does prompt me to ask a question that I had - On the i3, if
> > > > you do something like send a lock from the key or the Connected
> > > > Drive APP then the OBD-II comes alive but goes asleep again in
> > > > less than a minute.
> > > >
> > > > if I have a PID that I poll infrequently - say every 120
> > > > seconds. What happens in this case? Would they be seen as
> > > > "overdue" when the bus comes alive and polled immediately, or is
> > > > it a matter of luck if the 120th tick arrives at a time when the
> > > > bus is alive?
> > > >
> > > > If the latter I need to poll even things like the VIN every 10
> > > > seconds to make sure I get it before the bus goes to sleep again.
> > > >
> > > > Thanks,
> > > > Steve
> > > >
> > > >
> > > > On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter at expeedo.de
> > > > <mailto:dexter at expeedo.de>> wrote:
> > > >
> > > > Everyone,
> > > >
> > > > please pull & test the new "can-txfail-fix" branch. It's up
> > > > to date and includes the BMW i3 code already.
> > > >
> > > > I need to get feedback from users of both can1 (esp32can) &
> > > > can2/3/4 (mcp2515), as changes had to be made to both drivers.
> > > >
> > > > I'll quote from my commit:
> > > > https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70
> > > > <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>
> > > >
> > > > Design goals:
> > > > - any TX can either fail or succeed, the result state is
> > > > terminal
> > > > - the respective TX callback is called exactly once
> > > > - transmissions fail on reaching the error-passive bus state
> > > > and on message/bus errors while in error-passive state
> > > > - a failed TX will be aborted (no retries after bus recovery),
> > > > i.e. will be retried at most 128 times (in error-active
> > > > phase)
> > > > - reduce excessive CAN error logging
> > > > - reduce excessive interrupt load with switched-off buses
> > > >
> > > > This results in the application being able to reliably detect a
> > > > switched-off vehicle bus by the TX callback's success indicator.
> > > > It also results in frames no longer being held in the TX buffer
> > > > or added to the TX queue when the bus is switched off. The
> > > > application can now rely on getting a clean bus state on every
> > > > reconnect, without any queued old frames to be sent
> > > > automatically.
> > > >
> > > > Secondary benefit from aborting the transmission is, the module
> > > > doesn't need to handle the load from the continuously triggered
> > > > CAN error interrupts by retransmission attempts in error-passive
> > > > state.
> > > >
> > > >
> > > > Reason for this was a) Steve's question on aborting
> > > > transmissions / flushing the queue and b) my new car now
> > > > also switching off the bus, with the annoying effect of a
> > > > frozen can1 every 2-3 days, needing to reboot the module.
> > > > I'm not sure yet if the freeze issue is solved, but I
> > > > haven't had it since running these changes on my module.
> > > >
> > > > The other issue of the transceivers resending frames queued
> > > > long ago may have caused all sorts of strange & unrepeatable
> > > > issues. I remember the VW crew having problems that fell
> > > > into this category.
> > > >
> > > > I've verified the new MCP2515 implementation only on my
> > > > workbench (with an Arduino as the CAN tester), so real life
> > > > tests are necessary.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > -- Michael Balzer * Helkenberger Weg 9 * D-58256
> > > > Ennepetal
> > > > Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
More information about the OvmsDev
mailing list