[Ovmsdev] CAN drivers: fix & harmonize frame transmission failure handling

Mark Webb-Johnson mark at webb-johnson.net
Sun Jan 10 08:11:21 HKT 2021


I’m looking after the roadster module. I don’t have the car anymore, but do have a VMS+VDS on the bench (which covers most of the functionality).

No problems seen with the CAN bus changes. Roadster CAN should be on all the time.

Best thing roadster users here can do is the put their module on EDGE, or at least EAP, so we can catch problems before they go out widespread.

Regards, Mark

P.S. I’ve been quiet here lately due to crazy day job pressures - trying to get a new cloud services project launched with a distributed development team during COVID.

> On 10 Jan 2021, at 2:12 AM, Stephen Casner <casner at acm.org> wrote:
> 
> Michael,
> 
> Do you know if anyone is tracking your changes for the Tesla Roadster
> to make any changes that are necessary or appropriate?  I am not at
> all familiar with that part of the code.
> 
>                                                        -- Steve
> 
>> On Sat, 9 Jan 2021, Michael Balzer wrote:
>> 
>> I've just pushed the two poller extensions described before:
>> 
>> 1. If some TxCallback has been registered, either globally or on the
>>   frame, the CAN framework now won't add an extra error status log
>>   entry on TX failure. If you need to see/log these, activate a CAN
>>   logger capable of catching TX failures, e.g. "can log start monitor
>>   crtd", and look for "TX_Fail" entries. With a registered TxCallback,
>>   during normal operation you will only see error level log entries
>>   from TX failures when the CAN framework encounters a bus error
>>   condition. "Ping" frames/requests sent while the error state is
>>   active won't produce standard log entries.
>> 
>> 2. The vehicle poller now registers a TxCallback for all requests sent,
>>   so automatically fulfills the above condition. You can hook into
>>   that callback simply by overriding the following method:
>> 
>>   /**
>>     * IncomingPollTxCallback: poller TX callback (stub, override with
>>   vehicle implementation)
>>     *  This is called by PollerTxCallback() on TX success/failure for
>>   a poller request.
>>     *  You can use this to detect CAN bus issues, e.g. if the car
>>   switches off the OBD port.
>>     *
>>     *  ATT: this is executed in the main CAN task context. Keep it simple.
>>     *    Complex processing here will affect overall CAN performance.
>>     *
>>     *  @param bus
>>     *    CAN bus the current poll is done on
>>     *  @param txid
>>     *    The module TX ID of the current poll
>>     *  @param type
>>     *    OBD2 mode / UDS polling type, e.g. VEHICLE_POLL_TYPE_READDTC
>>     *  @param pid
>>     *    PID addressed (depending on the request type, may be none / 8
>>   bit / 16 bit)
>>     *  @param success
>>     *    Frame transmission success
>>     */
>>   void OvmsVehicle::IncomingPollTxCallback(canbus* bus, uint32_t txid,
>>   uint16_t type, uint16_t pid, bool success)
>>      {
>>      }
>> 
>> 
>> Regards,
>> Michael
>> 
>> 
>>> Am 09.01.21 um 14:18 schrieb Michael Balzer:
>>> As all tests were positive and without issues, I've merged the rework into
>>> master.
>>> 
>>> I now consider extending the poller to allow to hook into transmission
>>> failures.
>>> 
>>> Also, if a TX callback is present in a frame, I don't think we need the
>>> error log entry from the CAN framework. That would eliminate most CAN error
>>> log messages from regular poller "pings".
>>> 
>>> Regards,
>>> Michael
>>> 
>>> 
>>> Am 08.01.21 um 18:52 schrieb Michael Balzer:
>>>> Steve,
>>>> 
>>>> thanks, that's perfect. The failure handling works as designed in your
>>>> case.
>>>> 
>>>> Regarding your question:
>>>>> It does prompt me to ask a question that I had - On the i3, if you do
>>>>> something like send a lock from the key or the Connected Drive APP then
>>>>> the OBD-II comes alive but goes asleep again in less than a minute.
>>>>> 
>>>>> if I have a PID that I poll infrequently - say every 120 seconds.  What
>>>>> happens in this case?  Would they be seen as "overdue" when the bus
>>>>> comes alive and polled immediately, or is it a matter of luck if the
>>>>> 120th tick arrives at a time when the bus is alive?
>>>>> 
>>>>> If the latter I need to poll even things like the VIN every 10 seconds
>>>>> to make sure I get it before the bus goes to sleep again.
>>>> 
>>>> With the old handling, the queued frames would have get sent as soon as
>>>> the bus got awake again. That's nasty, as the frames may have been for a
>>>> specific task (e.g. some protocol part), and should not be sent to a just
>>>> woken up car. That could produce any sort of problem up to queued OBD
>>>> writes corrupting the car memory. It was also nasty the driver would then
>>>> send the whole TX queue at once, flooding the bus. A vehicle could see
>>>> that as a malicious activity and block access.
>>>> 
>>>> The new handling will abort the transmission as soon as the CAN controller
>>>> runs into the retransmission limit (128 tries, formally CAN error-passive
>>>> mode).
>>>> 
>>>> So you now need to "ping" the car with some simple read or session state
>>>> request, and check if a response comes in to determine if the bus is
>>>> online. If using the poller, you'll get a respective Incoming...()
>>>> callback. If you don't use the poller, you can set the TX callback pointer
>>>> on the frame you send. The TX callback is called with a success indicator,
>>>> so you can know a frame has been sent even if you don't get a response
>>>> from the device.
>>>> 
>>>> Regards,
>>>> Michael
>>>> 
>>>> 
>>>> Am 08.01.21 um 09:33 schrieb Steve Davies:
>>>>> Hi Michael,
>>>>> 
>>>>> Here's the log from a test on my car with your branch
>>>>> 
>>>>> I started the car, left it for a while, then shut it down and waited
>>>>> until the OBD-II first went to "not getting replies to my requests" and
>>>>> then to "not sending anything at all".
>>>>> 
>>>>> Hope its helpful.
>>>>> 
>>>>> https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing
>>>>> <https://drive.google.com/file/d/1AavD41HCykYrn-BxQXNufu2dT_UVCXYU/view?usp=sharing>
>>>>> 
>>>>> Steve
>>>>> 
>>>>> 
>>>>> On Fri, 8 Jan 2021 at 08:22, Steve Davies <steve at telviva.co.za
>>>>> <mailto:steve at telviva.co.za>> wrote:
>>>>> 
>>>>>    Hi Michael,
>>>>> 
>>>>>    The change looks helpful, thanks.  I'll try it during the course
>>>>>    of the day.
>>>>> 
>>>>>    It does prompt me to ask a question that I had - On the i3, if
>>>>>    you do something like send a lock from the key or the Connected
>>>>>    Drive APP then the OBD-II comes alive but goes asleep again in
>>>>>    less than a minute.
>>>>> 
>>>>>    if I have a PID that I poll infrequently - say every 120
>>>>>    seconds.  What happens in this case?  Would they be seen as
>>>>>    "overdue" when the bus comes alive and polled immediately, or is
>>>>>    it a matter of luck if the 120th tick arrives at a time when the
>>>>>    bus is alive?
>>>>> 
>>>>>    If the latter I need to poll even things like the VIN every 10
>>>>>    seconds to make sure I get it before the bus goes to sleep again.
>>>>> 
>>>>>    Thanks,
>>>>>    Steve
>>>>> 
>>>>> 
>>>>>    On Thu, 7 Jan 2021 at 18:22, Michael Balzer <dexter at expeedo.de
>>>>>    <mailto:dexter at expeedo.de>> wrote:
>>>>> 
>>>>>        Everyone,
>>>>> 
>>>>>        please pull & test the new "can-txfail-fix" branch. It's up
>>>>>        to date and includes the BMW i3 code already.
>>>>> 
>>>>>        I need to get feedback from users of both can1 (esp32can) &
>>>>>        can2/3/4 (mcp2515), as changes had to be made to both drivers.
>>>>> 
>>>>>        I'll quote from my commit:
>>>>>        https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70
>>>>>        <https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/c94592a11ad2c989e65313d23a8876cf38787d70>
>>>>> 
>>>>>        Design goals:
>>>>>        - any TX can either fail or succeed, the result state is
>>>>> terminal
>>>>>        - the respective TX callback is called exactly once
>>>>>        - transmissions fail on reaching the error-passive bus state
>>>>>           and on message/bus errors while in error-passive state
>>>>>        - a failed TX will be aborted (no retries after bus recovery),
>>>>>           i.e. will be retried at most 128 times (in error-active
>>>>> phase)
>>>>>        - reduce excessive CAN error logging
>>>>>        - reduce excessive interrupt load with switched-off buses
>>>>> 
>>>>>        This results in the application being able to reliably detect a
>>>>>        switched-off vehicle bus by the TX callback's success indicator.
>>>>>        It also results in frames no longer being held in the TX buffer
>>>>>        or added to the TX queue when the bus is switched off. The
>>>>>        application can now rely on getting a clean bus state on every
>>>>>        reconnect, without any queued old frames to be sent
>>>>> automatically.
>>>>> 
>>>>>        Secondary benefit from aborting the transmission is, the module
>>>>>        doesn't need to handle the load from the continuously triggered
>>>>>        CAN error interrupts by retransmission attempts in error-passive
>>>>>        state.
>>>>> 
>>>>> 
>>>>>        Reason for this was a) Steve's question on aborting
>>>>>        transmissions / flushing the queue and b) my new car now
>>>>>        also switching off the bus, with the annoying effect of a
>>>>>        frozen can1 every 2-3 days, needing to reboot the module.
>>>>>        I'm not sure yet if the freeze issue is solved, but I
>>>>>        haven't had it since running these changes on my module.
>>>>> 
>>>>>        The other issue of the transceivers resending frames queued
>>>>>        long ago may have caused all sorts of strange & unrepeatable
>>>>>        issues. I remember the VW crew having problems that fell
>>>>>        into this category.
>>>>> 
>>>>>        I've verified the new MCP2515 implementation only on my
>>>>>        workbench (with an Arduino as the CAN tester), so real life
>>>>>        tests are necessary.
>>>>> 
>>>>>        Thanks,
>>>>>        Michael
>>>>> 
>>>>>        --         Michael Balzer * Helkenberger Weg 9 * D-58256
>>>>> Ennepetal
>>>>>        Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev



More information about the OvmsDev mailing list