[Ovmsdev] CAN-3 broken again?

Greg D. gregd2350 at gmail.com
Mon Jan 8 14:31:25 HKT 2018


Quick update...  I can reliably get the CAN 3 bus to hang with an Rx
overflow by having the modem running, then telling WiFi to connect, but
only with the OBDWiz dongle.  Connecting an actual HUD display doesn't
seem to trigger the effect.  Surprisingly, I can do a full module reset
while the OBDWiz is running, without it disconnecting.  (OBDII ECU is
started in the system.start script, along with the v2 server and vehicle
module setting, though all this testing is done on the bench without the
car.)

Also, once the bus is hung, restarting the OBD2ECU process sometimes
only lets the OBDWiz dongle get part-way through its connect process
before it hangs again.  Consistently 7 frames received, 10 sent (due to
some of the responses taking multiple frames).  It may be significant
that the HUD does NOT use those multi-frame PIDS (ECU Name and VIN)...

That said, another development is that the Rx overflow may not be fatal
after all, if I start things with the HUD, then swap dongles to the
OBDWiz.  Seems that having an external 12v power source keeps things
running even with the overflow status.  Since earlier (few months ago)
testing didn't have the modem running, and the OBDWiz dongle doesn't
need the 12v power (it's a USB device, on a different PC), this test
combination is new.  Error flags on the can status are 0x2040, by the
way, when it hangs.

But still, even with the 12v, I can reliably cause the bus to hang by
starting with the OBDWiz dongle running, get the modem connected, then
connect wifi.  The partial connect and hang seems to be solved with the
12v power; the full hang is not.  But the full hang (with 12v attached)
can be reset by stopping and restarting the obdii ecu.  Interestingly,
the 0x2040 error status is not cleared when restarting the obdii
process, but the frame and frame error counters do get set back to zero.

Still looking for more clues...  Any ideas on how to narrow this down?

Greg


Greg D. wrote:
> I've turned off Canopen, SSH, and Telnet earlier, and that seemed to
> stop the crashes.  Just now added Bluetooth to that list, for good measure.
>
> Let's see how that holds...
>
> As for the issues with CAN-3, I seem to be able to hang it by simply
> starting WiFi while the OBDII ECU is running with an OBDII device
> attached (OBDWiz, in this case).  Trying to reconnect the OBDII device
> fails - no frames are received.  Stopping and restarting the OBDII ECU
> task lets me reconnect.  If I look at the can status when hung, I see
> that Rx Ovrflw is 1, and the Rx counter doesn't increment.  I'm guessing
> that starting WiFi is taking enough CPU time that the OBDII ECU task is
> falling behind, causing the overflow.  Apparently, that overflow is not
> being handled, leading to the hang.
>
> On an earlier run (before removing Bluetooth), I was able to get the
> OBDWiz dongle to connect for a few frames, after which it hung.  That
> behavior didn't repeat just now, but I'm not sure what else was running
> at the time (e.g. the modem / ppp).  The connect sequence from OBDWiz
> does a few frames rapidly (an initial PID 0, followed by requests for
> ECU Name and VIN), before a more relaxed polling starts.  So, if there's
> another task taking up CPU time, I can see where an Rx overflow could
> occur during that initial connect sequence.
>
> Driving a HUD is not a critical task, so I would be against a general
> raising of task priority.  Rather, we need to figure out how to handle
> the Rx Overflow, and keep the frames coming in.  OBDII devices generally
> are somewhat forgiving about lost frames, but apparently the OBDWiz has
> a short attention span and lets you know that something is wrong.
>
> I'll take a look at the 2515 code, but I'm not much of an expert on the
> chip's care and feeding under such circumstances.  If someone more in
> the know about it could take a look, that would be great.
>
> Thanks,
>
> Greg
>
>
> Stephen Casner wrote:
>> Greg,
>>
>> Yes, definitely running out of free RAM, but I don't know the meaning
>> of the WindowOverflow messages.
>>
>> The first time I built with release/v3.0 of esp-idf I was not able to
>> open an ssh connection; the error displayed was about a crypto
>> failure.  After quite a bit of digging to narrow down to where the
>> error was occurring, I finally found that the problem was running out
>> of free RAM.  My solution was to disable bluetooth entirely, which
>> made a big difference in the amount of free RAM.
>>
>>                                                         -- Steve
>>
>> On Sun, 7 Jan 2018, Greg D. wrote:
>>
>>> Hi Michael, Steve, Mark,
>>>
>>> Steve, the crash was an abort in new_op.cc, so perhaps being out of space is the
>>> issue.  Crash and reboot log attached (crash.txt).  One thing I've been wondering
>>> about are the several lines "_WindowOverflow4 at ??:?" during the boot process.  Is
>>> that indicative of a problem, later to manifest in the crash?
>>>
>>> My builds include pretty much everything, except for the Leaf, Twizy, and Soul.
>>>
>>> The update included some 20 lines changed to mcp2525.cpp, as well as a bunch of other
>>> stuff, including a lot stuff updated in Canopen and Kia.  I have a script that does
>>> the git fetch master, merge, and push back to my github fork, the output of which is
>>> attached (update.txt).  As a test, I removed Canopen from the build config, and the
>>> crash has disappeared.  CAN-3 also appears to have come back to life (!), at least
>>> initially.  I can still get CAN-3 to fail if I turn on/off the modem and/or wifi in
>>> some sequence (still trying to pin that down), but that also leads to another crash
>>> (crash2.txt, attached).
>>>
>>> Mark:  Note also the issue with DNS failures getting to the v2 server.  I enabled the
>>> modem, got connected, then enabled WiFi (simulating arriving at home), and lost the V2
>>> server.  Disabling Wifi didn't bring it back, and powering off the modem (in
>>> preparation for turning it back on) caused the crash.
>>>
>>> So, two questions...  First, why the apparent conflict between Canopen or wifi/modem
>>> and obd2ecu over access to the 3rd CAN bus?  Why would the modem or wifi have any
>>> effect on a CAN bus?
>>>
>>> Second, overall memory usage seems to be at the limit.  What sort of budget do we have
>>> for what remains to be done, and how are we going to be packaging the build options
>>> for when non-developers want to get their hands on the product?  Will we be able to
>>> turn everything on, minus the developer / debug stuff, or will we have a separate SKU
>>> for each model car?
>>>
>>> Thanks,
>>>
>>> Greg
>>>
>>>
>>> Michael Balzer wrote:
>>>
>>> Greg,
>>>
>>> which commits / changes do you mean? The CAN drivers have not been changed since the T
>>> X performance fix, which Geir reported having solved his last issues.
>>>
>>> The current version is stable over here, but without the SSH component -- I can't use
>>> that due to memory getting too low together with the Twizy component.
>>>
>>> Regards,
>>> Michael
>>>
>>>
>>> Am 07.01.2018 um 08:04 schrieb Greg D.:
>>>
>>> Hi folks,
>>>
>>> I just resync'd with the main repository, and am not receiving frames on
>>> CAN-3 anymore.  I see there were changes to the chip driver...
>>>
>>> I'm also seeing crashes right after getting connected to WiFi,
>>> immediately after the system tries to start SSH.
>>>
>>> Seems like we just took a big step backward.  What happened?
>>>
>>> Greg
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.teslaclub.hk
>>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.teslaclub.hk
>> http://lists.teslaclub.hk/mailman/listinfo/ovmsdev




More information about the OvmsDev mailing list