[Ovmsdev] version 3.2.007: Crash on boot

Michael Balzer dexter at expeedo.de
Fri Dec 13 17:25:58 HKT 2019


Just saw the can shell commands actually do an explicit init of frame.callback, so that added zero init is superfluous now. I'll keep it because doing the init
is good style to allow future extensions without hassle.

The obd2ecu frame use was definitely broken.


Am 13.12.19 um 10:15 schrieb Michael Balzer:
> Greg, Mark,
>
> the new crash happened here:
>
> balzer at leela:~/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3> a2l 0x3ffcfda0:0x3ffc87a0 0x400da762:0x3ffc87d0 0x400da95b:0x3ffc87f0 0x400da901:0x3ffc8810
> Using elf file: /home/balzer/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/build/ovms3.elf
> 0x400da762 is in can::ExecuteCallbacks(CAN_frame_t const*, bool, bool)
> (/home/balzer/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/can/src/can.cpp:866).
> 861      {
> 862      if (tx)
> 863        {
> 864        if (frame->callback)
> 865          {
> 866          (*(frame->callback))(frame, success); // invoke frame-specific callback function
> 867          }
> 868        for (auto entry : m_txcallbacks) {      // invoke generic tx callbacks
> 869          entry->m_callback(frame, success);
> 870          }
> 0x400da95b is in canbus::TxCallback(CAN_frame_t*, bool) (/home/balzer/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/can/src/can.cpp:1018).
> 1013        }
> 1014      else
> 1015        {
> 1016        LogFrame(CAN_LogFrame_TX_Fail, p_frame);
> 1017        }
> 1018      MyCan.ExecuteCallbacks(p_frame, true, success);
> 1019      }
> 1020   
> 1021    /**
> 1022     * canbus::Write -- main TX API
> 0x400da901 is in CAN_rxtask(void*) (/home/balzer/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/components/can/src/can.cpp:730).
> 725                  me->IncomingFrame(&msg.body.frame);
> 726                } while (loop);
> 727              break;
> 728              }
> 729            case CAN_txcallback:
> 730              msg.body.bus->TxCallback(&msg.body.frame, true);
> 731              break;
> 732            case CAN_txfailedcallback:
> 733              msg.body.bus->TxCallback(&msg.body.frame, false);
> 734              msg.body.bus->LogStatus(CAN_LogStatus_Error);
>
>
> As the frame callback invocation is secured against null, it must have been a missing initialization of frame.callback. So I checked our source tree for uses
> of CAN_frame_t without memset or zero initialization and found one in obd2ecu::IncomingFrame() -- causing this crash I assume -- and two more in the can rx/tx
> shell commands.
>
> Fix is in commit https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/34302254567b15a8c0f9dfb730b5cf24703ac700
>
> …and build in https://ovms.dexters-web.de/firmware/ota/v3.1/edge/
>
> Greg, please test again.
>
> Regards,
> Michael
>
>
> Am 13.12.19 um 07:09 schrieb Greg D.:
>> Hi Michael,
>>
>> Ok, so I believe I loaded your Edge build, and the issue remains.  To verify, this is the string reported by the web Status page:
>>
>> 3.2.007-102-g07440970/ota_1/main (build idf v3.3-beta3-774-g6652269e1 Dec 12 2019 11:59:14)
>>
>> Following my earlier email, I left the configuration as-is, including launching the obd2ecu task, but with the attached OBDII module removed, and the crash
>> was resolved.  Rebooting the module had it come up running normally.  I then plugged in the OBDII module into the OVMSv3, and OVMS crashed the instant the
>> first it started polling.  The device is a SyncUp Drive, which is a telematics and WiFi hotspot.  Connecting the OBDII device with 12v power off, as
>> expected, did nothing.  Power it on, wait a few seconds to boot, and OVMS crashes immediately.
>>
>> New console log, attached.  It shows the module booting normally (OBDII device not present), followed by the device being attached (indicated in the log),
>> and the subsequent crash.  I removed the device right after the boot started, so as to let the OVMS boot normally.
>>
>> So, clearly the obd2ecu task it is receiving frames on CAN3 just fine, but transmits back out on that same bus aren't working.  Is this something I need to
>> address within obd2ecu, or is it in the core code?  I've not changed obd2ecu in quite some time.
>>
>> Greg
>>
>>
>> Michael Balzer wrote:
>>> Fix commit is https://github.com/openvehicles/Open-Vehicle-Monitoring-System-3/commit/8fef3f5fb6eec9fca26997e01d9c7d08080cdc06
>>>
>>> Greg, please test.
>>>
>>> New build with SPIRAM fix: https://ovms.dexters-web.de/firmware/ota/v3.1/edge/
>>>
>>> Regards,
>>> Michael
>>>
>>>
>>> Am 12.12.19 um 11:17 schrieb Michael Balzer:
>>>> Last line of a2l was missing:
>>>>
>>>> #!/bin/bash
>>>> elf=~/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/build/ovms3.elf
>>>> for adr in $* ; do
>>>>   if [[ "$adr" =~ "elf" ]] ; then
>>>>     elf="$adr"
>>>>   else
>>>>     cmd+=" -ex 'l *${adr/:*/}'"
>>>>   fi
>>>> done
>>>> cmd+=" -ex 'q'"
>>>> echo "Using elf file: $elf"
>>>> eval xtensa-esp32-elf-gdb -batch $elf $cmd 2>/dev/null
>>>>
>>>>
>>>>
>>>> Am 12.12.19 um 11:12 schrieb Michael Balzer:
>>>>> Mark,
>>>>>
>>>>> 1) maybe I missed posting my later version of a2l, which automatically strips the stack pointers from the ":" address syntax so you can copy&paste the USB
>>>>> output. Here it is:
>>>>>
>>>>> #!/bin/bash
>>>>> elf=~/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3/build/ovms3.elf
>>>>> for adr in $* ; do
>>>>>   if [[ "$adr" =~ "elf" ]] ; then
>>>>>     elf="$adr"
>>>>>   else
>>>>>     cmd+=" -ex 'l *${adr/:*/}'"
>>>>>   fi
>>>>> done
>>>>> cmd+=" -ex 'q'"
>>>>> echo "Using elf file: $elf"
>>>>>
>>>>>
>>>>> 2) You're right, the bug is tx_frame with null origin overwriting body.bus in the union. I didn't notice that when checking Marko's submission.
>>>>>
>>>>> tx_frame is a copy of the last frame given to the bus for transmission. The queue msg is gone when the TX done IRQ comes in, and Marko needed a copy of
>>>>> the frame the TX IRQ relates to. I asked him (see PR discussion), he checked and confirmed that all TX is done sequentially, so a single buffer is sufficient.
>>>>>
>>>>> Swapping the lines would work. The frame.origin also shouldn't be null, but the handler should tolerate that. …oops, tx_frame also doesn't get initialized
>>>>> in the canbus constructor, so there's also potentially garbage in tx_frame if due to some bug a TX IRQ is generated or processed without a previous tx.
>>>>>
>>>>> I'll do the fixes… and also rename tx_frame to m_tx_frame for consistency.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 12.12.19 um 01:20 schrieb Mark Webb-Johnson:
>>>>>> Two issues:
>>>>>>
>>>>>> _*1] A2L*_
>>>>>>
>>>>>> My a2l is this:
>>>>>>
>>>>>>     #!/bin/bash
>>>>>>     elf=3.2.007.ovms3.elf
>>>>>>     for adr in $* ; do
>>>>>>       if [[ "$adr" =~ "elf" ]] ; then
>>>>>>         elf="$adr"
>>>>>>       else
>>>>>>         cmd+=" -ex 'l *$adr'"
>>>>>>       fi
>>>>>>     done
>>>>>>     cmd+=" -ex 'q'"
>>>>>>     echo "Using elf file: $elf"
>>>>>>     echo "xtensa-esp32-elf-gdb -batch $elf $cmd"
>>>>>>     eval xtensa-esp32-elf-gdb -batch $elf $cmd 2>/dev/null #| grep "^0x.* is in "
>>>>>>
>>>>>>
>>>>>> When I run it, I get:
>>>>>>
>>>>>>     $ a2l 3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
>>>>>>     Using elf file: 3.2.007.ovms3.elf
>>>>>>     xtensa-esp32-elf-gdb -batch 3.2.007.ovms3.elf  -ex 'l *0x400d5e4c:0x3ffc5c40' -ex 'l *0x7ffffffd:0x3ffc5c90' -ex ‘q'
>>>>>>
>>>>>>
>>>>>> And a manual run gives:
>>>>>>
>>>>>>     $ xtensa-esp32-elf-gdb -batch 3.2.007.ovms3.elf  -ex 'l *0x400d5e4c:0x3ffc5c40' -ex 'l *0x7ffffffd:0x3ffc5c90' -ex 'q'
>>>>>>     Junk at end of line specification.
>>>>>>
>>>>>>     $ xtensa-esp32-elf-gdb 3.2.007.ovms3.elf
>>>>>>     GNU gdb (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 7.10
>>>>>>     (gdb) l *0x400d5e4c:0x3ffc5c40
>>>>>>     Junk at end of line specification.
>>>>>>     (gdb) l *0x7ffffffd:0x3ffc5c90
>>>>>>     (gdb) quit
>>>>>>
>>>>>>
>>>>>> _*2] Frame origin*_
>>>>>>
>>>>>> Regarding the CAN_txcallback, I think you are correct. And both generators of that message set the frame correctly. So my initial thought was that it is
>>>>>> either a memory corruption, or somewhere else sending a frame with garbage data.
>>>>>>
>>>>>> I do see this technique used in both the mcp2515 and esp32can drivers:
>>>>>>
>>>>>>     msg.body.bus = me;
>>>>>>     msg.body.frame = me->tx_frame;
>>>>>>
>>>>>>
>>>>>> I don’t normally just copy structures over like that. I memcpy() them, but I guess it must work. However, as our CAN_queue_msg_t is a union of
>>>>>> CAN_frame_t (frame, with first member of the structure a canbus*) and 'canbus* bus’, that is a little worrying. It seems that the msg.body.bus will get
>>>>>> overwritten with whatever is in msg.body.frame.origin. I can’t see anywhere that tx_frame.origin is set - which is scary because that would mean it is
>>>>>> always random junk.
>>>>>>
>>>>>> Maybe it works if tx_frame.origin is set to the bus before anything else, but not in Greg’s circumstances where something else arrives first. Perhaps
>>>>>> related to obd2ecu? I only see tx_frame set in can.cpp canbus::Write. I don’t really understand the tx_frame approach at all, and why the frame is just
>>>>>> not passed on the queue.
>>>>>>
>>>>>>     // CAN Frame
>>>>>>     // Note: Take care changing this structure, as it is a union with
>>>>>>     // CAN_log_message_t and position of 'origin' is fixed.
>>>>>>     struct CAN_frame_t
>>>>>>       {
>>>>>>       canbus*     origin;                   // Origin of the frame
>>>>>>       CanFrameCallback * callback;          // Frame-specific callback. Is called when this frame is successfully sent (or sending failed)
>>>>>>       CAN_FIR_t   FIR;                      // Frame information record
>>>>>>       uint32_t    MsgID;                    // Message ID
>>>>>>       union
>>>>>>         {
>>>>>>         uint8_t   u8[8];                    // Payload byte access
>>>>>>         uint32_t  u32[2];                   // Payload u32 access (Att: little endian!)
>>>>>>         uint64_t  u64;                      // Payload u64 access (Att: little endian!)
>>>>>>         } data;
>>>>>>
>>>>>>       esp_err_t Write(canbus* bus=NULL, TickType_t maxqueuewait=0);  // bus: NULL=origin
>>>>>>       };
>>>>>>
>>>>>>     // CAN message
>>>>>>     typedef struct
>>>>>>       {
>>>>>>       CAN_queue_type_t type;
>>>>>>       union
>>>>>>         {
>>>>>>         CAN_frame_t frame;  // CAN_frame
>>>>>>         canbus* bus;
>>>>>>         } body;
>>>>>>       } CAN_queue_msg_t;
>>>>>>
>>>>>>
>>>>>> This approach seems to date back to the swcan support merge (f94ae5a1b).
>>>>>>
>>>>>> Should we swap around, and set the msg.body.bus after msg.body.frame? Or am I missing something…
>>>>>>
>>>>>> Regards, Mark.
>>>>>>
>>>>>>> On 12 Dec 2019, at 2:29 AM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>
>>>>>>> Mark,
>>>>>>>
>>>>>>> good example why not to use addr2line: I think that result is wrong. a2l uses gdb which gives:
>>>>>>>
>>>>>>> balzer at leela:~/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3> a2l tmp/3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
>>>>>>> Using elf file: tmp/3.2.007.ovms3.elf
>>>>>>> 0x400d5e4c is in CAN_rxtask(void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-3.1/vehicle/OVMS.V3/components/can/src/can.cpp:730).
>>>>>>> 725                  me->IncomingFrame(&msg.body.frame);
>>>>>>> 726                } while (loop);
>>>>>>> 727              break;
>>>>>>> 728              }
>>>>>>> 729            case CAN_txcallback:
>>>>>>> 730              msg.body.bus->TxCallback(&msg.body.frame, true);
>>>>>>> 731              break;
>>>>>>> 732            case CAN_txfailedcallback:
>>>>>>> 733              msg.body.bus->TxCallback(&msg.body.frame, false);
>>>>>>> 734              msg.body.bus->LogStatus(CAN_LogStatus_Error);
>>>>>>>
>>>>>>> …and that actually makes sense and matches the register dump.
>>>>>>>
>>>>>>> If I read the gdb disassembly correctly, A10 = msg.body.bus, so Greg's got a CAN_txcallback msg without a bus.
>>>>>>>
>>>>>>> Hardening the rxtask against null here would probably avoid the crash, but I don't see yet how that could be possible.
>>>>>>> Both esp32can and mcp2515 set the bus field to their object addresses, which cannot be null.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Am 11.12.19 um 13:45 schrieb Mark Webb-Johnson:
>>>>>>>> Can’t get a2l working at the moment. The addr2line gives:
>>>>>>>>
>>>>>>>>     addr2line -e 3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
>>>>>>>>     /home/openvehicles/build/Open-Vehicle-Monitoring-System-3.1/vehicle/OVMS.V3/components/can/src/can.cpp:551
>>>>>>>>
>>>>>>>>
>>>>>>>> That is:
>>>>>>>>
>>>>>>>>     void canbus::LogInfo(CAN_log_type_t type, const char* text)
>>>>>>>>       {
>>>>>>>>       MyCan.LogInfo(this, type, text);    <—— HERE
>>>>>>>>       }
>>>>>>>>
>>>>>>>>
>>>>>>>> ELF is at:
>>>>>>>>
>>>>>>>>     http://api.openvehicles.com/firmware/ota/v3.2/main/3.2.007.ovms3.elf
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards, Mark.
>>>>>>>>
>>>>>>>>> On 11 Dec 2019, at 5:20 PM, Michael Balzer <dexter at expeedo.de <mailto:dexter at expeedo.de>> wrote:
>>>>>>>>>
>>>>>>>>> Mark,
>>>>>>>>>
>>>>>>>>> Greg uses your build, the crash point seems to be consistent, can you post the a2l on this?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>> PS: Greg, would you mind switching to EAP to beta test future releases?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 11.12.19 um 04:33 schrieb Greg D.:
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> Well, the module updated to 3.2.007 last night.  I just checked on it,
>>>>>>>>>> and it appears that it didn't exactly survive.  Crashed while running
>>>>>>>>>> the autoconfig script.  Log with two cycles attached.
>>>>>>>>>>
>>>>>>>>>> I tried renaming the /store/events directory to /store/was_events since
>>>>>>>>>> it seems like Duktape was getting in the way, but that didn't resolve
>>>>>>>>>> the crash.  I manually enabled wifi so I could manage the module, and
>>>>>>>>>> moved it back to 3.2.005 from the other partition.  Seems stable again.
>>>>>>>>>>
>>>>>>>>>> The car is going into the Service Center tomorrow (the perennial issue
>>>>>>>>>> with 1146 alerts), so I need to have the module stable so that I can
>>>>>>>>>> keep an eye on it.  Going to leave it on 3.2.005 for now, unless someone
>>>>>>>>>> has a quick fix in the next few hours...
>>>>>>>>>>
>>>>>>>>>> Otherwise, any ideas for troubleshooting after I get the car back back
>>>>>>>>>> (hopefully end of day)?
>>>>>>>>>>
>>>>>>>>>> Greg
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> OvmsDev mailing list
>>>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>>>> _______________________________________________
>>>>>>>>> OvmsDev mailing list
>>>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> OvmsDev mailing list
>>>>>>>> OvmsDev at lists.openvehicles.com
>>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>>
>>>>>>> -- 
>>>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>>> _______________________________________________
>>>>>>> OvmsDev mailing list
>>>>>>> OvmsDev at lists.openvehicles.com <mailto:OvmsDev at lists.openvehicles.com>
>>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> OvmsDev mailing list
>>>>>> OvmsDev at lists.openvehicles.com
>>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>>
>>>>> -- 
>>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>>
>>>>> _______________________________________________
>>>>> OvmsDev mailing list
>>>>> OvmsDev at lists.openvehicles.com
>>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>>
>>>> -- 
>>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>>
>>>> _______________________________________________
>>>> OvmsDev mailing list
>>>> OvmsDev at lists.openvehicles.com
>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>>
>>> -- 
>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>>>
>>>
>>> _______________________________________________
>>> OvmsDev mailing list
>>> OvmsDev at lists.openvehicles.com
>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>>
>>
>> _______________________________________________
>> OvmsDev mailing list
>> OvmsDev at lists.openvehicles.com
>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
>
> -- 
> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
>
> _______________________________________________
> OvmsDev mailing list
> OvmsDev at lists.openvehicles.com
> http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvehicles.com/pipermail/ovmsdev/attachments/20191213/e1b1f2b1/attachment.htm>


More information about the OvmsDev mailing list