My concern is data corruption and crashes produced by the bug.

How about keeping it unmerged until the release but doing your factory and OTA builds with the fix until then?

Regards,
Michael


Am 15.12.19 um 13:34 schrieb Mark Webb-Johnson:
My concern for this is that once we do it ALL developers have to also run that version of toolchain. Can we wait for Espressif to formally release this (as that seems to be happening soon)?

Regards, Mark.

On 13 Dec 2019, at 8:32 PM, Michael Balzer <dexter@expeedo.de> wrote:

Mark,

I also recommend merging the SPIRAM fix into master now and building the factory release on that base as well (see my post Re: [Ovmsdev] Time for 3.2.006?).

I've been releasing all builds on my server including the fix since June, so there is no need for further tests.

Regards,
Michael


Am 13.12.19 um 09:35 schrieb Michael Balzer:
Mark,

having a look at Greg's new report now, which differs from the first. Please wait for my results.

Regards,
Michael


Am 13.12.19 um 06:05 schrieb Mark Webb-Johnson:
Michael,

It all looks good.

I have tagged this as 3.2.008, and sent it to EAP. Plan is to send to MAIN this weekend.

The rush is to avoid another of the new 8MB PSRAM users updating OTA and messed up their module. Any objections?

Regards, Mark.

On 12 Dec 2019, at 6:12 PM, Michael Balzer <dexter@expeedo.de> wrote:

2) You're right, the bug is tx_frame with null origin overwriting body.bus in the union. I didn't notice that when checking Marko's submission.

tx_frame is a copy of the last frame given to the bus for transmission. The queue msg is gone when the TX done IRQ comes in, and Marko needed a copy of the frame the TX IRQ relates to. I asked him (see PR discussion), he checked and confirmed that all TX is done sequentially, so a single buffer is sufficient.

Swapping the lines would work. The frame.origin also shouldn't be null, but the handler should tolerate that. …oops, tx_frame also doesn't get initialized in the canbus constructor, so there's also potentially garbage in tx_frame if due to some bug a TX IRQ is generated or processed without a previous tx.

I'll do the fixes… and also rename tx_frame to m_tx_frame for consistency.

Regards,
Michael


Am 12.12.19 um 01:20 schrieb Mark Webb-Johnson:
Two issues:

1] A2L

My a2l is this:

#!/bin/bash
elf=3.2.007.ovms3.elf
for adr in $* ; do
  if [[ "$adr" =~ "elf" ]] ; then
    elf="$adr"
  else
    cmd+=" -ex 'l *$adr'"
  fi
done
cmd+=" -ex 'q'"
echo "Using elf file: $elf"
echo "xtensa-esp32-elf-gdb -batch $elf $cmd"
eval xtensa-esp32-elf-gdb -batch $elf $cmd 2>/dev/null #| grep "^0x.* is in "

When I run it, I get:

$ a2l 3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
Using elf file: 3.2.007.ovms3.elf
xtensa-esp32-elf-gdb -batch 3.2.007.ovms3.elf  -ex 'l *0x400d5e4c:0x3ffc5c40' -ex 'l *0x7ffffffd:0x3ffc5c90' -ex ‘q'

And a manual run gives:

$ xtensa-esp32-elf-gdb -batch 3.2.007.ovms3.elf  -ex 'l *0x400d5e4c:0x3ffc5c40' -ex 'l *0x7ffffffd:0x3ffc5c90' -ex 'q'
Junk at end of line specification.

xtensa-esp32-elf-gdb 3.2.007.ovms3.elf
GNU gdb (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 7.10
(gdb) l *0x400d5e4c:0x3ffc5c40
Junk at end of line specification.
(gdb) l *0x7ffffffd:0x3ffc5c90
(gdb) quit

2] Frame origin

Regarding the CAN_txcallback, I think you are correct. And both generators of that message set the frame correctly. So my initial thought was that it is either a memory corruption, or somewhere else sending a frame with garbage data.

I do see this technique used in both the mcp2515 and esp32can drivers:

msg.body.bus = me;
msg.body.frame = me->tx_frame;

I don’t normally just copy structures over like that. I memcpy() them, but I guess it must work. However, as our CAN_queue_msg_t is a union of CAN_frame_t (frame, with first member of the structure a canbus*) and 'canbus* bus’, that is a little worrying. It seems that the msg.body.bus will get overwritten with whatever is in msg.body.frame.origin. I can’t see anywhere that tx_frame.origin is set - which is scary because that would mean it is always random junk.

Maybe it works if tx_frame.origin is set to the bus before anything else, but not in Greg’s circumstances where something else arrives first. Perhaps related to obd2ecu? I only see tx_frame set in can.cpp canbus::Write. I don’t really understand the tx_frame approach at all, and why the frame is just not passed on the queue.

// CAN Frame
// Note: Take care changing this structure, as it is a union with
// CAN_log_message_t and position of 'origin' is fixed.
struct CAN_frame_t
  {
  canbus*     origin;                   // Origin of the frame
  CanFrameCallback * callback;          // Frame-specific callback. Is called when this frame is successfully sent (or sending failed)
  CAN_FIR_t   FIR;                      // Frame information record
  uint32_t    MsgID;                    // Message ID
  union
    {
    uint8_t   u8[8];                    // Payload byte access
    uint32_t  u32[2];                   // Payload u32 access (Att: little endian!)
    uint64_t  u64;                      // Payload u64 access (Att: little endian!)
    } data;

  esp_err_t Write(canbus* bus=NULL, TickType_t maxqueuewait=0);  // bus: NULL=origin
  };

// CAN message
typedef struct
  {
  CAN_queue_type_t type;
  union
    {
    CAN_frame_t frame;  // CAN_frame
    canbus* bus;
    } body;
  } CAN_queue_msg_t;

This approach seems to date back to the swcan support merge (f94ae5a1b).

Should we swap around, and set the msg.body.bus after msg.body.frame? Or am I missing something…

Regards, Mark.

On 12 Dec 2019, at 2:29 AM, Michael Balzer <dexter@expeedo.de> wrote:

Mark,

good example why not to use addr2line: I think that result is wrong. a2l uses gdb which gives:

balzer@leela:~/esp/Open-Vehicle-Monitoring-System-3/vehicle/OVMS.V3> a2l tmp/3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
Using elf file: tmp/3.2.007.ovms3.elf
0x400d5e4c is in CAN_rxtask(void*) (/home/openvehicles/build/Open-Vehicle-Monitoring-System-3.1/vehicle/OVMS.V3/components/can/src/can.cpp:730).
725                  me->IncomingFrame(&msg.body.frame);
726                } while (loop);
727              break;
728              }
729            case CAN_txcallback:
730              msg.body.bus->TxCallback(&msg.body.frame, true);
731              break;
732            case CAN_txfailedcallback:
733              msg.body.bus->TxCallback(&msg.body.frame, false);
734              msg.body.bus->LogStatus(CAN_LogStatus_Error);

…and that actually makes sense and matches the register dump.

If I read the gdb disassembly correctly, A10 = msg.body.bus, so Greg's got a CAN_txcallback msg without a bus.

Hardening the rxtask against null here would probably avoid the crash, but I don't see yet how that could be possible.
Both esp32can and mcp2515 set the bus field to their object addresses, which cannot be null.

Regards,
Michael


Am 11.12.19 um 13:45 schrieb Mark Webb-Johnson:
Can’t get a2l working at the moment. The addr2line gives:

addr2line -e 3.2.007.ovms3.elf 0x400d5e4c:0x3ffc5c40 0x7ffffffd:0x3ffc5c90
/home/openvehicles/build/Open-Vehicle-Monitoring-System-3.1/vehicle/OVMS.V3/components/can/src/can.cpp:551

That is:

void canbus::LogInfo(CAN_log_type_t type, const char* text)
  {
  MyCan.LogInfo(this, type, text);    <—— HERE
  }

ELF is at:


Regards, Mark.

On 11 Dec 2019, at 5:20 PM, Michael Balzer <dexter@expeedo.de> wrote:

Mark,

Greg uses your build, the crash point seems to be consistent, can you post the a2l on this?

Regards,
Michael

PS: Greg, would you mind switching to EAP to beta test future releases?


Am 11.12.19 um 04:33 schrieb Greg D.:
Hi folks,

Well, the module updated to 3.2.007 last night.  I just checked on it,
and it appears that it didn't exactly survive.  Crashed while running
the autoconfig script.  Log with two cycles attached.

I tried renaming the /store/events directory to /store/was_events since
it seems like Duktape was getting in the way, but that didn't resolve
the crash.  I manually enabled wifi so I could manage the module, and
moved it back to 3.2.005 from the other partition.  Seems stable again.

The car is going into the Service Center tomorrow (the perennial issue
with 1146 alerts), so I need to have the module stable so that I can
keep an eye on it.  Going to leave it on 3.2.005 for now, unless someone
has a quick fix in the next few hours...

Otherwise, any ideas for troubleshooting after I get the car back back
(hopefully end of day)?

Greg


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev


-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26

_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev


_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev

-- 
Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal
Fon 02333 / 833 5735 * Handy 0176 / 206 989 26