Am
11.06.2018 um 04:07
schrieb Mark
Webb-Johnson:
I think the
fault must be internal
(controller, or our
driver code), as the
can bus itself has
error checking.
The only other thing I
noticed on my review
is that in
ESP32CAN_rxframe there
is no check that a
frame is actually
available before
reading and queueing
it. If the RX
interrupt is spurious,
that could cause all
sorts of issues. I
think it would be
safer to have such a
check, and maybe that
is the cause of this
issue?
You mean we could be
reading the RX mailbox
a) without a new signal
and b) before the next
message is fully rolled
in.
But how would you check
that a frame is
available? I understand
from the documentation,
the RX IRQ isn't raised
unless the FIFO has a
valid message and (in
PeliCAN mode) has been
mapped to the buffer
address. I can't see how
we could check the frame
validity additionally to
that.
Setting CMR.RRB then
allows the chip to fetch
the next message from
the FIFO, which will
trigger another IR.RI as
soon as the buffer is
filled.
IR.RI is a level
interrupt, so if stuck
will trigger again. But
reading IR clears it. So
even if the ISR would be
called again due to an
IRQ handling bug, IR.RI
would be 0, so rxframe()
wouldn't be called.
…unless IR.RI was set
due to a hardware bug…
…or IR.RI could also be
set on a data overrun
(or other error) for an
invalid message. That
would mean IR.RI must be
discarded in case of an
error interrupt
coming in with IR.RI,
but I've seen nothing in
the documentation about
something like this.
I’m working
on the MCP2515 driver
at the moment, to try
to find out what is
causing the OBDII HUD
lock-up (and others
have seen). Perhaps
I’ll find out
something when
reviewing the driver
and shared CAN
library. Due to the
huge volume of traffic
on my Tesla Model S, I
get a large number of
overruns but don’t see
this problem. Maybe it
is rare, or we are
just not noticing it.
I assume the problem can
be triggered by poor
Wifi connectivity --
that's the situation at
Frank's home. I assume
the Wifi driver disables
interrupts or context
switches in some
situations, which causes
the CAN rx task
execution pauses.
Just a wild guess
though, I would check
the source for this
pattern if it was open.
Regards,
Michael
Regards, Mark
On
10 Jun 2018, at 9:05
PM, Michael Balzer
<dexter@expeedo.de>
wrote:
Frank Demes reported
a very strange CAN
behaviour. He
managed to get a
CRTD log of one
incident.
The effect is
triggered by an
unknown cause
freezing the CAN rx
task for short
periods of time,
i.e. 70 - 120 ms.
This alone should be
no problem, the
RXFIFO
will probaly
overflow, but that
is handled by our
driver, so except
some messages lost
this should not be
an issue.
Unfortunately,
something goes
completely wrong:
a) on the overrun
event, the last
valid message is
delivered multiple
times (i.e. 5 times)
b) after resolving
the overrun, a
garbled message is
received.
The garbled message
contains a valid
byte sequence in the
wrong place, i.e.
bytes 5-8 would be
valid if they were
bytes 1-4. Bytes 1-4
cannot have been
read
from the bus, so
something is going
very wrong here.
I have walked
through the esp32can
driver and checked
the SJA1000
documentation but
have no clue what is
going wrong.
Any ideas? CRTD
excerpt below.
(Frame 155 is
reflected by the
module with byte 1
changed from 07 to
01 to limit the
charge current. The
problem is, it also
reflects the garbled
message, which
causes the charger
to stop.)
Regards,
Michael
----------------------------------------------------------------------------------------
…
274.341 1R11 155 07
97 E4 54 4B A8 00 6C
274.341 1T11 155 01
97 E4 54 4B A8 00 6C
274.351 1R11 155 07
97 E5 54 4B A8 00 6C
274.351 1T11 155 01
97 E5 54 4B A8 00 6C
274.361 1R11 155 07
97 E5 54 4B A8 00 6C
274.361 1T11 155 01
97 E5 54 4B A8 00 6C
274.371 1R11 423 03
22 FF FF 00 E0 00 DB
274.371 1R11 426 00
38 01 00 4D 7E 00
274.371 1R11 597 20
A4 03 B1 2F 00 01 53
274.371 1R11 627 00
00 00
274.371 1R11 155 07
97 E5 54 4B A8 00 6C
274.371 1T11 155 01
97 E5 54 4B A8 00 6C
274.381 1R11 155 07
97 E5 54 4B A8 00 6C
274.381 1T11 155 01
97 E5 54 4B A8 00 6C
274.391 1R11 5D7 FF
FF 01 E4 53 80 00
- normal up to here
!!! CAN RX is
blocked here for ~70
ms !!!
… interrupts
disabled? by whom?
274.461 1R11 599 00
00 4D 7E 1E 26 FF FF
274.461 1R11 436 00
0E 3C B2 00 00
274.461 1R11 155 07
97 E5 54 4B A8 00 6C
274.461 1R11 424 11
40 15 26 47 64 00 48
274.461 1R11 425 0A
1D 44 FF FE 40 01 1F
274.461 1R11 556 30
63 07 30 73 07 30 7A
274.461 1T11 155 01
97 E5 54 4B A8 00 6C
← reflection
for 155 still works
here
!!! next RX pause,
~90 ms this time !!!
!!! ID 599 has a
period of 100 ms,
but we now get it 5
times:
274.551 1R11 599 00
00 4D 7E 1E 26 FF FF
274.551 1CEV Error
intr=58445
rxpkt=27132
txpkt=528128
errflags=0x15433
rxerr=0 txerr=0
rxovr=2 txovr=0
txdelay=17
crtd fprintf bug
compensation:
274.551 1CEV Error
rxpkt=58445
txpkt=27132
errflags=528128
intr=0x15433 rxerr=0
txerr=0 rxovr=2
txovr=0 txdelay=17
errflags=528128 =
08 0F 00
0x08 = IR: Data
overrun
0x0F = SR: Bus
OK, error limit not
yet reached, RX
& TX idle,
RXFIFO overrun, RX
buffer full
0x00 = ECC: -
274.551 1R11 599 00
00 4D 7E 1E 26 FF FF
274.551 1R11 599 00
00 4D 7E 1E 26 FF FF
274.551 1R11 599 00
00 4D 7E 1E 26 FF FF
274.551 1R11 599 00
00 4D 7E 1E 26 FF FF
274.551 1R11 155 07
97 E6 54 4B A8 00 6C
RX "A"
274.551 1R11 155 07
97 E6 54 4B A8 00 6C
RX "B"
274.551 1R11 155 07
97 E7 54 4B A8 00 6C
RX "C"
274.551 1R11 423 03
22 FF FF 00 E0 00 DB
274.551 1R11 426 00
38 01 00 4D 7E 00
274.551 1R11 597 20
A4 03 B1 2F 00 01 53
274.551 1R11 627 00
00 00
274.551 1T11 155 01
97 E6 54 4B A8 00 6C
reflection "A"
274.551 1CEV
TX_Queue T11 155 01
97 E6 54 4B A8 00 6C
reflection "B"
274.551 1CEV
TX_Queue T11 155 01
97 E7 54 4B A8 00 6C
reflection "C"
274.551 1T11 155 01
97 E8 54 4B A8 00 6C
reflection "D"
!!! 120 ms pause
!!!
274.671 1R11 155 07
97 E8 54 4B A8 00 6C
RX "D"
→ "D" was sent to
the listeners &
reflected by
vehicle::IncomingFrame()
then, before the
LogFrame() call was
processed, the CAN
task was paused for
120 ms
and now it's
getting really
weird:
274.671 1R11 155 07
08 2A A0 07 97 E7 54
RX "E": invalid frame / garbled message!
… this frame
certainly was not on
the bus!
bytes 2-4 (08 2A
A0) = complete
nonsense, not
matching any frame
on the bus
bytes 5-8 (07 97
E7 54) = normally
bytes 1-4 (!) of ID
155
- copy / memory
error? … very
unlikely
- CAN RX handling
error? … doesn't
seem so
- RXFIFO message
window bug?
274.671 1CEV Error
intr=58458
rxpkt=27134
txpkt=528128
errflags=0x15456
rxerr=0 txerr=0
rxovr=3 txovr=0
txdelay=19
274.671 1R11 155 07
97 E7 54 4B A8 00 6C
274.671 1R11 155 07
97 E7 54 4B A8 00 6C
274.671 1R11 155 07
97 E7 54 4B A8 00 6C
274.671 1T11 155 01
97 E6 54 4B A8 00 6C
274.671 1CEV
TX_Queue T11 155 01
97 E7 54 4B A8 00 6C
274.671 1R11 155 07
97 F2 54 4B A8 00 6C
274.671 1R11 155 07
97 F3 54 4B A8 00 6C
274.671 1R11 423 03
22 FF FF 00 E0 00 DB
274.671 1R11 426 00
38 01 00 4D 7E 00
274.671 1R11 597 20
A4 03 B1 2F 00 01 53
274.671 1R11 627 00
00 00
274.671 1R11 155 07
97 F5 54 4B A8 00 6C
274.671 1R11 155 07
97 F6 54 4B A8 00 6C
274.671 1R11 5D7 FF
FF 01 E4 53 80 00
274.671 1R11 599 00
00 4D 7E 1E 26 FF FF
274.671 1R11 155 07
97 F7 54 4B A8 00 6C
274.671 1R11 436 00
0E 3C B2 00 00
274.671 1R11 041 A0
07 97 F5 54 4B A8 00
274.671 1T11 155 01
97 E7 54 4B A8 00 6C
274.671 1T11 155 01
08 2A A0 07 97 E7 54
reflection of invalid msg "E" → charge terminates due
to error
--
Michael Balzer *
Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735
* Handy 0176 / 206
989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev
--
Michael Balzer *
Helkenberger Weg 9 *
D-58256 Ennepetal
Fon 02333 / 833 5735 *
Handy 0176 / 206 989 26
_______________________________________________
OvmsDev mailing list
OvmsDev@lists.openvehicles.com
http://lists.openvehicles.com/mailman/listinfo/ovmsdev