I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): TCPDUMP: 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c. 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c. 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 Traffic (as shown on PC the other end of the can log tcp connection): tx: #111 54 00 tx: #111 45 00 00 54 a8 5d 40 00 tx: #111 40 01 b8 32 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7d f8 tx: #111 5b 4c 00 01 53 61 b8 60 tx: #111 19 f5 0e 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37 rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 59 00 00 rx: #110 40 01 66 37 0a 0a 63 02 rx: #110 0a 0a 63 03 00 00 85 f8 rx: #110 5b 4c 00 01 53 61 b8 60 rx: #110 19 f5 0e 00 08 09 0a 0b rx: #110 0c 0d 0e 0f 10 11 12 13 rx: #110 14 15 16 17 18 19 1a 1b rx: #110 1c 1d 1e 1f 20 21 22 23 rx: #110 24 25 26 27 28 29 2a 2b rx: #110 2c 2d 2e 2f 30 31 32 33 rx: #110 34 35 36 37 CAN1 active: 1T11 111 54 00 1R11 110 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 1T11 111 45 00 00 54 a8 5d 40 00 1T11 111 40 01 b8 32 0a 0a 63 03 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 1T11 111 0a 0a 63 02 08 00 7d f8 1T11 111 5b 4c 00 01 53 61 b8 60 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 1T11 111 19 f5 0e 00 08 09 0a 0b 1T11 111 0c 0d 0e 0f 10 11 12 13 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 1T11 111 14 15 16 17 18 19 1a 1b 1T11 111 1c 1d 1e 1f 20 21 22 23 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 24 25 26 27 28 29 2a 2b 1T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 34 35 36 37 1R11 110 54 00 1R11 110 45 00 00 54 3a 59 00 00 1R11 110 40 01 66 37 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 85 f8 1R11 110 5b 4c 00 01 53 61 b8 60 1R11 110 19 f5 0e 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37 CAN2 listen: 2R11 111 54 00 2R11 110 2R11 111 45 00 00 54 a8 5d 40 00 2R11 111 40 01 b8 32 0a 0a 63 03 2R11 111 0a 0a 63 02 08 00 7d f8 2R11 111 5b 4c 00 01 53 61 b8 60 2R11 111 19 f5 0e 00 08 09 0a 0b 2R11 111 0c 0d 0e 0f 10 11 12 13 2R11 111 14 15 16 17 18 19 1a 1b 2R11 111 1c 1d 1e 1f 20 21 22 23 2R11 111 24 25 26 27 28 29 2a 2b 2R11 111 2c 2d 2e 2f 30 31 32 33 2R11 111 34 35 36 37 2R11 110 54 00 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 2R11 110 40 01 66 37 0a 0a 63 02 2R11 110 19 f5 0e 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 45 00 00 54 3a 59 00 00 Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): TCPDUMP: 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c. 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 Traffic (as shown on PC the other end of the can log tcp connection): tx: #111 54 00 tx: #111 45 00 00 54 e3 80 40 00 tx: #111 40 01 7d 0f 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7c c8 tx: #111 5b 61 00 01 f1 61 b8 60 tx: #111 8b 0f 00 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37 rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 5a 00 00 rx: #110 0a 0a 63 03 00 00 84 c8 rx: #110 8b 0f 00 00 08 09 0a 0b rx: #110 34 35 36 37 rx: #110 40 01 66 36 0a 0a 63 02 CAN2 active: 2T11 111 54 00 2R11 110 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 2T11 111 45 00 00 54 e3 80 40 00 2T11 111 40 01 7d 0f 0a 0a 63 03 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 2T11 111 0a 0a 63 02 08 00 7c c8 2T11 111 5b 61 00 01 f1 61 b8 60 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 2T11 111 8b 0f 00 00 08 09 0a 0b 2T11 111 0c 0d 0e 0f 10 11 12 13 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 2T11 111 14 15 16 17 18 19 1a 1b 2T11 111 1c 1d 1e 1f 20 21 22 23 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 24 25 26 27 28 29 2a 2b 2T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 34 35 36 37 2R11 110 54 00 2R11 110 45 00 00 54 3a 5a 00 00 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 2R11 110 0a 0a 63 03 00 00 84 c8 2R11 110 8b 0f 00 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 40 01 66 36 0a 0a 63 02 CAN1 listen: 1R11 111 54 00 1R11 110 1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37 1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37 Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. Here is that last CAN1 listen, with timestamps: 1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37 1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37 It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? Regards, Mark.
Mark, I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. Regards, Michael Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64
0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c.
0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84)
10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64
0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c.
0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 a8 5d 40 00
tx: #111 40 01 b8 32 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7d f8
tx: #111 5b 4c 00 01 53 61 b8 60
tx: #111 19 f5 0e 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 59 00 00
rx: #110 40 01 66 37 0a 0a 63 02
rx: #110 0a 0a 63 03 00 00 85 f8
rx: #110 5b 4c 00 01 53 61 b8 60
rx: #110 19 f5 0e 00 08 09 0a 0b
rx: #110 0c 0d 0e 0f 10 11 12 13
rx: #110 14 15 16 17 18 19 1a 1b
rx: #110 1c 1d 1e 1f 20 21 22 23
rx: #110 24 25 26 27 28 29 2a 2b
rx: #110 2c 2d 2e 2f 30 31 32 33
rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00
1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03
1T11 111 45 00 00 54 a8 5d 40 00
1T11 111 40 01 b8 32 0a 0a 63 03
1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60
1T11 111 0a 0a 63 02 08 00 7d f8
1T11 111 5b 4c 00 01 53 61 b8 60
1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
1T11 111 19 f5 0e 00 08 09 0a 0b
1T11 111 0c 0d 0e 0f 10 11 12 13
1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
1T11 111 14 15 16 17 18 19 1a 1b
1T11 111 1c 1d 1e 1f 20 21 22 23
1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 24 25 26 27 28 29 2a 2b
1T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 34 35 36 37
1R11 110 54 00
1R11 110 45 00 00 54 3a 59 00 00
1R11 110 40 01 66 37 0a 0a 63 02
1R11 110 0a 0a 63 03 00 00 85 f8
1R11 110 5b 4c 00 01 53 61 b8 60
1R11 110 19 f5 0e 00 08 09 0a 0b
1R11 110 0c 0d 0e 0f 10 11 12 13
1R11 110 14 15 16 17 18 19 1a 1b
1R11 110 1c 1d 1e 1f 20 21 22 23
1R11 110 24 25 26 27 28 29 2a 2b
1R11 110 2c 2d 2e 2f 30 31 32 33
1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00
2R11 110
2R11 111 45 00 00 54 a8 5d 40 00
2R11 111 40 01 b8 32 0a 0a 63 03
2R11 111 0a 0a 63 02 08 00 7d f8
2R11 111 5b 4c 00 01 53 61 b8 60
2R11 111 19 f5 0e 00 08 09 0a 0b
2R11 111 0c 0d 0e 0f 10 11 12 13
2R11 111 14 15 16 17 18 19 1a 1b
2R11 111 1c 1d 1e 1f 20 21 22 23
2R11 111 24 25 26 27 28 29 2a 2b
2R11 111 2c 2d 2e 2f 30 31 32 33
2R11 111 34 35 36 37
2R11 110 54 00
2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
2R11 110 40 01 66 37 0a 0a 63 02
2R11 110 19 f5 0e 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64
0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c.
0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.`
0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637
4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 e3 80 40 00
tx: #111 40 01 7d 0f 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7c c8
tx: #111 5b 61 00 01 f1 61 b8 60
tx: #111 8b 0f 00 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 5a 00 00
rx: #110 0a 0a 63 03 00 00 84 c8
rx: #110 8b 0f 00 00 08 09 0a 0b
rx: #110 34 35 36 37
rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00
2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03
2T11 111 45 00 00 54 e3 80 40 00
2T11 111 40 01 7d 0f 0a 0a 63 03
2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60
2T11 111 0a 0a 63 02 08 00 7c c8
2T11 111 5b 61 00 01 f1 61 b8 60
2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
2T11 111 8b 0f 00 00 08 09 0a 0b
2T11 111 0c 0d 0e 0f 10 11 12 13
2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
2T11 111 14 15 16 17 18 19 1a 1b
2T11 111 1c 1d 1e 1f 20 21 22 23
2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 24 25 26 27 28 29 2a 2b
2T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 34 35 36 37
2R11 110 54 00
2R11 110 45 00 00 54 3a 5a 00 00
2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5
txfail=0
wdgreset=0 errreset=0
2R11 110 0a 0a 63 03 00 00 84 c8
2R11 110 8b 0f 00 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the *only* traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem. Looking at the error flags I see: Error flag: 0x23401c01 intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt errflag 0x40 RX0OVR Rx buffer 0 overflow intflag 0x1c01 0x01 Implied from Rx buffer 0 full 0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called. As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames. I’ll work on improving the handling of this case. Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000>
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 a8 5d 40 00 tx: #111 40 01 b8 32 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7d f8 tx: #111 5b 4c 00 01 53 61 b8 60 tx: #111 19 f5 0e 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 59 00 00 rx: #110 40 01 66 37 0a 0a 63 02 rx: #110 0a 0a 63 03 00 00 85 f8 rx: #110 5b 4c 00 01 53 61 b8 60 rx: #110 19 f5 0e 00 08 09 0a 0b rx: #110 0c 0d 0e 0f 10 11 12 13 rx: #110 14 15 16 17 18 19 1a 1b rx: #110 1c 1d 1e 1f 20 21 22 23 rx: #110 24 25 26 27 28 29 2a 2b rx: #110 2c 2d 2e 2f 30 31 32 33 rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00 1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 1T11 111 45 00 00 54 a8 5d 40 00 1T11 111 40 01 b8 32 0a 0a 63 03 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 1T11 111 0a 0a 63 02 08 00 7d f8 1T11 111 5b 4c 00 01 53 61 b8 60 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 1T11 111 19 f5 0e 00 08 09 0a 0b 1T11 111 0c 0d 0e 0f 10 11 12 13 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 1T11 111 14 15 16 17 18 19 1a 1b 1T11 111 1c 1d 1e 1f 20 21 22 23 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 24 25 26 27 28 29 2a 2b 1T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 59 00 00 1R11 110 40 01 66 37 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 85 f8 1R11 110 5b 4c 00 01 53 61 b8 60 1R11 110 19 f5 0e 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00 2R11 110
2R11 111 45 00 00 54 a8 5d 40 00 2R11 111 40 01 b8 32 0a 0a 63 03 2R11 111 0a 0a 63 02 08 00 7d f8 2R11 111 5b 4c 00 01 53 61 b8 60 2R11 111 19 f5 0e 00 08 09 0a 0b 2R11 111 0c 0d 0e 0f 10 11 12 13 2R11 111 14 15 16 17 18 19 1a 1b 2R11 111 1c 1d 1e 1f 20 21 22 23 2R11 111 24 25 26 27 28 29 2a 2b 2R11 111 2c 2d 2e 2f 30 31 32 33 2R11 111 34 35 36 37
2R11 110 54 00 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 2R11 110 40 01 66 37 0a 0a 63 02 2R11 110 19 f5 0e 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 e3 80 40 00 tx: #111 40 01 7d 0f 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7c c8 tx: #111 5b 61 00 01 f1 61 b8 60 tx: #111 8b 0f 00 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 5a 00 00 rx: #110 0a 0a 63 03 00 00 84 c8 rx: #110 8b 0f 00 00 08 09 0a 0b rx: #110 34 35 36 37 rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00 2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 2T11 111 45 00 00 54 e3 80 40 00 2T11 111 40 01 7d 0f 0a 0a 63 03 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 2T11 111 0a 0a 63 02 08 00 7c c8 2T11 111 5b 61 00 01 f1 61 b8 60 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 2T11 111 8b 0f 00 00 08 09 0a 0b 2T11 111 0c 0d 0e 0f 10 11 12 13 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 2T11 111 14 15 16 17 18 19 1a 1b 2T11 111 1c 1d 1e 1f 20 21 22 23 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 24 25 26 27 28 29 2a 2b 2T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 34 35 36 37
2R11 110 54 00 2R11 110 45 00 00 54 3a 5a 00 00 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 2R11 110 0a 0a 63 03 00 00 84 c8 2R11 110 8b 0f 00 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
Mark, the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code. I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched. I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse. Regards, Michael Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116...
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the
Tesla
Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64
0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c.
0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84)
10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64
0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c.
0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 a8 5d 40 00
tx: #111 40 01 b8 32 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7d f8
tx: #111 5b 4c 00 01 53 61 b8 60
tx: #111 19 f5 0e 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 59 00 00
rx: #110 40 01 66 37 0a 0a 63 02
rx: #110 0a 0a 63 03 00 00 85 f8
rx: #110 5b 4c 00 01 53 61 b8 60
rx: #110 19 f5 0e 00 08 09 0a 0b
rx: #110 0c 0d 0e 0f 10 11 12 13
rx: #110 14 15 16 17 18 19 1a 1b
rx: #110 1c 1d 1e 1f 20 21 22 23
rx: #110 24 25 26 27 28 29 2a 2b
rx: #110 2c 2d 2e 2f 30 31 32 33
rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00
1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03
1T11 111 45 00 00 54 a8 5d 40 00
1T11 111 40 01 b8 32 0a 0a 63 03
1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60
1T11 111 0a 0a 63 02 08 00 7d f8
1T11 111 5b 4c 00 01 53 61 b8 60
1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
1T11 111 19 f5 0e 00 08 09 0a 0b
1T11 111 0c 0d 0e 0f 10 11 12 13
1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
1T11 111 14 15 16 17 18 19 1a 1b
1T11 111 1c 1d 1e 1f 20 21 22 23
1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 24 25 26 27 28 29 2a 2b
1T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 34 35 36 37
1R11 110 54 00
1R11 110 45 00 00 54 3a 59 00 00
1R11 110 40 01 66 37 0a 0a 63 02
1R11 110 0a 0a 63 03 00 00 85 f8
1R11 110 5b 4c 00 01 53 61 b8 60
1R11 110 19 f5 0e 00 08 09 0a 0b
1R11 110 0c 0d 0e 0f 10 11 12 13
1R11 110 14 15 16 17 18 19 1a 1b
1R11 110 1c 1d 1e 1f 20 21 22 23
1R11 110 24 25 26 27 28 29 2a 2b
1R11 110 2c 2d 2e 2f 30 31 32 33
1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00
2R11 110
2R11 111 45 00 00 54 a8 5d 40 00
2R11 111 40 01 b8 32 0a 0a 63 03
2R11 111 0a 0a 63 02 08 00 7d f8
2R11 111 5b 4c 00 01 53 61 b8 60
2R11 111 19 f5 0e 00 08 09 0a 0b
2R11 111 0c 0d 0e 0f 10 11 12 13
2R11 111 14 15 16 17 18 19 1a 1b
2R11 111 1c 1d 1e 1f 20 21 22 23
2R11 111 24 25 26 27 28 29 2a 2b
2R11 111 2c 2d 2e 2f 30 31 32 33
2R11 111 34 35 36 37
2R11 110 54 00
2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
2R11 110 40 01 66 37 0a 0a 63 02
2R11 110 19 f5 0e 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64
0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c.
0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.`
0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 e3 80 40 00
tx: #111 40 01 7d 0f 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7c c8
tx: #111 5b 61 00 01 f1 61 b8 60
tx: #111 8b 0f 00 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 5a 00 00
rx: #110 0a 0a 63 03 00 00 84 c8
rx: #110 8b 0f 00 00 08 09 0a 0b
rx: #110 34 35 36 37
rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00
2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03
2T11 111 45 00 00 54 e3 80 40 00
2T11 111 40 01 7d 0f 0a 0a 63 03
2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60
2T11 111 0a 0a 63 02 08 00 7c c8
2T11 111 5b 61 00 01 f1 61 b8 60
2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
2T11 111 8b 0f 00 00 08 09 0a 0b
2T11 111 0c 0d 0e 0f 10 11 12 13
2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
2T11 111 14 15 16 17 18 19 1a 1b
2T11 111 1c 1d 1e 1f 20 21 22 23
2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 24 25 26 27 28 29 2a 2b
2T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 34 35 36 37
2R11 110 54 00
2R11 110 45 00 00 54 3a 5a 00 00
2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0
2R11 110 0a 0a 63 03 00 00 84 c8
2R11 110 8b 0f 00 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the *only* traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status). // Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin); Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object). I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix. Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000>
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 a8 5d 40 00 tx: #111 40 01 b8 32 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7d f8 tx: #111 5b 4c 00 01 53 61 b8 60 tx: #111 19 f5 0e 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 59 00 00 rx: #110 40 01 66 37 0a 0a 63 02 rx: #110 0a 0a 63 03 00 00 85 f8 rx: #110 5b 4c 00 01 53 61 b8 60 rx: #110 19 f5 0e 00 08 09 0a 0b rx: #110 0c 0d 0e 0f 10 11 12 13 rx: #110 14 15 16 17 18 19 1a 1b rx: #110 1c 1d 1e 1f 20 21 22 23 rx: #110 24 25 26 27 28 29 2a 2b rx: #110 2c 2d 2e 2f 30 31 32 33 rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00 1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 1T11 111 45 00 00 54 a8 5d 40 00 1T11 111 40 01 b8 32 0a 0a 63 03 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 1T11 111 0a 0a 63 02 08 00 7d f8 1T11 111 5b 4c 00 01 53 61 b8 60 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 1T11 111 19 f5 0e 00 08 09 0a 0b 1T11 111 0c 0d 0e 0f 10 11 12 13 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 1T11 111 14 15 16 17 18 19 1a 1b 1T11 111 1c 1d 1e 1f 20 21 22 23 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 24 25 26 27 28 29 2a 2b 1T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 59 00 00 1R11 110 40 01 66 37 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 85 f8 1R11 110 5b 4c 00 01 53 61 b8 60 1R11 110 19 f5 0e 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00 2R11 110
2R11 111 45 00 00 54 a8 5d 40 00 2R11 111 40 01 b8 32 0a 0a 63 03 2R11 111 0a 0a 63 02 08 00 7d f8 2R11 111 5b 4c 00 01 53 61 b8 60 2R11 111 19 f5 0e 00 08 09 0a 0b 2R11 111 0c 0d 0e 0f 10 11 12 13 2R11 111 14 15 16 17 18 19 1a 1b 2R11 111 1c 1d 1e 1f 20 21 22 23 2R11 111 24 25 26 27 28 29 2a 2b 2R11 111 2c 2d 2e 2f 30 31 32 33 2R11 111 34 35 36 37
2R11 110 54 00 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 2R11 110 40 01 66 37 0a 0a 63 02 2R11 110 19 f5 0e 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 e3 80 40 00 tx: #111 40 01 7d 0f 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7c c8 tx: #111 5b 61 00 01 f1 61 b8 60 tx: #111 8b 0f 00 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 5a 00 00 rx: #110 0a 0a 63 03 00 00 84 c8 rx: #110 8b 0f 00 00 08 09 0a 0b rx: #110 34 35 36 37 rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00 2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 2T11 111 45 00 00 54 e3 80 40 00 2T11 111 40 01 7d 0f 0a 0a 63 03 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 2T11 111 0a 0a 63 02 08 00 7c c8 2T11 111 5b 61 00 01 f1 61 b8 60 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 2T11 111 8b 0f 00 00 08 09 0a 0b 2T11 111 0c 0d 0e 0f 10 11 12 13 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 2T11 111 14 15 16 17 18 19 1a 1b 2T11 111 1c 1d 1e 1f 20 21 22 23 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 24 25 26 27 28 29 2a 2b 2T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 34 35 36 37
2R11 110 54 00 2R11 110 45 00 00 54 3a 5a 00 00 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 2R11 110 0a 0a 63 03 00 00 84 c8 2R11 110 8b 0f 00 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order). I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it. So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this: D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00 The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get: Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence. By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order. By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order. I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames. Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast). I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it. Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000>
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 a8 5d 40 00 tx: #111 40 01 b8 32 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7d f8 tx: #111 5b 4c 00 01 53 61 b8 60 tx: #111 19 f5 0e 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 59 00 00 rx: #110 40 01 66 37 0a 0a 63 02 rx: #110 0a 0a 63 03 00 00 85 f8 rx: #110 5b 4c 00 01 53 61 b8 60 rx: #110 19 f5 0e 00 08 09 0a 0b rx: #110 0c 0d 0e 0f 10 11 12 13 rx: #110 14 15 16 17 18 19 1a 1b rx: #110 1c 1d 1e 1f 20 21 22 23 rx: #110 24 25 26 27 28 29 2a 2b rx: #110 2c 2d 2e 2f 30 31 32 33 rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00 1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 1T11 111 45 00 00 54 a8 5d 40 00 1T11 111 40 01 b8 32 0a 0a 63 03 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 1T11 111 0a 0a 63 02 08 00 7d f8 1T11 111 5b 4c 00 01 53 61 b8 60 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 1T11 111 19 f5 0e 00 08 09 0a 0b 1T11 111 0c 0d 0e 0f 10 11 12 13 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 1T11 111 14 15 16 17 18 19 1a 1b 1T11 111 1c 1d 1e 1f 20 21 22 23 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 24 25 26 27 28 29 2a 2b 1T11 111 2c 2d 2e 2f 30 31 32 33 1T11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 59 00 00 1R11 110 40 01 66 37 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 85 f8 1R11 110 5b 4c 00 01 53 61 b8 60 1R11 110 19 f5 0e 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00 2R11 110
2R11 111 45 00 00 54 a8 5d 40 00 2R11 111 40 01 b8 32 0a 0a 63 03 2R11 111 0a 0a 63 02 08 00 7d f8 2R11 111 5b 4c 00 01 53 61 b8 60 2R11 111 19 f5 0e 00 08 09 0a 0b 2R11 111 0c 0d 0e 0f 10 11 12 13 2R11 111 14 15 16 17 18 19 1a 1b 2R11 111 1c 1d 1e 1f 20 21 22 23 2R11 111 24 25 26 27 28 29 2a 2b 2R11 111 2c 2d 2e 2f 30 31 32 33 2R11 111 34 35 36 37
2R11 110 54 00 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 2R11 110 40 01 66 37 0a 0a 63 02 2R11 110 19 f5 0e 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00 tx: #111 45 00 00 54 e3 80 40 00 tx: #111 40 01 7d 0f 0a 0a 63 03 tx: #111 0a 0a 63 02 08 00 7c c8 tx: #111 5b 61 00 01 f1 61 b8 60 tx: #111 8b 0f 00 00 08 09 0a 0b tx: #111 0c 0d 0e 0f 10 11 12 13 tx: #111 14 15 16 17 18 19 1a 1b tx: #111 1c 1d 1e 1f 20 21 22 23 tx: #111 24 25 26 27 28 29 2a 2b tx: #111 2c 2d 2e 2f 30 31 32 33 tx: #111 34 35 36 37
rx: #110 rx: #110 54 00 rx: #110 45 00 00 54 3a 5a 00 00 rx: #110 0a 0a 63 03 00 00 84 c8 rx: #110 8b 0f 00 00 08 09 0a 0b rx: #110 34 35 36 37 rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00 2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 2T11 111 45 00 00 54 e3 80 40 00 2T11 111 40 01 7d 0f 0a 0a 63 03 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 2T11 111 0a 0a 63 02 08 00 7c c8 2T11 111 5b 61 00 01 f1 61 b8 60 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 2T11 111 8b 0f 00 00 08 09 0a 0b 2T11 111 0c 0d 0e 0f 10 11 12 13 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 2T11 111 14 15 16 17 18 19 1a 1b 2T11 111 1c 1d 1e 1f 20 21 22 23 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 24 25 26 27 28 29 2a 2b 2T11 111 2c 2d 2e 2f 30 31 32 33 2T11 111 34 35 36 37
2R11 110 54 00 2R11 110 45 00 00 54 3a 5a 00 00 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 2R11 110 0a 0a 63 03 00 00 84 c8 2R11 110 8b 0f 00 00 08 09 0a 0b 2R11 110 34 35 36 37 2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Mark, I've just found a spot-on post on this issue: https://www.microchip.com/forums/tm.aspx?m=620741 Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me. Regards, Michael Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call
out
of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
* Interrupt flags show buffer #0 has a frame. It is B1=54. Good. * Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. * Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame
immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116...
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps),
and
listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64
0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c.
0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84)
10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64
0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c.
0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 a8 5d 40 00
tx: #111 40 01 b8 32 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7d f8
tx: #111 5b 4c 00 01 53 61 b8 60
tx: #111 19 f5 0e 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 59 00 00
rx: #110 40 01 66 37 0a 0a 63 02
rx: #110 0a 0a 63 03 00 00 85 f8
rx: #110 5b 4c 00 01 53 61 b8 60
rx: #110 19 f5 0e 00 08 09 0a 0b
rx: #110 0c 0d 0e 0f 10 11 12 13
rx: #110 14 15 16 17 18 19 1a 1b
rx: #110 1c 1d 1e 1f 20 21 22 23
rx: #110 24 25 26 27 28 29 2a 2b
rx: #110 2c 2d 2e 2f 30 31 32 33
rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00
1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03
1T11 111 45 00 00 54 a8 5d 40 00
1T11 111 40 01 b8 32 0a 0a 63 03
1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60
1T11 111 0a 0a 63 02 08 00 7d f8
1T11 111 5b 4c 00 01 53 61 b8 60
1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
1T11 111 19 f5 0e 00 08 09 0a 0b
1T11 111 0c 0d 0e 0f 10 11 12 13
1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
1T11 111 14 15 16 17 18 19 1a 1b
1T11 111 1c 1d 1e 1f 20 21 22 23
1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 24 25 26 27 28 29 2a 2b
1T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 34 35 36 37
1R11 110 54 00
1R11 110 45 00 00 54 3a 59 00 00
1R11 110 40 01 66 37 0a 0a 63 02
1R11 110 0a 0a 63 03 00 00 85 f8
1R11 110 5b 4c 00 01 53 61 b8 60
1R11 110 19 f5 0e 00 08 09 0a 0b
1R11 110 0c 0d 0e 0f 10 11 12 13
1R11 110 14 15 16 17 18 19 1a 1b
1R11 110 1c 1d 1e 1f 20 21 22 23
1R11 110 24 25 26 27 28 29 2a 2b
1R11 110 2c 2d 2e 2f 30 31 32 33
1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00
2R11 110
2R11 111 45 00 00 54 a8 5d 40 00
2R11 111 40 01 b8 32 0a 0a 63 03
2R11 111 0a 0a 63 02 08 00 7d f8
2R11 111 5b 4c 00 01 53 61 b8 60
2R11 111 19 f5 0e 00 08 09 0a 0b
2R11 111 0c 0d 0e 0f 10 11 12 13
2R11 111 14 15 16 17 18 19 1a 1b
2R11 111 1c 1d 1e 1f 20 21 22 23
2R11 111 24 25 26 27 28 29 2a 2b
2R11 111 2c 2d 2e 2f 30 31 32 33
2R11 111 34 35 36 37
2R11 110 54 00
2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
2R11 110 40 01 66 37 0a 0a 63 02
2R11 110 19 f5 0e 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64
0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c.
0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.`
0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 e3 80 40 00
tx: #111 40 01 7d 0f 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7c c8
tx: #111 5b 61 00 01 f1 61 b8 60
tx: #111 8b 0f 00 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 5a 00 00
rx: #110 0a 0a 63 03 00 00 84 c8
rx: #110 8b 0f 00 00 08 09 0a 0b
rx: #110 34 35 36 37
rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00
2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03
2T11 111 45 00 00 54 e3 80 40 00
2T11 111 40 01 7d 0f 0a 0a 63 03
2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60
2T11 111 0a 0a 63 02 08 00 7c c8
2T11 111 5b 61 00 01 f1 61 b8 60
2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
2T11 111 8b 0f 00 00 08 09 0a 0b
2T11 111 0c 0d 0e 0f 10 11 12 13
2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
2T11 111 14 15 16 17 18 19 1a 1b
2T11 111 1c 1d 1e 1f 20 21 22 23
2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 24 25 26 27 28 29 2a 2b
2T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 34 35 36 37
2R11 110 54 00
2R11 110 45 00 00 54 3a 5a 00 00
2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0
2R11 110 0a 0a 63 03 00 00 84 c8
2R11 110 8b 0f 00 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the *only* traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
On 6/6/21 9:16 AM, Michael Balzer wrote:
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Leave it to chip designers to (a) identify issues with their implementations and (b) provide a features to work around the issues. Poking around github, this rollver enabled driver might be a good reference: https://github.com/collin80/esp32_can/ Looks like it was added less than a year ago via this PR: https://github.com/collin80/esp32_can/pull/22 I'm using an (obsolete) chip for the J1850/Class B network in my 6th generation Corvette. I am using it with an atmega (running at 20 MHz) but the bus is only 10.4 kb/s and the part (HIP710) SPI master which precludes using SPI for anything else. I've been working on a J1850/CAN bridge so that ovms can access the bus; I had to step up to the atmega644/atmega1284 to pick up a second UART which can be used in SPI master mode for use with a MCP2515. But I haven't quite debugged my SPI2 code... Craig
Craig, we've been using rollover all the time on the MCP2515. As we don't use filters, there is -in theory- no point in not using the second buffer. In reality though, the SPI slowness introduces a serious race issue, as new frames may arrive while the driver still processes the previous interrupt. That issue isn't addressed by that driver as well, in fact their handling is much worse than ours, even without rollover enabled. As you can see in their interrupt handler… https://github.com/collin80/esp32_can/blob/master/src/mcp2515.cpp#L892 …they read the interrupt flags, then check & read both buffers based on the flags read, then clear all (!) interrupt flags. So if an interrupt occurs for buffer 0, and the next frame comes in for buffer 1 right after the driver read the interrupts, their driver will only read buffer 0 and then clear all interrupts, thus losing that for buffer 1. SPI can be a SPITA… Regards, Michael Am 06.06.21 um 19:25 schrieb Craig Leres:
On 6/6/21 9:16 AM, Michael Balzer wrote:
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Leave it to chip designers to (a) identify issues with their implementations and (b) provide a features to work around the issues.
Poking around github, this rollver enabled driver might be a good reference:
https://github.com/collin80/esp32_can/
Looks like it was added less than a year ago via this PR:
https://github.com/collin80/esp32_can/pull/22
I'm using an (obsolete) chip for the J1850/Class B network in my 6th generation Corvette. I am using it with an atmega (running at 20 MHz) but the bus is only 10.4 kb/s and the part (HIP710) SPI master which precludes using SPI for anything else. I've been working on a J1850/CAN bridge so that ovms can access the bus; I had to step up to the atmega644/atmega1284 to pick up a second UART which can be used in SPI master mode for use with a MCP2515. But I haven't quite debugged my SPI2 code...
Craig
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I totally do not disagree with you. The handling is not very good at all - and I might have even written that part... Surely it could be handled better. However, nobody has fixed it yet. Perhaps I ought to create an issue on GitHub to remind myself that someone has to revisit that code at some point. One thing I would like to inject into this discussion, though, is that counting on proper in-order reception and no dropped frames is not generally advisable. CAN is best treated like UDP. Some packets might arrive out of order, some not at all. Any multiframe transmission over CAN really needs to include some sort of means to identify lost or out of order frames. You simply cannot count on CAN traffic to reliably get from point A to point B 100% of the time. This is especially true once you get onto a real bus where other frames are coming in and the bus is loaded up. It's one thing to try this on an empty bus and get it to work but that work is not likely to transition over reliably to "the real thing." It's best to not even entertain the notion that it could be possible. Some form of higher level protocol that keeps track of the order and sequence is necessary. Yes, I know technically no dropped frames and in order reception are possible if the drivers on both sides are properly coded. But, in practice it tends to be quite difficult to ensure this 100% of the time. Once again, this is JUST like UDP. It can be reliable... most of the time. But, it isn't guaranteed to be so and it's best to take proper precautions against the problem. This is why things like ISO-TP have sequence bytes. Also, see here: https://datatracker.ietf.org/doc/html/draft-cafi-can-ip-00 On Sun, Jun 6, 2021 at 4:16 PM Michael Balzer <dexter@expeedo.de> wrote:
That issue isn't addressed by that driver as well, in fact their handling is much worse than ours, even without rollover enabled. As you can see in their interrupt handler…
https://github.com/collin80/esp32_can/blob/master/src/mcp2515.cpp#L892
…they read the interrupt flags, then check & read both buffers based on the flags read, then clear all (!) interrupt flags.
So if an interrupt occurs for buffer 0, and the next frame comes in for buffer 1 right after the driver read the interrupts, their driver will only read buffer 0 and then clear all interrupts, thus losing that for buffer 1.
Collin, totally agreed. The CAN-IP draft you linked has a fragmentation protocol inspired by ISO-TP, including frame sequence numbering & flow control with frame timing. And if frames come in at an acceptable rate, the issue isn't triggered. I assume the much more simple IP-over-CAN protocol Mark tries to implement leaves this all to the transport layer (TCP/UDP) and is optimized for speed on a bus with known participants (i.e. the Tesla Roadster devices). I think the RX buffers of the MCP2515 were not meant to form a FIFO first place, they were meant to be used with separate acceptance filters. The whole rollover feature looks like some later addition to the chip design. IMO the spec sheet totally lacks an explanation regarding the proper handling of this, and the SPI command set lacks a specific command to read the buffers in sequence of reception or to retrieve that info. The other issue for us is the bad interrupt performance of the ESP32 / esp-idf. The ESP32 CAN controller has a real 64 byte FIFO _and_ is mapped into the ESP32 address space, but we still sometimes cannot read the FIFO fast enough to avoid overflows, even with 500 kbit. I still think some framework components (Wifi / SPI / …) sometimes disable interrupts for longer periods of time. Regards, Michael Am 07.06.21 um 15:44 schrieb Collin Kidder:
I totally do not disagree with you. The handling is not very good at all - and I might have even written that part... Surely it could be handled better. However, nobody has fixed it yet. Perhaps I ought to create an issue on GitHub to remind myself that someone has to revisit that code at some point.
One thing I would like to inject into this discussion, though, is that counting on proper in-order reception and no dropped frames is not generally advisable. CAN is best treated like UDP. Some packets might arrive out of order, some not at all. Any multiframe transmission over CAN really needs to include some sort of means to identify lost or out of order frames. You simply cannot count on CAN traffic to reliably get from point A to point B 100% of the time. This is especially true once you get onto a real bus where other frames are coming in and the bus is loaded up. It's one thing to try this on an empty bus and get it to work but that work is not likely to transition over reliably to "the real thing." It's best to not even entertain the notion that it could be possible. Some form of higher level protocol that keeps track of the order and sequence is necessary. Yes, I know technically no dropped frames and in order reception are possible if the drivers on both sides are properly coded. But, in practice it tends to be quite difficult to ensure this 100% of the time. Once again, this is JUST like UDP. It can be reliable... most of the time. But, it isn't guaranteed to be so and it's best to take proper precautions against the problem. This is why things like ISO-TP have sequence bytes. Also, see here: https://datatracker.ietf.org/doc/html/draft-cafi-can-ip-00
On Sun, Jun 6, 2021 at 4:16 PM Michael Balzer <dexter@expeedo.de> wrote:
That issue isn't addressed by that driver as well, in fact their handling is much worse than ours, even without rollover enabled. As you can see in their interrupt handler…
https://github.com/collin80/esp32_can/blob/master/src/mcp2515.cpp#L892
…they read the interrupt flags, then check & read both buffers based on the flags read, then clear all (!) interrupt flags.
So if an interrupt occurs for buffer 0, and the next frame comes in for buffer 1 right after the driver read the interrupts, their driver will only read buffer 0 and then clear all interrupts, thus losing that for buffer 1.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Colin, In the case of this specific protocol, I have no control over the protocol or the firmware at the other end. I guess they are relying on this being a point-to-point CAN bus connection on a private bus with known components at both ends, and the TCP/IP layer providing the error detection and correction. They have a dedicated bus used for this, at 1MHz, with only the VMS at one end and a diagnostic device at the other. They also seem to pace the communications with some sort of ‘go ahead’ signal. I don’t have any documentation on the protocol itself, and just trying to RE it. So far it seems to be simply: Client->Server: N bytes are coming Server->Client: OK, go ahead Client->Server: Send a sequence of CAN frame, totalling N bytes of data The data itself is simply an IP datagram (which can be fed directly into the IP stack using a TUN device on Linux - I still need to work out how to feed that into LWIP on ESP32, but it doesn’t seem hard). Strangely, I don’t seem to require the ‘OK, go ahead’ at the client side (for traffic coming the other way). I have no idea what it does if things get out of sequence. Perhaps that is the reason for the ‘OK, go ahead’ message (as, coupled with timeouts, it would allow for comms to be re-synced). The only reason I am doing this at all is that it provides the holy grail of full access to the VMS. Access to logs, configuration, etc. That said, improving our driver may help us in general, and provide a framework for others to look at. While UDP can deliver frames out of order (as it is multi-hop, with route potentially determined on a per-packet basis, and some routes may be quicker than others), can that happen with CAN? Missing frames on CAN is a definite possibility. That said, I do think the drivers should be written not to re-order frames arbitrarily. It is one thing for the communication channel to do that, but quite another for a driver. Regards, Mark.
On 7 Jun 2021, at 9:44 PM, Collin Kidder <collink@kkmfg.com> wrote:
I totally do not disagree with you. The handling is not very good at all - and I might have even written that part... Surely it could be handled better. However, nobody has fixed it yet. Perhaps I ought to create an issue on GitHub to remind myself that someone has to revisit that code at some point.
One thing I would like to inject into this discussion, though, is that counting on proper in-order reception and no dropped frames is not generally advisable. CAN is best treated like UDP. Some packets might arrive out of order, some not at all. Any multiframe transmission over CAN really needs to include some sort of means to identify lost or out of order frames. You simply cannot count on CAN traffic to reliably get from point A to point B 100% of the time. This is especially true once you get onto a real bus where other frames are coming in and the bus is loaded up. It's one thing to try this on an empty bus and get it to work but that work is not likely to transition over reliably to "the real thing." It's best to not even entertain the notion that it could be possible. Some form of higher level protocol that keeps track of the order and sequence is necessary. Yes, I know technically no dropped frames and in order reception are possible if the drivers on both sides are properly coded. But, in practice it tends to be quite difficult to ensure this 100% of the time. Once again, this is JUST like UDP. It can be reliable... most of the time. But, it isn't guaranteed to be so and it's best to take proper precautions against the problem. This is why things like ISO-TP have sequence bytes. Also, see here: https://datatracker.ietf.org/doc/html/draft-cafi-can-ip-00
On Sun, Jun 6, 2021 at 4:16 PM Michael Balzer <dexter@expeedo.de> wrote:
That issue isn't addressed by that driver as well, in fact their handling is much worse than ours, even without rollover enabled. As you can see in their interrupt handler…
https://github.com/collin80/esp32_can/blob/master/src/mcp2515.cpp#L892
…they read the interrupt flags, then check & read both buffers based on the flags read, then clear all (!) interrupt flags.
So if an interrupt occurs for buffer 0, and the next frame comes in for buffer 1 right after the driver read the interrupts, their driver will only read buffer 0 and then clear all interrupts, thus losing that for buffer 1.
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
Great find. I’m working on changing to use the 2 byte interrupt status call now, and will try to incorporate this approach.
On 7 Jun 2021, at 12:17 AM, Michael Balzer <dexter@expeedo.de> wrote:
Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116...
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: > > I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. > > I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): > > ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. > > Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). > > So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): > > TCPDUMP: > > 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) > 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 > 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c. > 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` > 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) > 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 > 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c. > 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` > 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > Traffic (as shown on PC the other end of the can log tcp connection): > > tx: #111 54 00 > tx: #111 45 00 00 54 a8 5d 40 00 > tx: #111 40 01 b8 32 0a 0a 63 03 > tx: #111 0a 0a 63 02 08 00 7d f8 > tx: #111 5b 4c 00 01 53 61 b8 60 > tx: #111 19 f5 0e 00 08 09 0a 0b > tx: #111 0c 0d 0e 0f 10 11 12 13 > tx: #111 14 15 16 17 18 19 1a 1b > tx: #111 1c 1d 1e 1f 20 21 22 23 > tx: #111 24 25 26 27 28 29 2a 2b > tx: #111 2c 2d 2e 2f 30 31 32 33 > tx: #111 34 35 36 37 > > rx: #110 > rx: #110 54 00 > rx: #110 45 00 00 54 3a 59 00 00 > rx: #110 40 01 66 37 0a 0a 63 02 > rx: #110 0a 0a 63 03 00 00 85 f8 > rx: #110 5b 4c 00 01 53 61 b8 60 > rx: #110 19 f5 0e 00 08 09 0a 0b > rx: #110 0c 0d 0e 0f 10 11 12 13 > rx: #110 14 15 16 17 18 19 1a 1b > rx: #110 1c 1d 1e 1f 20 21 22 23 > rx: #110 24 25 26 27 28 29 2a 2b > rx: #110 2c 2d 2e 2f 30 31 32 33 > rx: #110 34 35 36 37 > > CAN1 active: > > 1T11 111 54 00 > 1R11 110 > > 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 > 1T11 111 45 00 00 54 a8 5d 40 00 > 1T11 111 40 01 b8 32 0a 0a 63 03 > 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 > 1T11 111 0a 0a 63 02 08 00 7d f8 > 1T11 111 5b 4c 00 01 53 61 b8 60 > 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 > 1T11 111 19 f5 0e 00 08 09 0a 0b > 1T11 111 0c 0d 0e 0f 10 11 12 13 > 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 > 1T11 111 14 15 16 17 18 19 1a 1b > 1T11 111 1c 1d 1e 1f 20 21 22 23 > 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 > 1T11 111 24 25 26 27 28 29 2a 2b > 1T11 111 2c 2d 2e 2f 30 31 32 33 > 1T11 111 34 35 36 37 > > 1R11 110 54 00 > 1R11 110 45 00 00 54 3a 59 00 00 > 1R11 110 40 01 66 37 0a 0a 63 02 > 1R11 110 0a 0a 63 03 00 00 85 f8 > 1R11 110 5b 4c 00 01 53 61 b8 60 > 1R11 110 19 f5 0e 00 08 09 0a 0b > 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1R11 110 14 15 16 17 18 19 1a 1b > 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1R11 110 24 25 26 27 28 29 2a 2b > 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1R11 110 34 35 36 37 > > CAN2 listen: > > 2R11 111 54 00 > 2R11 110 > > 2R11 111 45 00 00 54 a8 5d 40 00 > 2R11 111 40 01 b8 32 0a 0a 63 03 > 2R11 111 0a 0a 63 02 08 00 7d f8 > 2R11 111 5b 4c 00 01 53 61 b8 60 > 2R11 111 19 f5 0e 00 08 09 0a 0b > 2R11 111 0c 0d 0e 0f 10 11 12 13 > 2R11 111 14 15 16 17 18 19 1a 1b > 2R11 111 1c 1d 1e 1f 20 21 22 23 > 2R11 111 24 25 26 27 28 29 2a 2b > 2R11 111 2c 2d 2e 2f 30 31 32 33 > 2R11 111 34 35 36 37 > > 2R11 110 54 00 > 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 > 2R11 110 40 01 66 37 0a 0a 63 02 > 2R11 110 19 f5 0e 00 08 09 0a 0b > 2R11 110 34 35 36 37 > 2R11 110 45 00 00 54 3a 59 00 00 > > Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. > > Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): > > TCPDUMP: > > 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) > 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 > 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c. > 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` > 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > Traffic (as shown on PC the other end of the can log tcp connection): > > tx: #111 54 00 > tx: #111 45 00 00 54 e3 80 40 00 > tx: #111 40 01 7d 0f 0a 0a 63 03 > tx: #111 0a 0a 63 02 08 00 7c c8 > tx: #111 5b 61 00 01 f1 61 b8 60 > tx: #111 8b 0f 00 00 08 09 0a 0b > tx: #111 0c 0d 0e 0f 10 11 12 13 > tx: #111 14 15 16 17 18 19 1a 1b > tx: #111 1c 1d 1e 1f 20 21 22 23 > tx: #111 24 25 26 27 28 29 2a 2b > tx: #111 2c 2d 2e 2f 30 31 32 33 > tx: #111 34 35 36 37 > > rx: #110 > rx: #110 54 00 > rx: #110 45 00 00 54 3a 5a 00 00 > rx: #110 0a 0a 63 03 00 00 84 c8 > rx: #110 8b 0f 00 00 08 09 0a 0b > rx: #110 34 35 36 37 > rx: #110 40 01 66 36 0a 0a 63 02 > > CAN2 active: > > 2T11 111 54 00 > 2R11 110 > > 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 > 2T11 111 45 00 00 54 e3 80 40 00 > 2T11 111 40 01 7d 0f 0a 0a 63 03 > 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 > 2T11 111 0a 0a 63 02 08 00 7c c8 > 2T11 111 5b 61 00 01 f1 61 b8 60 > 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 > 2T11 111 8b 0f 00 00 08 09 0a 0b > 2T11 111 0c 0d 0e 0f 10 11 12 13 > 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 > 2T11 111 14 15 16 17 18 19 1a 1b > 2T11 111 1c 1d 1e 1f 20 21 22 23 > 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 > 2T11 111 24 25 26 27 28 29 2a 2b > 2T11 111 2c 2d 2e 2f 30 31 32 33 > 2T11 111 34 35 36 37 > > 2R11 110 54 00 > 2R11 110 45 00 00 54 3a 5a 00 00 > 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 > 2R11 110 0a 0a 63 03 00 00 84 c8 > 2R11 110 8b 0f 00 00 08 09 0a 0b > 2R11 110 34 35 36 37 > 2R11 110 40 01 66 36 0a 0a 63 02 > > CAN1 listen: > > 1R11 111 54 00 > 1R11 110 > > 1R11 111 45 00 00 54 e3 80 40 00 > 1R11 111 40 01 7d 0f 0a 0a 63 03 > 1R11 111 0a 0a 63 02 08 00 7c c8 > 1R11 111 5b 61 00 01 f1 61 b8 60 > 1R11 111 8b 0f 00 00 08 09 0a 0b > 1R11 111 0c 0d 0e 0f 10 11 12 13 > 1R11 111 14 15 16 17 18 19 1a 1b > 1R11 111 1c 1d 1e 1f 20 21 22 23 > 1R11 111 24 25 26 27 28 29 2a 2b > 1R11 111 2c 2d 2e 2f 30 31 32 33 > 1R11 111 34 35 36 37 > > 1R11 110 54 00 > 1R11 110 45 00 00 54 3a 5a 00 00 > 1R11 110 40 01 66 36 0a 0a 63 02 > 1R11 110 0a 0a 63 03 00 00 84 c8 > 1R11 110 5b 61 00 01 f1 61 b8 60 > 1R11 110 8b 0f 00 00 08 09 0a 0b > 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1R11 110 14 15 16 17 18 19 1a 1b > 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1R11 110 24 25 26 27 28 29 2a 2b > 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1R11 110 34 35 36 37 > > Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. > > Here is that last CAN1 listen, with timestamps: > > 1622696433.080107 1R11 111 54 00 > 1622696433.081657 1R11 110 > > 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 > 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 > 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 > 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 > 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b > 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 > 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b > 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 > 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b > 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 > 1622696433.265937 1R11 111 34 35 36 37 > > 1622696433.269221 1R11 110 54 00 > 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 > 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 > 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 > 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 > 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b > 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b > 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b > 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1622696433.272452 1R11 110 34 35 36 37 > > It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? > > The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. > > I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. > > I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? > > Regards, Mark. > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.openvehicles.com > http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
An update on this. Working with another developer, we have made some changes in a ’spimaster’ branch: Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master. To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS). Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue. Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full. These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs. I’ll do some more testing over the next few days, and if no issues found merge back to master. Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000>
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: > > I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. > > I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): > > ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. > > Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). > > So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): > > TCPDUMP: > > 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) > 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 > 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. > 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` > 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) > 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 > 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. > 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` > 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > Traffic (as shown on PC the other end of the can log tcp connection): > > tx: #111 54 00 > tx: #111 45 00 00 54 a8 5d 40 00 > tx: #111 40 01 b8 32 0a 0a 63 03 > tx: #111 0a 0a 63 02 08 00 7d f8 > tx: #111 5b 4c 00 01 53 61 b8 60 > tx: #111 19 f5 0e 00 08 09 0a 0b > tx: #111 0c 0d 0e 0f 10 11 12 13 > tx: #111 14 15 16 17 18 19 1a 1b > tx: #111 1c 1d 1e 1f 20 21 22 23 > tx: #111 24 25 26 27 28 29 2a 2b > tx: #111 2c 2d 2e 2f 30 31 32 33 > tx: #111 34 35 36 37 > > rx: #110 > rx: #110 54 00 > rx: #110 45 00 00 54 3a 59 00 00 > rx: #110 40 01 66 37 0a 0a 63 02 > rx: #110 0a 0a 63 03 00 00 85 f8 > rx: #110 5b 4c 00 01 53 61 b8 60 > rx: #110 19 f5 0e 00 08 09 0a 0b > rx: #110 0c 0d 0e 0f 10 11 12 13 > rx: #110 14 15 16 17 18 19 1a 1b > rx: #110 1c 1d 1e 1f 20 21 22 23 > rx: #110 24 25 26 27 28 29 2a 2b > rx: #110 2c 2d 2e 2f 30 31 32 33 > rx: #110 34 35 36 37 > > CAN1 active: > > 1T11 111 54 00 > 1R11 110 > > 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 > 1T11 111 45 00 00 54 a8 5d 40 00 > 1T11 111 40 01 b8 32 0a 0a 63 03 > 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 > 1T11 111 0a 0a 63 02 08 00 7d f8 > 1T11 111 5b 4c 00 01 53 61 b8 60 > 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 > 1T11 111 19 f5 0e 00 08 09 0a 0b > 1T11 111 0c 0d 0e 0f 10 11 12 13 > 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 > 1T11 111 14 15 16 17 18 19 1a 1b > 1T11 111 1c 1d 1e 1f 20 21 22 23 > 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 > 1T11 111 24 25 26 27 28 29 2a 2b > 1T11 111 2c 2d 2e 2f 30 31 32 33 > 1T11 111 34 35 36 37 > > 1R11 110 54 00 > 1R11 110 45 00 00 54 3a 59 00 00 > 1R11 110 40 01 66 37 0a 0a 63 02 > 1R11 110 0a 0a 63 03 00 00 85 f8 > 1R11 110 5b 4c 00 01 53 61 b8 60 > 1R11 110 19 f5 0e 00 08 09 0a 0b > 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1R11 110 14 15 16 17 18 19 1a 1b > 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1R11 110 24 25 26 27 28 29 2a 2b > 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1R11 110 34 35 36 37 > > CAN2 listen: > > 2R11 111 54 00 > 2R11 110 > > 2R11 111 45 00 00 54 a8 5d 40 00 > 2R11 111 40 01 b8 32 0a 0a 63 03 > 2R11 111 0a 0a 63 02 08 00 7d f8 > 2R11 111 5b 4c 00 01 53 61 b8 60 > 2R11 111 19 f5 0e 00 08 09 0a 0b > 2R11 111 0c 0d 0e 0f 10 11 12 13 > 2R11 111 14 15 16 17 18 19 1a 1b > 2R11 111 1c 1d 1e 1f 20 21 22 23 > 2R11 111 24 25 26 27 28 29 2a 2b > 2R11 111 2c 2d 2e 2f 30 31 32 33 > 2R11 111 34 35 36 37 > > 2R11 110 54 00 > 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 > 2R11 110 40 01 66 37 0a 0a 63 02 > 2R11 110 19 f5 0e 00 08 09 0a 0b > 2R11 110 34 35 36 37 > 2R11 110 45 00 00 54 3a 59 00 00 > > Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. > > Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): > > TCPDUMP: > > 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) > 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 > 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. > 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` > 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ > 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# > 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 > 0x0050: 3435 3637 4567 > > Traffic (as shown on PC the other end of the can log tcp connection): > > tx: #111 54 00 > tx: #111 45 00 00 54 e3 80 40 00 > tx: #111 40 01 7d 0f 0a 0a 63 03 > tx: #111 0a 0a 63 02 08 00 7c c8 > tx: #111 5b 61 00 01 f1 61 b8 60 > tx: #111 8b 0f 00 00 08 09 0a 0b > tx: #111 0c 0d 0e 0f 10 11 12 13 > tx: #111 14 15 16 17 18 19 1a 1b > tx: #111 1c 1d 1e 1f 20 21 22 23 > tx: #111 24 25 26 27 28 29 2a 2b > tx: #111 2c 2d 2e 2f 30 31 32 33 > tx: #111 34 35 36 37 > > rx: #110 > rx: #110 54 00 > rx: #110 45 00 00 54 3a 5a 00 00 > rx: #110 0a 0a 63 03 00 00 84 c8 > rx: #110 8b 0f 00 00 08 09 0a 0b > rx: #110 34 35 36 37 > rx: #110 40 01 66 36 0a 0a 63 02 > > CAN2 active: > > 2T11 111 54 00 > 2R11 110 > > 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 > 2T11 111 45 00 00 54 e3 80 40 00 > 2T11 111 40 01 7d 0f 0a 0a 63 03 > 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 > 2T11 111 0a 0a 63 02 08 00 7c c8 > 2T11 111 5b 61 00 01 f1 61 b8 60 > 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 > 2T11 111 8b 0f 00 00 08 09 0a 0b > 2T11 111 0c 0d 0e 0f 10 11 12 13 > 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 > 2T11 111 14 15 16 17 18 19 1a 1b > 2T11 111 1c 1d 1e 1f 20 21 22 23 > 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 > 2T11 111 24 25 26 27 28 29 2a 2b > 2T11 111 2c 2d 2e 2f 30 31 32 33 > 2T11 111 34 35 36 37 > > 2R11 110 54 00 > 2R11 110 45 00 00 54 3a 5a 00 00 > 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 > 2R11 110 0a 0a 63 03 00 00 84 c8 > 2R11 110 8b 0f 00 00 08 09 0a 0b > 2R11 110 34 35 36 37 > 2R11 110 40 01 66 36 0a 0a 63 02 > > CAN1 listen: > > 1R11 111 54 00 > 1R11 110 > > 1R11 111 45 00 00 54 e3 80 40 00 > 1R11 111 40 01 7d 0f 0a 0a 63 03 > 1R11 111 0a 0a 63 02 08 00 7c c8 > 1R11 111 5b 61 00 01 f1 61 b8 60 > 1R11 111 8b 0f 00 00 08 09 0a 0b > 1R11 111 0c 0d 0e 0f 10 11 12 13 > 1R11 111 14 15 16 17 18 19 1a 1b > 1R11 111 1c 1d 1e 1f 20 21 22 23 > 1R11 111 24 25 26 27 28 29 2a 2b > 1R11 111 2c 2d 2e 2f 30 31 32 33 > 1R11 111 34 35 36 37 > > 1R11 110 54 00 > 1R11 110 45 00 00 54 3a 5a 00 00 > 1R11 110 40 01 66 36 0a 0a 63 02 > 1R11 110 0a 0a 63 03 00 00 84 c8 > 1R11 110 5b 61 00 01 f1 61 b8 60 > 1R11 110 8b 0f 00 00 08 09 0a 0b > 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1R11 110 14 15 16 17 18 19 1a 1b > 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1R11 110 24 25 26 27 28 29 2a 2b > 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1R11 110 34 35 36 37 > > Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. > > Here is that last CAN1 listen, with timestamps: > > 1622696433.080107 1R11 111 54 00 > 1622696433.081657 1R11 110 > > 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 > 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 > 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 > 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 > 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b > 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 > 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b > 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 > 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b > 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 > 1622696433.265937 1R11 111 34 35 36 37 > > 1622696433.269221 1R11 110 54 00 > 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 > 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 > 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 > 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 > 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b > 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 > 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b > 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 > 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b > 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 > 1622696433.272452 1R11 110 34 35 36 37 > > It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? > > The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. > > I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. > > I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? > > Regards, Mark. > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> > http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 <MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Hi Mark, Trying this again... (got a 554 blocked reply the first time). apologize if it's a duplicate. Will this fix the bus hangs we have with the HUD devices? Ref: "Can buses stop after some time" email thread from 5/28/2019... As I wrote at the time, a 100% way to reproduce the hang was to have a HUD device running, *then* start the OBD2ECU task. I haven't used my HUD in some time, so I don't know if this was fixed since then. If not, that was a very quick and easy way to reproduce the issue. Happy New Year! Greg On Mon, Jan 3, 2022 at 5:10 PM Mark Webb-Johnson <mark@webb-johnson.net> wrote:
An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
1. Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
2. To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
3. Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
4. Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
- Interrupt flags show buffer #0 has a frame. It is B1=54. Good. - Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. - Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago.
We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%.
The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in…
https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116...
…to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%.
Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match.
Regards, Michael
Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson:
I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times.
I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet):
ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment.
Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)).
So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps):
TCPDUMP:
05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64
0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c.
0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84)
10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64
0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c.
0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.`
0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 a8 5d 40 00
tx: #111 40 01 b8 32 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7d f8
tx: #111 5b 4c 00 01 53 61 b8 60
tx: #111 19 f5 0e 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 59 00 00
rx: #110 40 01 66 37 0a 0a 63 02
rx: #110 0a 0a 63 03 00 00 85 f8
rx: #110 5b 4c 00 01 53 61 b8 60
rx: #110 19 f5 0e 00 08 09 0a 0b
rx: #110 0c 0d 0e 0f 10 11 12 13
rx: #110 14 15 16 17 18 19 1a 1b
rx: #110 1c 1d 1e 1f 20 21 22 23
rx: #110 24 25 26 27 28 29 2a 2b
rx: #110 2c 2d 2e 2f 30 31 32 33
rx: #110 34 35 36 37
CAN1 active:
1T11 111 54 00
1R11 110
1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03
1T11 111 45 00 00 54 a8 5d 40 00
1T11 111 40 01 b8 32 0a 0a 63 03
1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60
1T11 111 0a 0a 63 02 08 00 7d f8
1T11 111 5b 4c 00 01 53 61 b8 60
1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
1T11 111 19 f5 0e 00 08 09 0a 0b
1T11 111 0c 0d 0e 0f 10 11 12 13
1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
1T11 111 14 15 16 17 18 19 1a 1b
1T11 111 1c 1d 1e 1f 20 21 22 23
1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 24 25 26 27 28 29 2a 2b
1T11 111 2c 2d 2e 2f 30 31 32 33
1T11 111 34 35 36 37
1R11 110 54 00
1R11 110 45 00 00 54 3a 59 00 00
1R11 110 40 01 66 37 0a 0a 63 02
1R11 110 0a 0a 63 03 00 00 85 f8
1R11 110 5b 4c 00 01 53 61 b8 60
1R11 110 19 f5 0e 00 08 09 0a 0b
1R11 110 0c 0d 0e 0f 10 11 12 13
1R11 110 14 15 16 17 18 19 1a 1b
1R11 110 1c 1d 1e 1f 20 21 22 23
1R11 110 24 25 26 27 28 29 2a 2b
1R11 110 2c 2d 2e 2f 30 31 32 33
1R11 110 34 35 36 37
CAN2 listen:
2R11 111 54 00
2R11 110
2R11 111 45 00 00 54 a8 5d 40 00
2R11 111 40 01 b8 32 0a 0a 63 03
2R11 111 0a 0a 63 02 08 00 7d f8
2R11 111 5b 4c 00 01 53 61 b8 60
2R11 111 19 f5 0e 00 08 09 0a 0b
2R11 111 0c 0d 0e 0f 10 11 12 13
2R11 111 14 15 16 17 18 19 1a 1b
2R11 111 1c 1d 1e 1f 20 21 22 23
2R11 111 24 25 26 27 28 29 2a 2b
2R11 111 2c 2d 2e 2f 30 31 32 33
2R11 111 34 35 36 37
2R11 110 54 00
2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0
2R11 110 40 01 66 37 0a 0a 63 02
2R11 110 19 f5 0e 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 45 00 00 54 3a 59 00 00
Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order.
Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps):
TCPDUMP:
06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64
0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@.}...c.
0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.`
0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................
0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0050: 3435 3637 4567
Traffic (as shown on PC the other end of the can log tcp connection):
tx: #111 54 00
tx: #111 45 00 00 54 e3 80 40 00
tx: #111 40 01 7d 0f 0a 0a 63 03
tx: #111 0a 0a 63 02 08 00 7c c8
tx: #111 5b 61 00 01 f1 61 b8 60
tx: #111 8b 0f 00 00 08 09 0a 0b
tx: #111 0c 0d 0e 0f 10 11 12 13
tx: #111 14 15 16 17 18 19 1a 1b
tx: #111 1c 1d 1e 1f 20 21 22 23
tx: #111 24 25 26 27 28 29 2a 2b
tx: #111 2c 2d 2e 2f 30 31 32 33
tx: #111 34 35 36 37
rx: #110
rx: #110 54 00
rx: #110 45 00 00 54 3a 5a 00 00
rx: #110 0a 0a 63 03 00 00 84 c8
rx: #110 8b 0f 00 00 08 09 0a 0b
rx: #110 34 35 36 37
rx: #110 40 01 66 36 0a 0a 63 02
CAN2 active:
2T11 111 54 00
2R11 110
2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03
2T11 111 45 00 00 54 e3 80 40 00
2T11 111 40 01 7d 0f 0a 0a 63 03
2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60
2T11 111 0a 0a 63 02 08 00 7c c8
2T11 111 5b 61 00 01 f1 61 b8 60
2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13
2T11 111 8b 0f 00 00 08 09 0a 0b
2T11 111 0c 0d 0e 0f 10 11 12 13
2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23
2T11 111 14 15 16 17 18 19 1a 1b
2T11 111 1c 1d 1e 1f 20 21 22 23
2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 24 25 26 27 28 29 2a 2b
2T11 111 2c 2d 2e 2f 30 31 32 33
2T11 111 34 35 36 37
2R11 110 54 00
2R11 110 45 00 00 54 3a 5a 00 00
2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0
2R11 110 0a 0a 63 03 00 00 84 c8
2R11 110 8b 0f 00 00 08 09 0a 0b
2R11 110 34 35 36 37
2R11 110 40 01 66 36 0a 0a 63 02
CAN1 listen:
1R11 111 54 00 1R11 110
1R11 111 45 00 00 54 e3 80 40 00 1R11 111 40 01 7d 0f 0a 0a 63 03 1R11 111 0a 0a 63 02 08 00 7c c8 1R11 111 5b 61 00 01 f1 61 b8 60 1R11 111 8b 0f 00 00 08 09 0a 0b 1R11 111 0c 0d 0e 0f 10 11 12 13 1R11 111 14 15 16 17 18 19 1a 1b 1R11 111 1c 1d 1e 1f 20 21 22 23 1R11 111 24 25 26 27 28 29 2a 2b 1R11 111 2c 2d 2e 2f 30 31 32 33 1R11 111 34 35 36 37
1R11 110 54 00 1R11 110 45 00 00 54 3a 5a 00 00 1R11 110 40 01 66 36 0a 0a 63 02 1R11 110 0a 0a 63 03 00 00 84 c8 1R11 110 5b 61 00 01 f1 61 b8 60 1R11 110 8b 0f 00 00 08 09 0a 0b 1R11 110 0c 0d 0e 0f 10 11 12 13 1R11 110 14 15 16 17 18 19 1a 1b 1R11 110 1c 1d 1e 1f 20 21 22 23 1R11 110 24 25 26 27 28 29 2a 2b 1R11 110 2c 2d 2e 2f 30 31 32 33 1R11 110 34 35 36 37
Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine.
Here is that last CAN1 listen, with timestamps:
1622696433.080107 1R11 111 54 00 1622696433.081657 1R11 110
1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 1622696433.265937 1R11 111 34 35 36 37
1622696433.269221 1R11 110 54 00 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 1622696433.272452 1R11 110 34 35 36 37
It is 1Mbps, with 30us or so between each packet. This is the *only* traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets?
The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test.
I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1.
I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver?
Regards, Mark.
_______________________________________________ OvmsDev mailing listOvmsDev@lists.openvehicles.comhttp://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
<MCP2515Calc-1000kbit.ods>
_______________________________________________ OvmsDev mailing listOvmsDev@lists.openvehicles.comhttp://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing listOvmsDev@lists.openvehicles.comhttp://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
I don’t know the cause of the HUD bus hang, so really not sure if it is resolved or not. The SPI driver is the standard ESP IDF one now, so should be stable. But what was the state of the bus when it hung? Perhaps some error condition set in the MCP2515? Regards, Mark
On 4 Jan 2022, at 10:46 AM, Greg D <gregd2350@gmail.com> wrote:
Hi Mark,
Trying this again... (got a 554 blocked reply the first time). apologize if it's a duplicate.
Will this fix the bus hangs we have with the HUD devices? Ref: "Can buses stop after some time" email thread from 5/28/2019... As I wrote at the time, a 100% way to reproduce the hang was to have a HUD device running, then start the OBD2ECU task. I haven't used my HUD in some time, so I don't know if this was fixed since then. If not, that was a very quick and easy way to reproduce the issue.
Happy New Year!
Greg
On Mon, Jan 3, 2022 at 5:10 PM Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote: An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: > > Signed PGP part > Mark, > > I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. > > We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. > > The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… > > https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000> > > …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. > > Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. > > Regards, > Michael > > > Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >> >> I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. >> >> I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): >> >> ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. >> >> Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >> >> So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): >> >> TCPDUMP: >> >> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) >> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23372, seq 1, length 64 >> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. >> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` >> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >> 0x0050: 3435 3637 4567 >> >> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) >> 10.10.99.2 > 10.10.99.3 <http://10.10.99.3/>: ICMP echo reply, id 23372, seq 1, length 64 >> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. >> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` >> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >> 0x0050: 3435 3637 4567 >> >> Traffic (as shown on PC the other end of the can log tcp connection): >> >> tx: #111 54 00 >> tx: #111 45 00 00 54 a8 5d 40 00 >> tx: #111 40 01 b8 32 0a 0a 63 03 >> tx: #111 0a 0a 63 02 08 00 7d f8 >> tx: #111 5b 4c 00 01 53 61 b8 60 >> tx: #111 19 f5 0e 00 08 09 0a 0b >> tx: #111 0c 0d 0e 0f 10 11 12 13 >> tx: #111 14 15 16 17 18 19 1a 1b >> tx: #111 1c 1d 1e 1f 20 21 22 23 >> tx: #111 24 25 26 27 28 29 2a 2b >> tx: #111 2c 2d 2e 2f 30 31 32 33 >> tx: #111 34 35 36 37 >> >> rx: #110 >> rx: #110 54 00 >> rx: #110 45 00 00 54 3a 59 00 00 >> rx: #110 40 01 66 37 0a 0a 63 02 >> rx: #110 0a 0a 63 03 00 00 85 f8 >> rx: #110 5b 4c 00 01 53 61 b8 60 >> rx: #110 19 f5 0e 00 08 09 0a 0b >> rx: #110 0c 0d 0e 0f 10 11 12 13 >> rx: #110 14 15 16 17 18 19 1a 1b >> rx: #110 1c 1d 1e 1f 20 21 22 23 >> rx: #110 24 25 26 27 28 29 2a 2b >> rx: #110 2c 2d 2e 2f 30 31 32 33 >> rx: #110 34 35 36 37 >> >> CAN1 active: >> >> 1T11 111 54 00 >> 1R11 110 >> >> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >> 1T11 111 45 00 00 54 a8 5d 40 00 >> 1T11 111 40 01 b8 32 0a 0a 63 03 >> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >> 1T11 111 0a 0a 63 02 08 00 7d f8 >> 1T11 111 5b 4c 00 01 53 61 b8 60 >> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >> 1T11 111 19 f5 0e 00 08 09 0a 0b >> 1T11 111 0c 0d 0e 0f 10 11 12 13 >> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >> 1T11 111 14 15 16 17 18 19 1a 1b >> 1T11 111 1c 1d 1e 1f 20 21 22 23 >> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >> 1T11 111 24 25 26 27 28 29 2a 2b >> 1T11 111 2c 2d 2e 2f 30 31 32 33 >> 1T11 111 34 35 36 37 >> >> 1R11 110 54 00 >> 1R11 110 45 00 00 54 3a 59 00 00 >> 1R11 110 40 01 66 37 0a 0a 63 02 >> 1R11 110 0a 0a 63 03 00 00 85 f8 >> 1R11 110 5b 4c 00 01 53 61 b8 60 >> 1R11 110 19 f5 0e 00 08 09 0a 0b >> 1R11 110 0c 0d 0e 0f 10 11 12 13 >> 1R11 110 14 15 16 17 18 19 1a 1b >> 1R11 110 1c 1d 1e 1f 20 21 22 23 >> 1R11 110 24 25 26 27 28 29 2a 2b >> 1R11 110 2c 2d 2e 2f 30 31 32 33 >> 1R11 110 34 35 36 37 >> >> CAN2 listen: >> >> 2R11 111 54 00 >> 2R11 110 >> >> 2R11 111 45 00 00 54 a8 5d 40 00 >> 2R11 111 40 01 b8 32 0a 0a 63 03 >> 2R11 111 0a 0a 63 02 08 00 7d f8 >> 2R11 111 5b 4c 00 01 53 61 b8 60 >> 2R11 111 19 f5 0e 00 08 09 0a 0b >> 2R11 111 0c 0d 0e 0f 10 11 12 13 >> 2R11 111 14 15 16 17 18 19 1a 1b >> 2R11 111 1c 1d 1e 1f 20 21 22 23 >> 2R11 111 24 25 26 27 28 29 2a 2b >> 2R11 111 2c 2d 2e 2f 30 31 32 33 >> 2R11 111 34 35 36 37 >> >> 2R11 110 54 00 >> 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 >> 2R11 110 40 01 66 37 0a 0a 63 02 >> 2R11 110 19 f5 0e 00 08 09 0a 0b >> 2R11 110 34 35 36 37 >> 2R11 110 45 00 00 54 3a 59 00 00 >> >> Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. >> >> Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): >> >> TCPDUMP: >> >> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) >> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23393, seq 1, length 64 >> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. >> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` >> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >> 0x0050: 3435 3637 4567 >> >> Traffic (as shown on PC the other end of the can log tcp connection): >> >> tx: #111 54 00 >> tx: #111 45 00 00 54 e3 80 40 00 >> tx: #111 40 01 7d 0f 0a 0a 63 03 >> tx: #111 0a 0a 63 02 08 00 7c c8 >> tx: #111 5b 61 00 01 f1 61 b8 60 >> tx: #111 8b 0f 00 00 08 09 0a 0b >> tx: #111 0c 0d 0e 0f 10 11 12 13 >> tx: #111 14 15 16 17 18 19 1a 1b >> tx: #111 1c 1d 1e 1f 20 21 22 23 >> tx: #111 24 25 26 27 28 29 2a 2b >> tx: #111 2c 2d 2e 2f 30 31 32 33 >> tx: #111 34 35 36 37 >> >> rx: #110 >> rx: #110 54 00 >> rx: #110 45 00 00 54 3a 5a 00 00 >> rx: #110 0a 0a 63 03 00 00 84 c8 >> rx: #110 8b 0f 00 00 08 09 0a 0b >> rx: #110 34 35 36 37 >> rx: #110 40 01 66 36 0a 0a 63 02 >> >> CAN2 active: >> >> 2T11 111 54 00 >> 2R11 110 >> >> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >> 2T11 111 45 00 00 54 e3 80 40 00 >> 2T11 111 40 01 7d 0f 0a 0a 63 03 >> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >> 2T11 111 0a 0a 63 02 08 00 7c c8 >> 2T11 111 5b 61 00 01 f1 61 b8 60 >> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >> 2T11 111 8b 0f 00 00 08 09 0a 0b >> 2T11 111 0c 0d 0e 0f 10 11 12 13 >> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >> 2T11 111 14 15 16 17 18 19 1a 1b >> 2T11 111 1c 1d 1e 1f 20 21 22 23 >> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >> 2T11 111 24 25 26 27 28 29 2a 2b >> 2T11 111 2c 2d 2e 2f 30 31 32 33 >> 2T11 111 34 35 36 37 >> >> 2R11 110 54 00 >> 2R11 110 45 00 00 54 3a 5a 00 00 >> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 >> 2R11 110 0a 0a 63 03 00 00 84 c8 >> 2R11 110 8b 0f 00 00 08 09 0a 0b >> 2R11 110 34 35 36 37 >> 2R11 110 40 01 66 36 0a 0a 63 02 >> >> CAN1 listen: >> >> 1R11 111 54 00 >> 1R11 110 >> >> 1R11 111 45 00 00 54 e3 80 40 00 >> 1R11 111 40 01 7d 0f 0a 0a 63 03 >> 1R11 111 0a 0a 63 02 08 00 7c c8 >> 1R11 111 5b 61 00 01 f1 61 b8 60 >> 1R11 111 8b 0f 00 00 08 09 0a 0b >> 1R11 111 0c 0d 0e 0f 10 11 12 13 >> 1R11 111 14 15 16 17 18 19 1a 1b >> 1R11 111 1c 1d 1e 1f 20 21 22 23 >> 1R11 111 24 25 26 27 28 29 2a 2b >> 1R11 111 2c 2d 2e 2f 30 31 32 33 >> 1R11 111 34 35 36 37 >> >> 1R11 110 54 00 >> 1R11 110 45 00 00 54 3a 5a 00 00 >> 1R11 110 40 01 66 36 0a 0a 63 02 >> 1R11 110 0a 0a 63 03 00 00 84 c8 >> 1R11 110 5b 61 00 01 f1 61 b8 60 >> 1R11 110 8b 0f 00 00 08 09 0a 0b >> 1R11 110 0c 0d 0e 0f 10 11 12 13 >> 1R11 110 14 15 16 17 18 19 1a 1b >> 1R11 110 1c 1d 1e 1f 20 21 22 23 >> 1R11 110 24 25 26 27 28 29 2a 2b >> 1R11 110 2c 2d 2e 2f 30 31 32 33 >> 1R11 110 34 35 36 37 >> >> Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. >> >> Here is that last CAN1 listen, with timestamps: >> >> 1622696433.080107 1R11 111 54 00 >> 1622696433.081657 1R11 110 >> >> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >> 1622696433.265937 1R11 111 34 35 36 37 >> >> 1622696433.269221 1R11 110 54 00 >> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >> 1622696433.272452 1R11 110 34 35 36 37 >> >> It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? >> >> The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. >> >> I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. >> >> I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? >> >> Regards, Mark. >> >> >> >> _______________________________________________ >> OvmsDev mailing list >> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> > > -- > Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal > Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 > <MCP2515Calc-1000kbit.ods> >
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
No error flags as I recall. It just stopped receiving frames. I tried to figure out how to clear it in the driver, but never succeeded. The workaround was to cycle the ext12v, restarting obd2ecu between off and on, triggered by a Car-on event. Greg On January 3, 2022 6:54:22 PM PST, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
I don’t know the cause of the HUD bus hang, so really not sure if it is resolved or not. The SPI driver is the standard ESP IDF one now, so should be stable. But what was the state of the bus when it hung? Perhaps some error condition set in the MCP2515?
Regards, Mark
On 4 Jan 2022, at 10:46 AM, Greg D <gregd2350@gmail.com> wrote:
Hi Mark,
Trying this again... (got a 554 blocked reply the first time). apologize if it's a duplicate.
Will this fix the bus hangs we have with the HUD devices? Ref: "Can buses stop after some time" email thread from 5/28/2019... As I wrote at the time, a 100% way to reproduce the hang was to have a HUD device running, then start the OBD2ECU task. I haven't used my HUD in some time, so I don't know if this was fixed since then. If not, that was a very quick and easy way to reproduce the issue.
Happy New Year!
Greg
On Mon, Jan 3, 2022 at 5:10 PM Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote: An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson: > Michael, > > Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem. > > Looking at the error flags I see: > > Error flag: 0x23401c01 > > intstat 0x23 > ERRIF Error Interrupt pending > RX0IF Rx buffer 0 full interrupt > RX1IF Rx buffer 1 full interrupt > > errflag 0x40 > RX0OVR Rx buffer 0 overflow > > intflag 0x1c01 > 0x01 Implied from Rx buffer 0 full > > 0x1c = 0001 1100 > Means RXB0 overflow. No data lost in this case (it went into RXB1) > Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags > Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts > > So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called. > > As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames. > > I’ll work on improving the handling of this case. > > Regards, Mark. > >> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >> >> Signed PGP part >> Mark, >> >> I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. >> >> We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. >> >> The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… >> >> https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000> >> >> …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. >> >> Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. >> >> Regards, >> Michael >> >> >> Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >>> >>> I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. >>> >>> I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): >>> >>> ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. >>> >>> Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >>> >>> So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. >>> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) >>> 10.10.99.2 > 10.10.99.3 <http://10.10.99.3/>: ICMP echo reply, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. >>> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 a8 5d 40 00 >>> tx: #111 40 01 b8 32 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7d f8 >>> tx: #111 5b 4c 00 01 53 61 b8 60 >>> tx: #111 19 f5 0e 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 59 00 00 >>> rx: #110 40 01 66 37 0a 0a 63 02 >>> rx: #110 0a 0a 63 03 00 00 85 f8 >>> rx: #110 5b 4c 00 01 53 61 b8 60 >>> rx: #110 19 f5 0e 00 08 09 0a 0b >>> rx: #110 0c 0d 0e 0f 10 11 12 13 >>> rx: #110 14 15 16 17 18 19 1a 1b >>> rx: #110 1c 1d 1e 1f 20 21 22 23 >>> rx: #110 24 25 26 27 28 29 2a 2b >>> rx: #110 2c 2d 2e 2f 30 31 32 33 >>> rx: #110 34 35 36 37 >>> >>> CAN1 active: >>> >>> 1T11 111 54 00 >>> 1R11 110 >>> >>> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >>> 1T11 111 45 00 00 54 a8 5d 40 00 >>> 1T11 111 40 01 b8 32 0a 0a 63 03 >>> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >>> 1T11 111 0a 0a 63 02 08 00 7d f8 >>> 1T11 111 5b 4c 00 01 53 61 b8 60 >>> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1T11 111 19 f5 0e 00 08 09 0a 0b >>> 1T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1T11 111 14 15 16 17 18 19 1a 1b >>> 1T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 24 25 26 27 28 29 2a 2b >>> 1T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 59 00 00 >>> 1R11 110 40 01 66 37 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 85 f8 >>> 1R11 110 5b 4c 00 01 53 61 b8 60 >>> 1R11 110 19 f5 0e 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> CAN2 listen: >>> >>> 2R11 111 54 00 >>> 2R11 110 >>> >>> 2R11 111 45 00 00 54 a8 5d 40 00 >>> 2R11 111 40 01 b8 32 0a 0a 63 03 >>> 2R11 111 0a 0a 63 02 08 00 7d f8 >>> 2R11 111 5b 4c 00 01 53 61 b8 60 >>> 2R11 111 19 f5 0e 00 08 09 0a 0b >>> 2R11 111 0c 0d 0e 0f 10 11 12 13 >>> 2R11 111 14 15 16 17 18 19 1a 1b >>> 2R11 111 1c 1d 1e 1f 20 21 22 23 >>> 2R11 111 24 25 26 27 28 29 2a 2b >>> 2R11 111 2c 2d 2e 2f 30 31 32 33 >>> 2R11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 40 01 66 37 0a 0a 63 02 >>> 2R11 110 19 f5 0e 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 45 00 00 54 3a 59 00 00 >>> >>> Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. >>> >>> Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23393, seq 1, length 64 >>> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. >>> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` >>> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 e3 80 40 00 >>> tx: #111 40 01 7d 0f 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7c c8 >>> tx: #111 5b 61 00 01 f1 61 b8 60 >>> tx: #111 8b 0f 00 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 5a 00 00 >>> rx: #110 0a 0a 63 03 00 00 84 c8 >>> rx: #110 8b 0f 00 00 08 09 0a 0b >>> rx: #110 34 35 36 37 >>> rx: #110 40 01 66 36 0a 0a 63 02 >>> >>> CAN2 active: >>> >>> 2T11 111 54 00 >>> 2R11 110 >>> >>> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2T11 111 45 00 00 54 e3 80 40 00 >>> 2T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >>> 2T11 111 0a 0a 63 02 08 00 7c c8 >>> 2T11 111 5b 61 00 01 f1 61 b8 60 >>> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2T11 111 8b 0f 00 00 08 09 0a 0b >>> 2T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2T11 111 14 15 16 17 18 19 1a 1b >>> 2T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 24 25 26 27 28 29 2a 2b >>> 2T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2R11 110 45 00 00 54 3a 5a 00 00 >>> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 0a 0a 63 03 00 00 84 c8 >>> 2R11 110 8b 0f 00 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 40 01 66 36 0a 0a 63 02 >>> >>> CAN1 listen: >>> >>> 1R11 111 54 00 >>> 1R11 110 >>> >>> 1R11 111 45 00 00 54 e3 80 40 00 >>> 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1R11 111 14 15 16 17 18 19 1a 1b >>> 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1R11 111 24 25 26 27 28 29 2a 2b >>> 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1R11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. >>> >>> Here is that last CAN1 listen, with timestamps: >>> >>> 1622696433.080107 1R11 111 54 00 >>> 1622696433.081657 1R11 110 >>> >>> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >>> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >>> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >>> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.265937 1R11 111 34 35 36 37 >>> >>> 1622696433.269221 1R11 110 54 00 >>> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >>> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >>> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.272452 1R11 110 34 35 36 37 >>> >>> It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? >>> >>> The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. >>> >>> I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. >>> >>> I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? >>> >>> Regards, Mark. >>> >>> >>> >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> >> >> -- >> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >> <MCP2515Calc-1000kbit.ods> >> > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> > http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- This space for rent...
If a power cycle of the HUD fixed the problem, then it sounds like the issue is more likely on the HUD side. Conversely if a stop+start of the CAN on ovms fixed it, then I would suspect our driver. Regards, Mark.
On 4 Jan 2022, at 11:05 AM, Greg D <gregd2350@gmail.com> wrote:
No error flags as I recall. It just stopped receiving frames. I tried to figure out how to clear it in the driver, but never succeeded. The workaround was to cycle the ext12v, restarting obd2ecu between off and on, triggered by a Car-on event.
Greg
On January 3, 2022 6:54:22 PM PST, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
I don’t know the cause of the HUD bus hang, so really not sure if it is resolved or not. The SPI driver is the standard ESP IDF one now, so should be stable. But what was the state of the bus when it hung? Perhaps some error condition set in the MCP2515?
Regards, Mark
On 4 Jan 2022, at 10:46 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Trying this again... (got a 554 blocked reply the first time). apologize if it's a duplicate.
Will this fix the bus hangs we have with the HUD devices? Ref: "Can buses stop after some time" email thread from 5/28/2019... As I wrote at the time, a 100% way to reproduce the hang was to have a HUD device running, then start the OBD2ECU task. I haven't used my HUD in some time, so I don't know if this was fixed since then. If not, that was a very quick and easy way to reproduce the issue.
Happy New Year!
Greg
On Mon, Jan 3, 2022 at 5:10 PM Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote: An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson: > Michael, > > Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem. > > Looking at the error flags I see: > > Error flag: 0x23401c01 > > intstat 0x23 > ERRIF Error Interrupt pending > RX0IF Rx buffer 0 full interrupt > RX1IF Rx buffer 1 full interrupt > > errflag 0x40 > RX0OVR Rx buffer 0 overflow > > intflag 0x1c01 > 0x01 Implied from Rx buffer 0 full > > 0x1c = 0001 1100 > Means RXB0 overflow. No data lost in this case (it went into RXB1) > Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags > Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts > > So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called. > > As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames. > > I’ll work on improving the handling of this case. > > Regards, Mark. > >> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >> >> Signed PGP part >> Mark, >> >> I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. >> >> We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. >> >> The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… >> >> https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000> >> >> …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. >> >> Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. >> >> Regards, >> Michael >> >> >> Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >>> >>> I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. >>> >>> I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): >>> >>> ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. >>> >>> Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >>> >>> So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. >>> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) >>> 10.10.99.2 > 10.10.99.3 <http://10.10.99.3/>: ICMP echo reply, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. >>> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 a8 5d 40 00 >>> tx: #111 40 01 b8 32 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7d f8 >>> tx: #111 5b 4c 00 01 53 61 b8 60 >>> tx: #111 19 f5 0e 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 59 00 00 >>> rx: #110 40 01 66 37 0a 0a 63 02 >>> rx: #110 0a 0a 63 03 00 00 85 f8 >>> rx: #110 5b 4c 00 01 53 61 b8 60 >>> rx: #110 19 f5 0e 00 08 09 0a 0b >>> rx: #110 0c 0d 0e 0f 10 11 12 13 >>> rx: #110 14 15 16 17 18 19 1a 1b >>> rx: #110 1c 1d 1e 1f 20 21 22 23 >>> rx: #110 24 25 26 27 28 29 2a 2b >>> rx: #110 2c 2d 2e 2f 30 31 32 33 >>> rx: #110 34 35 36 37 >>> >>> CAN1 active: >>> >>> 1T11 111 54 00 >>> 1R11 110 >>> >>> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >>> 1T11 111 45 00 00 54 a8 5d 40 00 >>> 1T11 111 40 01 b8 32 0a 0a 63 03 >>> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >>> 1T11 111 0a 0a 63 02 08 00 7d f8 >>> 1T11 111 5b 4c 00 01 53 61 b8 60 >>> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1T11 111 19 f5 0e 00 08 09 0a 0b >>> 1T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1T11 111 14 15 16 17 18 19 1a 1b >>> 1T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 24 25 26 27 28 29 2a 2b >>> 1T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 59 00 00 >>> 1R11 110 40 01 66 37 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 85 f8 >>> 1R11 110 5b 4c 00 01 53 61 b8 60 >>> 1R11 110 19 f5 0e 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> CAN2 listen: >>> >>> 2R11 111 54 00 >>> 2R11 110 >>> >>> 2R11 111 45 00 00 54 a8 5d 40 00 >>> 2R11 111 40 01 b8 32 0a 0a 63 03 >>> 2R11 111 0a 0a 63 02 08 00 7d f8 >>> 2R11 111 5b 4c 00 01 53 61 b8 60 >>> 2R11 111 19 f5 0e 00 08 09 0a 0b >>> 2R11 111 0c 0d 0e 0f 10 11 12 13 >>> 2R11 111 14 15 16 17 18 19 1a 1b >>> 2R11 111 1c 1d 1e 1f 20 21 22 23 >>> 2R11 111 24 25 26 27 28 29 2a 2b >>> 2R11 111 2c 2d 2e 2f 30 31 32 33 >>> 2R11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 40 01 66 37 0a 0a 63 02 >>> 2R11 110 19 f5 0e 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 45 00 00 54 3a 59 00 00 >>> >>> Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. >>> >>> Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23393, seq 1, length 64 >>> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. >>> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` >>> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 e3 80 40 00 >>> tx: #111 40 01 7d 0f 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7c c8 >>> tx: #111 5b 61 00 01 f1 61 b8 60 >>> tx: #111 8b 0f 00 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 5a 00 00 >>> rx: #110 0a 0a 63 03 00 00 84 c8 >>> rx: #110 8b 0f 00 00 08 09 0a 0b >>> rx: #110 34 35 36 37 >>> rx: #110 40 01 66 36 0a 0a 63 02 >>> >>> CAN2 active: >>> >>> 2T11 111 54 00 >>> 2R11 110 >>> >>> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2T11 111 45 00 00 54 e3 80 40 00 >>> 2T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >>> 2T11 111 0a 0a 63 02 08 00 7c c8 >>> 2T11 111 5b 61 00 01 f1 61 b8 60 >>> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2T11 111 8b 0f 00 00 08 09 0a 0b >>> 2T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2T11 111 14 15 16 17 18 19 1a 1b >>> 2T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 24 25 26 27 28 29 2a 2b >>> 2T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2R11 110 45 00 00 54 3a 5a 00 00 >>> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 0a 0a 63 03 00 00 84 c8 >>> 2R11 110 8b 0f 00 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 40 01 66 36 0a 0a 63 02 >>> >>> CAN1 listen: >>> >>> 1R11 111 54 00 >>> 1R11 110 >>> >>> 1R11 111 45 00 00 54 e3 80 40 00 >>> 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1R11 111 14 15 16 17 18 19 1a 1b >>> 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1R11 111 24 25 26 27 28 29 2a 2b >>> 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1R11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. >>> >>> Here is that last CAN1 listen, with timestamps: >>> >>> 1622696433.080107 1R11 111 54 00 >>> 1622696433.081657 1R11 110 >>> >>> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >>> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >>> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >>> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.265937 1R11 111 34 35 36 37 >>> >>> 1622696433.269221 1R11 110 54 00 >>> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >>> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >>> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.272452 1R11 110 34 35 36 37 >>> >>> It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? >>> >>> The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. >>> >>> I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. >>> >>> I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? >>> >>> Regards, Mark. >>> >>> >>> >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> >> >> -- >> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >> <MCP2515Calc-1000kbit.ods> >> > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> > http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- This space for rent... _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
I think we would need to see the status of the can bus and a trace of the traffic to know for sure what is going on. I find it strange the HUD would send request (I thought that purely responded to requests from the OVMS master?). Apart from the SPI changes, the only change to the MCP2515 driver made here was to improve in-order delivery and throughput. I doubt that would improve the situation with a hung HUD. Regards, Mark
On 6 Jan 2022, at 2:18 AM, Greg D. <gregd2350@gmail.com> wrote:
Hi Mark,
Simply power cycling the HUD didn't work. The only way I could clear the issue was to restart the application while the HUD was turned off (ext12v off). As I recall, watching the CAN bus with a monitor showed the HUD was sending requests, but they were not being received by the application. So I concluded something was odd with the driver, but never pinned it down.
Greg
Mark Webb-Johnson wrote:
If a power cycle of the HUD fixed the problem, then it sounds like the issue is more likely on the HUD side. Conversely if a stop+start of the CAN on ovms fixed it, then I would suspect our driver.
Regards, Mark.
On 4 Jan 2022, at 11:05 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
No error flags as I recall. It just stopped receiving frames. I tried to figure out how to clear it in the driver, but never succeeded. The workaround was to cycle the ext12v, restarting obd2ecu between off and on, triggered by a Car-on event.
Greg
On January 3, 2022 6:54:22 PM PST, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
I don’t know the cause of the HUD bus hang, so really not sure if it is resolved or not. The SPI driver is the standard ESP IDF one now, so should be stable. But what was the state of the bus when it hung? Perhaps some error condition set in the MCP2515?
Regards, Mark
On 4 Jan 2022, at 10:46 AM, Greg D <gregd2350@gmail.com <mailto:gregd2350@gmail.com>> wrote:
Hi Mark,
Trying this again... (got a 554 blocked reply the first time). apologize if it's a duplicate.
Will this fix the bus hangs we have with the HUD devices? Ref: "Can buses stop after some time" email thread from 5/28/2019... As I wrote at the time, a 100% way to reproduce the hang was to have a HUD device running, then start the OBD2ECU task. I haven't used my HUD in some time, so I don't know if this was fixed since then. If not, that was a very quick and easy way to reproduce the issue.
Happy New Year!
Greg
On Mon, Jan 3, 2022 at 5:10 PM Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote: An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
> On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote: > > Signed PGP part > > The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status). > > // Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration > return !gpio_get_level((gpio_num_t)m_intpin); > > Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object). > > I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix. > > Regards, Mark. > >> On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >> >> Signed PGP part >> Mark, >> >> the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code. >> >> I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched. >> >> I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse. >> >> Regards, >> Michael >> >> >> Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson: >>> Michael, >>> >>> Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem. >>> >>> Looking at the error flags I see: >>> >>> Error flag: 0x23401c01 >>> >>> intstat 0x23 >>> ERRIF Error Interrupt pending >>> RX0IF Rx buffer 0 full interrupt >>> RX1IF Rx buffer 1 full interrupt >>> >>> errflag 0x40 >>> RX0OVR Rx buffer 0 overflow >>> >>> intflag 0x1c01 >>> 0x01 Implied from Rx buffer 0 full >>> >>> 0x1c = 0001 1100 >>> Means RXB0 overflow. No data lost in this case (it went into RXB1) >>> Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags >>> Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts >>> >>> So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called. >>> >>> As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames. >>> >>> I’ll work on improving the handling of this case. >>> >>> Regards, Mark. >>> >>>> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >>>> >>>> Signed PGP part >>>> Mark, >>>> >>>> I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. >>>> >>>> We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. >>>> >>>> The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… >>>> >>>> https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000> >>>> >>>> …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. >>>> >>>> Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. >>>> >>>> Regards, >>>> Michael >>>> >>>> >>>> Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >>>>> >>>>> I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. >>>>> >>>>> I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): >>>>> >>>>> ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. >>>>> >>>>> Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >>>>> >>>>> So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): >>>>> >>>>> TCPDUMP: >>>>> >>>>> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) >>>>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23372, seq 1, length 64 >>>>> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. >>>>> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` >>>>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>>>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>>>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>>>> 0x0050: 3435 3637 4567 >>>>> >>>>> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) >>>>> 10.10.99.2 > 10.10.99.3 <http://10.10.99.3/>: ICMP echo reply, id 23372, seq 1, length 64 >>>>> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. >>>>> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` >>>>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>>>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>>>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>>>> 0x0050: 3435 3637 4567 >>>>> >>>>> Traffic (as shown on PC the other end of the can log tcp connection): >>>>> >>>>> tx: #111 54 00 >>>>> tx: #111 45 00 00 54 a8 5d 40 00 >>>>> tx: #111 40 01 b8 32 0a 0a 63 03 >>>>> tx: #111 0a 0a 63 02 08 00 7d f8 >>>>> tx: #111 5b 4c 00 01 53 61 b8 60 >>>>> tx: #111 19 f5 0e 00 08 09 0a 0b >>>>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>>>> tx: #111 14 15 16 17 18 19 1a 1b >>>>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>>>> tx: #111 24 25 26 27 28 29 2a 2b >>>>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>>>> tx: #111 34 35 36 37 >>>>> >>>>> rx: #110 >>>>> rx: #110 54 00 >>>>> rx: #110 45 00 00 54 3a 59 00 00 >>>>> rx: #110 40 01 66 37 0a 0a 63 02 >>>>> rx: #110 0a 0a 63 03 00 00 85 f8 >>>>> rx: #110 5b 4c 00 01 53 61 b8 60 >>>>> rx: #110 19 f5 0e 00 08 09 0a 0b >>>>> rx: #110 0c 0d 0e 0f 10 11 12 13 >>>>> rx: #110 14 15 16 17 18 19 1a 1b >>>>> rx: #110 1c 1d 1e 1f 20 21 22 23 >>>>> rx: #110 24 25 26 27 28 29 2a 2b >>>>> rx: #110 2c 2d 2e 2f 30 31 32 33 >>>>> rx: #110 34 35 36 37 >>>>> >>>>> CAN1 active: >>>>> >>>>> 1T11 111 54 00 >>>>> 1R11 110 >>>>> >>>>> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >>>>> 1T11 111 45 00 00 54 a8 5d 40 00 >>>>> 1T11 111 40 01 b8 32 0a 0a 63 03 >>>>> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >>>>> 1T11 111 0a 0a 63 02 08 00 7d f8 >>>>> 1T11 111 5b 4c 00 01 53 61 b8 60 >>>>> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 1T11 111 19 f5 0e 00 08 09 0a 0b >>>>> 1T11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 1T11 111 14 15 16 17 18 19 1a 1b >>>>> 1T11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 1T11 111 24 25 26 27 28 29 2a 2b >>>>> 1T11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 1T11 111 34 35 36 37 >>>>> >>>>> 1R11 110 54 00 >>>>> 1R11 110 45 00 00 54 3a 59 00 00 >>>>> 1R11 110 40 01 66 37 0a 0a 63 02 >>>>> 1R11 110 0a 0a 63 03 00 00 85 f8 >>>>> 1R11 110 5b 4c 00 01 53 61 b8 60 >>>>> 1R11 110 19 f5 0e 00 08 09 0a 0b >>>>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>>>> 1R11 110 14 15 16 17 18 19 1a 1b >>>>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>>>> 1R11 110 24 25 26 27 28 29 2a 2b >>>>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>>>> 1R11 110 34 35 36 37 >>>>> >>>>> CAN2 listen: >>>>> >>>>> 2R11 111 54 00 >>>>> 2R11 110 >>>>> >>>>> 2R11 111 45 00 00 54 a8 5d 40 00 >>>>> 2R11 111 40 01 b8 32 0a 0a 63 03 >>>>> 2R11 111 0a 0a 63 02 08 00 7d f8 >>>>> 2R11 111 5b 4c 00 01 53 61 b8 60 >>>>> 2R11 111 19 f5 0e 00 08 09 0a 0b >>>>> 2R11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 2R11 111 14 15 16 17 18 19 1a 1b >>>>> 2R11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 2R11 111 24 25 26 27 28 29 2a 2b >>>>> 2R11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 2R11 111 34 35 36 37 >>>>> >>>>> 2R11 110 54 00 >>>>> 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 >>>>> 2R11 110 40 01 66 37 0a 0a 63 02 >>>>> 2R11 110 19 f5 0e 00 08 09 0a 0b >>>>> 2R11 110 34 35 36 37 >>>>> 2R11 110 45 00 00 54 3a 59 00 00 >>>>> >>>>> Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. >>>>> >>>>> Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): >>>>> >>>>> TCPDUMP: >>>>> >>>>> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) >>>>> 10.10.99.3 > 10.10.99.2 <http://10.10.99.2/>: ICMP echo request, id 23393, seq 1, length 64 >>>>> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. >>>>> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` >>>>> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>>>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>>>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>>>> 0x0050: 3435 3637 4567 >>>>> >>>>> Traffic (as shown on PC the other end of the can log tcp connection): >>>>> >>>>> tx: #111 54 00 >>>>> tx: #111 45 00 00 54 e3 80 40 00 >>>>> tx: #111 40 01 7d 0f 0a 0a 63 03 >>>>> tx: #111 0a 0a 63 02 08 00 7c c8 >>>>> tx: #111 5b 61 00 01 f1 61 b8 60 >>>>> tx: #111 8b 0f 00 00 08 09 0a 0b >>>>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>>>> tx: #111 14 15 16 17 18 19 1a 1b >>>>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>>>> tx: #111 24 25 26 27 28 29 2a 2b >>>>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>>>> tx: #111 34 35 36 37 >>>>> >>>>> rx: #110 >>>>> rx: #110 54 00 >>>>> rx: #110 45 00 00 54 3a 5a 00 00 >>>>> rx: #110 0a 0a 63 03 00 00 84 c8 >>>>> rx: #110 8b 0f 00 00 08 09 0a 0b >>>>> rx: #110 34 35 36 37 >>>>> rx: #110 40 01 66 36 0a 0a 63 02 >>>>> >>>>> CAN2 active: >>>>> >>>>> 2T11 111 54 00 >>>>> 2R11 110 >>>>> >>>>> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >>>>> 2T11 111 45 00 00 54 e3 80 40 00 >>>>> 2T11 111 40 01 7d 0f 0a 0a 63 03 >>>>> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >>>>> 2T11 111 0a 0a 63 02 08 00 7c c8 >>>>> 2T11 111 5b 61 00 01 f1 61 b8 60 >>>>> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 2T11 111 8b 0f 00 00 08 09 0a 0b >>>>> 2T11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 2T11 111 14 15 16 17 18 19 1a 1b >>>>> 2T11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 2T11 111 24 25 26 27 28 29 2a 2b >>>>> 2T11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 2T11 111 34 35 36 37 >>>>> >>>>> 2R11 110 54 00 >>>>> 2R11 110 45 00 00 54 3a 5a 00 00 >>>>> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 >>>>> 2R11 110 0a 0a 63 03 00 00 84 c8 >>>>> 2R11 110 8b 0f 00 00 08 09 0a 0b >>>>> 2R11 110 34 35 36 37 >>>>> 2R11 110 40 01 66 36 0a 0a 63 02 >>>>> >>>>> CAN1 listen: >>>>> >>>>> 1R11 111 54 00 >>>>> 1R11 110 >>>>> >>>>> 1R11 111 45 00 00 54 e3 80 40 00 >>>>> 1R11 111 40 01 7d 0f 0a 0a 63 03 >>>>> 1R11 111 0a 0a 63 02 08 00 7c c8 >>>>> 1R11 111 5b 61 00 01 f1 61 b8 60 >>>>> 1R11 111 8b 0f 00 00 08 09 0a 0b >>>>> 1R11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 1R11 111 14 15 16 17 18 19 1a 1b >>>>> 1R11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 1R11 111 24 25 26 27 28 29 2a 2b >>>>> 1R11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 1R11 111 34 35 36 37 >>>>> >>>>> 1R11 110 54 00 >>>>> 1R11 110 45 00 00 54 3a 5a 00 00 >>>>> 1R11 110 40 01 66 36 0a 0a 63 02 >>>>> 1R11 110 0a 0a 63 03 00 00 84 c8 >>>>> 1R11 110 5b 61 00 01 f1 61 b8 60 >>>>> 1R11 110 8b 0f 00 00 08 09 0a 0b >>>>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>>>> 1R11 110 14 15 16 17 18 19 1a 1b >>>>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>>>> 1R11 110 24 25 26 27 28 29 2a 2b >>>>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>>>> 1R11 110 34 35 36 37 >>>>> >>>>> Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. >>>>> >>>>> Here is that last CAN1 listen, with timestamps: >>>>> >>>>> 1622696433.080107 1R11 111 54 00 >>>>> 1622696433.081657 1R11 110 >>>>> >>>>> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >>>>> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >>>>> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >>>>> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >>>>> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >>>>> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >>>>> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >>>>> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >>>>> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >>>>> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >>>>> 1622696433.265937 1R11 111 34 35 36 37 >>>>> >>>>> 1622696433.269221 1R11 110 54 00 >>>>> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >>>>> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >>>>> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >>>>> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >>>>> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >>>>> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >>>>> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >>>>> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >>>>> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >>>>> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >>>>> 1622696433.272452 1R11 110 34 35 36 37 >>>>> >>>>> It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? >>>>> >>>>> The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. >>>>> >>>>> I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. >>>>> >>>>> I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? >>>>> >>>>> Regards, Mark. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> OvmsDev mailing list >>>>> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >>>>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> >>>> >>>> -- >>>> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >>>> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >>>> <MCP2515Calc-1000kbit.ods> >>>> >>> >>> >>> >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> >> >> -- >> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >> > >
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- This space for rent... _______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
I recently have begun using CAN3 on my car to log the original OCU CAN communication. I'm using the 'spimaster' branch for this now (with latest master merged in) and everything looks fine. I cannot test high volume performance in my setup though, as the bus is only running at 100 kbit. Regards, Michael Am 04.01.22 um 02:09 schrieb Mark Webb-Johnson:
An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
1. Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
2. To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
3. Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
4. Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
* Interrupt flags show buffer #0 has a frame. It is B1=54. Good. * Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. * Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson:
Michael,
Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem.
Looking at the error flags I see:
Error flag: 0x23401c01
intstat 0x23 ERRIF Error Interrupt pending RX0IF Rx buffer 0 full interrupt RX1IF Rx buffer 1 full interrupt
errflag 0x40 RX0OVR Rx buffer 0 overflow
intflag 0x1c01 0x01 Implied from Rx buffer 0 full
0x1c = 0001 1100 Means RXB0 overflow. No data lost in this case (it went into RXB1) Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts
So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called.
As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames.
I’ll work on improving the handling of this case.
Regards, Mark.
> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de> wrote: > > Signed PGP part > Mark, > > I'd give the bit timing a try first, the MCP2515 seems to be > very sensitive for this. I've even had some trouble finding a > working configuration for the 50 kbit timing I've added a couple > weeks ago. > > We currently use 00 / D0 / 82 which is also the result of the > old Intrepid timing calculator. That's a propagation segment of > 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. > > The Arduino MCP CAN lib by Cory Fowler also had this previously, > but then changed in… > > https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... > > …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 > Tq per phase, shifting the sampling window to 62.5 - 75%. > > Our current configuration scheme for the internal SJA1000 > compatible CAN seems to sample at 62.5 - 75% as well, so that > would also match. > > Regards, > Michael > > > Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >> >> I’m working on an implementation of IP stack over CAN for the >> Tesla Roadster. IP frames are encoded as a length followed by a >> sequence of CAN frames, all on the same ID. This runs over a >> 1MHz bus, so presumably the traffic volume could be high at times. >> >> I was having problems with this running on CAN2, so tried CAN1 >> and it worked perfectly. Here are some simple dumps of a single >> PING packet (and single PING response packet): >> >> ID #111 is used to transmit an IP packet, and ID #110 is used >> to receive an IP packet. The special empty data frame is an >> acknowledgment. >> >> Using latest master branch code >> (3.2.016-196-g0aad1a9f/ota_1/edge (build idf >> v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >> >> So, first let’s test with traffic on CAN1 (active, 1Mbps), and >> listening on CAN2 (listen, 1Mbps): >> >> TCPDUMP: >> >> >> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, >> offset 0, flags [DF], proto ICMP (1), length 84) >> >> 10.10.99.3 > 10.10.99.2: ICMP echo request, id >> 23372, seq 1, length 64 >> >> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 >> E..T.]@.@..2..c. >> >> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 >> ..c...}.[L..Sa.` >> >> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 >> ................ >> >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 >> .............!"# >> >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 >> $%&'()*+,-./0123 >> >> 0x0050: 3435 3637 4567 >> >> >> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, >> offset 0, flags [none], proto ICMP (1), length 84) >> >> 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, >> seq 1, length 64 >> >> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 >> E..T:Y..@.f7..c. >> >> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 >> ..c.....[L..Sa.` >> >> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 >> ................ >> >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 >> .............!"# >> >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 >> $%&'()*+,-./0123 >> >> 0x0050: 3435 3637 4567 >> >> >> Traffic (as shown on PC the other end of the can log >> tcp connection): >> >> >> tx: #111 54 00 >> >> tx: #111 45 00 00 54 a8 5d 40 00 >> >> tx: #111 40 01 b8 32 0a 0a 63 03 >> >> tx: #111 0a 0a 63 02 08 00 7d f8 >> >> tx: #111 5b 4c 00 01 53 61 b8 60 >> >> tx: #111 19 f5 0e 00 08 09 0a 0b >> >> tx: #111 0c 0d 0e 0f 10 11 12 13 >> >> tx: #111 14 15 16 17 18 19 1a 1b >> >> tx: #111 1c 1d 1e 1f 20 21 22 23 >> >> tx: #111 24 25 26 27 28 29 2a 2b >> >> tx: #111 2c 2d 2e 2f 30 31 32 33 >> >> tx: #111 34 35 36 37 >> >> >> rx: #110 >> >> rx: #110 54 00 >> >> rx: #110 45 00 00 54 3a 59 00 00 >> >> rx: #110 40 01 66 37 0a 0a 63 02 >> >> rx: #110 0a 0a 63 03 00 00 85 f8 >> >> rx: #110 5b 4c 00 01 53 61 b8 60 >> >> rx: #110 19 f5 0e 00 08 09 0a 0b >> >> rx: #110 0c 0d 0e 0f 10 11 12 13 >> >> rx: #110 14 15 16 17 18 19 1a 1b >> >> rx: #110 1c 1d 1e 1f 20 21 22 23 >> >> rx: #110 24 25 26 27 28 29 2a 2b >> >> rx: #110 2c 2d 2e 2f 30 31 32 33 >> >> rx: #110 34 35 36 37 >> >> >> CAN1 active: >> >> >> 1T11 111 54 00 >> >> 1R11 110 >> >> >> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >> >> 1T11 111 45 00 00 54 a8 5d 40 00 >> >> 1T11 111 40 01 b8 32 0a 0a 63 03 >> >> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >> >> 1T11 111 0a 0a 63 02 08 00 7d f8 >> >> 1T11 111 5b 4c 00 01 53 61 b8 60 >> >> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >> >> 1T11 111 19 f5 0e 00 08 09 0a 0b >> >> 1T11 111 0c 0d 0e 0f 10 11 12 13 >> >> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >> >> 1T11 111 14 15 16 17 18 19 1a 1b >> >> 1T11 111 1c 1d 1e 1f 20 21 22 23 >> >> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >> >> 1T11 111 24 25 26 27 28 29 2a 2b >> >> 1T11 111 2c 2d 2e 2f 30 31 32 33 >> >> 1T11 111 34 35 36 37 >> >> >> 1R11 110 54 00 >> >> 1R11 110 45 00 00 54 3a 59 00 00 >> >> 1R11 110 40 01 66 37 0a 0a 63 02 >> >> 1R11 110 0a 0a 63 03 00 00 85 f8 >> >> 1R11 110 5b 4c 00 01 53 61 b8 60 >> >> 1R11 110 19 f5 0e 00 08 09 0a 0b >> >> 1R11 110 0c 0d 0e 0f 10 11 12 13 >> >> 1R11 110 14 15 16 17 18 19 1a 1b >> >> 1R11 110 1c 1d 1e 1f 20 21 22 23 >> >> 1R11 110 24 25 26 27 28 29 2a 2b >> >> 1R11 110 2c 2d 2e 2f 30 31 32 33 >> >> 1R11 110 34 35 36 37 >> >> >> CAN2 listen: >> >> >> 2R11 111 54 00 >> >> 2R11 110 >> >> >> 2R11 111 45 00 00 54 a8 5d 40 00 >> >> 2R11 111 40 01 b8 32 0a 0a 63 03 >> >> 2R11 111 0a 0a 63 02 08 00 7d f8 >> >> 2R11 111 5b 4c 00 01 53 61 b8 60 >> >> 2R11 111 19 f5 0e 00 08 09 0a 0b >> >> 2R11 111 0c 0d 0e 0f 10 11 12 13 >> >> 2R11 111 14 15 16 17 18 19 1a 1b >> >> 2R11 111 1c 1d 1e 1f 20 21 22 23 >> >> 2R11 111 24 25 26 27 28 29 2a 2b >> >> 2R11 111 2c 2d 2e 2f 30 31 32 33 >> >> 2R11 111 34 35 36 37 >> >> >> 2R11 110 54 00 >> >> 2CER Error intr=10 rxpkt=14 txpkt=0 >> errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 >> rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 >> errreset=0 >> >> 2R11 110 40 01 66 37 0a 0a 63 02 >> >> 2R11 110 19 f5 0e 00 08 09 0a 0b >> >> 2R11 110 34 35 36 37 >> >> 2R11 110 45 00 00 54 3a 59 00 00 >> >> >> Conclusion is that the CAN1 traffic looks fine, and the PING >> packet gets a good reply. All successful. But the CAN2 listen >> is missing a few packets and the last packet is out of order. >> >> Now, let’s test with traffic on CAN2 (active, 1Mbps), and >> listening on CAN1 (listen, 1Mbps): >> >> TCPDUMP: >> >> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset >> 0, flags [DF], proto ICMP (1), length 84) >> >> 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, >> seq 1, length 64 >> >> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 >> E..T..@.@.}...c. >> >> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 >> ..c...|.[a...a.` >> >> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 >> ................ >> >> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 >> .............!"# >> >> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 >> $%&'()*+,-./0123 >> >> 0x0050: 3435 3637 4567 >> >> >> Traffic (as shown on PC the other end of the can log tcp >> connection): >> >> >> tx: #111 54 00 >> >> tx: #111 45 00 00 54 e3 80 40 00 >> >> tx: #111 40 01 7d 0f 0a 0a 63 03 >> >> tx: #111 0a 0a 63 02 08 00 7c c8 >> >> tx: #111 5b 61 00 01 f1 61 b8 60 >> >> tx: #111 8b 0f 00 00 08 09 0a 0b >> >> tx: #111 0c 0d 0e 0f 10 11 12 13 >> >> tx: #111 14 15 16 17 18 19 1a 1b >> >> tx: #111 1c 1d 1e 1f 20 21 22 23 >> >> tx: #111 24 25 26 27 28 29 2a 2b >> >> tx: #111 2c 2d 2e 2f 30 31 32 33 >> >> tx: #111 34 35 36 37 >> >> >> rx: #110 >> >> rx: #110 54 00 >> >> rx: #110 45 00 00 54 3a 5a 00 00 >> >> rx: #110 0a 0a 63 03 00 00 84 c8 >> >> rx: #110 8b 0f 00 00 08 09 0a 0b >> >> rx: #110 34 35 36 37 >> >> rx: #110 40 01 66 36 0a 0a 63 02 >> >> >> CAN2 active: >> >> >> 2T11 111 54 00 >> >> 2R11 110 >> >> >> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >> >> 2T11 111 45 00 00 54 e3 80 40 00 >> >> 2T11 111 40 01 7d 0f 0a 0a 63 03 >> >> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >> >> 2T11 111 0a 0a 63 02 08 00 7c c8 >> >> 2T11 111 5b 61 00 01 f1 61 b8 60 >> >> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >> >> 2T11 111 8b 0f 00 00 08 09 0a 0b >> >> 2T11 111 0c 0d 0e 0f 10 11 12 13 >> >> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >> >> 2T11 111 14 15 16 17 18 19 1a 1b >> >> 2T11 111 1c 1d 1e 1f 20 21 22 23 >> >> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >> >> 2T11 111 24 25 26 27 28 29 2a 2b >> >> 2T11 111 2c 2d 2e 2f 30 31 32 33 >> >> 2T11 111 34 35 36 37 >> >> >> 2R11 110 54 00 >> >> 2R11 110 45 00 00 54 3a 5a 00 00 >> >> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 >> rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 >> txfail=0 wdgreset=0 errreset=0 >> >> 2R11 110 0a 0a 63 03 00 00 84 c8 >> >> 2R11 110 8b 0f 00 00 08 09 0a 0b >> >> 2R11 110 34 35 36 37 >> >> 2R11 110 40 01 66 36 0a 0a 63 02 >> >> >> CAN1 listen: >> >> >> 1R11 111 54 00 >> 1R11 110 >> >> >> 1R11 111 45 00 00 54 e3 80 40 00 >> 1R11 111 40 01 7d 0f 0a 0a 63 03 >> 1R11 111 0a 0a 63 02 08 00 7c c8 >> 1R11 111 5b 61 00 01 f1 61 b8 60 >> 1R11 111 8b 0f 00 00 08 09 0a 0b >> 1R11 111 0c 0d 0e 0f 10 11 12 13 >> 1R11 111 14 15 16 17 18 19 1a 1b >> 1R11 111 1c 1d 1e 1f 20 21 22 23 >> 1R11 111 24 25 26 27 28 29 2a 2b >> 1R11 111 2c 2d 2e 2f 30 31 32 33 >> 1R11 111 34 35 36 37 >> >> >> 1R11 110 54 00 >> 1R11 110 45 00 00 54 3a 5a 00 00 >> 1R11 110 40 01 66 36 0a 0a 63 02 >> 1R11 110 0a 0a 63 03 00 00 84 c8 >> 1R11 110 5b 61 00 01 f1 61 b8 60 >> 1R11 110 8b 0f 00 00 08 09 0a 0b >> 1R11 110 0c 0d 0e 0f 10 11 12 13 >> 1R11 110 14 15 16 17 18 19 1a 1b >> 1R11 110 1c 1d 1e 1f 20 21 22 23 >> 1R11 110 24 25 26 27 28 29 2a 2b >> 1R11 110 2c 2d 2e 2f 30 31 32 33 >> 1R11 110 34 35 36 37 >> >> >> Conclusion is that the CAN2 transmit traffic looks fine, but no >> PING reply received via CAN. The CAN1 listen shows the reply >> just fine. >> >> Here is that last CAN1 listen, with timestamps: >> >> 1622696433.080107 1R11 111 54 00 >> 1622696433.081657 1R11 110 >> >> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >> 1622696433.265937 1R11 111 34 35 36 37 >> >> 1622696433.269221 1R11 110 54 00 >> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >> 1622696433.272452 1R11 110 34 35 36 37 >> >> >> It is 1Mbps, with 30us or so between each packet. This is the >> *only* traffic on the bus. Everything else is turned off. >> Roughly 12 packets each way. Surely even if we were hitting a >> performance limit, our buffers can handle 12 packets? >> >> The good news is that I have a good environment to replicate >> this issue now, so any fix should be easy to test. >> >> I haven’t worked on the MCP2515 driver in our code in a long >> time, but it certainly seems something is messed up and that >> could be badly affecting vehicle modules using anything other >> than CAN1. >> >> I will start to look at this over the weekend, but has anyone >> got any ideas/suggestions? Perhaps the bit timing registers are >> off by a small amount (so it works on CAN1 but not on CAN2)? Or >> something more serious in our driver? >> >> Regards, Mark. >> >> >> _______________________________________________ >> OvmsDev mailing list >> OvmsDev@lists.openvehicles.com >> http://lists.openvehicles.com/mailman/listinfo/ovmsdev > > -- > Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal > Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 > <MCP2515Calc-1000kbit.ods> >
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com http://lists.openvehicles.com/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
I haven’t seen any further issues, so have merged in spimaster branch to master. Regards, Mark.
On 19 Jan 2022, at 6:28 PM, Michael Balzer <dexter@expeedo.de> wrote:
Signed PGP part I recently have begun using CAN3 on my car to log the original OCU CAN communication.
I'm using the 'spimaster' branch for this now (with latest master merged in) and everything looks fine.
I cannot test high volume performance in my setup though, as the bus is only running at 100 kbit.
Regards, Michael
Am 04.01.22 um 02:09 schrieb Mark Webb-Johnson:
An update on this.
Working with another developer, we have made some changes in a ’spimaster’ branch:
Stop using spi_nodma fork of ESP’s standard spi code, and switch back to use the standard ESP IDF spi master.
To support >3 devices (which ESP IDF spi master doesn’t due to hardware limitations of CS line and 3x DMA channels), change to use software CS line for the MAX7317 driver (the MCP2515 continue to use hardware CS).
Confirm the changes to our MCP2515 driver related to keeping track of the last buffer read, to solve the out-of-order issue.
Confirm the fix for another related issue where we don’t block (delay) if the can tx queue is full.
These seem better now, and I am able to establish a CAN IP connection over MCP2515. Frames come in order, and we are seeing performance around 700 frames/second - which should be adequate for our needs.
I’ll do some more testing over the next few days, and if no issues found merge back to master.
Regards, Mark.
On 7 Jun 2021, at 12:16 AM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
I've just found a spot-on post on this issue:
https://www.microchip.com/forums/tm.aspx?m=620741 <https://www.microchip.com/forums/tm.aspx?m=620741>
Tom suggests implementing a state machine to reproduce the receive order. His analysis & solution looks sound to me.
Regards, Michael
Am 06.06.21 um 14:50 schrieb Mark Webb-Johnson:
I spent quite a bit of time on this. With my standard test packet of 11 CAN frames expected, and the standard driver, I get perhaps 4 or 5 of them (about half are lost, and some are out of order).
I made the suggested change to move the MyCan.IncomingFrame() call out of the ‘can’ object (when frameReceived is true) to within the mcp2515 AsynchronousInterruptHandler itself. That allows the handler to receive more than one frame per call and is a very simple change. Once that is done, we can at least now try to tune it.
So I then modified the code of mcp2515 AsynchronousInterruptHandler to loop so long as the interrupt flag says either buffer #0 or #1 has a frame. The result looks something like this:
D (63192) mcp2515: AsynchronousInterruptHandler instat=01 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=54) D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=40 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=40) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=45) D (63192) mcp2515: AsynchronousInterruptHandler Clear RX buffer #0 overflow flag D (63192) mcp2515: AsynchronousInterruptHandler instat=23 errflag=00 txb0ctrl=00 D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #0 (ID 0x110 B1=24) D (63192) mcp2515: AsynchronousInterruptHandler rx frame from buffer #1 (ID 0x110 B1=34) D (63192) mcp2515: AsynchronousInterruptHandler instat=20 errflag=00 txb0ctrl=00
The actual frames on the bus are (B1 values) 54, 45, 40, 0a, 7a, d5, 0c, 14, 1c, 24, 2c, and 34. Looking at the above debug output, we get:
Interrupt flags show buffer #0 has a frame. It is B1=54. Good. Interrupt flags show buffers #0 and #1 both have frames. Buffer #0 has B1=40 and buffer #1 has B1=45. Etc etc
That is not good. What must have happened is that the first B1=54 frame arrived, got put in buffer #0, and interrupt was raised. We checked the interrupt flags, found buffer #0 had something, and read the frame ok. All is good. But what is happening now is that between the time we checked the interrupt flags and the time we finished reading the 13 bytes from buffer #0, a second frame arrived and was put in buffer #1. Then a third frame arrives and is put in buffer #0. We loop back to check the interrupt flags and find both buffers have frames ready. So we ready buffer #0 to get the third frame, then buffer #1 to get the second frame. We are out of sequence.
By removing the ESP_LOGD statements, I can improve performance enough to get 10 out of the 11 frames, but still sometimes frames are swapped in order.
By over-clocking the MCP2515 SPI bus (supposed to be 10MHz, but I push it to 15MHz), I can get all 11 frames, but two are out of order.
I suppose I can minimise the chance of the out-of-order issue by repeating the call to read interrupt flags after processing buffer #0 but before checking for buffer #1. That would at least reduce the time window to as small as possible, but would be another SPI call and is too slow. Doing that brings us back to losing frames.
Another approach may relate to our current use of the READ command to read 5 status registers (interrupt flags, error flags, two skipped, then transmit buffer #0 flags). There are two specific commands ‘read status’ (which gets the rx and tx buffer status flags in one byte), and ‘rx status’ (which gets just the receive buffer status and some info on the frames received, again in one byte). I think those are more designed for what we are trying to do. I can try to optimise the read loop at the start of the AsynchronousInterruptHandler to use one of those - they are 2 SPI bytes vs 7 for what we are doing at the moment (so more than three times as fast).
I think it will also be worthwhile having a look at some other open source mcp2515 drivers to see how other people are doing it.
Regards, Mark.
On 4 Jun 2021, at 3:02 PM, Mark Webb-Johnson <mark@webb-johnson.net <mailto:mark@webb-johnson.net>> wrote:
Signed PGP part
The handler can only return one frame. As it is, if both buffers #0 and #1 have a frame, it returns #0. I am not sure if it gets called again (seems to depend on the interrupt gpio status).
// Read the interrupt pin status and if it's still active (low), require another interrupt handling iteration return !gpio_get_level((gpio_num_t)m_intpin);
Maybe a quick solution is to just return true, immediately after *frameReceived=true, if intflag=0x01 and (intstat & CANINTF_RX1IF)? That would dispatch the incoming frame, then call back for more (from the loop in the can object).
I am not sure in general why AsynchronousInterruptHandler uses a bool frameReceived flag, and doesn’t just simply dispatch the frame immediately to the can object? That would simplify things and allow the AsynchronousInterruptHandler to handle receiving both frames in the same call. Given that MCP2515 is the only driver using AsynchronousInterruptHandler, that would be an easy fix.
Regards, Mark.
On 4 Jun 2021, at 2:29 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote:
Signed PGP part Mark,
the handler is meant to read both buffers sequentially, and on a quick glance I don't see why it wouldn't. But it can't hurt if you do an audit of the code.
I remember having had that out-of-order discussion when handling both RX buffers before here, but don't remember the outcome. Too bad the list archives cannot be searched.
I think it was the MCP not doing overflows from RX buffer 1 to 0. I.e. if buffer 1 still has a frame on arrival, the new frame will be lost. That means losing a frame if the handler cannot react fast enough, but receiving out of order would be worse.
Regards, Michael
Am 04.06.21 um 04:16 schrieb Mark Webb-Johnson: > Michael, > > Good suggestion on the timing. I think it best to use the same timings as the Arduino library, and have committed that change. No vehicle modules currently use 1Mbps on MCP2515 anyway. Unfortunately, it didn’t resolve my problem. > > Looking at the error flags I see: > > Error flag: 0x23401c01 > > intstat 0x23 > ERRIF Error Interrupt pending > RX0IF Rx buffer 0 full interrupt > RX1IF Rx buffer 1 full interrupt > > errflag 0x40 > RX0OVR Rx buffer 0 overflow > > intflag 0x1c01 > 0x01 Implied from Rx buffer 0 full > > 0x1c = 0001 1100 > Means RXB0 overflow. No data lost in this case (it went into RXB1) > Means (errflag & EFLG_RX01OVR), clear RX buffer overflow flags > Means (intstat & (CANINTF_MERRF | CANINTF_WAKIF | CANINTF_ERRIF)), clear error & wakeup interrupts > > So we have CAN frames in BOTH rx buffers #0 and #1. Looking at our driver code (mcp2515::AsynchronousInterruptHandler), it seems in that case we only read from buffer #0. From the flow I can see, we are going to lose that second frame. We’re not really handling the issue of two frames being in the buffers when the interrupt handler is called. > > As the architecture of mcp2515::AsynchronousInterruptHandler can only receive one frame, it is not so simple to fix. We could simply read and return the frame in buffer #0, requesting to be called again (return true), but another frame may arrive (into buffer #0) before we get called again, and that is going to result in out-of-order frames. > > I’ll work on improving the handling of this case. > > Regards, Mark. > >> On 3 Jun 2021, at 3:07 PM, Michael Balzer <dexter@expeedo.de <mailto:dexter@expeedo.de>> wrote: >> >> Signed PGP part >> Mark, >> >> I'd give the bit timing a try first, the MCP2515 seems to be very sensitive for this. I've even had some trouble finding a working configuration for the 50 kbit timing I've added a couple weeks ago. >> >> We currently use 00 / D0 / 82 which is also the result of the old Intrepid timing calculator. That's a propagation segment of 1 Tq and 3 Tq per phase, resulting in samling between 50% - 62.5%. >> >> The Arduino MCP CAN lib by Cory Fowler also had this previously, but then changed in… >> >> https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a901116... <https://github.com/coryjfowler/MCP_CAN_lib/commit/ece730cf697fef1cbe8a90111694868168d41000> >> >> …to 00 / CA / 81, which is a propagation segment of 3 Tq and 2 Tq per phase, shifting the sampling window to 62.5 - 75%. >> >> Our current configuration scheme for the internal SJA1000 compatible CAN seems to sample at 62.5 - 75% as well, so that would also match. >> >> Regards, >> Michael >> >> >> Am 03.06.21 um 07:36 schrieb Mark Webb-Johnson: >>> >>> I’m working on an implementation of IP stack over CAN for the Tesla Roadster. IP frames are encoded as a length followed by a sequence of CAN frames, all on the same ID. This runs over a 1MHz bus, so presumably the traffic volume could be high at times. >>> >>> I was having problems with this running on CAN2, so tried CAN1 and it worked perfectly. Here are some simple dumps of a single PING packet (and single PING response packet): >>> >>> ID #111 is used to transmit an IP packet, and ID #110 is used to receive an IP packet. The special empty data frame is an acknowledgment. >>> >>> Using latest master branch code (3.2.016-196-g0aad1a9f/ota_1/edge (build idf v3.3.4-848-g1ff5e24b1 Jun 2 2021 09:28:58)). >>> >>> So, first let’s test with traffic on CAN1 (active, 1Mbps), and listening on CAN2 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 05:57:55.980291 IP (tos 0x0, ttl 64, id 43101, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 a85d 4000 4001 b832 0a0a 6303 E..T.]@.@..2..c <mailto:E..T.]@.@..2..c>. >>> 0x0010: 0a0a 6302 0800 7df8 5b4c 0001 5361 b860 ..c...}.[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> 05:57:56.436190 IP (tos 0x0, ttl 64, id 14937, offset 0, flags [none], proto ICMP (1), length 84) >>> 10.10.99.2 > 10.10.99.3: ICMP echo reply, id 23372, seq 1, length 64 >>> 0x0000: 4500 0054 3a59 0000 4001 6637 0a0a 6302 E..T:Y..@.f7..c <mailto:E..T:Y..@.f7..c>. >>> 0x0010: 0a0a 6303 0000 85f8 5b4c 0001 5361 b860 ..c.....[L..Sa.` >>> 0x0020: 19f5 0e00 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 a8 5d 40 00 >>> tx: #111 40 01 b8 32 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7d f8 >>> tx: #111 5b 4c 00 01 53 61 b8 60 >>> tx: #111 19 f5 0e 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 59 00 00 >>> rx: #110 40 01 66 37 0a 0a 63 02 >>> rx: #110 0a 0a 63 03 00 00 85 f8 >>> rx: #110 5b 4c 00 01 53 61 b8 60 >>> rx: #110 19 f5 0e 00 08 09 0a 0b >>> rx: #110 0c 0d 0e 0f 10 11 12 13 >>> rx: #110 14 15 16 17 18 19 1a 1b >>> rx: #110 1c 1d 1e 1f 20 21 22 23 >>> rx: #110 24 25 26 27 28 29 2a 2b >>> rx: #110 2c 2d 2e 2f 30 31 32 33 >>> rx: #110 34 35 36 37 >>> >>> CAN1 active: >>> >>> 1T11 111 54 00 >>> 1R11 110 >>> >>> 1CER TX_Queue T11 111 40 01 b8 32 0a 0a 63 03 >>> 1T11 111 45 00 00 54 a8 5d 40 00 >>> 1T11 111 40 01 b8 32 0a 0a 63 03 >>> 1CER TX_Queue T11 111 5b 4c 00 01 53 61 b8 60 >>> 1T11 111 0a 0a 63 02 08 00 7d f8 >>> 1T11 111 5b 4c 00 01 53 61 b8 60 >>> 1CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1T11 111 19 f5 0e 00 08 09 0a 0b >>> 1T11 111 0c 0d 0e 0f 10 11 12 13 >>> 1CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1T11 111 14 15 16 17 18 19 1a 1b >>> 1T11 111 1c 1d 1e 1f 20 21 22 23 >>> 1CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 24 25 26 27 28 29 2a 2b >>> 1T11 111 2c 2d 2e 2f 30 31 32 33 >>> 1T11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 59 00 00 >>> 1R11 110 40 01 66 37 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 85 f8 >>> 1R11 110 5b 4c 00 01 53 61 b8 60 >>> 1R11 110 19 f5 0e 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> CAN2 listen: >>> >>> 2R11 111 54 00 >>> 2R11 110 >>> >>> 2R11 111 45 00 00 54 a8 5d 40 00 >>> 2R11 111 40 01 b8 32 0a 0a 63 03 >>> 2R11 111 0a 0a 63 02 08 00 7d f8 >>> 2R11 111 5b 4c 00 01 53 61 b8 60 >>> 2R11 111 19 f5 0e 00 08 09 0a 0b >>> 2R11 111 0c 0d 0e 0f 10 11 12 13 >>> 2R11 111 14 15 16 17 18 19 1a 1b >>> 2R11 111 1c 1d 1e 1f 20 21 22 23 >>> 2R11 111 24 25 26 27 28 29 2a 2b >>> 2R11 111 2c 2d 2e 2f 30 31 32 33 >>> 2R11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2CER Error intr=10 rxpkt=14 txpkt=0 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=0 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 40 01 66 37 0a 0a 63 02 >>> 2R11 110 19 f5 0e 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 45 00 00 54 3a 59 00 00 >>> >>> Conclusion is that the CAN1 traffic looks fine, and the PING packet gets a good reply. All successful. But the CAN2 listen is missing a few packets and the last packet is out of order. >>> >>> Now, let’s test with traffic on CAN2 (active, 1Mbps), and listening on CAN1 (listen, 1Mbps): >>> >>> TCPDUMP: >>> >>> 06:00:33.004060 IP (tos 0x0, ttl 64, id 58240, offset 0, flags [DF], proto ICMP (1), length 84) >>> 10.10.99.3 > 10.10.99.2: ICMP echo request, id 23393, seq 1, length 64 >>> 0x0000: 4500 0054 e380 4000 4001 7d0f 0a0a 6303 E..T..@.@ <mailto:E..T..@.@>.}...c. >>> 0x0010: 0a0a 6302 0800 7cc8 5b61 0001 f161 b860 ..c...|.[a...a.` >>> 0x0020: 8b0f 0000 0809 0a0b 0c0d 0e0f 1011 1213 ................ >>> 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# >>> 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 >>> 0x0050: 3435 3637 4567 >>> >>> Traffic (as shown on PC the other end of the can log tcp connection): >>> >>> tx: #111 54 00 >>> tx: #111 45 00 00 54 e3 80 40 00 >>> tx: #111 40 01 7d 0f 0a 0a 63 03 >>> tx: #111 0a 0a 63 02 08 00 7c c8 >>> tx: #111 5b 61 00 01 f1 61 b8 60 >>> tx: #111 8b 0f 00 00 08 09 0a 0b >>> tx: #111 0c 0d 0e 0f 10 11 12 13 >>> tx: #111 14 15 16 17 18 19 1a 1b >>> tx: #111 1c 1d 1e 1f 20 21 22 23 >>> tx: #111 24 25 26 27 28 29 2a 2b >>> tx: #111 2c 2d 2e 2f 30 31 32 33 >>> tx: #111 34 35 36 37 >>> >>> rx: #110 >>> rx: #110 54 00 >>> rx: #110 45 00 00 54 3a 5a 00 00 >>> rx: #110 0a 0a 63 03 00 00 84 c8 >>> rx: #110 8b 0f 00 00 08 09 0a 0b >>> rx: #110 34 35 36 37 >>> rx: #110 40 01 66 36 0a 0a 63 02 >>> >>> CAN2 active: >>> >>> 2T11 111 54 00 >>> 2R11 110 >>> >>> 2CER TX_Queue T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2T11 111 45 00 00 54 e3 80 40 00 >>> 2T11 111 40 01 7d 0f 0a 0a 63 03 >>> 2CER TX_Queue T11 111 5b 61 00 01 f1 61 b8 60 >>> 2T11 111 0a 0a 63 02 08 00 7c c8 >>> 2T11 111 5b 61 00 01 f1 61 b8 60 >>> 2CER TX_Queue T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2T11 111 8b 0f 00 00 08 09 0a 0b >>> 2T11 111 0c 0d 0e 0f 10 11 12 13 >>> 2CER TX_Queue T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2T11 111 14 15 16 17 18 19 1a 1b >>> 2T11 111 1c 1d 1e 1f 20 21 22 23 >>> 2CER TX_Queue T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 24 25 26 27 28 29 2a 2b >>> 2T11 111 2c 2d 2e 2f 30 31 32 33 >>> 2T11 111 34 35 36 37 >>> >>> 2R11 110 54 00 >>> 2R11 110 45 00 00 54 3a 5a 00 00 >>> 2CER Error intr=15 rxpkt=3 txpkt=12 errflags=0x23401c01 rxerr=0 txerr=0 rxinval=0 rxovr=0 txovr=0 txdelay=5 txfail=0 wdgreset=0 errreset=0 >>> 2R11 110 0a 0a 63 03 00 00 84 c8 >>> 2R11 110 8b 0f 00 00 08 09 0a 0b >>> 2R11 110 34 35 36 37 >>> 2R11 110 40 01 66 36 0a 0a 63 02 >>> >>> CAN1 listen: >>> >>> 1R11 111 54 00 >>> 1R11 110 >>> >>> 1R11 111 45 00 00 54 e3 80 40 00 >>> 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1R11 111 14 15 16 17 18 19 1a 1b >>> 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1R11 111 24 25 26 27 28 29 2a 2b >>> 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1R11 111 34 35 36 37 >>> >>> 1R11 110 54 00 >>> 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1R11 110 14 15 16 17 18 19 1a 1b >>> 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1R11 110 24 25 26 27 28 29 2a 2b >>> 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1R11 110 34 35 36 37 >>> >>> Conclusion is that the CAN2 transmit traffic looks fine, but no PING reply received via CAN. The CAN1 listen shows the reply just fine. >>> >>> Here is that last CAN1 listen, with timestamps: >>> >>> 1622696433.080107 1R11 111 54 00 >>> 1622696433.081657 1R11 110 >>> >>> 1622696433.227479 1R11 111 45 00 00 54 e3 80 40 00 >>> 1622696433.228318 1R11 111 40 01 7d 0f 0a 0a 63 03 >>> 1622696433.245727 1R11 111 0a 0a 63 02 08 00 7c c8 >>> 1622696433.246214 1R11 111 5b 61 00 01 f1 61 b8 60 >>> 1622696433.248219 1R11 111 8b 0f 00 00 08 09 0a 0b >>> 1622696433.248772 1R11 111 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.250774 1R11 111 14 15 16 17 18 19 1a 1b >>> 1622696433.251338 1R11 111 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.253380 1R11 111 24 25 26 27 28 29 2a 2b >>> 1622696433.253944 1R11 111 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.265937 1R11 111 34 35 36 37 >>> >>> 1622696433.269221 1R11 110 54 00 >>> 1622696433.272095 1R11 110 45 00 00 54 3a 5a 00 00 >>> 1622696433.272125 1R11 110 40 01 66 36 0a 0a 63 02 >>> 1622696433.272156 1R11 110 0a 0a 63 03 00 00 84 c8 >>> 1622696433.272193 1R11 110 5b 61 00 01 f1 61 b8 60 >>> 1622696433.272245 1R11 110 8b 0f 00 00 08 09 0a 0b >>> 1622696433.272277 1R11 110 0c 0d 0e 0f 10 11 12 13 >>> 1622696433.272314 1R11 110 14 15 16 17 18 19 1a 1b >>> 1622696433.272354 1R11 110 1c 1d 1e 1f 20 21 22 23 >>> 1622696433.272387 1R11 110 24 25 26 27 28 29 2a 2b >>> 1622696433.272420 1R11 110 2c 2d 2e 2f 30 31 32 33 >>> 1622696433.272452 1R11 110 34 35 36 37 >>> >>> It is 1Mbps, with 30us or so between each packet. This is the only traffic on the bus. Everything else is turned off. Roughly 12 packets each way. Surely even if we were hitting a performance limit, our buffers can handle 12 packets? >>> >>> The good news is that I have a good environment to replicate this issue now, so any fix should be easy to test. >>> >>> I haven’t worked on the MCP2515 driver in our code in a long time, but it certainly seems something is messed up and that could be badly affecting vehicle modules using anything other than CAN1. >>> >>> I will start to look at this over the weekend, but has anyone got any ideas/suggestions? Perhaps the bit timing registers are off by a small amount (so it works on CAN1 but not on CAN2)? Or something more serious in our driver? >>> >>> Regards, Mark. >>> >>> >>> >>> _______________________________________________ >>> OvmsDev mailing list >>> OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> >>> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev> >> >> -- >> Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal >> Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 >> <MCP2515Calc-1000kbit.ods> >> >> > > > > _______________________________________________ > OvmsDev mailing list > OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> > http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.openvehicles.com <mailto:OvmsDev@lists.openvehicles.com> http://lists.openvehicles.com/mailman/listinfo/ovmsdev <http://lists.openvehicles.com/mailman/listinfo/ovmsdev>
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
participants (6)
-
Collin Kidder -
Craig Leres -
Greg D -
Greg D. -
Mark Webb-Johnson -
Michael Balzer