[Ovmsdev] sprintf / crashes
mark at webb-johnson.net
Thu Dec 20 09:29:40 HKT 2012
I think our revisions overlapped. I'd just merged in all your previous changes, plus documentation and server code updates.
I did make some minor layout fixes (where the indentation was wrong - I think you are using 4 spaces, or perhaps tabs, where the rest of the project uses 2). I also added your pseudo-command #6 to the protocol document.
Can you merge in my changes, then re-push yours? I'd like to get this v2 branch complete today and merge back into master tonight (my time).
Going forward, do you still want to maintain and work off your clone, or would it be easier if I just gave you write access to the main project? Your contributions are so helpful and good, that there is little I am having to do other than just accept them :-)
For the watchdog reboot and occasional ram trashing, I too suspect the NET, NET_MSG or SMS code. It is the only place where things are externally controlled to result in variable length strings. I did review it a while ago, but didn't see anything obvious. The other possibility is sprintf() elsewhere in the code (such as STAT).
Running v2.1.1 in my car, I have seen the firmware version go bizarre about a month ago:
2012-11-21 07:32:06.388834 -0500 info main: #74 C EV915 rx msg F 2.1.1/V2,SFZRE8B15B3000569,1,1,TR2N,3(2G)
2012-11-21 07:34:58.845568 -0500 info main: #61 C EV915 rx msg F 49.51.51/V2,SFZRE8B15B3000569,1,1,TR2N,3(2G)
That is a sprintf().
On 20 Dec, 2012, at 9:13 AM, Michael Balzer <dexter at expeedo.de> wrote:
> I've rewritten all my sprintf() calls now. I introduced a new general string utils family to ease avoiding sprintf(), see my utils module addition.
> I've had no garbled strings since and can now fetch all my history rows from the server, so it had some positive effect.
> But the watchdog timeout reboots still occur, they still occasionally trash variables and I once still got the STKUNF flag from the reboot. That feels like some uninitialised pointer or writing beyond array / string bounds. I'm about to review the basic net code, if you've got an idea where to look first, tell me.
> An example of the RAM trashing can still be seen on the server: MP-0 W17.4,8,17.4,8,17.4,8,17.4,8,-1
> When that occured, a lot of other data was displayed wrong as well.
> The TPMS vars never get written to by the twizy module. The values 17.4 & 8 mean both car_tpms arrays had been filled completely with 0x30 or '8'. Does that ring a bell?
> Am 18.12.2012 02:39, schrieb Mark Webb-Johnson:
>> In general, I try to minimise stack and ram usage on the small PICs.
>> Early on in OVMS, we had a bunch of local variables, and some were quite large. We were getting all sorts of random weird behavior (reboots, corrupt messages, etc). Since we changed to global variables, and very limited use of stacked function calls and local variables, things have been much better.
>> I agree that a large sprintf may be the cause of your problems. Can you try to change to itoa() and strcat(), to see if it makes an impact?
>> Regards, Mark.
>> On 18 Dec, 2012, at 8:05 AM, Michael Balzer wrote:
>>> the client_app.pl hint was good, I had not recognized that as a server query utility yet.
>>> I removed the comma (misread the draft) and can now see my H entries. However, that lead me back to my assumed connectivity issue:
>>> MP-0 c31,0,2,6,RTPWR-BattCell,1,1,76,2012-12-17 23:18:43,2012-12-17 23:18:43
>>> MP-0 c31,0,3,6,RT-PWR-BattCell,14,16,1215,2012-12-17 23:13:11,2012-12-17 23:18:43
>>> MP-0 c31,0,4,6,RT-PWR-BattC�ll,1,1,65,2012-12-17 23:18:43,2012-12-17 23:18:43
>>> MP-0 c31,0,5,6,RT-PWR-BattPack,1,2,202,2012-12-17 23:13:11,2012-12-17 23:18:43
>>> MP-0 c31,0,6,6,RT-PWR-Usag,1,1,45,2012-12-17 23:13:11,2012-12-17 23:13:11
>>> This C31 result shows all kinds of garbled chars in my module's messages, and even a truncation on "RT-PWR-UsageStats" (also missing parts on the data blob on that one).
>>> Now that's a bit odd and most probably cannot be connected to a GPRS link failure -- as that would not garble single bytes in a TCP connection.
>>> I could fix some similar output problems in DIAG mode more than once by reducing complex sprintf() calls, so I searched for C18 sprintf() stack usage and found nothing concrete, but many warnings about very high stack usage of the whole printf family, plus advice not to use them at all on small embedded systems. One source mentioned sprintf() will need 70+ bytes stack for a simple integer template.
>>> I also have read a bit into the C18 software stack management and found my previous assumption to be correct: it's currently fixed to bank 12 (0xC00), so provides 256 bytes for any kind of parameter + local vars combination. I think sprintf() on a 256 byte stack could well be a source of problems... and stack overruns can produce weird effects, as those above. I think about rewriting all my sprintf calls to itoa/ltoa/ultoa, but find it strange they did no harm up to now, even with complex templates as in net_msgp_environment(). Or maybe they did, unrecognized?
>>> Do you have some other info on C18 sprintf()? I'd rather avoid recoding every output without sprintf(), but that's my best bet currently...
> Michael Balzer * Paradestr. 8 * D-42107 Wuppertal
> Fon 0202 / 272 2201 * Handy 0176 / 206 989 26
> OvmsDev mailing list
> OvmsDev at lists.teslaclub.hk
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OvmsDev