Wow. Just wow. $ ssh jack@1.2.3.217 The authenticity of host '1.2.3.217 (1.2.3.217)' can't be established. RSA key fingerprint is SHA256:Fv1cgvPKmGoojR2nGa+/rnCvu7N0Wv/4pr8NOwHpgn0. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '1.2.3.217' (RSA) to the list of known hosts. WolfSSH Server jack@1.2.3.217's password: Welcome to the Open Vehicle Monitoring System (OVMS) - SSH Console OVMS > Stephen Casner, you are without a doubt ‘da man. Very very cool.
Mark, Thanks for the compliments. This integration of WolfSSH was a significant struggle for various reasons that I will describe here for their possible entertainment value :-) as well as noting some additional debugging approaches. Integration of the WolfSSL and WolfSSH libraries themselves into the OVMS build did not take too long. The build process for those libraries is implemented with autoconf and automake and the resulting configure script, which doesn't fit with the ESP-IDF build structure. My first step was to just build those two libraries and the example SSH echo server program on my Mac OSX laptop using the configure script as designed. Then I was able to run configure again with the option --enable-cryptonly to limit the build to just that portion needed for SSH plus the --disable-silent-rules option (or equivalently run "make V=1") to show the full command lines during the make. That let me see which object files were needed out of the big WolfSSL tree and to see what -D defines were used during the make. I also looked at the defines in the config.h file that was produced to see which were relevant. At first I added those defines to CFLAGS in the component.mk files, but later moved them to the user_settings.h file that the Wolf sources are set up to include if WOLFSSL_USER_SETTINGS is defined (in the component.mk) so they could be included when the library headers were included in the console_ssh.cpp compilation. The way the WolfSSH code is written there are many instances where the compiler detects potential references to uninitialized variables even though the code structure ensures that won't happen. At first I tried to add initializations, but then gave up and added CFLAGS option -Wno-maybe-uninitialized to the component.mk file. I also needed to define the COMPONENT_EXTRA_INCLUDES variable in coponent.mk to reference the directory where the FreeRTOS includes are located in ESP-IDF because, although WolfSSL does support FreeRTOS, their #include statements expect the files one level up. I also needed to list the object files explicitly to avoid a warning about direct compilation misc.c because it is intended to be compiled only as part of other .c file that #include it. With those additions to the component.mk files I was able to compile wolfssl and wolfssh without any changes to the files in the distributions. I created console_ssh from console_telnet plus code snippets from the WolfSSH example echoserver program, then I realized that I needed to add a define NO_DEV_RANDOM since there is no /dev/random facility. I'm temporarily using the non-random test generator, but need to implement access to the ESP32 random facility. I worked through several variations in the transition from the design for telnet where the library does not do any of the I/O itself to ssh where it does. At that point the project became more difficult. When connecting from the client on the Mac it would sometimes progress to the point of getting a command prompt, but often there were complaints like "incorrect signature" or "Bad packet length 2263058189". There were also some crashes suggesting corrupted memory. I turned on the debug logging in wolfssh and in the ssh client, but that did not give enough details about the raw packets. I installed Putty (which also did not go smoothly) since it does provide logging of the raw packets. I came to the conclusion that the complaints about packets were the result of memory corruption, so to find out where in the wolfssh code this was occurring I added calls in several places to a heap debug routine that checks the integrity of all free and allocated blocks an records a tag number from the call into a ring buffer to track which instance of the call it was and abort when a memory error ws found. That way I could see which was the last call before the problem and the first one after. Over a bunch of runs the results varied. Then I began to see some cases where the sequence of tags should not be possible according to the code structure. I expanded the wolfssh debug logging to add a task name the the log line. That exposed the root problem: I had put some of the initialization code in the constructor and initialization parts of the SSHConsole task that are executed by the parent process before the SSHConsole task is created. Then the SSHConsole task could be created and start sending the command line introduction before the initialization was finished, causing two tasks to be accessing the same thread context in the library. That's not safe. This sounds straighforward to deduce now, but it took a while to run test cases and dig through the details before I realized the problem. Progress was also impeded by a bug in gdb that causes it to get stuck in an infinite loop. I kept hitting this when trying to look at the memory problems, so I had to digress into debugging gdb using the Mac's lldb. Naturally the gdb is built without debugging symbols, so I had to debug using disassembly. It turns out that the bug is one that was fixed in 2009 in the libiconv library. I'm not sure if this is true on all platforms, but the xtensa-esp32-elf-gdb binary supplied for the Mac references /usr/lib/libiconv.2.dylib which is version 7.0.0. MacPorts supplied a newer one, 9.0.0, but the /usr/lib directory is locked down on the Mac so you can't change it even with sudo and I could not find a tool to edit the gdb binary. So I tried following the ESP32 documentation for building the toolchain from sources. Ater several minutes that got most of the way through but failed on the gdb step with a complaint from the configure script that python was not available or not usable. I knew that should not be true, so I digressed in debugging the configure script (which, if you've tried this, you know they are almost impossible to read). Again by adding V=1 to the make command that invokes configure I figured out that the problem was that the configure script was including a -u option to gcc when compiling the test program for that configure step, but on the Mac gcc is symlinked to clang which does not understand the -u option. I figured out how to hack the configure script to omit that (unnecessary) option and then manually ran configure with an option to get libiconv from where Macports puts it in /opt/local/lib, followed by make to build gdb. Now I have a gdb that does not get stuck in the infinite loop. If any of you want to use gdb and hit this problem, I can assist with fixing it. -- Steve On Wed, 15 Nov 2017, Mark Webb-Johnson wrote:
Wow. Just wow.
$ ssh jack@1.2.3.217 The authenticity of host '1.2.3.217 (1.2.3.217)' can't be established. RSA key fingerprint is SHA256:Fv1cgvPKmGoojR2nGa+/rnCvu7N0Wv/4pr8NOwHpgn0. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '1.2.3.217' (RSA) to the list of known hosts. WolfSSH Server jack@1.2.3.217's password:
Welcome to the Open Vehicle Monitoring System (OVMS) - SSH Console OVMS >
Stephen Casner, you are without a doubt ‘da man.
Very very cool.
participants (2)
-
Mark Webb-Johnson -
Stephen Casner