GNU Toolchain Cross-Compile Challenges

For the last several days I've been working to compile the latest free GNU Toolchain for ARM published by CodeSourcery (now owned by Mentor Graphics).

This process has not been easy.  In this lengthy blog post, I'll share all the patches I've written, with detailed explanations of exactly what errors I encountered, what I've learned about each problem, and how to work around it.

Edit: full source is on github.

"Read mode" for all the gory details......

Why CodeSourcery's GNU Toolchain?

You might be wondering why CodeSourcery's version of the toolchain, rather than the official GNU sources.  The simple answer is CodeSourcery is the primary contributor of ARM support for microcontrollers.  Their code contains the best support for the latest ARM microcontrollers.  They publish only 2 binaries, Windows and 32 bit Linux.  If you want 64 bit Linux and Mac version (as I do), you need to compile your own.

 

Mentor's Build Script

As required by the open source licenses, they publish the full source.  They also publish a huge script with the exact commands they used to build the toolchain.  The script comes with this comment:

# This file contains the complete sequence of commands
# Mentor Graphics used to build this version of Sourcery CodeBench.
#
# For each free or open-source component of Sourcery CodeBench,
# the source code provided includes all of the configuration
# scripts and makefiles for that component, including any and
# all modifications made by Mentor Graphics.  From this list of
# commands, you can see every configuration option used by
# Mentor Graphics during the build process.
#
# This file is provided as a guideline for users who wish to
# modify and rebuild a free or open-source component of
# Sourcery CodeBench from source. For a number of reasons,
# though, you may not be able to successfully run this script
# directly on your system. Certain aspects of the Mentor Graphics
# build environment (such as directory names) are included in
# these commands. Mentor Graphics uses Canadian cross compilers so
# you may need to modify various configuration options and paths
# if you are building natively. This sequence of commands
# includes those used to build proprietary components of
# Sourcery CodeBench for which source code is not provided.
#
# Please note that Sourcery CodeBench support covers only your
# use of the original, validated binaries provided as part of
# Sourcery CodeBench -- and specifically does not cover either
# the process of rebuilding a component or the use of any
# binaries you may build.  In addition, if you rebuild any
# component, you must not use the --with-pkgversion and
# --with-bugurl configuration options that embed Mentor Graphics
# trademarks in the resulting binary; see the "Mentor Graphics
# Trademarks" section in the Sourcery CodeBench Software
# License Agreement.

Indeed, the script is very useful as a guideline, but using it directly is pretty much impossible.   The script is obviously generated by another script, which isn't provided.  Everywhere, very long and specific full path names are embedded.  Custom versions of gcc, which aren't provided, are used by the script.  Those Mentor trademarks are also embedded in many places.  There's several sections compiling the proprietary components, which need to be removed.  You can read the script, but it's easier to write a new one than trying to run the one Mentor provides.

It took me any entire day's work to deconstruct this giant script.  The most time consuming part was learning the purpose of each pathname.  All the tools are configured with prefix set to "/opt/codesourcery".  However, the script never actually uses that directory.  The actual toolchain is installed to "/scratch/jbrown/arm-eabi/install", using an incredible amount of tedious work on every "make install" to override all the defaults based on prefix.  For quite some time, the word "scratch" confused me about the real purpose of that directory.  Lots of other stuff that actually is scratch (or temporary stuff) goes into "/scratch/jbrown/arm-eabi/obj" and similar directories.  For example, static libraries are built into "/scratch/jbrown/arm-eabi/obj/pkg-2012.09-63-arm-none-eabi/arm-2012.09-63-arm-none-eabi.extras/host-libs-i686-pc-linux-gnu/usr".  Making sense of all these directories is the key.  Then it's merely an extremely long build script....

After a couple days, I managed to recreate the entire script in a format similar to the widely used "summon arm toolchain" script.  For compiling the Linux version on Ubuntu 12.04, I was able to resolve pretty much every problem in my script by just looking at exactly what options CodeSourcery used.

 

GMP Check Problems

Of course, the first person I shared the binary with could not use it.  His system was an older version of Fedora, which had an older version of the libc.  So I used a virtual machine with Ubuntu 10.04 to compile the code.  That's where the problems began!

The first error was in GMP, the GNU Multiple Precision Arithmetic library.  All recent version of gcc use this library.  After you compile the library, it prints this helpful message.

+-------------------------------------------------------------+
| CAUTION:                                                    |
|                                                             |
| If you have not already run "make check", then we strongly  |
| recommend you do so.                                        |
|                                                             |
| GMP has been carefully tested by its authors, but compilers |
| are all too often released with serious bugs.  GMP tends to |
| explore interesting corners in compilers and has hit bugs   |
| on quite a few occasions.                                   |
|                                                             |
+-------------------------------------------------------------+

Of course, I included a "make check" in my script.  It ran without any error in Ubuntu 12.04, but failed with these errors on 10.4 in the virtual machine:

mpz_inp_str nread wrong
  inp         "0"
  base        10
  pre         0
  post        1
  got_nread   0
  want_nread  1
/bin/bash: line 5: 10113 Aborted                 ${dir}$tst
FAIL: t-inp_str
.........................
t-io_raw.c:102: GNU MP assertion failed: ! ferror(fp)
/bin/bash: line 5: 10223 Aborted                 ${dir}$tst
FAIL: t-io_raw
.........................
======================================================================================
2 of 59 tests failed
Please report to gmp-bugs@gmplib.org, see http://gmplib.org/manual/Reporting-Bugs.html
======================================================================================

Searching for the text of these errors turns up lots of web pages and email list conversations, with no useful answers.  That's why I'm writing this blog post, with the actual messages and explanation about what causes them and how to work around them.... so hopefully anyone hitting these errors will (hopefully) find this info.

At first, not finding any info about bugs in GMP or workarounds for these issues, I concluded it must be some bug in the compiler that was present in Ubuntu 10.04 but fixed by 12.04.  I installed another virtual machine with 10.10, with the same result.  Then 11.04, and then 11.11.  It seemed the problem must have been fixed between 11.11 and 12.04, since 12.04 running on my machine produced no errors.  For the sake of completeness (and also because I'd been up all night installing virtual machines, to the point Robin was about to wake up soon), I kept going to tried 12.04 in a virtual machine.

Ubuntu 12.04 failed GMP's "make check" inside VirtualBox, but passed the same check running natively!  Then I tried running those 2 specific test programs manually.  The copy compiled inside VirtualBox passes when run natively, but fails when run within the virtual machine.

Again, I searched on google, trying dozens of different queries with fragments of the error messages, virtualbox, ubuntu 12.04 and many, many other seemingly relevant terms.  It's a reoccurring theme of GNU toolchain compile errors.... searching for pretty much any error always turns up at least a few people asking for help with that exact error message, but rarely is there ever any useful reply (in fact, most have no reply at all).

So I started digging into the t-io_raw.c and t-inp_str.c code, adding lots of printf() statements and looking at the test files they write with a hex editor.  It turned out, the file was indeed longer than it should have been when running inside virtualbox vs running natively.  Adding "truncate" to more google searches finally led to VirtualBox bug #9485.

https://www.virtualbox.org/ticket/9485
https://forums.virtualbox.org/viewtopic.php?f=3&t=44056

Indeed, the GMP tests use fopen(filename, "w+") to truncate the file back to zero bytes, before writing new test data.  Knowing this was the problem, I added remove(filename) in front of each fopen(filename, "w+).  There were several more places this occurs in the GMP tests.  See my patches attached at the end of this message for the full solution.

 

More VirtualBox Trouble

Unfortunately, another ugly VirtualBox bug stopped the build process.  At least this one was easy to figure out.  Here's the error:

The directory that should contain system headers does not exist:
  /home/paul/teensy/arm_compile/arm-none-eabi/arm-none-eabi/usr/include

The previous steps did indeed build and install the libraries.  But the "usr" part of that pathname is apparently required by something inside GCC's build process.  Mentor's script (and mine by the way of copying Mentor's steps) creates a symbolic link before the final gcc build, so it will find the library headers.

When I looked at the files, the symbolic link simply wasn't there!

It turns out, this is another VirtualBox bug, number 10085:

https://www.virtualbox.org/ticket/10085#comment:14

Apparently when using shared folders, when you create a symbolic link, the link can point to locations on the host's filesystem which the guest in the virtual machine should not be able to access, because they may be outside the shared folder.  Obviously the VirtualBox people never thought carefully about how to implement symbolic links.  When it was reported as a security vulnerability, they just disabled it.

Fortunately, you can reenable symbolic links in shared folders using VBoxManage.

VBoxManage setextradata VM_NAME VBoxInternal2/SharedFoldersEnableSymlinksCreate/SHARE_NAME 1

Obviously, substiture the machine name and share name for VM_NAME and SHARE_NAME.

This episode has really shaken my confidence in VirtualBox.  Many years ago, before VirtualBox, I purchased VMware Workstation (I believe it was version 2 or 3).  Sadly, VMware doesn't update their kernel driver, unless you pay for expensive upgrages.  Other people updated the driver, but it became quite a chose to constantly search for unofficial kernel sources.  When VirtualBox came along, I dumped my ancient VMware... even though that very old version greatly outperformed VirtualBox.

If anyone from VMware reads this, I'm seriously considering buying Workstation again.  Really, it's not the cost that concerns me, but rather it breaking every time Ubuntu publishes a kernel update.  I had such a painful experience, long-term, with that old version of Workstation constantly breaking with each kernel update.  I depend upon being able to compile and test code in virtual machines... and usually it's an urgent need like a customer has a problem I need to reproduce.  I depend on the shared folders to quickly access code and files.

I really need to find a more reliable and dependable virtual machine....

 

CLOOG Library on Mac OS-X

With Linux working (and my script implementing everything Mentor did), I turned to compiling for Macintosh.  I was relieved when GMP passed, but this problem compiling CLOOG quickly turned up:

ld: library not found for -lgcc_s

As usual, searching for the error and combinations of related terms finds pages with the same problem (or at least similar errors), but no solutions.

However, I did find this page, which references a makefile written by James Snyder.

http://gnuarmeclipse.livius.net/wiki/Toolchain_installation_on_OS_X

James obviously deconstructed Mentor huge build script and created a nice makefile.  However, it's for an older version of the tools, before the CLOOG library was used.  James did use a small patch to run bash instead of sh.  Apparently something in gcc's multilib build depends upon bash.  I used James's patch, and it's included in the set attached to this post.

Thanks James!  :-)

For CLOOG, It didn't take much digging to find the libgcc_s.a is named slightly differently by Apple.

It turns out, CLOOG's configure script has a check for compiling on OS-X, to use the correct name, but it doesn't work because it checks for a specific (and old) version.  Here's the code:

        LIBS="$LIBS -lppl_c -lppl -lgmpxx -lstdc++"
        if test x$host_alias = xi686-darwin8; then
            LIBS="$LIBS -lgcc_s.10.4"
        elif test x$host_alias != xi686-mingw32; then
            LIBS="$LIBS -lgcc_s"
        fi

On my late-2011 MacBook running Lion (10.7) and Xcode command line tools 4.5.2, host is "x86_64-apple-darwin11.4.2", not "i686-darwin8".  I actually created a much more general patch, which is included in the file attached to the blog post.

 

Another GMP Check Issue - Windows

I've always felt a little intimidated by Canadian cross compiling, which means compiling a compiler that will run on another system, producing code for yet a 3rd system.  The GNU toolchain calls "build" the system you're using the build the tools, "host" the system where those tools will run, and "target" the system that will execute the compiler's output.  But standing on the CodeSourcery giant's shoulder (or at least their giant example script), I decided to give Canadian cross compiling a try.

Right away, I ran into another GMP check failure.

t-scanf.c:1497: GNU MP assertion failed: ret == (-1)

abnormal program termination
FAIL: t-scanf.exe

Having learned from the VirtualBox problems, I copied the executable to an actual Windows machine.  Sure enough, it runs without error on real Windows, but fails in Wine.

For this one, spending time trying google searches actually paid off!  I found this message from Rick Jones at RedHat, which explains the problem and provides a patch.

http://gmplib.org/list-archives/gmp-devel/2009-January/000817.html

This patch is included in the archive attached to this blog post.

 

MAKE for Windows

I decided to try building GNU make for Windows and Mac, since Microsoft and Apple don't provide it with their base system.

glob.c:76:18: error: pwd.h: No such file or directory
glob.c: In function ‘glob’:
glob.c:681: warning: assignment makes pointer from integer without a cast
glob.c:684: error: dereferencing pointer to incomplete type
glob.c:765: warning: assignment makes pointer from integer without a cast
glob.c:768: error: dereferencing pointer to incomplete type
make[2]: *** [glob.o] Error 1

I spent quite a lot of time searching for this error with related keywords.  I found a few threads, and one gave a solution for the first error (adding a #ifdef check for __MINGW32__).

http://lists.gnu.org/archive/html/make-w32/2003-04/msg00006.html

I started digging through the code, finding all the places with WINDOWS32 and adding the check for __MINGW32__.  Then I grepped the files in other directories and found many more WINDOWS32 checks.  I knew this wasn't a good path, so I started looking at why WINDOWS32 wasn't being defined for my cross compile.

Inside make's configure script, mingw is checked like this:

case "$host" in
  *-*-mingw32)

The mingw cross compiler I'm using is the package from Ubuntu, which has host string "i586-mingw32msvc", that doesn't match this check (only 1 dash character).  A simple patch to allow make's configure to recognized Ubuntu's mingw magically made everything work!

 

GCC, MinGW and caddr_t

The next problem was in compiling gcc.  There's something disheartening about gcc compile errors.  Any error is bad enough.  Even a relatively small package like GMP has hundreds of source files, but gcc.....

In fact, I had to scroll up the window and sift through many lines just to see the error.  At this point, gcc's build appears to be nested at least 4 or 5 makefiles deep!  Here's the actual error:

In file included from /home/paul/arm/native/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/include/stdio.h:46:0,
                 from /home/paul/arm/workdir/gcc-4.7-2012.09/libgcc/../gcc/tsystem.h:88,
                 from /home/paul/arm/workdir/gcc-4.7-2012.09/libgcc/crtstuff.c:62:
/home/paul/arm/native/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/include/sys/types.h:126:16: error: expected identifier or '(' before 'char'

The actual code at line 126 in types.h was this:

typedef char *  caddr_t;

I spent quite a bit of time trying to understand how something so simple could be an unexpected identifier before "char".  How could "typedef" be an unexpected identifier.

Searching on google, using lots of different combinations of keywords, turned up numerous people encountering similar problems, but few useful leads.  Ultimately I ended up grepping all the gcc files for "caddr_t" until I found this suspicious code in gcc/configure:

ac_fn_c_check_type "$LINENO" "caddr_t" "ac_cv_type_caddr_t" "$ac_includes_default"
if test "x$ac_cv_type_caddr_t" = x""yes; then :

else

cat >>confdefs.h <<_ACEOF
#define caddr_t char *
_ACEOF

fi

 

Indeed, that was turning the perfectly reasonable "typedef char * caddr_t" into "typedef char * char *".  Not good.

Also not good is why.  Some investigating shows the check runs the mingw compiler to learn if its headers define caddr_t.  If the code doesn't compile, then the #define is added to confdefs.h.  That's great for compiling gcc itself.  But clearn confdefs.h is being intermingled with newlib's headers.  I don't know if this is a bug in gcc, or just somehow I haven't probably specified build, host and target (I did pass them all and much more to the top-level configure).

Anyway, the solution (or ugly hack) was just to comment out that #define line.  It's in my patch collection attached to this post.
 

Multilib Problems

With all these tweaks, the toolchain compiles on every platform.  But it produces non-working programs!

However, the same LED blink test program did work with CodeSourcery's published binary copy.  So I used arm-none-eabi-objdump -d to disassemble each one, and then I manually compared their generated assembly language code.

It turned out my build was including 32 bit ARM instructions for certain library functions, not the 16/32 bit Thumb2 instructions that CodeSourcery was properly including.  They had configured multilib differently.

This command prints a summary of the compiler's multilib setup.

arm-none-eabi-gcc -dumpspecs | grep -A1 multilib:

If you get what looks like a correct compiler, but non-working programs, maybe this will help?

After much digging, I tracked the problem down to the file "t-cs-eabi-lite", which is used due to the configure option "--enable-extra-sgxxlite-multilibs".  I'm not 100% of the cause of the error, but it seems like perhaps the file provided might not actually be the same as what they used when compiling their toolchain?

Anyway, since I'm building a toolchain targeting only 1 board, I just hacked this file up horribly to target only the hardware I need.  My t-cs-eabe-lite file is in the archive, but I was warn you not to use it, unless you intend to build a limited toolchain.  Perhaps I'll revisit this file (or maybe someone will leave a comment with improvements).  At least this message might help you get to the cause of wrong architecture code being linked, rather than go through so much trouble to successfully compile and only have the resulting output not work.

 

Contacting Me... "Thanks", not "Tech Support"

I took several hours to write this lengthy blog post, recreating each problem in each environment to get the exact errors, so you dear reader might have better luck with google, or whatever search you use, to find these solutions.  Throughout this project, finding others with the same errors has been easy, but solutions seem rare.  Hopefully this blog post will help a bit?

Also, it's meant to give fellow Dorkbotters (in the unlikely case they have time read this) a glimpse into this process and what I've been up to.  Perhaps it might help me too, if I later need to figure out why I created these patches.

I'm pretty easy to find if you search for Paul Stoffregen.  Please, if you do contact me, say "thanks".  DO NOT ask me for technical support with the GNU toolchain, particularly compiling it.  If it doesn't compile or doesn't work, and you can't find the answer online (pretty much my experience for most errors), you're going to have to dig into the code and figure it out yourself.  Hopefully this message might give you some insight.

The several hours I've spent composing this message are as much as I can help you with GNU toolchain compile issues.

 

AttachmentSize
pauls_gnu_toolchain_patches.zip4.66 KB

Comments

YAGARTO?

Have you tried it? Works well for me, and they include a build script if you want to DIY:

http://www.yagarto.de

        - Jared

And the official ARM release?

Feurig pointed out this, which looks like a better option than even YAGARTO. They provide builds for all three major OSes, and it's straight from ARM.

https://launchpad.net/gcc-arm-embedded

        - Jared