#2830 closed defect (fixed)

throwing std::runtime on PC BSP fails.

Reported by: Chris Johns Owned by: Needs Funding
Priority: normal Milestone: 5.2
Component: tool/gcc Version: 5
Severity: normal Keywords:
Cc: Blocked By:
Blocking:

Description (last modified by Joel Sherrill)

Throwing a std::runtime() exception locks up.

The lock up is in the exception clean up handler where the exception object is destructed. The destructor loops distructing the std::string object. The path ends up in libstdc++-v3/include/ext/atomicity.h line 48 or exchange_and_add.

At a guess it would seem like the C++ atomics on i386 is broken or fragile.

UPDATE: This was broken when gcc i386 eliminated -mcpu in favor of -march/-mtune. The multilibs were built with -mtune and not -march.

Attachments (3)

cdtest-throw-std_runtime.diff (2.7 KB) - added by Chris Johns on Dec 2, 2016 at 5:10:43 AM.
Patch to the cdtest sample that shows the problem
gcc-7.5.0-i386march-1.diff (759 bytes) - added by Michael Davidsaver on Sep 19, 2020 at 2:46:42 PM.
Change gcc multiarch config to use -march
0001-patch-gcc-i386-multiarch.patch (790 bytes) - added by Michael Davidsaver on Sep 19, 2020 at 2:53:28 PM.
Patch RSB to patch GCC

Download all attachments as: .zip

Change History (24)

Changed on Dec 2, 2016 at 5:10:43 AM by Chris Johns

Patch to the cdtest sample that shows the problem

comment:1 Changed on Dec 2, 2016 at 5:12:22 AM by Chris Johns

Description: modified (diff)

comment:2 Changed on Dec 9, 2016 at 7:02:30 AM by Sebastian Huber

Works at least on SPARC and ARM. On which BSP fails this?

comment:3 in reply to:  2 Changed on Dec 9, 2016 at 11:24:02 AM by Chris Johns

Replying to sebastian.huber:

Works at least on SPARC and ARM. On which BSP fails this?

i386/pc686 tested on qemu with a core2duo cpu.

comment:4 Changed on Feb 15, 2017 at 2:20:42 PM by Sebastian Huber

Milestone: 4.12Indefinite
Owner: set to Needs Funding
Status: newassigned

comment:5 Changed on Sep 10, 2020 at 8:24:40 PM by Michael Davidsaver

I'm observing a hang with RTEMS 5.1 with i386/pc686 which may be this issue, though it does not looks to me to be in an exception class dtor. This is a test case I'm running in QEMU w/ 1 virtual CPU.

It's on exit from an catch(...){ block. The actual hang appears to be a tight loop in the __atomic_fetch_add_4 builtin. Specifically, these three instructions.

=> 0x3f1780 <libat_fetch_add_4>:        mov    $0x5,%eax
   0x3f1785 <libat_fetch_add_4+5>:      mov    %eax,0xc(%esp)
   0x3f1789 <libat_fetch_add_4+9>:      jmp    0x3f1780 <libat_fetch_add_4>

The stack trace is:

(gdb) bt
#0  libat_fetch_add_4 (mptr=0x75d7bc, opval=4294967295, smodel=5) at ../../../../gcc-7.5.0/libatomic/fop_n.c:44
#1  0x003b07fc in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x75d7bc)
    at /home/mdavidsaver/source/rtems/rtems-source-builder-5.1/rtems/build/i386-rtems5-gcc-7.5.0-newlib-7947581-x86_64-linux-gnu-1/build/i386-rtems5/mpentiumpro/libstdc++-v3/include/ext/atomicity.h:49
#2  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x75d7bc)
    at /home/mdavidsaver/source/rtems/rtems-source-builder-5.1/rtems/build/i386-rtems5-gcc-7.5.0-newlib-7947581-x86_64-linux-gnu-1/build/i386-rtems5/mpentiumpro/libstdc++-v3/include/ext/atomicity.h:82
#3  __gnu_cxx::__eh_atomic_dec (__count=0x75d7bc) at ../../../../../gcc-7.5.0/libstdc++-v3/libsupc++/eh_atomics.h:72
#4  __gxx_exception_cleanup (code=_URC_FOREIGN_EXCEPTION_CAUGHT, exc=0x75d7fc)
    at ../../../../../gcc-7.5.0/libstdc++-v3/libsupc++/eh_throw.cc:46
#5  0x003ad9cb in _Unwind_DeleteException (exc=0x75d7fc) at ../../../../gcc-7.5.0/libgcc/unwind.inc:271
#6  0x003af8d0 in __cxxabiv1::__cxa_end_catch () at ../../../../../gcc-7.5.0/libstdc++-v3/libsupc++/eh_catch.cc:125
#7  0x001012fd in epicsTimeTest () at ../epicsTimeTest.cpp:116

comment:6 Changed on Sep 11, 2020 at 4:51:16 AM by Sebastian Huber

If this BSP uses libatomic to load a 32-bit value, then it uses an obsolete instruction set.

comment:7 Changed on Sep 11, 2020 at 4:56:14 PM by Michael Davidsaver

I've not come across 'libatomic' before. I guess this is some compatibility glue for older x86?

From some playing around, it looks like the toolchain gcc is defaulting to '-march=i386' if no other option is provided. Maybe not surprising given that the toolchain name is 'i386-rtems5'. '<prefix>/make/custom/pc686.cfg' has '-mtune=pentiumpro -march=pentium'. Passing this to a short test seems to result in the intrinsic actually being used. So I guess the RTEMS kernel config/build is ok?

Starting from the linker map of epicsTimeTest, I see that the symbol __atomic_fetch_add_4 (aka. 'libat_fetch_add_4') is undefined in <prefix>/lib/gcc/i386-rtems5/7.5.0/mpentiumpro/libstdc++.a. So I guess this means that the (multilib?) build of libstdc++ is not being done correctly?

My knowledge of GCC internals doesn't extend beyond ./configure arguments. So I don't know where to look next.

Last edited on Sep 11, 2020 at 4:59:13 PM by Michael Davidsaver (previous) (diff)

comment:8 Changed on Sep 11, 2020 at 10:07:20 PM by Joel Sherrill

Could someone please try this test case with -march=i486 as the compiler selection?

I think we need to move the base uniprocessor x86 CPU model up from vanilla i386 w/FPU but what the new floor needs to be is TBD. If I understand things correctly, i486 is the minimum with any atomic instructions but you have to get into the pentium II era to get the earliest SMP support.

I'm not sure if going beyond i486 is needed for uniprocessor but that may be sufficient. For SMP, you probably need to go to at least pentium II. On Qemu, I used core2duo long ago to test SMP.

If we move the floor to greater than or equal to Pentium, there is more opportunity to remove a small bit of code. But I'd like to know the bare technical minimums first.

So what is the lowest architecture (-march=XXX) that appears to solve this for you?

comment:9 Changed on Sep 11, 2020 at 10:58:24 PM by Michael Davidsaver

Could someone please try this test case with -march=i486 as the compiler selection?

This seems to have the desired effect (emits a 'lock' instruction instead of calling __atomic_fetch_add_4).

I'm still perplexed that the 'pentiumpro' version of libstdc++.a appears to be built with '-march=i386'. All 6 versions seem to be. Is this how gcc's multilib is meant to work?

comment:10 Changed on Sep 11, 2020 at 11:18:34 PM by Michael Davidsaver

I may have answered part of my question with a lucky grep of the gcc source. I found 'gcc/config/i386/t-rtems' which seems to show that the different multilib versions are built with '-mtune=...', and presumably defaulting to '-march=i386'. RTEMS itself in '<prefix>/make/custom/pc686.cfg' has '-march=pentium'. What is the logic here?

MULTILIB_OPTIONS = mtune=i486/mtune=pentium/mtune=pentiumpro msoft-float
MULTILIB_DIRNAMES= m486 mpentium mpentiumpro soft-float
MULTILIB_MATCHES = msoft-float=mno-80387
MULTILIB_MATCHES += mtune?pentium=mtune?k6 mtune?pentiumpro=mtune?athlon
MULTILIB_EXCEPTIONS = \
mtune=pentium/*msoft-float* \
mtune=pentiumpro/*msoft-float*

comment:11 Changed on Sep 12, 2020 at 7:47:29 AM by Sebastian Huber

We have two different issues:

  1. The i386 BSPs use probably obsolete instruction sets which lead to the use of libatomic.
  1. If libatomic is used, then there is some broken behaviour.

The libatomic seems to work fine on other architectures, e.g. it is also used by the sparc/erc32 BSP. Here the cdtest.exe test program runs successfully and executes a similar context:

Breakpoint 2, libat_fetch_add_4 (mptr=0x203d798, opval=4294967295, smodel=4) at ../../../gnu-mirror-gcc-c72a1b6/libatomic/fop_n.c:164
164     ../../../gnu-mirror-gcc-c72a1b6/libatomic/fop_n.c: No such file or directory.
(gdb) bt
#0  libat_fetch_add_4 (mptr=0x203d798, opval=4294967295, smodel=4) at ../../../gnu-mirror-gcc-c72a1b6/libatomic/fop_n.c:164
#1  0x0200eda0 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x203d798) at /home/EB/sebastian_h/src/rtems-source-builder/rtems/build/sparc-rtems6-gcc-c72a1b6-newlib-ece49e4-x86_64-linux-gnu-1/build/sparc-rtems6/libstdc++-v3/include/ext/atomicity.h:84
#2  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x203d798) at /home/EB/sebastian_h/src/rtems-source-builder/rtems/build/sparc-rtems6-gcc-c72a1b6-newlib-ece49e4-x86_64-linux-gnu-1/build/sparc-rtems6/libstdc++-v3/include/ext/atomicity.h:84
#3  __gnu_cxx::__eh_atomic_dec (__count=0x203d798) at ../../../../gnu-mirror-gcc-c72a1b6/libstdc++-v3/libsupc++/eh_atomics.h:72
#4  __gxx_exception_cleanup (code=_URC_FOREIGN_EXCEPTION_CAUGHT, exc=0x203d7d0) at ../../../../gnu-mirror-gcc-c72a1b6/libstdc++-v3/libsupc++/eh_throw.cc:46
#5  0x0201e584 in _Unwind_DeleteException (exc=0x203d7d0) at ../../../gnu-mirror-gcc-c72a1b6/libgcc/unwind.inc:283
#6  0x02001a44 in foo_function () at ../../../testsuites/samples/cdtest/main.cc:198
#7  main_task () at ../../../testsuites/samples/cdtest/main.cc:212
#8  0x02006b8c in _Thread_Entry_adaptor_numeric (executing=0x2032ae8 <_RTEMS_tasks_Objects>) at ../../../cpukit/score/src/threadentryadaptornumeric.c:25
#9  0x02006c64 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:143
#10 0x02006c04 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:87

comment:12 in reply to:  5 Changed on Sep 12, 2020 at 10:29:03 AM by Sebastian Huber

Replying to Michael Davidsaver:

It's on exit from an catch(...){ block. The actual hang appears to be a tight loop in the __atomic_fetch_add_4 builtin. Specifically, these three instructions.

=> 0x3f1780 <libat_fetch_add_4>:        mov    $0x5,%eax
   0x3f1785 <libat_fetch_add_4+5>:      mov    %eax,0xc(%esp)
   0x3f1789 <libat_fetch_add_4+9>:      jmp    0x3f1780 <libat_fetch_add_4>

From this code it is clear, that this is a libatomic configuration issue on i386. We have a recursive call here. We probably use instruction sets which are not really tested these days by someone else.

comment:13 Changed on Sep 12, 2020 at 2:54:58 PM by Joel Sherrill

The bug goes back to when gcc replaced -mcpu= with -march= and -mtune=. We used to generated code specifically compatible with and optimized for a CPU model. -march is now the compatibility level flag and -mtune is an optimization indication. We are generating i386 compatible code which is tuned based on say an i686 instruction weighting.

The multilib -mtune needs to change to -march.

Since -march is the first x86 option described, it is at the top of this page in the GCC manual:

https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gcc/x86-Options.html#x86-Options

A quick search browsing gcc/config/i386/t-* shows RTEMS seems to be the only i386 target building multilibs which are cpu model based. Others use m32/m64 or other things.

comment:14 Changed on Sep 18, 2020 at 9:36:49 PM by Michael Davidsaver

Is there agreement on a path forward for this issue? Is it as simple as replacing 'mtune' with 'march' in gcc/config/i386/t-rtems ? If so, who will do/test this? This change seems simple enough that I would just try it myself, but I'm not sure where/how to add a patch in the RSB recipies.

comment:15 Changed on Sep 18, 2020 at 9:53:00 PM by Joel Sherrill

I think that's the extent of the source changes which we think will resolve the issue. Someone may point to documentation but it is Friday and I am going to point to an example of adding a patch and talk you through it. This assumes the 5 branch because we won't get a patch merged into gcc 7. We can address RTEMS master after 5 is fixed. The patch for gcc should be the same. We just have more latitude to merge it to gcc master and newer release branches.

How to generate a patch: https://devel.rtems.org/wiki/Developer/Coding/GenerateAPatch which should also be in the Software Engineering Guide.

The easiest example I saw was in rtems/config/5/rtems-lm32.bset which adds a gdb patch only for lm32 builds. You would be adding a gcc patch to rtems/config/5/rtems-i386.bset.

For merging, a patch needs an Internet home (attaching to a ticket gives you a URL) but for testing, you can just drop the diff into the patches subdirectory config/rtems/patches. The RSB will see it there and not try to fetch it. But you need an sha checksum.

After that, it is build as normal for testing purpose. If this gives you a toolset that works, we can address changing the URL. But attaching it to this ticket and getting the URL for the "raw" attachment should work just fine.

Changed on Sep 19, 2020 at 2:46:42 PM by Michael Davidsaver

Attachment: gcc-7.5.0-i386march-1.diff added

Change gcc multiarch config to use -march

Changed on Sep 19, 2020 at 2:53:28 PM by Michael Davidsaver

Patch RSB to patch GCC

comment:16 Changed on Sep 19, 2020 at 4:47:42 PM by Michael Davidsaver

I've attached patches for GCC and RSB.

A successful test build: https://github.com/mdavidsaver/rsb/runs/1138090148?check_suite_focus=true#step:5:563

comment:17 Changed on Sep 19, 2020 at 5:49:16 PM by Joel Sherrill

Congratulations! Can you confirm you tested code compiled with the resulting toolchain and it solved the problem?

comment:18 Changed on Sep 19, 2020 at 7:44:55 PM by Michael Davidsaver

Ha. That would be a good thing to mention would it not? Yes, the epicsTimeTest now passes, and the cdtest completes as well (with qemu*). I am having a problem with another test, but this is almost certainly a separate issue.

comment:19 Changed on Sep 21, 2020 at 7:43:58 PM by Joel Sherrill

Component: unspecifiedtool/gcc
Description: modified (diff)
Milestone: Indefinite5.2

comment:20 Changed on Sep 21, 2020 at 8:37:37 PM by Michael Davidsaver <mdavidsaver@…>

Resolution: fixed
Status: assignedclosed

In ebc3abe/rtems-source-builder:

patch gcc i386 multiarch

Add patch to change from mtune to march when building multilibs.
The mtune argument tunes or optimizes for a specific CPU model but
does not ensure the generated code is appropriate for the
CPU model. Prior to this patch, i386 compatible code was always
generated but tuned for later models.

Closes #2830.

comment:21 Changed on Sep 21, 2020 at 9:14:32 PM by Michael Davidsaver <mdavidsaver@…>

In 1ea1c9c/rtems-source-builder:

patch gcc i386 multiarch

Add patch to change from mtune to march when building multilibs.
The mtune argument tunes or optimizes for a specific CPU model but
does not ensure the generated code is appropriate for the
CPU model. Prior to this patch, i386 compatible code was always
generated but tuned for later models.

This is the same fix as #2830 but applying to gcc 10.

Updates #4084.

Note: See TracTickets for help on using tickets.