#2274 closed enhancement (fixed)
Enable libgomp build in GCC
Reported by: | Sebastian Huber | Owned by: | Sebastian Huber |
---|---|---|---|
Priority: | normal | Milestone: | 4.11.1 |
Component: | tool/gcc | Version: | 4.11 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: |
Description
libgomp is the support library for OpenMP code emitted by GCC. Adding support for RTEMS needs roughly the following steps:
- Move <semaphore.h> header file from RTEMS to Newlib. Due to license issue use the one provided by FreeBSD and modify it accordingly.
- Add Autoconf code to detect presence of Newlib <semaphore.h>.
- Add RTEMS tweaks to libgomp configure script.
- Add RTEMS specific link-time configuration to select a special memory allocator for libgomp.
- Add ability to control thread scheduler, priority, stack size, etc. via application configuration options/handler.
- Add standard OpenMP tests to RTEMS testsuite.
- Add documentation to user manual.
- Do performance tests.
- Add dedicated low-overhead barriers.
Attachments (5)
Change History (14)
comment:1 Changed on 02/18/15 at 14:47:43 by Sebastian Huber
Status: | new → accepted |
---|
comment:2 Changed on 07/07/15 at 07:38:02 by Sebastian Huber
Changed on 07/07/15 at 07:55:59 by Sebastian Huber
Attachment: | libgomp-parallel-bench-posix-malloc.png added |
---|
comment:3 Changed on 07/07/15 at 07:59:33 by Sebastian Huber
The microbenchmark posted here
https://gcc.gnu.org/ml/gcc-patches/2008-03/msg00930.html
shows a significant overhead due to malloc/free in the team create/destroy path.
The next step is to get fix this in upstream libgomp.
comment:4 Changed on 07/08/15 at 08:30:40 by Sebastian Huber
The results of the microbenchmark obtained on a T4240 with using only two processors (unmodified GCC 4.9.3, RTEMS 8406d94283cc704df2c0d8aa017310e3e4ad0919):
barrier bench 20.6147 seconds
parallel bench 16.8791 seconds
static bench 0.852061 seconds
dynamic bench 0.292199 seconds
Changed on 07/16/15 at 08:14:43 by Sebastian Huber
Attachment: | libgomp-parallel-bench-posix-no-malloc.png added |
---|
comment:5 Changed on 07/16/15 at 08:32:03 by Sebastian Huber
The malloc() problem is solved in the GCC 6.0:
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=225811
Microbench (2 processor T4240):
barrier bench 23.3409 seconds
parallel bench 9.60804 seconds
static bench 0.472419 seconds
dynamic bench 0.223881 seconds
guided bench 0.00999273 seconds
runtime bench 0.229282 seconds
single bench 2.18316 seconds
Microbench (24 processor T4240):
barrier bench 783.888 seconds
parallel bench 115.901 seconds
static bench 5.7876 seconds
dynamic bench 0.262251 seconds
guided bench 0.0133215 seconds
runtime bench 0.261378 seconds
single bench 57.3227 seconds
There is a significant overhead due to the creation/destruction of POSIX mutexes and semaphores. In particular there is high contention on the allocator lock. The next step is to provide self-contained objects defined in Newlib <sys/lock.h> which can be used to implement the libgomp primitives and avoid the creation/destruction overhead. In addition a spin based barrier implementation based on the Linux futex barrier will be provided.
Changed on 07/23/15 at 07:44:54 by Sebastian Huber
Attachment: | libgomp-parallel-bench-sys-lock.png added |
---|
comment:6 Changed on 07/23/15 at 07:59:54 by Sebastian Huber
Performance with self-contained objects defined in Newlib <sys/lock.h>. The barrier implementation is virtually identical to the Linux futex barrier present in libgomp.
Microbench (2 processor T4240):
barrier bench 0.387543 seconds
parallel bench 0.258221 seconds
static bench 0.0215772 seconds
dynamic bench 0.224599 seconds
guided bench 0.00639818 seconds
runtime bench 0.229863 seconds
single bench 0.0711802 seconds
Microbench (24 processor T4240):
barrier bench 5.74687 seconds
parallel bench 2.38893 seconds
static bench 0.118236 seconds
dynamic bench 0.2516 seconds
guided bench 0.00146854 seconds
runtime bench 0.250789 seconds
single bench 0.543456 seconds
This is a major improvement compared to the previous versions. In the parallel bench profile, the only operating system function is _Futex_Wake() with 13% processor utilization. This is all right, since barrier operations are heavily used in this test case.
Changed on 07/23/15 at 08:01:05 by Sebastian Huber
comment:7 Changed on 07/31/15 at 05:28:42 by Sebastian Huber
The relevant Newlib patches are checked in:
The relevant RTEMS patches are checked in:
https://git.rtems.org/rtems/commit/?id=9e9e61d27d146e2ca83d5b0f590683a3f605c3f1
The relevant GCC patches are sent for review:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01837.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02347.html
comment:8 Changed on 09/04/15 at 11:45:56 by Sebastian Huber <sebastian.huber@…>
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
comment:9 Changed on 10/10/17 at 05:58:26 by Sebastian Huber
Component: | GCC → tool/gcc |
---|
OpenMP using the POSIX configuration is available in GCC 4.9.3 and later. It is enabled in the RSB for RTEMS 4.11 on ARM, PowerPC and SPARC.