#3741 closed defect (fixed)

libdl loading ELF objects from libbsd NFS file system ends in a deadlock

Reported by: dufault Owned by: Chris Johns
Priority: normal Milestone: 5.1
Component: lib/dl Version: 5
Severity: normal Keywords: run-time-loader libdl
Cc: Blocked By:
Blocking:

Description

For ELF files the run-time loader calls this chain:

  • rtems_rtl_elf_file_load()
  • rtems_rtl_alloc_lock()
  • rtems_rtl_alloc_heap()
  • _RTEMS_Lock_allocator()

_RTEMS_Lock_allocator() locks all heap operations. RTL then calls read() and for NFS file systems the NFS threads try to use the heap, locking up the system.

Change History (7)

comment:1 Changed on Apr 28, 2019 at 1:44:02 AM by Chris Johns

Milestone: 5.1
Owner: set to Chris Johns
Status: newaccepted

comment:2 Changed on Apr 30, 2019 at 7:25:22 AM by Chris Johns

The default RTL allocator (heap) uses the file system while holding the heap allocator lock and if the file system uses the heap we end up in a deadlock. LibBSD's NFS implementation uses the heap.

The allocator lock/unlock logic was added when the trampoline changes were added. Trampolines provide large memory support and is documented in the User Manual (https://docs.rtems.org/branches/master/user/exe/loader.html#large-memory). The trampoline table needs to be close to the code using it or it may be out of range. If something allocates a large piece of memory between allocating the text memory and the trampoline table the relocation to the trampoline will be out of range.

There are two unknowns until the memory is allocated and code is loaded. The first is the location in memory and the second is the instruction the relocation record is pointing too. The location lets us determine the distance from the instruction to the target address and if it is in range and the instruction lets the arch back-end know determine the range the instruction has.

The ELF loader can be split into a finer set of stages to do as much processing before allocating memory however as stated above the two unknowns need to be resolved.

The most robust solution is to add code to build a table of relocation records that could require trampolines. While the allocator lock is being held:

  1. Lock the allocator
  2. Allocate the sections
  3. Locate the symbols
  4. Determine the number of trampolines using the cached relocation records
  5. Allocate the trampoline table
  6. Unlock the allocator

The number of trampolines can be determined using the relocation table data held in memory. The relocation table could be implemented in a similar way the unresolved externals is done with a common pool of blocks that grows and shrinks based on demand. A block allocation should aid the heap with fragmentation.

comment:3 Changed on May 2, 2019 at 1:45:27 AM by Chris Johns

Summary: libdl load of ELF objects on NFS file system lock uplibdl loadinf ELF objects from libbsd NFS file system ends in a deadlock

I have:

  1. Implemented caching any reloc record that could result in a trampoline using the unresolved table. This table is suitable for the purpose. I have not refactored the unresolved code to relabel the code to be common to unresolved and trampolines at this point in time, I have simply added the trampoline code to the unresolved code with a separate RTL internal header to isolate the interface.
  2. Increased the block size for the unresolved table from 64 to 256 and a block is allocated when in the resolved table is opened and one block is always held. It is not clear to me how many tramp reloc records a large object file or an incrementally link object file could have.
  3. Added per object file trampoline stats to help track what is happening.

The output for the PowerPC psim BSP for test dl09 with the test hacked to show the trampoline stats is:

RTL List:
 /dl09-o1.o
  trampolines:
      slots     : 3
      size      : 48
      slot size : 16
      used      : 1
      relocs    : 0
      unresolved: 3
      yield     : 33%
 /dl09-o2.o
  trampolines:
      slots     : 7
      size      : 112
      slot size : 16
      used      : 7
      relocs    : 6
      unresolved: 1
      yield     : 100%
 /dl09-o5.o
  trampolines:
      slots     : 6
      size      : 96
      slot size : 16
      used      : 6
      relocs    : 6
      unresolved: 0
      yield     : 100%
 /dl09-o3.o
  trampolines:
      slots     : 37
      size      : 592
      slot size : 16
      used      : 13
      relocs    : 12
      unresolved: 25
      yield     : 35%
 /dl09-o4.o
  trampolines:
      slots     : 8
      size      : 128
      slot size : 16
      used      : 8
      relocs    : 7
      unresolved: 1
      yield     : 100%

The dl09 test is a bit special because it does a large heap allocation between the loading of each object modules so the yields are close to 100%. The yield indicates how successful the slot usage is. There is an element of estimation in the allocation size of the table.

Note, /dl09-o3.o exposes a new issue related to the way the unresolved external trampoline slots are managed. This file has a yield of 35% because there are 25 unresolved relocation records. Most of these turn out to be small data (SDATA) relocations and do not need trampolines because they are in the small data section which is limited in size and these are for data which can be jumped too.

If we look at test dl08 we get:

RTL List:
 /dl08-o1.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 1
      yield     : 0%
 /libdl08_1.a:dl08-o2.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 1
      yield     : 0%
 /libdl08_2.a:dl08-o3.o
  trampolines:
      slots     : 25
      size      : 400
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 25
      yield     : 0%
 /libdl08_1.a:dl08-o4.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 1
      yield     : 0%
 /libdl08_2.a:dl08-o6-123456789-123456789.o
  trampolines: no slots allocated
 /libdl08_2.a:dl08-o5.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 1
      yield     : 0%

Notice we have a yield of 0% because no trampolines are needed and everything is within range. Again we have the same issue noted above with /libdl08_2.a:dl08-o3.o.

comment:4 Changed on May 2, 2019 at 1:47:14 AM by Chris Johns

Summary: libdl loadinf ELF objects from libbsd NFS file system ends in a deadlocklibdl loading ELF objects from libbsd NFS file system ends in a deadlock

comment:5 Changed on May 3, 2019 at 12:12:02 AM by Chris Johns

I have worked on the way the relocs are parsed and now have dl09 on the psim and xilinx_zynq_a9_qemu BSPs reporting trampoline usage of:

RTL List:
 /dl09-o1.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 1
      relocs    : 1
      unresolved: 0
      yield     : 100%
 /dl09-o2.o
  trampolines:
      slots     : 7
      size      : 112
      slot size : 16
      used      : 7
      relocs    : 7
      unresolved: 0
      yield     : 100%
 /dl09-o5.o
  trampolines:
      slots     : 6
      size      : 96
      slot size : 16
      used      : 6
      relocs    : 6
      unresolved: 0
      yield     : 100%
 /dl09-o3.o
  trampolines:
      slots     : 13
      size      : 208
      slot size : 16
      used      : 13
      relocs    : 13
      unresolved: 0
      yield     : 100%
 /dl09-o4.o
  trampolines:
      slots     : 8
      size      : 128
      slot size : 16
      used      : 8
      relocs    : 8
      unresolved: 0
      yield     : 100%

The dl08 results still have a 0% yield but the number of slots is lower:

RTL List:
 /dl08-o1.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 1
      unresolved: 0
      yield     : 0%
 /libdl08_1.a:dl08-o2.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 1
      unresolved: 0
      yield     : 0%
 /libdl08_2.a:dl08-o3.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 1
      unresolved: 0
      yield     : 0%
 /libdl08_1.a:dl08-o4.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 1
      unresolved: 0
      yield     : 0%
 /libdl08_2.a:dl08-o6-123456789-123456789.o
  trampolines:
      slots     : 0
      size      : 0
      slot size : 16
      used      : 0
      relocs    : 0
      unresolved: 0
      yield     : 0%
 /libdl08_2.a:dl08-o5.o
  trampolines:
      slots     : 1
      size      : 16
      slot size : 16
      used      : 0
      relocs    : 1
      unresolved: 0
      yield     : 0%
Last edited on May 3, 2019 at 12:12:45 AM by Chris Johns (previous) (diff)

comment:6 Changed on May 3, 2019 at 12:46:02 AM by Chris Johns

I have posted a reasonable solution to the bug but I think there is a better solution. The current trampoline processing is reloc record based and it should target symbol based. A trampoline to a target symbol will be the same set of instructions and could be shared by relocations. This would drop the trampoline slot count.

A change would move away from a trampoline cache of each record to a cache of target symbols referenced by reloc records. The tramp check would see which symbols are in range and which are out of range to determine the number of slots to allocate. A relocation record that is not the full address range would still need a slot if the symbol is unresolved. The symbol cache entry would reference count reloc records referencing it and be deleted once there are no more references. The symbol cache entry would hold the trampoline slot number once allocated for use by other relocation records.

Note, the trampoline code is transparent to the execution of the object code and only has the target symbol address in it therefore it can be shared by more than one relocation record.

The unresolved table code may need to be split as the symbol record may grow the size of the record union and effect the memory footprint for unresolved symbols.

comment:7 Changed on May 4, 2019 at 2:22:01 AM by Chris Johns <chrisj@…>

Resolution: fixed
Status: acceptedclosed

In b36c5209/rtems:

libdl: Do not access the ELF file while the allocator is locked.

  • Load symbols before allocation.
  • Parse reloc records and place any reloc recs in a cache to use while the allocator is locked.
  • Relocate symbols after section allocation.
  • Split section loading into allocation/locating and loading.
  • Update all arch back-ends with a new reloc interface to control tramp handling.
  • Add -a and -t to the object list shell command.

Closes #3741

Note: See TracTickets for help on using tickets.