Opened on 04/27/19 at 12:51:51
Closed on 05/04/19 at 02:22:01
#3741 closed defect (fixed)
libdl loading ELF objects from libbsd NFS file system ends in a deadlock
Reported by: | dufault | Owned by: | Chris Johns |
---|---|---|---|
Priority: | normal | Milestone: | 5.1 |
Component: | lib/dl | Version: | 5 |
Severity: | normal | Keywords: | run-time-loader libdl |
Cc: | Blocked By: | ||
Blocking: |
Description
For ELF files the run-time loader calls this chain:
- rtems_rtl_elf_file_load()
- rtems_rtl_alloc_lock()
- rtems_rtl_alloc_heap()
- _RTEMS_Lock_allocator()
_RTEMS_Lock_allocator() locks all heap operations. RTL then calls read() and for NFS file systems the NFS threads try to use the heap, locking up the system.
Change History (7)
comment:1 Changed on 04/28/19 at 01:44:02 by Chris Johns
Milestone: | → 5.1 |
---|---|
Owner: | set to Chris Johns |
Status: | new → accepted |
comment:2 Changed on 04/30/19 at 07:25:22 by Chris Johns
comment:3 Changed on 05/02/19 at 01:45:27 by Chris Johns
Summary: | libdl load of ELF objects on NFS file system lock up → libdl loadinf ELF objects from libbsd NFS file system ends in a deadlock |
---|
I have:
- Implemented caching any reloc record that could result in a trampoline using the unresolved table. This table is suitable for the purpose. I have not refactored the unresolved code to relabel the code to be common to unresolved and trampolines at this point in time, I have simply added the trampoline code to the unresolved code with a separate RTL internal header to isolate the interface.
- Increased the block size for the unresolved table from 64 to 256 and a block is allocated when in the resolved table is opened and one block is always held. It is not clear to me how many tramp reloc records a large object file or an incrementally link object file could have.
- Added per object file trampoline stats to help track what is happening.
The output for the PowerPC psim BSP for test dl09
with the test hacked to show the trampoline stats is:
RTL List: /dl09-o1.o trampolines: slots : 3 size : 48 slot size : 16 used : 1 relocs : 0 unresolved: 3 yield : 33% /dl09-o2.o trampolines: slots : 7 size : 112 slot size : 16 used : 7 relocs : 6 unresolved: 1 yield : 100% /dl09-o5.o trampolines: slots : 6 size : 96 slot size : 16 used : 6 relocs : 6 unresolved: 0 yield : 100% /dl09-o3.o trampolines: slots : 37 size : 592 slot size : 16 used : 13 relocs : 12 unresolved: 25 yield : 35% /dl09-o4.o trampolines: slots : 8 size : 128 slot size : 16 used : 8 relocs : 7 unresolved: 1 yield : 100%
The dl09
test is a bit special because it does a large heap allocation between the loading of each object modules so the yields are close to 100%. The yield indicates how successful the slot usage is. There is an element of estimation in the allocation size of the table.
Note, /dl09-o3.o
exposes a new issue related to the way the unresolved external trampoline slots are managed. This file has a yield of 35% because there are 25 unresolved relocation records. Most of these turn out to be small data (SDATA) relocations and do not need trampolines because they are in the small data section which is limited in size and these are for data which can be jumped too.
If we look at test dl08
we get:
RTL List: /dl08-o1.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 0 unresolved: 1 yield : 0% /libdl08_1.a:dl08-o2.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 0 unresolved: 1 yield : 0% /libdl08_2.a:dl08-o3.o trampolines: slots : 25 size : 400 slot size : 16 used : 0 relocs : 0 unresolved: 25 yield : 0% /libdl08_1.a:dl08-o4.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 0 unresolved: 1 yield : 0% /libdl08_2.a:dl08-o6-123456789-123456789.o trampolines: no slots allocated /libdl08_2.a:dl08-o5.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 0 unresolved: 1 yield : 0%
Notice we have a yield of 0% because no trampolines are needed and everything is within range. Again we have the same issue noted above with /libdl08_2.a:dl08-o3.o
.
comment:4 Changed on 05/02/19 at 01:47:14 by Chris Johns
Summary: | libdl loadinf ELF objects from libbsd NFS file system ends in a deadlock → libdl loading ELF objects from libbsd NFS file system ends in a deadlock |
---|
comment:5 Changed on 05/03/19 at 00:12:02 by Chris Johns
I have worked on the way the relocs are parsed and now have dl09
on the psim
and xilinx_zynq_a9_qemu
BSPs reporting trampoline usage of:
RTL List: /dl09-o1.o trampolines: slots : 1 size : 16 slot size : 16 used : 1 relocs : 1 unresolved: 0 yield : 100% /dl09-o2.o trampolines: slots : 7 size : 112 slot size : 16 used : 7 relocs : 7 unresolved: 0 yield : 100% /dl09-o5.o trampolines: slots : 6 size : 96 slot size : 16 used : 6 relocs : 6 unresolved: 0 yield : 100% /dl09-o3.o trampolines: slots : 13 size : 208 slot size : 16 used : 13 relocs : 13 unresolved: 0 yield : 100% /dl09-o4.o trampolines: slots : 8 size : 128 slot size : 16 used : 8 relocs : 8 unresolved: 0 yield : 100%
The dl08
results still have a 0% yield but the number of slots is lower:
RTL List: /dl08-o1.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 1 unresolved: 0 yield : 0% /libdl08_1.a:dl08-o2.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 1 unresolved: 0 yield : 0% /libdl08_2.a:dl08-o3.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 1 unresolved: 0 yield : 0% /libdl08_1.a:dl08-o4.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 1 unresolved: 0 yield : 0% /libdl08_2.a:dl08-o6-123456789-123456789.o trampolines: slots : 0 size : 0 slot size : 16 used : 0 relocs : 0 unresolved: 0 yield : 0% /libdl08_2.a:dl08-o5.o trampolines: slots : 1 size : 16 slot size : 16 used : 0 relocs : 1 unresolved: 0 yield : 0%
comment:6 Changed on 05/03/19 at 00:46:02 by Chris Johns
I have posted a reasonable solution to the bug but I think there is a better solution. The current trampoline processing is reloc record based and it should target symbol based. A trampoline to a target symbol will be the same set of instructions and could be shared by relocations. This would drop the trampoline slot count.
A change would move away from a trampoline cache of each record to a cache of target symbols referenced by reloc records. The tramp check would see which symbols are in range and which are out of range to determine the number of slots to allocate. A relocation record that is not the full address range would still need a slot if the symbol is unresolved. The symbol cache entry would reference count reloc records referencing it and be deleted once there are no more references. The symbol cache entry would hold the trampoline slot number once allocated for use by other relocation records.
Note, the trampoline code is transparent to the execution of the object code and only has the target symbol address in it therefore it can be shared by more than one relocation record.
The unresolved table code may need to be split as the symbol record may grow the size of the record union and effect the memory footprint for unresolved symbols.
comment:7 Changed on 05/04/19 at 02:22:01 by Chris Johns <chrisj@…>
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
In b36c5209/rtems:
The default RTL allocator (heap) uses the file system while holding the heap allocator lock and if the file system uses the heap we end up in a deadlock. LibBSD's NFS implementation uses the heap.
The allocator lock/unlock logic was added when the trampoline changes were added. Trampolines provide large memory support and is documented in the User Manual (https://docs.rtems.org/branches/master/user/exe/loader.html#large-memory). The trampoline table needs to be close to the code using it or it may be out of range. If something allocates a large piece of memory between allocating the text memory and the trampoline table the relocation to the trampoline will be out of range.
There are two unknowns until the memory is allocated and code is loaded. The first is the location in memory and the second is the instruction the relocation record is pointing too. The location lets us determine the distance from the instruction to the target address and if it is in range and the instruction lets the arch back-end know determine the range the instruction has.
The ELF loader can be split into a finer set of stages to do as much processing before allocating memory however as stated above the two unknowns need to be resolved.
The most robust solution is to add code to build a table of relocation records that could require trampolines. While the allocator lock is being held:
The number of trampolines can be determined using the relocation table data held in memory. The relocation table could be implemented in a similar way the unresolved externals is done with a common pool of blocks that grows and shrinks based on demand. A block allocation should aid the heap with fragmentation.