#4069 reopened defect

dl06 does not link on RISCV

Reported by: Chris Johns Owned by: Hesham Almatary <Hesham.Almatary@…>
Priority: normal Milestone: 6.1
Component: tool Version: 6
Severity: normal Keywords:
Cc: Blocked By:
Blocking:

Description

rtems-ld -r /home/jiri/src/rtems/riscvmp/riscv-rtems6/c/griscv \
  -C riscv-rtems6-gcc -c "-march=rv32imafd -mabi=ilp32d" \
  -O rap -b dl06.pre -e rtems_main -s \
  -o dl06.rap dl06-o1.o dl06-o2.o -lm
error: rap::object: Section index '0' not found: dl06-o1.o
Makefile:8528: recipe for target 'dl06.rap' failed
make[5]: *** [dl06.rap] Error 10

Attachments (5)

rld-rap.patch (463 bytes) - added by Jiri Gaisler on Sep 6, 2020 at 6:44:04 PM.
disable-dl-for-riscv.patch (597 bytes) - added by Jiri Gaisler on Sep 8, 2020 at 7:33:22 AM.
Disable building dl tests for RISC-V
0001-libdl-riscv-Support-misaligned-read-and-write-access.patch (8.6 KB) - added by Chris Johns on Sep 10, 2020 at 5:07:34 AM.
libdl/risc: Support misaligned acesses for updates to the loaded code
riscv-rtl-empty-symbols.diff (4.3 KB) - added by Chris Johns on Sep 10, 2020 at 11:28:09 PM.
Apply on top of the misalignment patch. Teach the RTL to handle empty symbol names.
v2-0001-libdl-riscv-Fix-RISCV-issues-with-libdl-tests.patch (12.9 KB) - added by Chris Johns on Sep 11, 2020 at 8:24:12 AM.
Tests dl01-dl04,dl06-dl09 pass with this patch on the SIS. Hesham, please test, review and push if you think it is OK.

Download all attachments as: .zip

Change History (25)

Changed on Sep 6, 2020 at 6:44:04 PM by Jiri Gaisler

Attachment: rld-rap.patch added

comment:1 Changed on Sep 6, 2020 at 6:46:49 PM by Jiri Gaisler

By re-applying Hesham's earlier patch (above), dl06 builds again. The program fails however on griscv bsp:

* BEGIN OF TEST libdl (RTL) 6 *
* TEST VERSION: 6.0.0.aedd92d1477df0025821b77c06b2f2b2dc7aaf67
* TEST STATE: EXPECTED_PASS
* TEST BUILD: RTEMS_NETWORKING RTEMS_POSIX_API RTEMS_SMP
* TEST TOOLS: 10.2.1 20200904 (RTEMS 6, RSB 47f32b8b1a597b5ed3475722bdc155249ef51115, Newlib a0d7982)

load: /dl06.rap
dlopen failed: offset past end of file: offset=11716 size=11716

* FATAL *
fatal source: 5 (RTEMS_FATAL_SOURCE_EXIT)
fatal code: 0 (0x00000000)
RTEMS version: 6.0.0.aedd92d1477df0025821b77c06b2f2b2dc7aaf67
RTEMS tools: 10.2.1 20200904 (RTEMS 6, RSB 47f32b8b1a597b5ed3475722bdc155249ef51115, Newlib a0d7982)
executing thread ID: 0x08a010001
executing thread name: UI1

I guess this is not the intended behavior ...?

comment:2 Changed on Sep 6, 2020 at 9:12:01 PM by Hesham Almatary <Hesham.Almatary@…>

Owner: set to Hesham Almatary <Hesham.Almatary@…>
Resolution: fixed
Status: newclosed

In 764ea578/rtems:

htif_console_handler is defined in htif.c

closes #4069.

comment:3 Changed on Sep 7, 2020 at 2:00:52 PM by Hesham Almatary

Resolution: fixed
Status: closedreopened

The HTIF fix doesn't fix this error. The new run-time error (after Jiri applied my patch) is probably because of:

1) Some relocations not being supported in libdl/RISC-V implementation (e.g., RELAX)
2) I haven't tested/supported RAP format when I ported libdl for RISC-V.

comment:4 Changed on Sep 8, 2020 at 7:26:07 AM by Chris Johns

If Jiri's patch builds dl06 then it can be pushed. The patch looks reasonable.

comment:5 Changed on Sep 8, 2020 at 7:32:41 AM by Jiri Gaisler

The patch fixes the build of dl06, but several of the dlxx test then will fail during execution. If we push this patch, the the dl test must be marked as expected-fail for RISC-V. I would think that disabling building of dl tests is cleaner, see new suggested patch.

Changed on Sep 8, 2020 at 7:33:22 AM by Jiri Gaisler

Attachment: disable-dl-for-riscv.patch added

Disable building dl tests for RISC-V

comment:6 in reply to:  5 Changed on Sep 9, 2020 at 2:11:01 AM by Chris Johns

Replying to Jiri Gaisler:

The patch fixes the build of dl06, but several of the dlxx test then will fail during execution. If we push this patch, the the dl test must be marked as expected-fail for RISC-V.

Tagging the tests with expected-fail is my preferred approach.

I would think that disabling building of dl tests is cleaner, see new suggested patch.

I do not support doing this. Disabling or excluding tests because they fail hides the failures and does not encourage fixes. Why should RISCV be treated as special? I feel disabling tests sends the wrong message.

I assume they passed once so regressions like this happen. I suspect if you wind back the tool sets to a previous version the tests might past. Should tools upgrades be linked to regressions? I do not think that would be helpful.

comment:7 Changed on Sep 9, 2020 at 4:06:29 AM by Chris Johns

I built the tests and dl01 to dl04 pass and dl05 to dl09 fail. I tried to have a look at dl05 however gdb reported:

(gdb) target remote :1234
Remote debugging using :1234
Truncated register 52 in remote 'g' packet

then disconnected me from sis.

comment:8 in reply to:  7 ; Changed on Sep 9, 2020 at 7:35:50 AM by Jiri Gaisler

Replying to Chris Johns:

I built the tests and dl01 to dl04 pass and dl05 to dl09 fail. I tried to have a look at dl05 however gdb reported:

(gdb) target remote :1234
Remote debugging using :1234
Truncated register 52 in remote 'g' packet

then disconnected me from sis.

Make sure you use the extended-remote target, and riscv-rtems5-sis:

$ riscv-rtems5-sis -gdb
.
.

$ riscv-rtems6-gdb ./riscv-rtems6/c/griscv/testsuites/psxtests/psxfenv01.exe

$ (gdb) tar extended-remote localhost:1234
Remote debugging using localhost:1234
0x00000000 in ?? ()
(gdb) load
Loading section .start, size 0xa8 lma 0x40000000
Loading section .text, size 0x15138 lma 0x400000a8
Loading section .rodata, size 0x1c1d lma 0x400151e0
Loading section .eh_frame, size 0x68 lma 0x40016e00
Loading section .init_array, size 0x4 lma 0x40016e68
Loading section .fini_array, size 0x4 lma 0x40016e6c
Loading section .rtemsroset, size 0x74 lma 0x40016e70
Loading section .data, size 0x490 lma 0x40016ee8
Loading section .sdata, size 0xac lma 0x40017380
Start address 0x40000000, load size 95261
Transfer rate: 4429 KB/sec, 237 bytes/write.
(gdb) run

comment:9 in reply to:  8 ; Changed on Sep 10, 2020 at 1:50:07 AM by Chris Johns

Replying to Jiri Gaisler:

Replying to Chris Johns:

I built the tests and dl01 to dl04 pass and dl05 to dl09 fail. I tried to have a look at dl05 however gdb reported:

(gdb) target remote :1234
Remote debugging using :1234
Truncated register 52 in remote 'g' packet

then disconnected me from sis.

Make sure you use the extended-remote target, and riscv-rtems5-sis:

$ riscv-rtems5-sis -gdb
.

I built it using the devel/sis RSB package and I only got an sis executable installed.

$ ../source-builder/sb-set-builder --list-bsets | grep sis
    devel/sis.bset
$ ls /opt/work/rtems/6/bin/ | grep sis
sis
sparc-rtems6-sis

I checked the build log and there is no riscv-rtems[56]-sis installed. I see a sparc-rtems-sis installed. Where do the those come from?

I will push a change to the RSB to build 2.22.

$ riscv-rtems6-gdb ./riscv-rtems6/c/griscv/testsuites/psxtests/psxfenv01.exe

$ (gdb) tar extended-remote localhost:1234

I added -riscv to the sis command line and it works. It must have been defaulting to the sparc simulation.

Note, the -riscv option is spelt wrong in the usage help.

Last edited on Sep 10, 2020 at 1:52:07 AM by Chris Johns (previous) (diff)

comment:10 Changed on Sep 10, 2020 at 4:17:07 AM by Chris Johns

I have looked into the crash in dl05 and it is a misaligned access because the rela record has an offset of 7. You can see this here:

$ riscv-rtems6-readelf -r riscv-rtems6/c/griscv/testsuites/libtests/dl05-o5.o
  [snip]

Relocation section '.rela.gcc_except_table.exception_dl' at offset 0x8b98 contains 58 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00000007  00006423 R_RISCV_ADD32     0000002c   .LEHB0 + 0

  [snip]

This is a relocation record for exception table.

I have added reloc tracing to the RISCV backend the reloc that crashes is:

rtl: reloc base_rel(/dl05-o5.o): ADD32: where=0x4006cc67, *where=0xdedeeded, addend=0x0, base 0x4006cc60

The *where=0xdedeeded can be ignored, the value is invalid because the address was detected as odd.

I do not know the RISCV so does the write needs to be smarter or is the problem else where.

I am wondering if a number of other crashes are the same or a similar issue.

Last edited on Sep 10, 2020 at 4:18:17 AM by Chris Johns (previous) (diff)

Changed on Sep 10, 2020 at 5:07:34 AM by Chris Johns

libdl/risc: Support misaligned acesses for updates to the loaded code

comment:11 Changed on Sep 10, 2020 at 5:08:43 AM by Chris Johns

The patch lets things get further along however there are other issues with the libdl tests 5 thru 9. I have no more time available to look into this.

comment:12 in reply to:  9 ; Changed on Sep 10, 2020 at 7:21:02 AM by Jiri Gaisler

Replying to Chris Johns:

Replying to Jiri Gaisler:

Replying to Chris Johns:

I built the tests and dl01 to dl04 pass and dl05 to dl09 fail. I tried to have a look at dl05 however gdb reported:

(gdb) target remote :1234
Remote debugging using :1234
Truncated register 52 in remote 'g' packet

then disconnected me from sis.

Make sure you use the extended-remote target, and riscv-rtems5-sis:

$ riscv-rtems5-sis -gdb
.

I built it using the devel/sis RSB package and I only got an sis executable installed.

$ ../source-builder/sb-set-builder --list-bsets | grep sis
    devel/sis.bset
$ ls /opt/work/rtems/6/bin/ | grep sis
sis
sparc-rtems6-sis

I checked the build log and there is no riscv-rtems[56]-sis installed. I see a sparc-rtems-sis installed. Where do the those come from?

I will push a change to the RSB to build 2.22.

$ riscv-rtems6-gdb ./riscv-rtems6/c/griscv/testsuites/psxtests/psxfenv01.exe

$ (gdb) tar extended-remote localhost:1234

I added -riscv to the sis command line and it works. It must have been defaulting to the sparc simulation.

Note, the -riscv option is spelt wrong in the usage help.

I have updated RSB to also build sis for RISC-V. This was previously done only for rtems5, not rtems6.

comment:13 in reply to:  12 ; Changed on Sep 10, 2020 at 11:25:33 PM by Chris Johns

Replying to Jiri Gaisler:

I have updated RSB to also build sis for RISC-V. This was previously done only for rtems5, not rtems6.

Thanks. Does this install as riscv-rtems6-sis?

Changed on Sep 10, 2020 at 11:28:09 PM by Chris Johns

Apply on top of the misalignment patch. Teach the RTL to handle empty symbol names.

comment:14 in reply to:  13 ; Changed on Sep 11, 2020 at 6:49:07 AM by Jiri Gaisler

Replying to Chris Johns:

Replying to Jiri Gaisler:

I have updated RSB to also build sis for RISC-V. This was previously done only for rtems5, not rtems6.

Thanks. Does this install as riscv-rtems6-sis?

Yes, that is the whole idea with this patch... :-)

comment:15 in reply to:  14 Changed on Sep 11, 2020 at 7:06:23 AM by Chris Johns

Replying to Jiri Gaisler:

Yes, that is the whole idea with this patch... :-)

Being able to have the SIS available like this when a RISCV tool chain is built is fantastic. Thank you for your support and the tool.

comment:16 Changed on Sep 11, 2020 at 7:08:33 AM by Jiri Gaisler

The riscv-rtl-empty-symbols.diff​ patch gives me the following failures:

Failures:

dl01.exe
dl02.exe
dl05.exe
dl08.exe

User Input:

dl10.exe
monitor.exe
termios.exe
top.exe

dl01, dl02 and dl08 now times-out rather than fails....

comment:17 Changed on Sep 11, 2020 at 7:36:26 AM by Chris Johns

Yeah, something is wrong with the patch. I was seeing an unresolved symbol with a symbol name of "" being added and as you can imagine it is never resolved.

comment:18 Changed on Sep 11, 2020 at 8:07:21 AM by Chris Johns

I am starting to suspect the issue is the R_RISCV_RELAX reloc records and the call sites. The simplest approach is to ignore those relocs and just use 32bit destinations or implement the 21bit relative offset if the target is in range.

The RISCV relax approach is nicer to libdl than the ARM or PowerPC trampoline support.

Changed on Sep 11, 2020 at 8:24:12 AM by Chris Johns

Tests dl01-dl04,dl06-dl09 pass with this patch on the SIS. Hesham, please test, review and push if you think it is OK.

comment:19 Changed on Sep 11, 2020 at 8:26:15 AM by Chris Johns

dl05 has an issue in the unwind tables somewhere. An object's tables are added to the base tables so you can throw in a module and catch it elsewhere.
dl06 is a RAP thing which requires some work to sort out.

comment:20 Changed on Sep 15, 2020 at 7:17:20 AM by Hesham Almatary

Thanks, Chris for your fixes! I applied the last patch and tried to test rv64imafdc_medany on QEMU, but dl01 fails in run-time triggering an exception while loading the object. I'll debug that issue.

Last edited on Sep 15, 2020 at 7:17:42 AM by Hesham Almatary (previous) (diff)
Note: See TracTickets for help on using tickets.