#2343 assigned defect

[Patch] Fix Zynq SMP boot

Reported by: Jonathan Brandmeyer Owned by: Needs Funding
Priority: high Milestone: Indefinite
Component: arch/arm Version: 4.10
Severity: normal Keywords: zynq, xilinx_zynq_zedboard
Cc: Blocked By:
Blocking:

Description

Booting the second processor core on the Xilinx Zynq requires either support from the bootloader or the RTEMS BSP. It would be nice if the standard U-Boot and Xilinx FSBL could be used without customization by supporting SMP out of the box in RTEMS.

This patch does two things: It communicates with CPU1 to direct it to execute RTEMS, entering at _start, and it allows access to the top-most segment of on-chip memory which contains CPU1's running code by modifying the MMU setup. It should also avoid interference with any current users' custom bootloaders.

See also Xilinx UG 585, section 6.1.10 for details.

Attachments (2)

bsp_smp_startup.patch (1.5 KB) - added by Jonathan Brandmeyer on May 8, 2015 at 10:33:32 PM.
Patch
smp.c (2.8 KB) - added by Chris Johns on May 11, 2015 at 7:34:04 AM.
Example of how a bootloader can start cpu1.

Download all attachments as: .zip

Change History (20)

Changed on May 8, 2015 at 10:33:32 PM by Jonathan Brandmeyer

Attachment: bsp_smp_startup.patch added

Patch

comment:1 Changed on May 8, 2015 at 10:35:06 PM by Jonathan Brandmeyer

Also of note: This has been tested on the Microzed 7010 hardware.

comment:2 Changed on May 9, 2015 at 10:45:31 AM by Sebastian Huber

Thanks for the patch. Can you please split it up into two parts and send them via git send-email to the devel@… list. Please add "Update #2342" to the commit message. Please don't change the coding style of the files and avoid / / comments. I think its better to use normal memory for the on-chip memory. Why do you want to make it a device memory?

comment:3 Changed on May 9, 2015 at 5:02:10 PM by Jonathan Brandmeyer

What it really needs is for the write to be observable by the other CPU. In order to make that happen, it needs to be device memory because of a confluence of issues:

  • RTEMS initializes the MMU and caches before starting up other CPUs.
  • The other CPU does not have its MMU enabled in its waiting-to-startup state.
  • Normal memory is cachable.
  • RTEMS does not support cache flush functions in ARMv7-A, only ARMv5TEJ (see c/src/lib/libcpu/arm/shared/include/cache_.h)

So, marking the memory as 'device' memory seemed like the safest way around those issues.

comment:4 Changed on May 9, 2015 at 5:09:15 PM by Jonathan Brandmeyer

As a compromise, I can craft a patch that

  • enables only the last 4k page as device memory
  • enables the remainder of the OCM devices as normal memory, sensitive to the BSP's definitions provided by ZYNQ_RAM_INT_{0,1}_ORIGIN and ZYNQ_RAM_INT_{0,1}_LENGTH.

comment:5 in reply to:  2 Changed on May 9, 2015 at 5:11:48 PM by Jonathan Brandmeyer

Replying to sebastian.huber:

Can you please split it up into two parts

The two changes are intimately related. Without the change to the memory map, the changes to CPU_SMP_start_processor will cause memory access violations on otherwise previously working boards. IMO, they should be applied as a single unit.

comment:6 in reply to:  4 Changed on May 9, 2015 at 6:53:37 PM by Jonathan Brandmeyer

Replying to jbrandmeyer:

As a compromise, I can craft a patch that

  • enables only the last 4k page as device memory
  • enables the remainder of the OCM devices as normal memory, sensitive to the BSP's definitions provided by ZYNQ_RAM_INT_{0,1}_ORIGIN and ZYNQ_RAM_INT_{0,1}_LENGTH.

Never mind. RTEMS's CP15 startup code only supports 1M pages right now. Using second-level page table entries would be a substantial change.

comment:7 Changed on May 9, 2015 at 8:32:46 PM by Sebastian Huber

Normal memory allows speculative reads, burst writes etc. The cache usage is a separate option.

comment:8 Changed on May 11, 2015 at 4:31:37 AM by Chris Johns

I am not sure about this change.

I do not understand the logic in _CPU_SMP_Start_processor where the CPU index is checked for 1 (cpu1 ?) and if cpu1 has not started it is kicked. Why would you do this for cpu1 or is the index numbered from 1 ( so cpu0 ?) ?

The change on partial handles a full SMP initialisation for example the SMP bit in the CP15 ACTLR register.

comment:9 in reply to:  3 Changed on May 11, 2015 at 4:39:05 AM by Chris Johns

Replying to jbrandmeyer:

What it really needs is for the write to be observable by the other CPU. In order to make that happen, it needs to be device memory because of a confluence of issues:

  • RTEMS initializes the MMU and caches before starting up other CPUs.
  • The other CPU does not have its MMU enabled in its waiting-to-startup state.
  • Normal memory is cachable.
  • RTEMS does not support cache flush functions in ARMv7-A, only ARMv5TEJ (see c/src/lib/libcpu/arm/shared/include/cache_.h)

So, marking the memory as 'device' memory seemed like the safest way around those issues.

From the TRM "Configuring OCM memory as device memory in the MMU or using narrow, non-modifiable accesses through the ACP port is not recommended.". I assume "the memory" being referred to is OCM memory.

Also the changes in the MMU should not be in an area that is overloaded by the MMU's weak symbol if it effects the operation of other code in the BSP.

comment:10 in reply to:  8 ; Changed on May 11, 2015 at 5:40:43 AM by Jonathan Brandmeyer

Replying to chrisj:

I am not sure about this change.

I do not understand the logic in _CPU_SMP_Start_processor where the CPU index is checked for 1 (cpu1 ?) and if cpu1 has not started it is kicked. Why would you do this for cpu1 or is the index numbered from 1 ( so cpu0 ?) ?

The change on partial handles a full SMP initialization for example the SMP bit in the CP15 ACTLR register.

Yes, the CPU's are numbered from zero. The boot sequence is as follows:
The Zynq ROM bootloader loads and executes a single file, itself a fragment of a proprietary-formatted boot image, typically the Xilinx First Stage Bootloader (FSBL).
The FSBL performs low-level hardware and FPGA fabric initialization and also loads an arbitrary ELF file (in this case, an RTEMS application) and starts CPU0 at the ELF's entry point. However, CPU1 sits waiting.
So when RTEMS starts its initialization, it is already running on CPU0. And the only thing that it can possibly do with regards to SMP initialization is to start CPU1.

As this BSP is specific to the Zynq, there can be only one more CPU. The change follows the logic from the Altera Cyclone V BSP, which is a direct competitor to the Zynq with a nearly identical hard processor subsystem.

With regards to the MMU change, the user who overrides the weak symbol is already responsible for correctness of the entire memory map. IMO, announcing the change on the mailing list is easier to manage than splitting the MMU initialization into two stages.

I have confirmed that simply marking the memory with the READ_WRITE permissions instead of as device memory is also sufficient to work around the cache problems.

Yet another alternative to this patch would be to tell users that in order to boot RTEMS in SMP mode, that they need to customize the Xilinx FSBL in a certain way. IMO, that is less satisfactory than doing The Right Thing with the default tools as provided by Xilinx.

comment:11 in reply to:  10 ; Changed on May 11, 2015 at 7:32:08 AM by Chris Johns

Replying to jbrandmeyer:

Replying to chrisj:

I am not sure about this change.

I do not understand the logic in _CPU_SMP_Start_processor where the CPU index is checked for 1 (cpu1 ?) and if cpu1 has not started it is kicked. Why would you do this for cpu1 or is the index numbered from 1 ( so cpu0 ?) ?

The change on partial handles a full SMP initialization for example the SMP bit in the CP15 ACTLR register.

Yes, the CPU's are numbered from zero. The boot sequence is as follows:
The Zynq ROM bootloader loads and executes a single file, itself a fragment of a proprietary-formatted boot image, typically the Xilinx First Stage Bootloader (FSBL).
The FSBL performs low-level hardware and FPGA fabric initialization and also loads an arbitrary ELF file (in this case, an RTEMS application) and starts CPU0 at the ELF's entry point. However, CPU1 sits waiting.
So when RTEMS starts its initialization, it is already running on CPU0. And the only thing that it can possibly do with regards to SMP initialization is to start CPU1.

I understand the sequence and all I am saying is the logic seems wrong. If the cpu is 0, ie ...

 if ( cpu_index == 0 ) {

... then starting cpu1 makes sense. Having cpu1 start itself does not make sense to me. Maybe I am missing something.

FYI my boot loader and uboot (not the I have looked at the Zynq specific parts) do this before entering RTEMS. I had discussed this with Sebastian a while ago.

As this BSP is specific to the Zynq, there can be only one more CPU. The change follows the logic from the Altera Cyclone V BSP, which is a direct competitor to the Zynq with a nearly identical hard processor subsystem.

Sure. I like the idea of RTEMS doing this. I avoids configuration issues in the bootloader ending up with a locked system.

With regards to the MMU change, the user who overrides the weak symbol is already responsible for correctness of the entire memory map.

There is a define with values that defines the base memory map and it is used by the weak symbol handlers.

IMO, announcing the change on the mailing list is easier to manage than splitting the MMU initialization into two stages.

I do not agree. A post to a devel list is not a suitable way to resolve the issue and I would rather something better be found.

I have confirmed that simply marking the memory with the READ_WRITE permissions instead of as device memory is also sufficient to work around the cache problems.

Excellent.

Yet another alternative to this patch would be to tell users that in order to boot RTEMS in SMP mode, that they need to customize the Xilinx FSBL in a certain way. IMO, that is less satisfactory than doing The Right Thing with the default tools as provided by Xilinx.

I use a customised FSBL to so I can boot RTEMS from a JFFS2 file system. I do not depend on Xilinx code or their SDK and I am happy working this way. My hope longer term is have umon be a better licensed replacement to the FSBL.

Changed on May 11, 2015 at 7:34:04 AM by Chris Johns

Attachment: smp.c added

Example of how a bootloader can start cpu1.

comment:12 in reply to:  11 ; Changed on May 11, 2015 at 7:36:35 AM by Sebastian Huber

Replying to chrisj:

Replying to jbrandmeyer:

Replying to chrisj:

I am not sure about this change.

I do not understand the logic in _CPU_SMP_Start_processor where the CPU index is checked for 1 (cpu1 ?) and if cpu1 has not started it is kicked. Why would you do this for cpu1 or is the index numbered from 1 ( so cpu0 ?) ?

The change on partial handles a full SMP initialization for example the SMP bit in the CP15 ACTLR register.

Yes, the CPU's are numbered from zero. The boot sequence is as follows:
The Zynq ROM bootloader loads and executes a single file, itself a fragment of a proprietary-formatted boot image, typically the Xilinx First Stage Bootloader (FSBL).
The FSBL performs low-level hardware and FPGA fabric initialization and also loads an arbitrary ELF file (in this case, an RTEMS application) and starts CPU0 at the ELF's entry point. However, CPU1 sits waiting.
So when RTEMS starts its initialization, it is already running on CPU0. And the only thing that it can possibly do with regards to SMP initialization is to start CPU1.

I understand the sequence and all I am saying is the logic seems wrong. If the cpu is 0, ie ...

 if ( cpu_index == 0 ) {

... then starting cpu1 makes sense. Having cpu1 start itself does not make sense to me. Maybe I am missing something.

From the documentation:

  /**
   * @brief Starts a processor specified by its index.
   *
   * This function is invoked on the boot processor during system
   * initialization.
   *
   * This function will be called after _CPU_SMP_Initialize().
   *
   * @param[in] cpu_index The processor index.
   *
   * @retval true Successful operation.
   * @retval false Unable to start this processor.
   */
  bool _CPU_SMP_Start_processor( uint32_t cpu_index );

comment:13 in reply to:  12 ; Changed on May 11, 2015 at 10:06:40 PM by Chris Johns

Replying to sebastian.huber:

Replying to chrisj:

Replying to jbrandmeyer:

Replying to chrisj:

I am not sure about this change.

I do not understand the logic in _CPU_SMP_Start_processor where the CPU index is checked for 1 (cpu1 ?) and if cpu1 has not started it is kicked. Why would you do this for cpu1 or is the index numbered from 1 ( so cpu0 ?) ?

The change on partial handles a full SMP initialization for example the SMP bit in the CP15 ACTLR register.

Yes, the CPU's are numbered from zero. The boot sequence is as follows:
The Zynq ROM bootloader loads and executes a single file, itself a fragment of a proprietary-formatted boot image, typically the Xilinx First Stage Bootloader (FSBL).
The FSBL performs low-level hardware and FPGA fabric initialization and also loads an arbitrary ELF file (in this case, an RTEMS application) and starts CPU0 at the ELF's entry point. However, CPU1 sits waiting.
So when RTEMS starts its initialization, it is already running on CPU0. And the only thing that it can possibly do with regards to SMP initialization is to start CPU1.

I understand the sequence and all I am saying is the logic seems wrong. If the cpu is 0, ie ...

 if ( cpu_index == 0 ) {

... then starting cpu1 makes sense. Having cpu1 start itself does not make sense to me. Maybe I am missing something.

From the documentation:

  /**
   * @brief Starts a processor specified by its index.
   *
   * This function is invoked on the boot processor during system
   * initialization.
   *
   * This function will be called after _CPU_SMP_Initialize().
   *
   * @param[in] cpu_index The processor index.
   *
   * @retval true Successful operation.
   * @retval false Unable to start this processor.
   */
  bool _CPU_SMP_Start_processor( uint32_t cpu_index );

Ah thanks. I had not checked and assumed this was the cpu starting. This make sense.

I think checking for '_start' means a bootloader that kicks the second CPU will fail as it will set a different value. I think the reset value is 0 so would it be better to check for that and and then set _start ? I am not sure if it will make any difference but it might be cleaner.

comment:14 in reply to:  13 ; Changed on May 12, 2015 at 2:36:57 PM by Jonathan Brandmeyer

Replying to chrisj:

I think checking for '_start' means a bootloader that kicks the second CPU will fail as it will set a different value. I think the reset value is 0 so would it be better to check for that and and then set _start ? I am not sure if it will make any difference but it might be cleaner.

The reset value is not zero. It is an address that points the ROM bootloader back to itself, so that if it gets an event before the address has been updated then it will just resume wfe again. Worse, the exact value at the kick address is not guaranteed (see below).

I also reviewed the FSBL support code you posted above and I have to say that it doesn't make much sense to me. The wfe loop that your code injects is the same as the loop that the ROM booloader is already running. Why would you inject a second one? The code would work just fine if you simply kicked off the second processor directly into RTEMS.

Apparently there is no rigorous way to check if RTEMS has already been kicked off in the presence of a FSBL that can alter anything it pleases. However, at this stage in the boot the OCM is "owned" by RTEMS, and I think that it can safely make the write to the kick-off address unconditionally.

For reference, the ROM bootloader is executing (gdb disassembly report):

0xffffff20: dsb sy
0xffffff24: wfe
0xffffff28: b 0xffffff20
0xffffff2c: dsb sy
0xffffff30: wfe

=> 0xffffff34: mvn r0, #15

0xffffff38: ldr lr, [r0]
0xffffff3c: cmn lr, #212 ; 0xd4
0xffffff40: beq 0xffffff2c
0xffffff44: mcr 15, 0, r0, cr7, cr5, {0}
0xffffff48: mcr 15, 0, r0, cr7, cr5, {6}
0xffffff4c: mcr 15, 0, r0, cr8, cr7, {0}
0xffffff50: mov r4, #0
0xffffff54: mcr 15, 0, r4, cr1, cr0, {0}
0xffffff58: bx lr

0xfffffff0 contains value 0xffffff2c

However, Xilinx does not guarantee that the kick address points exactly to 0xffff ff2c - that is not part of their API. In principle they could change it with a silicon revision or something.

comment:15 in reply to:  14 Changed on May 12, 2015 at 11:04:07 PM by Chris Johns

Replying to jbrandmeyer:

Replying to chrisj:

I think checking for '_start' means a bootloader that kicks the second CPU will fail as it will set a different value. I think the reset value is 0 so would it be better to check for that and and then set _start ? I am not sure if it will make any difference but it might be cleaner.

The reset value is not zero. It is an address that points the ROM bootloader back to itself, so that if it gets an event before the address has been updated then it will just resume wfe again. Worse, the exact value at the kick address is not guaranteed (see below).

What about kicking the core if the address is inside the BootROM address space ?

I also reviewed the FSBL support code you posted above and I have to say that it doesn't make much sense to me. The wfe loop that your code injects is the same as the loop that the ROM booloader is already running. Why would you inject a second one? The code would work just fine if you simply kicked off the second processor directly into RTEMS.

The processor is brought into the bootloader when instructed to and waits. It can support POST testing before running RTEMS.

Apparently there is no rigorous way to check if RTEMS has already been kicked off in the presence of a FSBL that can alter anything it pleases.

I thought RTEMS waited for both cores to be present before finally finishing the initialisation so if RTEMS does not touch any specific resources it should be ok.

However, at this stage in the boot the OCM is "owned" by RTEMS, and I think that it can safely make the write to the kick-off address unconditionally.

The OCM needs to have the first 3 64K blocks at the lower address range and the last one at the upper address range. The FSBL asm code assumes this with the various stacks set to the higher address range. This means you cannot change the OCM settings until RTEMS has moved the stacks, vector table and MMU table to the DDR memory.

Note, bit 4 of the OCM config register is set to 1 by the BootROM and it must remain set. The BootROM sets the OCM cfg register to 0x18.

For reference, the ROM bootloader is executing (gdb disassembly report):

0xffffff20: dsb sy
0xffffff24: wfe
0xffffff28: b 0xffffff20
0xffffff2c: dsb sy
0xffffff30: wfe

=> 0xffffff34: mvn r0, #15

0xffffff38: ldr lr, [r0]
0xffffff3c: cmn lr, #212 ; 0xd4
0xffffff40: beq 0xffffff2c
0xffffff44: mcr 15, 0, r0, cr7, cr5, {0}
0xffffff48: mcr 15, 0, r0, cr7, cr5, {6}
0xffffff4c: mcr 15, 0, r0, cr8, cr7, {0}
0xffffff50: mov r4, #0
0xffffff54: mcr 15, 0, r4, cr1, cr0, {0}
0xffffff58: bx lr

0xfffffff0 contains value 0xffffff2c

However, Xilinx does not guarantee that the kick address points exactly to 0xffff ff2c - that is not part of their API. In principle they could change it with a silicon revision or something.

Yes and I suppose we would have to manage that when the time comes using version numbers.

comment:16 Changed on Jan 26, 2017 at 7:16:00 AM by Sebastian Huber

Milestone: 4.11.14.11.2

comment:17 Changed on Feb 15, 2017 at 1:37:51 PM by Sebastian Huber

Milestone: 4.11.2Indefinite
Owner: set to Needs Funding
Status: newassigned

comment:18 Changed on Oct 10, 2017 at 6:22:17 AM by Sebastian Huber

Component: SMParch/arm
Note: See TracTickets for help on using tickets.