wiki:GSoC/2013/ParavirtualizationOfRTEMS

Version 28 (modified by Phipse, on May 4, 2013 at 4:15:48 PM) (diff)

/* libcpu/score split */ list of privileged functions in score/cpu/i386

GSOC 2013 - Paravirtualization of RTEMS

The goal is to run RTEMS virtual on POK inside a software partition.

The Proposal will be open for everyone, after the application deadline (May 3, 2013).

Partitioned OS Kernel - POK

This paper explains POK in detail. <ref>J. Delange and Laurent Lec. POK, an ARINC653-compliant operating system released under the BSD license. In - 13th Real-Time Linux Workshop. http://julien.gunnm.org/data/publications/article-dl11-osadl11.pdf</ref>

Architecture Analysis and Design Language

AADL is used in POK to configure and specify the systems architecture. The model must specify the size of the memory and the time slice and the communication ports of the partition. If a communication port is not defined in the model, it will cause an exception at run time, if the application tries to access it. If a fault occurs the kernel will call a handler function inside the partition, causing the fault.

As explained in the OSADL11 paper, section 5.2 and 5.3, there are several keywords for AADL. They divide into two categories:

Kernel and partition specification

  • processor
  • virtual processor
  • process
    • feature
  • memory

Behavior code

  • thread
  • data
  • subprogram

Services

  • Time Management -> provides time related functions to partitions
  • Fault Handling -> catches errors and calls handler of faulting partiton
  • Inter-partiton communication -> explicitly defined during configuration, kernel supervised

RTEMS

Target Architectures

  • x86 (proof of concept)
  • Sparc
  • ARM

Paravirtualization layer

The interface between RTEMS and the hypervisor / host OS is provided by a library. Central to the library is a header file defining all necessary functions, e.g. to connect to an IRQ source. The host has to implement the function specified in the header file and compile a library, which is passed to RTEMS. At RTEMS link-time the library is included and all remaining undefined references are resolved.

Function list

  • requestIRQ
  • detachIRQ
  • enableInterrupts
  • disableInterrupts
  • flashInterrupts
  • getInterruptLevel
  • faultHandler -> is called by POK, when a fault occurrs
  • tbc

RTEMS startup as a guest

In L4RTEMS the wrapper task loads the RTEMS elf-binary, stores the initial IP and SP in the vCPU registers and starts the vCPU. Therefore the binary is passed as a command line argument.

We need to keep in mind, that issue should NOT lead to hacks in the virtualization layer of RTEMS. It is more of a hypervisor/host issue and needs to be resolved in this scope and this scope only.

How can we start RTEMS on POK? The closest abstraction to the L4-vCPU I have found in POK is the software partition itself. However, if we can make it, we need to change the partitions SP,IP and EAX register, while running on it. This looks like a hack and not like a careful design.

But there are other options. One I can think of is:

  • compiling the library in POK.
  • provide the lib to RTEMS and compile the application.
  • pass the binary back to POK and combine kernel and binary.

This approach was partly used in the GSoC 2012 project.

Another approach would be :

  • compile RTEMS and do a first linking run
  • compile POK and pass the partly linked RTEMS file along
    • The POK starter function would do the env setup and then call the RTEMS start() (0x10000c)

Both approaches might to consider this:

Thanks to DrJoel?: "If it is "ld -r", then it may still need a real linking with linkcmds to be at a known address range to insert into the Pok link."

Compile POK partition in RTEMS

POK awaits ELF binaries to be included in the final linking stage. If we can provide RTEMS with enough information (read include files) to build up a valid partition binary, we maybe could set the entry point into the RTEMS binary and get POK to execute it, as it would start every other partition. As far as I can see, the ELF file compiled in the partX directories is taken and merged with the kernel binary. At run-time the kernel the loads the partition_size table and loads the ELF binaries into memory. I haven't come across checks, if the binary is a POK one.

I replaced the part1/part1.elf with the RTEMS hello.exe in the generated-code/cpu/Makefile and introduced a new Makefile target just invoking $(TARGET):

  1 export ARCH=x86
  2 export BSP=x86-qemu

  6 TARGET=$(shell pwd)/pok.elf
  7 #PARTITIONS= part1/part1.elf
  8 PARTITIONS= part1/hello.exe

 10 KERNEL=kernel/kernel.lo
 11 all: build-kernel partitions $(TARGET)
 12 
 13 last: $(TARGET)

</code>

Invoking with ''make last'' produces the expected result: The size.c file contains the size of hello.exe and nm partitions.bin shows the RTEMS symbols. 

However, POK fails at start up, I didn't figure out where exactly, yet, but it looks like an issue while loading the binary.
The pok_boot() function runs through and the execution crashes with SIGQUIT at the first asm instruction in pok_dispatch_space.
=  Virtual CPU Issue  =

==  libcpu/score split  ==

== = Structure  ===

The CPU dependent code is split up in virtualization sensitive and unsensitive parts. 
The unsensitive parts go in ''cpukit/score/cpu/${arch}/'' the sensitive parts go into ''c/src/lib/libcpu/${arch}/${arch}virt/''. 

The CPU is selected through the BSP, hence additional virtual BSPs of the form ''${bsp_name}virt'' are introduced.

Therefore no changes to the configuration scripts besides the additional BSP names are necessary.
The target names stay the same.

In the end there is one virtual CPU model and one BSP per virtualized architecture.
== = Configuration  ===

The only change to the RTEMS configuration scripts, will be additional names for the ''--enable-rtemsbsp='' option.
== = Questionable parts  ===

All files are below /cpukit/score/cpu/i386/. 
This lists the name, the file its in and the instruction(s):
{| class="wikitable" border="1"
|-
! Name
! File
! Instruction
! Description
|-
| _CPU_ISR_Set_level
| rtems/score/cpu.h
| cli, sti
|
|-
| _CPU_Fatal_halt
| rtems/score/cpu.h
| hlt
|
|-
| _CPU_Thread_Idle_body
| cpu.c
| hlt
|
|-
| CPU_EFLAGS_INTERRUPTS_ON/_OFF 
| rtems/score/cpu.h 
| 
| 
|-
| interrupt.h
| rtems/score/interrupt.h
| 
| Critical.
|}
==  Collective directory ''virt''  ==

== = Structure  ===

To prevent cluttering the BSP and CPU directories with additional virtual CPU models, a collective directory is added.

 *  ''c/src/lib/libbsp/virt/<arch>/<bsp_name>''
 *  ''cpukit/score/cpu/virt/<arch>''

The behaviour inside these directories is the same, as without virtualization.
The names for CPU and BSP stay the same.

The code necessary for the virtualization is shared among the BSPs and CPUs and goes into:
 *  ''c/src/lib/libbsp/virt/shared''
 *  ''cpukit/score/cpu/virt/shared''

The Makefiles have to cover these directories.
== = Configuration  ===

To configure RTEMS for virtual execution of the binary, a new flag is introduced. 

 *  ''--enable-virt:''

It tells autoconf to assume a different directory structure.
The other configuration parameter, which are deduce from ''--target'' and ''--enable-rtemsbsp'', are not touched.

==  Introduce new target  ==


I used this approach to bring RTEMS on L4Re. 
I will explain it with the aid of this implementation.
The architecture in use is x86 and I used the i386 CPU and BSP directory as a starting point.

[https://github.com/phipse/L4RTEMS L4RTEMS source code]
== = Structure  ===


A new target called ''l4vcpu'' was introduced and the corresponding directories:
 *  c/src/lib/libbsp/l4vcpu/
 *  cpukit/score/cpu/l4vcpu/
were added.

These directories are copies of the i386 directories and only code that produced visible faults was touched and changed. 
To provide a point where data can be shared a so called ''sharedVariableStruct'' was defined, which accommodates e.g. a pointer to the vcpu-structure and a pointer to the l4re_env (L4Re environment).
This is passed to RTEMS at startup in a register, e.g. like the multiboot information, and is saved before anything else is executed.

The BSP startup was boiled down, as hardware initialization isn't necessary.
Also some privileged instructions are skipped. 
It's still work in progress.
== = Configuration  ===

Also some configuration files were adapted, see the doc file in the source code. 

To configure RTEMS ''l4vcpu-rtems4.11'' must be used as a target and ''pc386'' as BSP. 
== = Compilation & Start up  ===


RTEMS compiles and links without errors.
The resulting ELF binary, e.g. hello.exe, is passed on to L4Re as a command line argument.
It is loaded into the applications address space and the vcpu is supplied with EIP and ESP.
=  ARINC 653  =


The ARINC 653 standard defines "a software specification for space and time partitioning in Safety-critical avionics Real-time operating systems".<ref>https://en.wikipedia.org/wiki/ARINC_653</ref> 
These specifications are enforced by an additional layer called APEX (APplication EXecutive).

As POK is ARINC compliant and RTEMS is not, a paravirtualized RTEMS on top of POK would be a way to achieve compliance.
To make use of this compliance, RTEMS needs to be able to communicate with other partitions on POK by using ''intra-partition communication''.


=  GSoC 2012 Project  =

Source code: [https://github.com/jolkaczad/rtems_hyper by Wiktor Langowski ]

The project used syscalls to access POK resources out of RTEMS.
To get the code together the RTEMS binary is compiled - what fails.
The generated .ralf file is the added to POK by rewriting the partition.bin file and by fixing the size section in the POK binary.

The code uses a hack:
By naming a function ''bsp_start'' in POK and in RTEMS the function is somehow executed twice, at least it looks like it from the output.
From my point of view that's not an approach, destined to be reused.

=  RTEMS_ENABLE_HYPERVISOR  =


With this guard, every CPU dependent code is wrapped.
In case it's false, the normal score code is used.
In case it's true, no CPU code of score is passed to the preprocessor.
In this case the POK BSP will provide the functionality.
At the moment, the CPU code in the BSP is partly real i386 code, partly just copied stubs from no_cpu.
=  References  =

<references/>

=  Misc  =

''Under construction''

''' Virtual CPU state/model '''

In a virtual environment the CPU is shared among several virtual machines.
There are privileged instructions (e.g CLI/STI) which would allow the VM to prevent the hypervisor to switch execution to another VM.
Besides that there are instructions altering the CPU state without notice of the hypervisor or another VM. 
To prevent these disturbances it is common practice to provide each VM with an own virtual CPU implemented in software.
So the VM can for example only disable interrupts on it's virtual CPU.
This state change is persistent, but only on the virtual CPU model and isn't written out to the hardware CPU. 
This ensures the separation of all VMs in the system. 
Additionally, the hypervisor can inspect the virtual CPU state and alter it in case of errors.  

'''How is it in POK?'''

I haven't found a pleasing alternative for the virtual CPU yet. 

'''Paravirtualization'''

Is the method to run the guest system on the host system, while modifying the guest source code to access host system functionality directly. 

'''Paravirtualization layer'''

Set of functions headers, providing an defined interface to RTEMS as well as to the host system. 
RTEMS will call these functions instead of the hardware.
The host needs to provide enough source code to the guest to implement these function, even if they emit a call to a host function.
Hopefully, the compiler is optimizing this additional function call.

Alternatively, the host provides a library to be included at link-time, to resolve all missing references.