= GSOC 2013 - Paravirtualization of RTEMS = The goal is to run RTEMS virtual on POK inside a software partition. The [https://docs.google.com/document/d/10SWiyYg6WEeMdAeysyahbS6XK5QwyIMlUvAr4aFOyX4/edit Proposal] will be open for everyone, after the application deadline (May 3, 2013). Up to date source code can be found in the branches of my fork of the RTEMS/rtems repo on [https://github.com/phipse/rtems github]. = Partitioned OS Kernel - POK = This paper explains POK in detail. J. Delange and Laurent Lec. POK, an ARINC653-compliant operating system released under the BSD license. In - 13th Real-Time Linux Workshop. http://julien.gunnm.org/data/publications/article-dl11-osadl11.pdf = Architecture Analysis and Design Language = AADL is used in POK to configure and specify the systems architecture. The model must specify the size of the memory and the time slice and the communication ports of the partition. If a communication port is not defined in the model, it will cause an exception at run time, if the application tries to access it. If a fault occurs the kernel will call a handler function inside the partition, causing the fault. As explained in the OSADL11 paper, section 5.2 and 5.3, there are several keywords for AADL. They divide into two categories: '''Kernel and partition specification''' * processor * virtual processor * process * feature * memory '''Behavior code''' * thread * data * subprogram = Services = * Time Management -> provides time related functions to partitions * Fault Handling -> catches errors and calls handler of faulting partiton * Inter-partiton communication -> explicitly defined during configuration, kernel supervised = RTEMS = = Target Architectures = * x86 (proof of concept) * Sparc * ARM = Paravirtualization layer = The interface between RTEMS and the hypervisor / host OS is provided by a library. Central to the library is a header file defining all necessary functions, e.g. to connect to an IRQ source. The host has to implement the function specified in the header file and compile a library, which is passed to RTEMS. At RTEMS link-time the library is included and all remaining undefined references are resolved. = Function list = The listed functions describe the interface functions provided by the host system. The CPU-BSP separation is just to get a good overview. == CPU functions (i386) == * requestIRQ * detachIRQ * enableInterrupts * disableInterrupts * flashInterrupts * getInterruptLevel * idleThread * tbc == BSP functions (i386) == * '''faultHandler''' -> is called by POK, when a fault occurrs * console_init * console_write * console_read * clock_init * clock_read * timer_init * timer_set * timer_read * tbc = RTEMS startup as a guest = In L4RTEMS the wrapper task loads the RTEMS elf-binary, stores the initial IP and SP in the vCPU registers and starts the vCPU. Therefore the binary is passed as a command line argument. We need to keep in mind, that issue should NOT lead to hacks in the virtualization layer of RTEMS. It is more of a hypervisor/host issue and needs to be resolved in this scope and this scope only. How can we start RTEMS on POK? The closest abstraction to the L4-vCPU I have found in POK is the software partition itself. However, if we can make it, we need to change the partitions SP,IP and EAX register, while running on it. This looks like a hack and not like a careful design. But there are other options. One I can think of is: * compiling the library in POK. * provide the lib to RTEMS and compile the application. * pass the binary back to POK and combine kernel and binary. This approach was partly used in the GSoC 2012 project. Another approach would be : * compile RTEMS and do a first linking run * compile POK and pass the partly linked RTEMS file along * The POK starter function would do the env setup and then call the RTEMS start() (0x10000c) Both approaches might to consider this: Thanks to DrJoel: "If it is "ld -r", then it may still need a real linking with linkcmds to be at a known address range to insert into the Pok link." == Compile POK partitions == POK awaits ELF binaries to be included in the final linking stage. If we can provide RTEMS with enough information (read include files) to build up a valid partition binary, we maybe could set the entry point into the RTEMS binary and get POK to execute it, as it would start every other partition. As far as I can see, the ELF file compiled in the partX directories is taken and merged with the kernel binary. At run-time the kernel the loads the partition_size table and loads the ELF binaries into memory. I haven't come across checks, if the binary is a POK one. I replaced the part1/part1.elf with the RTEMS hello.exe in the ''generated-code/cpu/Makefile'' and introduced a new Makefile target just invoking $(TARGET): {{{ 1 export ARCH=x86 2 export BSP=x86-qemu 6 TARGET=$(shell pwd)/pok.elf 7 #PARTITIONS= part1/part1.elf 8 PARTITIONS= part1/hello.exe 10 KERNEL=kernel/kernel.lo 11 all: build-kernel partitions $(TARGET) 12 13 last: $(TARGET) Invoking with ''make last'' produces the expected result: The size.c file contains the size of hello.exe and nm partitions.bin shows the RTEMS symbols. However, POK fails at start up, I didn't figure out where exactly, yet, but it looks like an issue while loading the binary. The pok_boot() function runs through and the execution crashes with SIGQUIT at the first asm instruction in pok_dispatch_space. == Build process == The build process will be as follows: * Design the POK system via an AADL model. * Keep the size of the '''final''' binary, including RTEMS, in mind. * Build the POK container for the RTEMS code --> Library * Take the library and pass it to RTEMS at compile time. * Use the last years pok_rtems_combine script to add the final binary as partition. This is a clean approach on both sides. POK will be configured with the AADL model and the partition binary implements the POK side of the communication interface. As POK starts partitions by loading the ELF-binary and jumping in on the entry_ip specified in the ELF-header, RTEMS should start fine. On the RTEMS side the use of the virtualization layer functions works without issues, as the function implementations are passed via the library. = Virtual CPU Issue = == libcpu/score split == == = Structure === The CPU dependent code is split up in virtualization sensitive and unsensitive parts. The unsensitive parts go in ''cpukit/score/cpu/${arch}/'' the sensitive parts go into ''c/src/lib/libcpu/${arch}/${arch}virt/''. The CPU is selected through the BSP, hence additional virtual BSPs of the form ''${bsp_name}virt'' are introduced. Therefore no changes to the configuration scripts besides the additional BSP names are necessary. The target names stay the same. In the end there is one virtual CPU model and one BSP per virtualized architecture. == = Configuration === The only change to the RTEMS configuration scripts, will be additional names for the ''--enable-rtemsbsp='' option. == = Questionable parts === All files are below /cpukit/score/cpu/i386/ or c/src/lib/libcpu/i386/. This lists the name, the file its in and the instruction(s): {| class="wikitable" border="1" |- ! Name ! File ! Instruction ! Description |- | _CPU_ISR_Set_level | rtems/score/cpu.h | cli, sti | |- | _CPU_Fatal_halt | rtems/score/cpu.h | hlt | |- | _CPU_Thread_Idle_body | cpu.c | hlt | |- | CPU_EFLAGS_INTERRUPTS_ON/_OFF | rtems/score/cpu.h | | |- | interrupt.h | rtems/score/interrupt.h | | Critical. |- | rdtsc | libcpu: cpuModel.h | | No direct access possible. |} == Collective directory ''virt'' == == = Structure === To prevent cluttering the BSP and CPU directories with additional virtual CPU models, a collective directory is added. * ''c/src/lib/libbsp/virt//'' * ''cpukit/score/cpu/virt/'' The behaviour inside these directories is the same, as without virtualization. The names for CPU and BSP stay the same. The code necessary for the virtualization is shared among the BSPs and CPUs and goes into: * ''c/src/lib/libbsp/virt/shared'' * ''cpukit/score/cpu/virt/shared'' The Makefiles have to cover these directories. == = Configuration === To configure RTEMS for virtual execution of the binary, a new flag is introduced. * ''--enable-virt:'' It tells autoconf to assume a different directory structure. The other configuration parameter, which are deduce from ''--target'' and ''--enable-rtemsbsp'', are not touched. == Introduce new target == I used this approach to bring RTEMS on L4Re. I will explain it with the aid of this implementation. The architecture in use is x86 and I used the i386 CPU and BSP directory as a starting point. [https://github.com/phipse/L4RTEMS L4RTEMS source code] == = Structure === A new target called ''l4vcpu'' was introduced and the corresponding directories: * c/src/lib/libbsp/l4vcpu/ * cpukit/score/cpu/l4vcpu/ were added. These directories are copies of the i386 directories and only code that produced visible faults was touched and changed. To provide a point where data can be shared a so called ''sharedVariableStruct'' was defined, which accommodates e.g. a pointer to the vcpu-structure and a pointer to the l4re_env (L4Re environment). This is passed to RTEMS at startup in a register, e.g. like the multiboot information, and is saved before anything else is executed. The BSP startup was boiled down, as hardware initialization isn't necessary. Also some privileged instructions are skipped. It's still work in progress. == = Configuration === Also some configuration files were adapted, see the doc file in the source code. To configure RTEMS ''l4vcpu-rtems4.11'' must be used as a target and ''pc386'' as BSP. == = Compilation & Start up === RTEMS compiles and links without errors. The resulting ELF binary, e.g. hello.exe, is passed on to L4Re as a command line argument. It is loaded into the applications address space and the vcpu is supplied with EIP and ESP. = ARINC 653 = The ARINC 653 standard defines "a software specification for space and time partitioning in Safety-critical avionics Real-time operating systems".https://en.wikipedia.org/wiki/ARINC_653 These specifications are enforced by an additional layer called APEX (APplication EXecutive). As POK is ARINC compliant and RTEMS is not, a paravirtualized RTEMS on top of POK would be a way to achieve compliance. To make use of this compliance, RTEMS needs to be able to communicate with other partitions on POK by using ''intra-partition communication''. = GSoC 2012 Project = Source code: [https://github.com/jolkaczad/rtems_hyper by Wiktor Langowski ] The project used syscalls to access POK resources out of RTEMS. To get the code together the RTEMS binary is compiled - what fails. The generated .ralf file is the added to POK by rewriting the partition.bin file and by fixing the size section in the POK binary. The code uses a hack: By naming a function ''bsp_start'' in POK and in RTEMS the function is somehow executed twice, at least it looks like it from the output. From my point of view that's not an approach, destined to be reused. = RTEMS_ENABLE_HYPERVISOR = With this guard, every CPU dependent code is wrapped. In case it's false, the normal score code is used. In case it's true, no CPU code of score is passed to the preprocessor. In this case the POK BSP will provide the functionality. At the moment, the CPU code in the BSP is partly real i386 code, partly just copied stubs from no_cpu. = Interrupt handling = The whole interrupt control chain of RTEMS was abandoned and a own registering system established. There is an own array to register the handler and the POK communication code. = References = = Misc = ''Under construction'' ''' Virtual CPU state/model ''' In a virtual environment the CPU is shared among several virtual machines. There are privileged instructions (e.g CLI/STI) which would allow the VM to prevent the hypervisor to switch execution to another VM. Besides that there are instructions altering the CPU state without notice of the hypervisor or another VM. To prevent these disturbances it is common practice to provide each VM with an own virtual CPU implemented in software. So the VM can for example only disable interrupts on it's virtual CPU. This state change is persistent, but only on the virtual CPU model and isn't written out to the hardware CPU. This ensures the separation of all VMs in the system. Additionally, the hypervisor can inspect the virtual CPU state and alter it in case of errors. '''How is it in POK?''' I haven't found a pleasing alternative for the virtual CPU yet. '''Paravirtualization''' Is the method to run the guest system on the host system, while modifying the guest source code to access host system functionality directly. '''Paravirtualization layer''' Set of functions headers, providing an defined interface to RTEMS as well as to the host system. RTEMS will call these functions instead of the hardware. The host needs to provide enough source code to the guest to implement these function, even if they emit a call to a host function. Hopefully, the compiler is optimizing this additional function call. Alternatively, the host provides a library to be included at link-time, to resolve all missing references.