wiki:TBR/UserManual/RTEMS_Coverage_Analysis

Version 58 (modified by GlennHumphrey, on 09/18/09 at 01:16:25) (diff)

Added data to show code coverage progress

RTEMS Coverage Analysis

RTEMS is used in many critical systems. It is important that the RTEMS Project ensure that the RTEMS product is tested as thoroughly as possible. With this goal in mind, we have set out to expand the RTEMS test suite so that 100% of the RTEMS executive is tested. There are numerous industry and country specific standards for safety including FAA DO-178B for flight software in the United States. There are similar aviation standards in other countries as well as in domains such as medical devices, trains, medical and military applications. As a free software project, the RTEMS Project will never have a complete set of certification paperwork available for download. But we would like to ensure that RTEMS meets the technical requirements that are shared across these safety and quality oriented standards.

We encourage members of the community to help out. If you are in a domain where a safety or certification standard applies, work with us to understand that standard and guide us to providing a polished RTEMS product that helps meets that criteria. Providing funding to augment tests, test procedures or documentation that would aid you in using RTEMS in your domain. Once the artifact is merged into the project, it becomes a community asset that will be easier to maintain. Plus the increased level of testing ensures that submissions to RTEMS do not negatively impact you.

Be active and help us meet your application domain requirements while improving the product for all!

Applying Coverage Analysis to RTEMS

In order to achieve the 100% tested goal, it is important to define what constitutes 100% tested. A lot of information exists about how to completely test a software application. In general, the term Code Coverage is used to refer to the analysis that is performed to determine what portions of the software are tested by the test suite and what portions are not tested. It should be noted that Code Coverage does not prove correctness only that all code has been tested. For some background information on Code Coverage Analysis, see Coverage Analysis Theory?.

Traditionally, Code Coverage Analysis has been performed by instrumenting the source code or object code or by using special hardware to monitor the instructions executed. The guidelines for the RTEMS code coverage effort were to use existing tools and to avoid altering the code to be analyzed. This was accomplished by using a processor simulator that provides coverage analysis information. The information was processed to determine which instructions are executed. We called this Object Code Coverage and we defined 100% tested to be 100% Object Code Coverage.

In addition to defining the method for determining 100% tested, it is also important to define what is actually being tested. We accomplished this by defining a set of Coverage Profiles that allowed us to specify the feature set, configuration options and compiler options used when performing the analysis. This was important for two reasons. First, it allowed us to simplify the problem space (uncovered code) so that the goal was attainable. Second, we wanted to recognize that not all RTEMS users configure RTEMS in the same manner and we wanted 100% tested to be applicable to as many user configurations as possible. The first profile that we defined encompassed the RTEMS executive and was called the POSIX Enabled Profile. The RTEMS executive is a large body of code that is generally defined to contain the score, sapi, rtems, and posix directories in the cpukit directory. This represents the full tasking and synchronization feature set. More details about Coverage Profiles are discussed below. Initially, we set out to achieve 100% Object Code Coverage of the POSIX Enabled Profile.

How it was Done

Imported from old wiki.]]

The RTEMS Code Coverage Analysis process is designed to be as automated as possible. The coverage testing is performed using a processor simulator in conjunction with a set of RTEMS specific support scripts. The code to be analyzed is linked together as a single relocatable with special start (start_coverage) and end (end_coverage) symbols. The relocatable is then linked to the same address in every test from the test suite. Each test is then executed on a processor simulator that gathers information about which instructions were executed and produces a coverage map for the test. After all tests have finished, the support script covmerge is used to merge all coverage maps into a unified coverage map for the entire test suite and to produce reports that identify the uncovered code. The picture shown provides the general flow of the process.

One issue that had to be addressed was the different coverage map formats. Each source of a coverage map (e.g. simulator, hardware debugger, etc.) may produce a coverage map in a different format. The covmerge tool is implemented using C++ classes and provides for inheriting new Coverage Reader and Writer classes for specific coverage map formats. This allows different formats to be converted to the internal representation used by covmerge. The covmerge script currently supports the formats produced by the TSIM and Skyeye simulators. <br style="clear: both" />

The output produced by covmerge is actually a set of simple ASCII files that give a developer the necessary information to quickly determine the current status of the Code Coverage and enough information to determine the location of the uncovered code. The following set of files is produced by covmerge.

{| border="1" style="margin: 1em auto 1em auto;text-align: left;" |+

|File Name | configuration.txt | summary.txt | sizes.txt | report.txt | Explanations.txt.NotFound? | annotated.dmp | hello.num
Purpose of File
Details the settings for the coverage run
Provides a summary of the results of the coverage run
Provides a list identifying the file name and source line number of each uncovered range along with its size in bytes
Provides the details of each uncovered range
Contains the Explanations that were not found for this coverage run (see RTEMS Code Coverage How To for more information about how and why to use Explanations)
Provides the disassembled listing of hello.exe with indications of the object code that was not executed
The symbol table of hello.exe

|}

You may wonder why the annotated disassembly (annotated.dmp) and symbol table (hello.num) are from hello.exe. Because the set of object code to analyze is the same in all tests and linked to the same address range, the disassembly and symbol table for the analyzable portion of all executables is the same.

What was Discovered

When we began the RTEMS Code Coverage effort, we performed coverage analysis on the development head of RTEMS 4.8 using the POSIX Enabled Profile. Some of our initial observations were interesting. First, we were a little surprised at the incompleteness of the test suite. We knew that there were some areas of the RTEMS code that were not tested at all, but we also found that areas we thought were tested were only partially tested. We also observed some interesting things about the code we were analyzing. We noticed that the use of inlining sometimes caused significant branch explosion. This generated a lot of uncovered ranges that really mapped back to the same source code. We also found that some defensive coding habits and coding style idioms could generate unreachable object code. Also, the use of a case statement that includes all values of an enumerated type instead of an if statement sometimes lead to unreachable code.

Other observations were related to the performance of the covmerge tool. Of particular interest was the handling of NOP instructions. Compilers can use NOP instructions to force alignment between functions or to fill delay-slots required by the processor. Of course the NOP instructions are not executed and thus had a negative impact on the coverage. The first attempt at dealing with NOP instructions was to mark them all as EXECUTED. This was correct for the NOPs used for function alignment, but not for NOPs used for delay-slots. Marking delay-slot NOPs as EXECUTED produced an unwanted side effect of occasionally spliting an uncovered range into two ranges. We finally settled on an improved method for dealing with NOPs where NOPs were marked as EXECUTED unless they were between two NOT EXECUTED instructions. An example is shown below:

2003ee8:  80 a6 20 00 	cmp  %i0, 0                            <== NOT EXECUTED
2003eec:  02 80 00 06 	be  2003f04 <IMFS_fifo_write+0x60>     <== NOT EXECUTED
2003ef0:  01 00 00 00 	nop                                    <== NOT EXECUTED
2003ef4:  40 00 78 fb 	call  20222e0 <__errno>                <== NOT EXECUTED
2003ef8:  b0 20 00 18 	neg  %i0                               <== NOT EXECUTED

The NOP issue was important because it falsely increased the number of uncovered ranges. This created an unnecessary explosion of the reports and increased the uncovered ranges to examine.

Resolving Uncovered Code

The output files produced by covmerge are intended to provide both a quick-look at the status of a coverage run and the details needed to resolve the uncovered ranges. As we worked through the resolution of the uncovered ranges, we noticed that the uncovered ranges usually fit into one of the following categories:

  • A new test case is needed.
  • Code unreachable in selected RTEMS configuration. For example, the SuperCore? could have a feature only exercised by a POSIX API object. It should be disabled when POSIX is not configured.
  • Debug or sanity checking code which should be placed inside an RTEMS_DEBUG conditional.
  • Unreachable paths generated for switch statements. If the switch is based upon an enumerated type and the switch includes cases for all values, then it must be possible to actually generate all values at this point in the code. You can restructure the switch to only include possible values and thus avoid unreachable object code. This is sometimes best done by rewriting the switch into a series of if/else statements.
  • Critical sections which are synchronizing actions with ISRs. Most of these are very hard to hit and may require very specific support from a simulator environment. OAR has used tsim to exercise these paths but this is not reproducible in a BSP independent manner. Worse, sometimes there is often no external way to know the case in question has been hit and no way to do it in a one shot test. The spintrcriticalXX and psxintrcriticalXX tests attempt to reproduce these cases.

In general, it is interesting to note that the resolution of uncovered code does not simply translate into additions to the test suite. Often the resolution points to improvements or changes to the analyzed code. This can lead to more intelligent factoring of the code or a code re-design that produces a simpler solution. There is also the notion that just because the analyzed code is "good" the way it is does not mean that it should not be rewritten to improve its testability. Code that is completely tested is always better.

Measuring Progress

As mentioned above, the covmerge script produces reports that contain several metrics that can be used to measure progress. The first is the number of uncovered object code ranges. The second is the percent of untested object code as a percentage of the total object code size under analysis. Together the metrics provide useful information about the status or progress of the Object Code Coverage.

When we started the RTEMS Code Coverage effort, we did not immediately capture results to measure progress. This actually ended up being the correct thing to do since the covmerge tool was in development and often produced results that were not directly comparable. Now that the development of covmerge has largely settled, we can perform coverage runs on several RTEMS release points and see the progress of the coverage effort. The results shown below were of the POSIX Enabled Profile run on the SPARC/ERC32.

{| border="1" style="margin: 1em auto 1em auto;text-align: left;" |+

|Release | 4.7 | 4.8 | 4.9 | 4.10 (head 09/09/2009)
Covered % Uncovered Ranges Uncovered Bytes Total Bytes
77.51 454 17508 77840
76.37 538 21772 92140
96.41 167 2532 70564
100 0 0 70480

|}

Several interesting facts can be seen from the data in the table. There was no organized effort to perform coverage analysis prior to the 4.8 release. This is evident in that there was no measurable improvement in coverage between 4.7 and 4.8. The unassisted developer is just not going to recognize the need for more test cases in the test suite. The coverage analysis began prior to the 4.9 release. Not surprising, the progress was significant between 4.8 and 4.9. At that time we addressed large uncovered ranges by doing simple things like adding test cases to the test suite and disabling code that was not used by the chosen configuration.

Coverage Profiles

RTEMS includes a lot of source code and the coverage analysis should focus on improving the test coverage of well-defined code subsets with a trend over time of increasing both the level of coverage (e.g. object to statement to decision to MC/DC) and the amount of source code covered.

As other support libraries in cpukit is covered, these will be move from the Developmental Profile and added to the POSIX Enabled and Classic API Only profiles

There are four code subsets analysed with the option of using the -O2 or -Os level.

  • Baseline with POSIX Enabled
  • Baseline with POSIX Disabled
  • Developmental with POSIX Enabled
  • Developmental with POSIX Disabled

Define baseline and developmental

Compilation and Configuration Options

Discuss impact of -O2 versus -Os with example from code.

Inlining _Thread_Dispatch_enable, etc.

POSIX Enabled

This is the first profile we tested. This initially focused on the score, sapi, rtems, and posix directories in the cpukit directory. This profile represents a full tasking and synchronization feature set.

Classic API Only (POSIX Disabled)

In this profile, we disable POSIX and focus on the contents of the score, sapi, and rtems directories in the cpukit directory. The POSIX API and tests are disabled. In this profile, we expect to identify:

  • features in score only exercised by POSIX
  • features in score available via Classic API but only tested via POSIX
  • POSIX features like sleep() which are enabled when POSIX threads are disabled.

The first case will allow us to disable score features in this configuration and reduce the code size.

The second case allows us to approach 100% coverage in every RTEMS configuration.

The third case is similar to the second and indicates the need for tests in this configuration for features that are technically part of the POSIX API support.

Developmental

This is an experimental/developmental coverage configuration and adds almost all of the CPUKit contents that are non-networked. It nearly doubles the size of the code being covered. We are aiming for the entire contents of libcsupport, libmisc, and various filesystems. This is a large body of code and components like Termios and the file systems will require creativity to get automated coverage near 100%.

We have done initial tests on this profile. There is work to be done improving the test coverage. As components are covered 100%, they will be moved from experimental/developmental status to be included in the official coverage run.

We welcome your contributions.

Beyond Object Code Coverage

Statement Coverage

This requires knowing which source files are involved (which we do) and which lines in those files can produce assembly code (which I don't think we do 100%). We can easily know which lines are comments and blank but beyond that will require some thought.

The current object coverage utility covmerge can be modified to generate a report of which source lines were covered. It could generate a bitmap per source file where the bit index indicates if a source line in that file was executed or not. If we can generate a similar bit map from the source code which marks comments and other non-executable source lines as covered, then the union of the two bitmaps can be used to generate a report showing which source lines are not covered or represented in the object code. This may indicate dead code or weaknesses in the tests.

This is definitely an open project at this point.

Condition/Decision? Coverage

TBD

MC/DC

From the RTEMS testing perspective, this is to verify that every branch instruction in the generated object has been both taken and not taken. We cannot determine this without help from a simulator or hardware debugger which gathers this information.

QEMU -- project to do MC/DC .. update here

BSPs Analyzed

If you know of a simulator that includes coverage analysis, please let us know.

ARM

The SkyEye project has added coverage analysis capabilities per our specifications. We are currently using it on the following ARM targets to generate coverage reports:

Blackfin

Since SkyEye supports this target architecture, we hope to one day get coverage results on the following BSPs:

  • eZKit553

Coldfire

SkyEye supports the Coldfire but is currently unable to run any RTEMS Coldfire BSP. Work to improve Skyeye's Coldfire support is welcomed. We look forward to being able to use it to perform coverage testing on the following BSPs.

  • mcf5206elite

i386

We have identified using Qemu for the information. This project (http://libre.adacore.com/libre/tools/coverage/) aims to add the necessary capabilities to that simulator. The source code for this project is available from http://forge.open-do.org/scm/?group_id=8. Now it is up to you.

We anticipate that someday we will be able to do coverage testing using Qemu on the following BSPs:

  • pc386

=SPARC =

We are using TSIM from Gaisler Research on the following BSPs:

  • ERC32
  • LEON2
  • LEON3

=References=

===General Coverage Testing===

===Standards and Certifications===

  • FAA DO-178B - United States Aviation Standard

Attachments (1)

Download all attachments as: .zip