= RTEMS Coverage Analysis = [[TOC(TBR/UserManual/RTEMS_Coverage_Analysis, depth=2)]] RTEMS is used in many critical systems. It is important that the RTEMS Project ensure that the RTEMS product is tested as thoroughly as possible. With this goal in mind, we have set out to expand the RTEMS test suite so that 100% of the RTEMS executive is tested. There are numerous industry and country specific standards for safety including [http://en.wikipedia.org/wiki/DO-178B FAA DO-178B] for flight software in the United States. There are similar aviation standards in other countries as well as in domains such as medical devices, trains, medical and military applications. As a free software project, the RTEMS Project will never have a complete set of certification paperwork available for download. But we would like to ensure that RTEMS meets the technical requirements that are shared across these safety and quality oriented standards. We encourage members of the community to help out. If you are in a domain where a safety or certification standard applies, work with us to understand that standard and guide us to providing a polished RTEMS product that helps meets that criteria. Providing funding to augment tests, test procedures or documentation that would aid you in using RTEMS in your domain. Once the artifact is merged into the project, it becomes a community asset that will be easier to maintain. Plus the increased level of testing ensures that submissions to RTEMS do not negatively impact you. Be active and help us meet your application domain requirements while improving the product for all! = Applying Coverage Analysis to RTEMS = In order to achieve the 100% tested goal, it is important to define what constitutes 100% tested. A lot of information exists about how to completely test a software application. In general, the term [http://en.wikipedia.org/wiki/Code_coverage Code Coverage] is used to refer to the analysis that is performed to determine what portions of the software are tested by the test suite and what portions are not tested. It should be noted that Code Coverage does not prove correctness only that all code has been tested. For some background information on Code Coverage Analysis, see [wiki:Developer/Coverage/CoverageAnalysisTheory Coverage Analysis Theory]. Traditionally, Code Coverage Analysis has been performed by instrumenting the source code or object code or by using special hardware to monitor the instructions executed. The guidelines for the RTEMS code coverage effort were to use existing tools and to avoid altering the code to be analyzed. This was accomplished by using a processor simulator that provides coverage analysis information. The information was processed to determine which instructions are executed. We called this Object Code Coverage and we defined 100% tested to be 100% Object Code Coverage. In addition to defining the method for determining 100% tested, it is also important to define what is actually being tested. We accomplished this by defining a set of [wiki:RTEMS_Coverage_Analysis#Coverage_Profiles Coverage Profiles] that allowed us to specify the feature set, configuration options and compiler options used when performing the analysis. This was important for two reasons. First, it allowed us to simplify the problem space (uncovered code) so that the goal was attainable. Second, we wanted to recognize that not all RTEMS users configure RTEMS in the same manner and we wanted 100% tested to be applicable to as many user configurations as possible. The first profile that we defined encompassed the RTEMS executive and was called the POSIX Enabled Profile. The RTEMS executive is a large body of code that is generally defined to contain the score, sapi, rtems, and posix directories in the cpukit directory. This represents the full tasking and synchronization feature set. More details about [wiki:RTEMS_Coverage_Analysis#Coverage_Profiles Coverage Profiles] are discussed below. Initially, we set out to achieve 100% Object Code Coverage of the POSIX Enabled Profile. = How it was Done = [[Image(CoverageFlow.png)]]]] The RTEMS Code Coverage Analysis process is designed to be as automated as possible. The coverage testing is performed using a processor simulator in conjunction with a set of RTEMS specific support scripts. The code to be analyzed is linked together as a single relocatable with special start (''start_coverage'') and end (''end_coverage'') symbols. The relocatable is then linked to the same address in every test from the test suite. Each test is then executed on a processor simulator that gathers information about which instructions were executed and produces a coverage map for the test. After all tests have finished, the support script ''covmerge'' is used to merge all coverage maps into a unified coverage map for the entire test suite and to produce reports that identify the uncovered code. The picture shown provides the general flow of the process. One issue that had to be addressed was the different coverage map formats. Each source of a coverage map (e.g. simulator, hardware debugger, etc.) may produce a coverage map in a different format. The ''covmerge'' tool is implemented using C++ classes and provides for inheriting new Coverage Reader and Writer classes for specific coverage map formats. This allows different formats to be converted to the internal representation used by ''covmerge''. The ''covmerge'' script currently supports the formats produced by the TSIM and Skyeye simulators.
The output produced by ''covmerge'' is actually a set of simple ASCII files that give a developer the necessary information to quickly determine the current status of the Code Coverage and enough information to determine the location of the uncovered code. The following set of files is produced by ''covmerge''. {| border="1" style="margin: 1em auto 1em auto;text-align: left;" |+ |- |'''File Name''' || '''Purpose of File''' |- | configuration.txt || Details the settings for the coverage run |- | summary.txt || Provides a summary of the results of the coverage run |- | sizes.txt || Provides a list identifying the file name and source line number of each uncovered range along with its size in bytes |- | report.txt || Provides the details of each uncovered range |- | Explanations.txt.NotFound || Contains the Explanations that were not found for this coverage run (see [wiki:Developer/Coverage/HowTo RTEMS Code Coverage How To] for more information about how and why to use Explanations) |- | annotated.dmp || Provides the disassembled listing of hello.exe with indications of the object code that was not executed |- | hello.num || The symbol table of hello.exe |} You may wonder why the annotated disassembly (''annotated.dmp'') and symbol table (''hello.num'') are from hello.exe. Because the set of object code to analyze is the same in all tests and linked to the same address range, the disassembly and symbol table for the analyzable portion of all executables is the same. = What was Discovered = When we began the RTEMS Code Coverage effort, we performed coverage analysis on the development head of RTEMS 4.8 using the POSIX Enabled Profile. Some of our initial observations were interesting. First, we were a little surprised at the incompleteness of the test suite. We knew that there were some areas of the RTEMS code that were not tested at all, but we also found that areas we thought were tested were only partially tested. We also observed some interesting things about the code we were analyzing. We noticed that the use of inlining sometimes caused significant branch explosion. This generated a lot of uncovered ranges that really mapped back to the same source code. We also found that some defensive coding habits and coding style idioms could generate unreachable object code. Also, the use of a case statement that includes all values of an enumerated type instead of an if statement sometimes lead to unreachable code. Other observations were related to the performance of the ''covmerge'' tool. Of particular interest was the handling of NOP instructions. Compilers can use NOP instructions to force alignment between functions or to fill delay-slots required by the processor. Of course the NOP instructions are not executed and thus had a negative impact on the coverage. The first attempt at dealing with NOP instructions was to mark them all as EXECUTED. This was correct for the NOPs used for function alignment, but not for NOPs used for delay-slots. Marking delay-slot NOPs as EXECUTED produced an unwanted side effect of occasionally spliting an uncovered range into two ranges. We finally settled on an improved method for dealing with NOPs where NOPs were marked as EXECUTED unless they were between two NOT EXECUTED instructions. An example is shown below: {{{ 2003ee8: 80 a6 20 00 cmp %i0, 0 <== NOT EXECUTED 2003eec: 02 80 00 06 be 2003f04 <== NOT EXECUTED 2003ef0: 01 00 00 00 nop <== NOT EXECUTED 2003ef4: 40 00 78 fb call 20222e0 <__errno> <== NOT EXECUTED 2003ef8: b0 20 00 18 neg %i0 <== NOT EXECUTED }}} This solution to the NOP problem was important because NOPs were falsely increasing the number of uncovered ranges. This created an unnecessary explosion of the reports and increased the uncovered ranges to examine. = Resolving Uncovered Code = The output files produced by ''covmerge'' are intended to provide both a quick-look at the status of a coverage run and the details needed to resolve the uncovered ranges. As we worked through the resolution of the uncovered ranges, we noticed that the uncovered ranges usually fit into one of the following categories: * A new test case is needed. * Code unreachable in selected RTEMS configuration. For example, the SuperCore could have a feature only exercised by a POSIX API object. It should be disabled when POSIX is not configured. * Debug or sanity checking code which should be placed inside an RTEMS_DEBUG conditional. * Unreachable paths generated for switch statements. If the switch is based upon an enumerated type and the switch includes cases for all values, then it must be possible to actually generate all values at this point in the code. You can restructure the switch to only include possible values and thus avoid unreachable object code. This is sometimes best done by rewriting the switch into a series of if/else statements. * Critical sections which are synchronizing actions with ISRs. Most of these are very hard to hit and may require very specific support from a simulator environment. OAR has used tsim to exercise these paths but this is not reproducible in a BSP independent manner. Worse, sometimes there is often no external way to know the case in question has been hit and no way to do it in a one shot test. The spintrcriticalXX and psxintrcriticalXX tests attempt to reproduce these cases. In general, it is interesting to note that the resolution of uncovered code does not simply translate into additions to the test suite. Often the resolution points to improvements or changes to the analyzed code. This can lead to more intelligent factoring of the code or a code re-design that produces a simpler solution. There is also the notion that just because the analyzed code is "good" the way it is does not mean that it should not be rewritten to improve its testability. Code that is completely tested is '''always''' better. = Measuring Progress = As mentioned above, the ''covmerge'' script produces reports that contain several metrics that can be used to measure progress. The first is the number of uncovered object code ranges. The second is the percent of untested object code as a percentage of the total object code size under analysis. Together these metrics provide useful information about the status or progress of the Object Code Coverage. When we started the RTEMS Code Coverage effort, we did not immediately capture results to measure progress. This actually ended up being the correct thing to do since the ''covmerge'' tool was in development and often produced results that were not directly comparable. Now that the development of ''covmerge'' has largely settled, we can perform coverage runs on several RTEMS release points and see the progress of the coverage effort. The results shown below were of the POSIX Enabled Profile run on the SPARC/ERC32. {| border="1" style="margin: 1em auto 1em auto;text-align: left;" |+ |- |'''Release''' || '''Covered %''' || '''Uncovered Ranges''' || '''Uncovered Bytes''' || '''Total Bytes''' |- | 4.7 || 77.51 || 454 || 17508 || 77840 |- | 4.8 || 76.37 || 538 || 21772 || 92140 |- | 4.9 || 96.41 || 167 || 2532 || 70564 |- | 4.10 (head 09/09/2009) || 100 || 0 || 0 || 70480 |} Several interesting facts can be seen from the data in the table. There was no organized effort to perform coverage analysis prior to the 4.8 release. This is evident in that there was no measurable improvement in coverage between 4.7 and 4.8. The unassisted developer is just not going to recognize the need for more test cases in the test suite. The coverage analysis began prior to the 4.9 release. Not surprising, the progress was significant between 4.8 and 4.9. At that time we addressed large uncovered ranges by doing simple things like adding test cases to the test suite and disabling code that was not used by the chosen configuration. The last 3.5% of uncovered code was much harder to address, but the development head has now achieved 100% coverage. Now that we have achieved 100% Code Coverage using the POSIX Enabled Profile, we would like to keep it 100% covered. We have setup a periodic run of the coverage analysis against the development head. The results are captured (http://rtems/ftp/pub/rtems/people/joel/coverage/) and can be monitored to ensure that future modifications to the analyzed code base do not produce uncovered code. = Coverage Profiles = RTEMS contains a lot of source code and although the primary focus of coverage analysis is to achieve 100% coverage of well-defined code subsets, we would also like to increase the amount of source code analyzed. In order to manage the increase in a systematic manner, we defined to basic groups of source code. The first group is called Baseline and the second group is called Developmental. The Baseline group contains the source code that has achieved (or nearly achieved) 100% Object Code Coverage. The Developmental group contains the source code for which there are very few test cases and therefore very poor coverage. Initially, the Baseline group included source code from the cpukit. Specifically the following cpukit directories were included: score, sapi, rtems and posix. This group represents a full tasking and synchronization feature set. What was not in the Baseline group was placed in the Developmental group. The Developmental group included: libcsupport, libfs/imfs, libmisc/stackchk, libmisc/cpuuse, libmisc/bspcmdline, libmisc/dmpbuf and libmisc/devnull. From the two groups, we recognized the need to analyze each group with POSIX enabled and POSIX disabled. This produced four sub-groups that we called profiles. The four profiles are: * Baseline (POSIX Enabled) * Baseline (POSIX Disabled) * Developmental (POSIX Enabled) * Developmental (POSIX Disabled) As other support libraries in cpukit is covered, these will be move from the Developmental Profile and added to the '''POSIX Enabled''' and '''Classic API Only''' profiles There are four code subsets analysed with the option of using the -O2 or -Os level. = Compilation and Configuration Options = Discuss impact of -O2 versus -Os with example from code. Inlining _Thread_Dispatch_enable, etc. = Classic API Only (POSIX Disabled) = In this profile, we disable POSIX and focus on the contents of the score, sapi, and rtems directories in the cpukit directory. The POSIX API and tests are disabled. In this profile, we expect to identify: * features in score only exercised by POSIX * features in score available via Classic API but only tested via POSIX * POSIX features like sleep() which are enabled when POSIX threads are disabled. The first case will allow us to disable score features in this configuration and reduce the code size. The second case allows us to approach 100% coverage in every RTEMS configuration. The third case is similar to the second and indicates the need for tests in this configuration for features that are technically part of the POSIX API support. = Developmental = This is an experimental/developmental coverage configuration and adds almost all of the CPUKit contents that are non-networked. It nearly doubles the size of the code being covered. We are aiming for the entire contents of libcsupport, libmisc, and various filesystems. This is a large body of code and components like Termios and the file systems will require creativity to get automated coverage near 100%. We have done initial tests on this profile. There is work to be done improving the test coverage. As components are covered 100%, they will be moved from experimental/developmental status to be included in the official coverage run. We welcome your contributions. = Beyond Object Code Coverage = = Statement Coverage = This requires knowing which source files are involved (which we do) and which lines in those files can produce assembly code (which I don't think we do 100%). We can easily know which lines are comments and blank but beyond that will require some thought. The current object coverage utility ''covmerge'' can be modified to generate a report of which source lines were covered. It could generate a bitmap per source file where the bit index indicates if a source line in that file was executed or not. If we can generate a similar bit map from the source code which marks comments and other non-executable source lines as covered, then the union of the two bitmaps can be used to generate a report showing which source lines are not covered or represented in the object code. This may indicate dead code or weaknesses in the tests. This is definitely an open project at this point. = Condition/Decision Coverage = QEMU -- project to do MC/DC .. update here TBD = MC/DC = From the RTEMS testing perspective, this is to verify that every branch instruction in the generated object has been both taken and not taken. We cannot determine this without help from a simulator or hardware debugger which gathers this information. QEMU -- project to do MC/DC .. update here = BSPs Analysed = If you know of a simulator that includes coverage analysis, please let us know. = Currently Analysed = Results may be found at http://www.rtems.org/ftp/pub/rtems/people/joel/coverage/ == ARM == The [wiki:Developer/Simulators/SkyEye SkyEye] project has added coverage analysis capabilities per our specifications. We are currently using it on the following ARM targets to generate coverage reports: * EDB7312 * GumStix * SMDK2410 == i386 == We have identified using Qemu for the information. This project (http://libre.adacore.com/libre/tools/coverage/) aims to add the necessary capabilities to that simulator. The source code for this project is available from http://forge.open-do.org/scm/?group_id=8. The following BSP is included in the coverage reports. * pc386 = SPARC = We are using TSIM from Gaisler Research on the following BSPs: * ERC32 * LEON2 * LEON3 = Not Currently Analysed = = Blackfin = Since [wiki:Developer/Simulators/SkyEye SkyEye] supports this target architecture, we hope to one day get coverage results on the following BSPs: * eZKit553 = Coldfire = [wiki:Developer/Simulators/SkyEye SkyEye] supports the Coldfire but is currently unable to run any RTEMS Coldfire BSP. Work to improve Skyeye's Coldfire support is welcomed. We look forward to being able to use it to perform coverage testing on the following BSPs. * mcf5206elite In addition Qemu has support for the Coldfire and we can run the uC5282 BSP on it. Unfortunately, the Qemu ColdFire support does not yet include tracing. = PowerPC = Qemu includes support for various PowerPC boards. We have not yet matched a BSP with one of these that works. The likely solution is to do a minimal BSP based upon the powerpc-elf support being used by the Qemu developers. This should result in a minimal but functional BSP and be easy to cobble together. = References = = ==General Coverage Testing=== * [http://en.wikipedia.org/wiki/Code_coverage Code Coverage Definition] * [http://en.wikipedia.org/wiki/Modified_Condition/Decision_Coverage Modified Condition/Decision Coverage Definition] * [http://googletesting.blogspot.com/2008/03/tott-understanding-your-coverage-data.html TotT: Understanding Your Coverage Data] ===Standards and Certifications=== * FAA DO-178B - United States Aviation Standard