wiki:GCI/Documentation/CoverageAnalysis/Coverage

Version 15 (modified by shashvat jain, on Dec 8, 2018 at 8:21:10 PM) (diff)

--

What was Discovered

When we began the RTEMS Code Coverage effort, we performed coverage analysis on the development head of RTEMS 4.8 using the Baseline/-Os/POSIX Enabled Profile. Some of our initial observations were interesting. First, we were a little surprised at the incompleteness of the test suite. We knew that there were some areas of the RTEMS code that were not tested at all, but we also found that areas we thought were tested were only partially tested. We also observed some interesting things about the code we were analyzing. We noticed that the use of inlining sometimes caused significant branch explosion. This generated a lot of uncovered ranges that really mapped back to the same source code. We also found that some defensive coding habits and coding style idioms could generate unreachable object code. Also, the use of a case statement that includes all values of an enumerated type instead of an if statement sometimes lead to unreachable code.

Other observations were related to the performance of the covmerge tool. Of particular interest was the handling of NOP instructions. Compilers can use NOP instructions to force alignment between functions or to fill delay-slots required by the processor. Of course the NOP instructions are not executed and thus had a negative impact on the coverage. The first attempt at dealing with NOP instructions was to mark them all as EXECUTED. This was correct for the NOPs used for function alignment, but not for NOPs used for delay-slots. Marking delay-slot NOPs as EXECUTED produced an unwanted side effect of occasionally spliting an uncovered range into two ranges. We finally settled on an improved method for dealing with NOPs where NOPs were marked as EXECUTED unless they were between two NOT EXECUTED instructions. An example is shown below:

2003ee8:  80 a6 20 00 	cmp  %i0, 0                            <== NOT EXECUTED
2003eec:  02 80 00 06 	be  2003f04 <IMFS_fifo_write+0x60>     <== NOT EXECUTED
2003ef0:  01 00 00 00 	nop                                    <== NOT EXECUTED
2003ef4:  40 00 78 fb 	call  20222e0 <__errno>                <== NOT EXECUTED
2003ef8:  b0 20 00 18 	neg  %i0                               <== NOT EXECUTED

This solution to the NOP problem was important because NOPs were falsely increasing the number of uncovered ranges. This created an unnecessary explosion of the reports and increased the uncovered ranges to examine.

Resolving Uncovered Code

The output files produced by covmerge are intended to provide both a quick-look at the status of a coverage run and the details needed to resolve the uncovered ranges. As we worked through the resolution of the uncovered ranges, we noticed that the uncovered ranges usually fit into one of the following categories:

  • A new test case is needed.
  • Code unreachable in selected RTEMS configuration. For example, the SuperCore could have a feature only exercised by a POSIX API object. It should be disabled when POSIX is not configured.
  • Debug or sanity checking code which should be placed inside an RTEMS_DEBUG conditional.
  • Unreachable paths generated for switch statements. If the switch is based upon an enumerated type and the switch includes cases for all values, then it must be possible to actually generate all values at this point in the code. You can restructure the switch to only include possible values and thus avoid unreachable object code. This is sometimes best done by rewriting the switch into a series of if/else statements.
  • Critical sections which are synchronizing actions with ISRs. Most of these are very hard to hit and may require very specific support from a simulator environment. OAR has used tsim to exercise these paths but this is not reproducible in a BSP independent manner. Worse, sometimes there is often no external way to know the case in question has been hit and no way to do it in a one shot test. The spintrcriticalXX and psxintrcriticalXX tests attempt to reproduce these cases.

In general, it is interesting to note that the resolution of uncovered code does not simply translate into additions to the test suite. Often the resolution points to improvements or changes to the analyzed code. This can lead to more intelligent factoring of the code or a code re-design that produces a simpler solution. There is also the notion that just because the analyzed code is "good" the way it is does not mean that it should not be rewritten to improve its testability. Code that is completely tested is always better.

Measuring Progress

As mentioned above, the covmerge program produces reports that contain several metrics that can be used to measure progress. The first is the number of uncovered object code ranges. The second is the percent of untested object code as a percentage of the total object code size under analysis. Together these metrics provide useful information about the status or progress of the Object Code Coverage.

When we started the RTEMS Code Coverage effort, we did not immediately capture results to measure progress. This actually ended up being the correct thing to do since the covmerge tool was in development and often produced results that were not directly comparable. Now that the development of covmerge has largely settled, we can perform coverage runs on several RTEMS release points and see the progress of the coverage effort. The results shown below were of the Baseline/-Os/POSIX Enabled Profile run on the SPARC/ERC32.

Release Covered % Uncovered Ranges Uncovered Bytes Total Bytes
4.7 77.51 454 17508 77840
4.8 76.37 538 21772 92140
4.9 96.41 167 2532 70564
4.10 (head 09/09/2009) 100 0 0 70480

Several interesting facts can be seen from the data in the table. There was no organized effort to perform coverage analysis prior to the 4.8 release. This is evident in that there was no measurable improvement in coverage between 4.7 and 4.8. The unassisted developer is just not going to recognize the need for more test cases in the test suite. The coverage analysis began prior to the 4.9 release. Not surprising, the progress was significant between 4.8 and 4.9. At that time we addressed large uncovered ranges by doing simple things like adding test cases to the test suite and disabling code that was not used by the chosen configuration. The last 3.5% of uncovered code was much harder to address, but the development head has now achieved 100% coverage.

Now that we have achieved 100% Code Coverage using the Baseline/-Os/POSIX Enabled Profile, we would like to keep it 100% covered. We have setup a periodic run of the coverage analysis against the development head. The results are captured (http://rtems/ftp/pub/rtems/people/joel/coverage/) and can be monitored to ensure that future modifications to the analyzed code base do not produce uncovered code.

Coverage Profiles

RTEMS contains a lot of source code and although the primary focus of coverage analysis is to achieve 100% coverage of well-defined code subsets, we would also like to increase the amount of source code analyzed. In order to manage the increase in a systematic manner, we defined two basic groups of source code. The first group is called Baseline and the second group is called Developmental. The Baseline group contains the source code that has achieved (or nearly achieved) 100% Object Code Coverage. The Developmental group contains the source code for which there are very few test cases and therefore very poor coverage.

Initially, the Baseline group included source code from the cpukit. Specifically the following cpukit directories were included: score, sapi, rtems and posix. This group represents a full tasking and synchronization feature set. What was not in the Baseline group was placed in the Developmental group. The Developmental group included: libcsupport, libfs/imfs, libmisc/stackchk, libmisc/cpuuse, libmisc/bspcmdline, libmisc/dmpbuf and libmisc/devnull.

Within the two groups, we recognized the need to use different compiler optimization levels and to analyze each group with POSIX threads enabled and POSIX threads disabled. Applying these options produced eight sub-groups that we called profiles. The eight profiles are:

  • Baseline/-Os/POSIX Enabled
  • Baseline/-O2/POSIX Enabled
  • Baseline/-Os/POSIX Disabled
  • Baseline/-O2/POSIX Disabled
  • Developmental/-Os/POSIX Enabled
  • Developmental/-O2/POSIX Enabled
  • Developmental/-Os/POSIX Disabled
  • Developmental/-O2/POSIX Disabled

Over time it is desirable to migrate code from the Developmental group to the Baseline. As support libraries in cpukit become nearly 100% covered, they will be move from the Developmental group to the Baseline group. Eventually, the Baseline group should contain all of the RTEMS code and the Developmental group should contain nothing.

Compilation and Configuration Options

The compilation level and POSIX configuration options are passed as command line arguments to the RTEMS Coverage Scripts. RTEMS Code Coverage How To provides details concerning how to run the RTEMS Coverage Scripts. When we started the RTEMS Code Coverage effort, the code analyzed was compiled with optimization level -Os. This optimizes for size without making the object code too difficult to follow. Following the object code is important when trying to determime how to resolve the uncovered code. Once the analyzed code approaches 100% covered, it is desirable to change the optimization level to -O2. This is the most often used optimization level.

Enabling or disabling POSIX allows us to analyze the RTEMS code in its two most commonly used threading configurations. When POSIX is enabled, RTEMS is configured to use POSIX threads and the POSIX tests are built and executed as part of the test suite. When POSIX is disabled, RTEMS is configured to use Classic RTEMS threads and the POSIX tests are not included in the test suite.

Internal Compilation and Configuration Options

There are several compilation and configuration options that are built into the RTEMS Coverage Scripts and are not selectable from the command line. These options effect the RTEMS build and are used to simplify the code to aid analysis. Ideally, we would like the coverage build to match the default build for RTEMS. Over time, we will work to eliminate the need for the internal options. The current options being used are:

  • NDEBUG=1 - Disables asserts. We will probably keep this option.
  • RTEMS_DO_NOT_INLINE_THREAD_ENABLE_DISPATCH=1 - Inlining resulted in branch explosion. Over 200 new test cases will be needed to eliminate this option.
  • RTEMS_DO_NOT_INLINE_CORE_MUTEX_SEIZE=1 - Inlining resulted in very difficult code to analyze. This option should be able to be eliminated.
  • RTEMS_DO_NOT_UNROLL_THREADQ_ENQUEUE_PRIORITY=1 - Unrolling loop resulted in multiple interrupt critical sections. This option should be able to be eliminated.

Beyond Object Code Coverage

At this point, the RTEMS Code Coverage effort has been focused on Object Code Coverage. But we would like to go beyond Object Code Coverage and address other traditional coverage criteria (see Coverage Analysis Theory). We would also like to remain true to our original guidelines of using existing tools and performing the analysis without modifying the code to analyze.

Achieving Statement Coverage

Achieving Statement Coverage requires knowing which source files are involved (which covoar does) and which lines in those files can produce assembly code (which I don't think covoar can). If any dead source code in RTEMS is detected by the combination of gcc and Coverity Scan, then we can assume that all source code in RTEMS is represented in the generated executables.

The current object coverage utility covoar reports on which source lines were covered. It could easily be modified to generate a report indicating which source lines were covered, not covered, or only partially covered.

covoar could also generate a bitmap per source file where the bit index indicates if a source line in that file was executed or not. If we can generate a similar bit map from the source code which marks comments and other non-executable source lines as covered, then the union of the two bitmaps can be used to generate a report showing which source lines are not covered or represented in the object code. This may indicate dead code or weaknesses in the tests.

Adding a statement coverage report to covoar is an open project.

Achieving Condition/Decision Coverage

Achieving Condition/Decision Coverage requires knowing whether each branch has been both taken and not taken. Currently QEMU and tsim can be used to gather this information.

tsim produces bitmaps indicating instruction executed, branch taken, and branch not taken.

All versions of QEMU produce a debug log of the instructions executed when an executable is run. The trace information is analyzed to identify branch instructions and to determine whether the branch was taken and/or not taken. Some versions of QEMU may also be able to produce a trace log which is denser but contains the same information.

skyeye does not produce branch taken/not taken information.

covoar produces reports on which branch instructions are taken and not taken. Our goal is to ensure that each branch instruction is taken and not taken.

GCC does not include debug information which indicates that a sequence of compare and branch instructions are part of a single logical condition. This hinders our ability to augment covoar to make direct claims regarding Decision Coverage (DC) and Modified condition/decision coverage (MC/DC).

We believe that for single condition if statements such as if (cond) action or if (cond) action1 else action2, that we are achieving full DC and MC/DC coverage because all logical paths are exercised.

Similarly given a dual OR condition if statement (in C) such as one the following:

Case OR1: if (cond1 or cond2)
            action
Case OE2: if (cond1 or cond2)
            action1
          else
            action2

We aim for the following cases given our branch coverage requirements:

  • cond1 branch taken, cond2 short-circuited
  • cond1 branch not taken, cond2 taken
  • cond1 branch not taken, cond2 not taken

As the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.

Case AND1: if (cond1 and cond2)
             action
Case AND2: if (cond1 and cond2)
             action1
           else
             action2

We aim for the following cases given our branch coverage requirements:

  • cond1 branch taken, cond2 taken
  • cond1 branch taken, cond2 not taken
  • cond1 branch not taken, cond2 short-circuited

Again, the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.

Open projects in this area include:

  • proving our branch coverage testing policy meets decision coverage (DC) requirements in a more general sense.
  • extending GCC to provide the debug information required to let covoar evaluate DC and MC/DC in C programs.
    • IDEA: If GCC reliably reports that all conditions within a single if condition have the same line number, then we can use that information as the basis for the analysis. Did we execute the proper set of cases for all branch instructions associated with a single debug line number.

Current Status

The Code Coverage Status section lists the RTEMS BSPs on which we are performing (or would like to perform) Object Code Coverage. We would like to continue to grow this list. If you know of a simulator that includes coverage analysis, please let us know.

With the instruction level coverage of core of RTEMS (e.g. score, rtems, posix, and sapi directories) near 100%, we have expanded our attention to include other non-networking portions of the cpukit. The best way to find out which portions of the cpukit are not currently being included in coverage analysis is to look at the commented out lines calling filter_nm() in the method generate_symbols() in rtems-testing/rtems-coverage/do_coverage

If you are interested in writing some simple parameter check error cases, then take a look at the branch taken/not taken coverage reports for the "core configuration". Some of these are a simple matter of adding missing test cases for bad parameter path. Other cases are more difficult. So if you run into trouble with the analysis, ask or skip it. A common pattern is this:

if (arg1 bad)
  return EINVAL;
if (arg2 bad)
  return EINVAL;

GCC is smart enough to optimize the returns into one block of code. Thus we could have a test for arg1 or arg2 bad and obtain 100% instruction coverage. But would not get 100% branch coverage.

Initial analysis has been done at -Os which instructs gcc to generate smaller object code. At -O2 which optimizes for speed, more code is generated and it is often clear looking at the -O2 reports, that there are test cases needed which are not required at -Os.

Coverage Analysis Theory

The subject of Code Coverage Analysis is broad and has been written about many times over. This background material is not intended to summarise or rehash what can be read elsewhere. Instead, the focus here will be on the aspects of Code Coverage Analysis as they pertain to the RTEMS Coverage Analysis effort.

The ultimate goal of Code Coverage Analysis is to ensure that a test suite adequately tests a particular body of code. In order to achieve this goal, several different coverage criteria may have to be examined. Let's consider the following criteria:

  • Statement Coverage - Has each line of the source code been executed?
  • Decision Coverage (also known as Branch coverage) - Has each control structure (such as an if statement) evaluated both to true and false?
  • Condition Coverage - Has each boolean sub-expression evaluated both to true and false (this does not necessarily imply decision coverage)?
  • Object Coverage - Has each line of generated assembly been executed?

Statement Coverage

Statement Coverage requires that each line of source code must be executed. This is often considered the simplest criteria. The problem is that it only identifies the lines that were executed, and does not consider the logic flow of the code. It can be useful for identifying "chunks" of code (i.e. new functionality) that are not covered by the test suite, but not much else.

Decision Coverage

Decision Coverage requires that each control structure evaluate to both TRUE and FALSE. This is a pretty good criteria because it generally ensures that both the TRUE and FALSE paths of an expression are covered. However, short-circuit operators will prevent some portions of a complex expression from being evaluated.

Condition Coverage

Condition Coverage requires that each boolean sub-expression evaluate to both TRUE and FALSE. This criteria goes a little further than Decision Coverage by ensuring that the component parts of a compound expression each evaluate to TRUE and FALSE. But it should be noted that Condition Coverage by itself does not necessarily imply decision coverage. Because of this fact, it is best to apply Decision Coverage and Condition Coverage together.

Object Coverage

Object Coverage requires that each line of generated assembly be executed. This can be a very good general criteria because it ensures most of the test cases that the other criteria ensure.

Criteria Relationships

Each of these criteria can be used independently to analyze the code in question. Application of any one criteria will likely improve the test suite to some degree albeit at the cost of increasing the complexity of the test suite. Examination of the criteria collectively, shows that there are clear relationships between the different criteria as shown in the picture. The completeness and complexity of the test suite increases as it satisfies first Statement Coverage and then Decision Coverage and finally Condition/Decision Coverage. If the test suite satisfies Statement Coverage, it will partially satisfy Decision Coverage and Condition/Decision Coverage. If the test suite satisfies Decision Coverage, it will completely satisfy Statement Coverage and partially satisfy Condition/Decision Coverage. Note the fact that Object Coverage satisfies part of all of the other criteria. There is also a complexity relationship where Statement Coverage is the least complex to satisfy and Condition/Decision Coverage is the most complex to satisfy.

An Example

In order to illustrate what is covered by each of the different criteria, consider the following example showing the source code for a simple if statement along with its generated pseudo-code instructions.

Block Source Code Block Object Pseudo-code
A if (x OR y) A1 cmp x, 0 branch if FALSE to do something
A2 cmp y, 0 branch if TRUE around do something
B do something B do something instructions

Statement Coverage

A single test case that allows the if statement to evaluate to TRUE will execute blocks A and B. This will achieve 100% Statement Coverage.

Decision Coverage

A minimum of two test cases are required to achieve 100% Decision Coverage. One case must force the if statement to evaluate to TRUE and the other case must force the if statement to evaluate to FALSE. A test case that forces a TRUE outcome will either execute blocks A1 and B or A1, A2 and B. A test case that forces a FALSE outcome will execute blocks A1 and A2.

Condition/Decision Coverage

A minimum of two test cases are required to achieve 100% Condition/Decision Coverage. In the first case, x and y must be TRUE. In the second case, x and y must be FALSE. The test case that forces a TRUE outcome will execute blocks A1 and B. The test case that forces a FALSE outcome will execute blocks A1 and A2.

Object Coverage

One carefully chosen test case where x is FALSE and y is TRUE will achieve 100% Object Coverage. The test case will execute blocks A1, A2 and B.

Running covoar To generate coverage reports

To run covoar we need to run covoar from the RTEMS-KERNEL BUILD directory.

NOTE : The .cov trace files are needed to get coverage reports

covoar -S /home/lunatic/development/rtems/test/rtems-tools/tester/rtems/testing/coverage/leon3-qemu-symbols.ini \
-O coverage/score -E/home/lunatic/development/rtems/test/rtems-tools/tester/rtems/testing/coverage/Explanations.txt \
-p RTEMS-5 sparc-rtems5/c/leon3/testsuites/samples/hello.exe

covoar usage :

Usage: covoar [-v] -T TARGET -f FORMAT [-E EXPLANATIONS] -e EXE_EXTENSION -c COVERAGEFILE_EXTENSION EXECUTABLE1 ... EXECUTABLE2

  -v                        - verbose at initialization
  -T TARGET                 - target name
  -f FORMAT                 - coverage file format (RTEMS, QEMU, TSIM or Skyeye)
  -E EXPLANATIONS           - name of file with explanations
  -s SYMBOL_SET_FILE        - path to the INI format symbol sets
  -1 EXECUTABLE             - name of executable to get symbols from
  -e EXE_EXTENSION          - extension of the executables to analyze
  -c COVERAGEFILE_EXTENSION - extension of the coverage files to analyze
  -g GCNOS_LIST             - name of file with list of *.gcno files
  -p PROJECT_NAME           - name of the project
  -C ConfigurationFileName  - name of configuration file
  -O Output_Directory       - name of output directory (default=.
  -d debug                  - disable cleaning of tempfile

Running RTEMS-TESTER for Coverage analysis

RTEMS-TESTER when run with --coverage option, generates an html coverage analysis report (report.html)

$HOME/development/rtems/test/rtems-tools/tester/rtems-test \
--rtems-tools=$HOME/development/rtems/5 --log=coverage_analysis.log \
--no-clean --coverage=score --rtems-bsp=leon3-qemu-cov \
/home/lunatic/development/rtems/kernel/leon3/sparc-rtems5/c/leon3/testsuites/samples/hello.exe

NOTE : The --no-clean option tells the script not to delete the .cov trace files generated while running the coverage. These Trace files can be used for running covoar directly.

The --coverage option is when included with specific symbol name, it runs coverage analysis for only the mentioned sets. In the above example above, the coverage analysis will run for score only.

To run coverage for all the sets there's no need to pass any argument to --coverage option, by default, it runs coverage analysis for all the sets.

Please visit my development blog to see examples of coverage report

Coverage Analysis Report

Running coverage on samples/ for score The main report.html with the analysis report:

The index of the reports:

The detailed coverage report :

The branch Coverage report:

The Size Report:

References

General Coverage Testing

Standards and Certifications

  • FAA DO-178B - United States Aviation Standard

Attachments (21)

Download all attachments as: .zip