Changes between Version 14 and Version 15 of GCI/Documentation/CoverageAnalysis/Coverage

Dec 8, 2018, 8:21:10 PM (7 months ago)
shashvat jain



  • GCI/Documentation/CoverageAnalysis/Coverage

    v14 v15  
    1 = Coverage =
    3 [[TOC(GCI/Documentation/CoverageAnalysis/Coverage , depth=2)]]
     2= What was Discovered =
     5When we began the RTEMS Code Coverage effort, we performed coverage analysis on the development head of RTEMS 4.8 using the Baseline/-Os/POSIX Enabled Profile.  Some of our initial observations were interesting.  First, we were a little surprised at the incompleteness of the test suite.  We knew that there were some areas of the RTEMS code that were not tested at all, but we also found that areas we thought were tested were only partially tested.  We also observed some interesting things about the code we were analyzing.  We noticed that the use of inlining sometimes caused significant branch explosion.  This generated a lot of uncovered ranges that really mapped back to the same source code.  We also found that some defensive coding habits and coding style idioms could generate unreachable object code.  Also, the use of a case statement that includes all values of an enumerated type instead of an if statement sometimes lead to unreachable code.
     7Other observations were related to the performance of the ''covmerge'' tool.  Of particular interest was the handling of NOP instructions.  Compilers can use NOP instructions to force alignment between functions or to fill delay-slots required by the processor.  Of course the NOP instructions are not executed and thus had a negative impact on the coverage.  The first attempt at dealing with NOP instructions was to mark them all as EXECUTED.  This was correct for the NOPs used for function alignment, but not for NOPs used for delay-slots.  Marking delay-slot NOPs as EXECUTED produced an unwanted side effect of occasionally spliting an uncovered range into two ranges.  We finally settled on an improved method for dealing with NOPs where NOPs were marked as EXECUTED unless they were between two NOT EXECUTED instructions.  An example is shown below:
     102003ee8:  80 a6 20 00   cmp  %i0, 0                            <== NOT EXECUTED
     112003eec:  02 80 00 06   be  2003f04 <IMFS_fifo_write+0x60>     <== NOT EXECUTED
     122003ef0:  01 00 00 00   nop                                    <== NOT EXECUTED
     132003ef4:  40 00 78 fb   call  20222e0 <__errno>                <== NOT EXECUTED
     142003ef8:  b0 20 00 18   neg  %i0                               <== NOT EXECUTED
     17This solution to the NOP problem was important because NOPs were falsely increasing the number of uncovered ranges.  This created an unnecessary explosion of the reports and increased the uncovered ranges to examine.
     18= Resolving Uncovered Code =
     21The output files produced by ''covmerge'' are intended to provide both a quick-look at the status of a coverage run and the details needed to resolve the uncovered ranges.  As we worked through the resolution of the uncovered ranges, we noticed that the uncovered ranges usually fit into one of the following categories:
     23 *  A new test case is needed.
     25 *  Code unreachable in selected RTEMS configuration.  For example, the !SuperCore could have a feature only exercised by a POSIX API object.  It should be disabled when POSIX is not configured.
     27 *  Debug or sanity checking code which should be placed inside an RTEMS_DEBUG conditional.
     29 *  Unreachable paths generated for switch statements.  If the switch is based upon an enumerated type and the switch includes cases for all values, then it must be possible to actually generate all values at this point in the code. You can restructure the switch to only include possible values and thus avoid unreachable object code.  This is sometimes best done by rewriting the switch into a series of if/else statements.
     31 *  Critical sections which are synchronizing actions with ISRs.  Most of these are very hard to hit and may require very specific support from a simulator environment.  OAR has used tsim to exercise these paths but this is not reproducible in a BSP independent manner.  Worse, sometimes there is often no external way to know the case in question has been hit and no way to do it in a one shot test.  The spintrcriticalXX and psxintrcriticalXX tests attempt to reproduce these cases.
     33In general, it is interesting to note that the resolution of uncovered code does not simply translate into additions to the test suite.  Often the resolution points to improvements or changes to the analyzed code.  This can lead to more intelligent factoring of the code or a code re-design that produces a simpler solution.  There is also the notion that just because the analyzed code is "good" the way it is does not mean that it should not be rewritten to improve its testability.  Code that is completely tested is '''always''' better.
     34= Measuring Progress =
     37As mentioned above, the ''covmerge'' program produces reports that contain several metrics that can be used to measure progress.  The first is the number of uncovered object code ranges.  The second is the percent of untested object code as a percentage of the total object code size under analysis.  Together these metrics provide useful information about the status or progress of the Object Code Coverage.
     39When we started the RTEMS Code Coverage effort, we did not immediately capture results to measure progress.  This actually ended up being the correct thing to do since the ''covmerge'' tool was in development and often produced results that were not directly comparable.  Now that the development of ''covmerge'' has largely settled, we can perform coverage runs on several RTEMS release points and see the progress of the coverage effort.  The results shown below were of the Baseline/-Os/POSIX Enabled Profile run on the SPARC/ERC32.
     41||= Release =||= Covered % =||= Uncovered Ranges =||= Uncovered Bytes =||= Total Bytes =||
     42|| 4.7 || 77.51 || 454 || 17508 || 77840 ||
     43|| 4.8 || 76.37 || 538 || 21772 || 92140 ||
     44|| 4.9 || 96.41 || 167 || 2532 || 70564 ||
     45|| 4.10 (head 09/09/2009) || 100 || 0 || 0 || 70480 ||
     47Several interesting facts can be seen from the data in the table.  There was no organized effort to perform coverage analysis prior to the 4.8 release.  This is evident in that there was no measurable improvement in coverage between 4.7 and 4.8.  The unassisted developer is just not going to recognize the need for more test cases in the test suite.  The coverage analysis began prior to the 4.9 release.  Not surprising, the progress was significant between 4.8 and 4.9.  At that time we addressed large uncovered ranges by doing simple things like adding test cases to the test suite and disabling code that was not used by the chosen configuration.  The last 3.5% of uncovered code was much harder to address, but the development head has now achieved 100% coverage.
     49Now that we have achieved 100% Code Coverage using the Baseline/-Os/POSIX Enabled Profile, we would like to keep it 100% covered.  We have setup a periodic run of the coverage analysis against the development head.  The results are captured (http://rtems/ftp/pub/rtems/people/joel/coverage/) and can be monitored to ensure that future modifications to the analyzed code base do not produce uncovered code.
     50= Coverage Profiles =
     53RTEMS contains a lot of source code and although the primary focus of coverage analysis is to achieve 100% coverage of well-defined code subsets, we would also like to increase the amount of source code analyzed.  In order to manage the increase in a systematic manner, we defined two basic groups of source code.  The first group is called Baseline and the second group is called Developmental.  The Baseline group contains the source code that has achieved (or nearly achieved) 100% Object Code Coverage.  The Developmental group contains the source code for which there are very few test cases and therefore very poor coverage.
     55Initially, the Baseline group included source code from the cpukit.  Specifically the following cpukit directories were included:  score, sapi, rtems and posix.  This group represents a full tasking and synchronization feature set.  What was not in the Baseline group was placed in the Developmental group.  The Developmental group included: libcsupport, libfs/imfs, libmisc/stackchk, libmisc/cpuuse, libmisc/bspcmdline, libmisc/dmpbuf and libmisc/devnull.
     57Within the two groups, we recognized the need to use different compiler optimization levels and to analyze each group with POSIX threads enabled and POSIX threads disabled.  Applying these options produced eight sub-groups that we called profiles.  The eight profiles are:
     59 *  Baseline/-Os/POSIX Enabled
     60 *  Baseline/-O2/POSIX Enabled
     61 *  Baseline/-Os/POSIX Disabled
     62 *  Baseline/-O2/POSIX Disabled
     63 *  Developmental/-Os/POSIX Enabled
     64 *  Developmental/-O2/POSIX Enabled
     65 *  Developmental/-Os/POSIX Disabled
     66 *  Developmental/-O2/POSIX Disabled
     68Over time it is desirable to migrate code from the Developmental group to the Baseline.  As support libraries in cpukit become nearly 100% covered, they will be move from the Developmental group to the Baseline group.  Eventually, the Baseline group should contain all of the RTEMS code and the Developmental group should contain nothing.
     69= Compilation and Configuration Options =
     72The compilation level and POSIX configuration options are passed as command line arguments to the RTEMS Coverage Scripts.  [wiki:Developer/Coverage/HowTo RTEMS Code Coverage How To] provides details concerning how to run the RTEMS Coverage Scripts.  When we started the RTEMS Code Coverage effort, the code analyzed was compiled with optimization level -Os.  This optimizes for size without making the object code too difficult to follow.  Following the object code is important when trying to determime how to resolve the uncovered code.  Once the analyzed code approaches 100% covered, it is desirable to change the optimization level to -O2.  This is the most often used optimization level.
     74Enabling or disabling POSIX allows us to analyze the RTEMS code in its two most commonly used threading configurations.  When POSIX is enabled, RTEMS is configured to use POSIX threads and the POSIX tests are built and executed as part of the test suite.  When POSIX is disabled, RTEMS is configured to use Classic RTEMS threads and the POSIX tests are not included in the test suite.
     75= Internal Compilation and Configuration Options =
     78There are several compilation and configuration options that are built into the RTEMS Coverage Scripts and are not selectable from the command line.  These options effect the RTEMS build and are used to simplify the code to aid analysis.  Ideally, we would like the coverage build to match the default build for RTEMS.  Over time, we will work to eliminate the need for the internal options.  The current options being used are:
     80 *  NDEBUG=1 - Disables asserts.  We will probably keep this option.
     81 *  RTEMS_DO_NOT_INLINE_THREAD_ENABLE_DISPATCH=1 - Inlining resulted in branch explosion.  Over 200 new test cases will be needed to eliminate this option.
     82 *  RTEMS_DO_NOT_INLINE_CORE_MUTEX_SEIZE=1 - Inlining resulted in very difficult code to analyze.  This option should be able to be eliminated.
     83 *  RTEMS_DO_NOT_UNROLL_THREADQ_ENQUEUE_PRIORITY=1 - Unrolling loop resulted in multiple interrupt critical sections.  This option should be able to be eliminated.
     84= Beyond Object Code Coverage =
     87At this point, the RTEMS Code Coverage effort has been focused on Object Code Coverage.  But we would like to go beyond Object Code Coverage and address other traditional coverage criteria (see [wiki:Developer/Coverage/Theory Coverage Analysis Theory]).  We would also like to remain true to our original guidelines of using existing tools and performing the analysis without modifying the code to analyze.
     88= Achieving Statement Coverage =
     91Achieving Statement Coverage requires knowing which source files are involved (which ''covoar'' does) and which lines in those files can produce assembly code (which I don't think ''covoar'' can).  If any dead source code in RTEMS is detected by the combination of  gcc and Coverity Scan, then we can assume that all source code in RTEMS is represented in the generated executables. 
     93The current object coverage utility ''covoar'' reports on which source lines were covered.  It could easily be modified to generate a report indicating which source lines were covered, not covered, or only partially covered.
     95''covoar'' could also generate a bitmap per source file where the bit index indicates if a source line in that file was executed or not.  If we can generate a similar bit map from the source code which marks comments and other non-executable source lines as covered, then the union of the two bitmaps can be used to generate a report showing which source lines are not covered or represented in the object code.  This may indicate dead code or weaknesses in the tests.
     97Adding a statement coverage report to ''covoar'' is an open project.
     98= Achieving !Condition/Decision Coverage =
     101Achieving !Condition/Decision Coverage requires knowing whether each branch has been both taken and not taken.  Currently QEMU and tsim can be used to gather this information.
     103tsim produces bitmaps indicating instruction executed, branch taken, and branch not taken.
     105All versions of QEMU produce a debug log of the instructions executed when an executable is run.  The trace information is analyzed to identify branch instructions and to determine whether the branch was taken and/or not taken.  Some versions of QEMU may also be able to produce a trace log which is denser but contains the same information.
     107skyeye does not produce branch taken/not taken information.
     109''covoar'' produces reports on which branch instructions are taken and not taken.  Our goal is to ensure that each branch instruction is taken and not taken. 
     111GCC does not include debug information which indicates that a sequence of compare and branch instructions are part of a single logical condition.  This hinders our ability to augment ''covoar'' to make direct claims regarding Decision Coverage (DC) and Modified condition/decision coverage (MC/DC).
     113We believe that for single condition ''if'' statements such as ''if (cond) action'' or ''if (cond) action1 else action2'', that we are achieving full DC and MC/DC coverage because all logical paths are exercised.
     115Similarly given a dual OR condition ''if'' statement (in C) such as one the following:
     118Case OR1: if (cond1 or cond2)
     119            action
     120Case OE2: if (cond1 or cond2)
     121            action1
     122          else
     123            action2
     126We aim for the following cases given our branch coverage requirements:
     128 *  cond1 branch taken, cond2 short-circuited
     129 *  cond1 branch not taken, cond2 taken
     130 *  cond1 branch not taken, cond2 not taken
     132As the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.
     135Case AND1: if (cond1 and cond2)
     136             action
     137Case AND2: if (cond1 and cond2)
     138             action1
     139           else
     140             action2
     143We aim for the following cases given our branch coverage requirements:
     145 *  cond1 branch taken, cond2 taken
     146 *  cond1 branch taken, cond2 not taken
     147 *  cond1 branch not taken, cond2 short-circuited
     149Again, the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.
     151Open projects in this area include:
     153 *  proving our branch coverage testing policy meets decision coverage (DC) requirements in a more general sense.
     154 *  extending GCC to provide the debug information required to let covoar evaluate DC and MC/DC in C programs.
     155  *  IDEA: If GCC reliably reports that all conditions within a single ''if condition'' have the same line number, then we can use that information as the basis for the analysis.  Did we execute the proper set of cases for all branch instructions associated with a single debug line number.
     156= Current Status =
     159The [wiki:Developer/Coverage/Status Code Coverage Status] section lists the RTEMS BSPs on which we are performing (or would like to perform) Object Code Coverage.  We would like to continue to grow this list.  If you know of a simulator that includes coverage analysis, please let us know.
     161With the instruction level coverage of core of RTEMS (e.g. score, rtems, posix, and sapi directories) near 100%, we have expanded our attention to include other non-networking portions of the cpukit.  The best way to find out which portions of the cpukit are not currently being included in coverage analysis is to look at the commented out lines calling ''filter_nm()'' in the method ''generate_symbols()'' in [ rtems-testing/rtems-coverage/do_coverage]
     163If you are interested in writing some simple parameter check error cases, then take a look at the branch taken/not taken coverage reports for the "core configuration". Some of these are a simple matter of adding missing test cases for bad parameter path.  Other cases are more difficult.  So if you run into trouble with the analysis, ask or skip it. A common pattern is this:
     166if (arg1 bad)
     167  return EINVAL;
     168if (arg2 bad)
     169  return EINVAL;
     172GCC is smart enough to optimize the returns into one block of code.  Thus we could have a test for arg1 or arg2 bad and obtain 100% instruction coverage.  But would not get 100% branch coverage.
     174Initial analysis has been done at -Os which instructs gcc to generate smaller object code.  At -O2 which optimizes for speed, more code is generated and it is often clear looking at the -O2 reports, that there are test cases needed which are not required at -Os.
    5176== Coverage Analysis Theory ==
     341= References =
     342=== General Coverage Testing
     343 *  [ Code Coverage Definition]
     344 *  [ Modified Condition/Decision Coverage Definition]
     345 *  [ TotT: Understanding Your Coverage Data]
     347=== Standards and Certifications
     348 *  FAA DO-178B - United States Aviation Standard