#3977 new enhancement

Add unexplained-failure as option for Test Configuration Files

Reported by: Joel Sherrill Owned by: joel@…
Priority: normal Milestone: 6.1
Component: test Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking:

Description

RTEMS has many BSPs and it is a large amount of work to investigate each test execution that has an unexpected result. The tcfg files were initially created to just avoid trying build tests where the executable did not fit into target memory. That has grown to incorporate knowledge about which tests require human input and are not automated. This ticket is to add a new test state for "unexplained failure".

If a test has been investigated and has an explanation, it is a known failure. This could be associated with a ticket (e.g. dynamic loading or fenv not supported on an architecture) or just a comment explaining it (e.g. mips jmr3904 simulator trapping invalid memory accesses before the trap handler is tripped).

But many tests are currently just marked as failures because they do fail but no one has investigated and explained it. The addition of "unexplained failure" is to capture those. This will make it possible for someone looking to know they need investigating but are known to fail. Marking these are "known failure" permanently puts them in a category where they will not get looked at again.

This has two side-effects. The first is that we can have test results with no "unexpected failures." They are either known and explained or known and need investigation. This helps anyone looking at test results since there should be no unexpected failures. The second side-effect is that it hopefully captures the set of repeatedly failing tests and encourages us to investigate and file tickets or explain.

The milestone for this is NOT 5.x because this is moving the goalpost for test results. Release 5.x is the first with the body of test results data we have. The build@ mailing list has them and can be compared by any user interested. We also do not have a way to machine check results yet which would ease checking the test results for deviations.

Note: As an odd case, there are multiple BSPs where the executables can run on hardware or 1+ simulators. Sometimes the results vary across the target environment even for the same executable. This is at least something to consider as we want to have a trustworthy and checkable set of test results available.

Change History (2)

comment:1 Changed on May 13, 2020 at 1:32:55 AM by Chris Johns

For the record the TCFG files were created to manage the states for the tester. The change included excluding a test. The tester command had the state checks and handling added in April 2017. The test states were also added to rtems.git at this time so all states have been present and documented in README.testdata together. I also raised #2962 at the time to indicate we need to manage this before we release.

From my point of view and rtems-test nothing has changed. What was added did not set a policy for managing failures or the relationship to releasing and any failure, known, expected or otherwise is just a failure. The tester either expects to see a failure or it does not for accounting purposes only.

I have gone away and spent time looking at adding this state and the change feels wrong because this state and any like it are basically a logical OR of expected-fail and the tester does not care about a policy that manages this or arrives at the expected fail state. The end goal is to have ./waf test that runs all the tests on a configured test harness and waf returns a non-zero exit code for any regressions from the build's baseline. I doubt we have single BSP that does this on any test harness.

We should actively discourage specific test analysis from test results as they are simply a tool to account for a build's baseline. The result's stats should be bounded and consistent and provide a simple way to have a machine determine the state of a build.

I have posted a detailed alternative. For these to work we will need a workable set of policies for the management of test failures.

I think it is unrealistic to expect RTEMS to have 100% pass on all tests on all BSPs on all test platforms a BSP supports. There will be failures. There has been little or no analysis of existing tests that fail captured and accounted for so I dispute the assertion moving them to expected failures will stop any analysis. I believe the solution to this issue lies else where.

I am fine with tests that have no analysis being left as unexpected failures but we need a way for buildbot or any other CI tool to run the test suite on selected profiles to know if a change has caused a regression. Any test failure in a clean clone from rtems.git breaks the ability to do this. This is the first thing we need to fix.

Transitioning from the current state to one where the tester can detect regressions requires some effort. It is unfair to instantly say "all commits to rtems.git must have no unexpected results for BSP XXX on YYY" if this is the policy. I do think we need to select a few key BSPs and test harnesses that are required to have no unexpected results befor a release can be made.

I would like to close this ticket and open a new one for the tester state changes. I suggest a ticket to document the unexpected results policy be created.

comment:2 Changed on Sep 25, 2020 at 1:20:22 AM by Chris Johns

This ticket should be closed. Please close.

Note: See TracTickets for help on using tickets.