#4749 assigned enhancement

Clock Driver Validation

Reported by: Matt Joyce Owned by: Sebastian Huber
Priority: normal Milestone:
Component: test Version: 6
Severity: normal Keywords: clock driver, testsuite
Cc: Blocked By:
Blocking:

Description

The current testsuite requires additional work to properly validate clock drivers. See, for example, this off-by-one bug patch, sent by a user on 26 Oct 2022:

--- a/bsps/arm/shared/clock/clock-armv7m.c
+++ b/bsps/arm/shared/clock/clock-armv7m.c
@@ -90,7 +90,7 @@ static void _ARMV7M_Clock_initialize_early(void)

interval = (uint32_t) ((freq * us_per_tick) / 1000000);
-systick->rvr = interval;
+systick->rvr = interval - 1;
systick->cvr = 0;
systick->csr = ARMV7M_SYSTICK_CSR_ENABLE | ARMV7M_SYSTICK_CSR_CLKSOURCE;
}

Currently, the tests do not use an external clock source to verify the internal one. We propose to address this by adding tests using the Pulse-Per-Second (PPS) functionality from a GPS signal to validate the clock drivers over a set time interval. This will help to catch not only errors such as the one above, but also potential bugs related to time interval jitter.

Change History (4)

comment:1 Changed on 11/07/22 at 00:58:01 by Chris Johns

Is this something that will validate the clock drivers on all BSPs?

I suggest it might be a good idea to provide some details on how this will be implemented, the hardware requirements, the external testsuite dependencies and effects on the rtems-test system?

comment:2 Changed on 11/07/22 at 09:00:48 by Matt Joyce

Hi Chris,

Yes, the intent is to validate the clock drivers for all BSPs. Each BSP would need to have a driver that implements an API that we will define for the test. We will first implement it on the stm32f4 BSP as a proof of concept and will then document it in the BSP’s How-to.

The only additional hardware requirements are an external GPS receiver with PPS functionality and an antenna. There should not be any effects on the rtems-test system.

comment:3 Changed on 11/09/22 at 03:47:54 by Chris Johns

What about suitable hardware IO on the board to receive the PPS signal and generate an interrupt? I assume it is a TTL level signal we are discussing?

How will the test be selected for each BSP in the build to avoid erroneous error reports for BSPs that do not have support?

What is the tolerance of the detection per BSP given the variability of the PPS latency, PPM of clock sources, temperature and timer hardware each BSP may have?

If I have a high clock frequency as an input signal to the timer and a large divider what resolution and error range be used to pass or fail the test?

I think this is an interesting and challenging problem to solve. I am interested to see how it can detect the off by one timer reload error you highlighted. For an A9 with a CPU x1 clock of 750MHz and a peripheral clock of 350MHz feeding the timer the internal count for a 1msec interrupt is 350,000 (I think). An error in the count of one is a small amount of time.

comment:4 Changed on 11/24/22 at 07:15:06 by blackbird

Adding onto Chris John's suggestions. Given variety of hardware peripherals and systems affected by clock jitter, if I may make a couple of suggestions on PPS/GPS input clocking from prior experience implementing similar testing:

  1. Split the test suite into external/external HIL (Hardware in the Loop) and maybe SIL (Software in the Loop) pass/fail. Consider other devices on the clock tree if interested. For example, PCIe jitter testing, IEEE 1588, etc, can further aid system verification from an external timing perspective. May help if SYSTICK derived from a shared clock distribution source that could have their own sources of configuration/firmware/logic-ware/implementation errors on a single BSP. This may be vital to avoid misdiagnosing an off by one error that could be located in say a clock distribution configuration register unrelated to the driver being validated.
  1. GPS Quality Checks. Given potential geographic distribution of users, test sites limits, band configuration, and Ephemeris, general quality of fix checks should be a condition for tests passing or failing. An atomic/high end OCXO in general could avoid GPS dependence if Stratum 1 is sufficient for the test at hand, but may be out of reach from a cost and availability perspective.
  1. Consider a spec on minimum GPS holdover clocking performance for the external hardware versus target frequency and clock tree configuration. In an adverse environment, etc, GPS receiver going into loss off fix can add variability through the packaged DCTCXOs and VCTCXOs freewheeling.
  1. Clock Sync and Distribution Performance. Many of the clock distribution/jitter cleaner/sync ICs are often rated worse than Stratum 3 and may not provide adequate jitter free performance to test higher speed clocks as Chris alluded to. Though this issue is related to SYSTICK performance, my concern here is the variability of clocks and potential phase relationships across clock domains. This is especially applicable to FPGA metastability issues, clock domain crossing issues, soft-processor synthesis variability, and clock distribution pathways for PPS input/output, and the derivation of SYSTICK.
  1. No-GPS Solution: A < 100 PPB clock can provide adequate short term timing performance, especially against the 25PPM or clocks on many PCBs. The ability to frequency tune an oscillator through (say a VCTCXO or digitally tuned) and PLL lock can also be helpful. Some testers lock both the input and output clock signal with known dividers derived from the clock tree config. This gets into spectrum analysis territory and can be a good way to validate the test suite itself. Granted, if someone is looking for 1000 Systicks versus PPS, and they are off by 1 on a fast clock, none of these considerations matter since the missing tick is clearly observable. However, this is not a deterministic solution versus a phase lock + known good hardware comparing the output with the BSP clock tree config. This is also flexible as intermediate clocks can be easily validated as well, and many BSPs internally contain the reference hardware to implement this.
  1. Consider accuracy and capability of onboard thermal sensors and SoC utilization/junction temperatures. A set of constraints/monitored variables in the test setup will help for hardware in the loop. This itself can produce a false positive for off-by one testing or throw off jitter measurements. Granted, if SYSTICK frequency dividers are large, this is less of a problem as the problem is more observable.
  1. Consider AVAR/ADEV statistics as an output for in-situ clock performance analysis given availability of a higher performance external reference clocks and adequate measurement hardware. This may help identify outliers and quantify performance of the on-board and off-board clocks for validation and outlier rejection. This data can then be used to generate a good test times from time constants where oscillator performance is optimal after thermal settling. Other concerns on test threshold parameters that may vary for BSPs are oscillator models, aging characteristics, part changes, mfg batches, etc, some of which could be datasheet values depending on pass/fail criterion.

These are mostly related to jitter related issues that I can see arising from the test suite, and a free measurement of clock related performance parameters off a GPSDO doesn't seem half bad, especially for Stratum 3E or better which is natively supported by most clock distribution circuits on modern 5G, PCIe 5.0, IEEE 1588, and precision timing capable boards

Last edited on 11/24/22 at 07:17:41 by blackbird (previous) (diff)
Note: See TracTickets for help on using tickets.