#2230 closed defect (fixed)

PowerPC with mpc6xx clock: Nanoseconds extension doesn't compensate for pending clock tick, time may go backwards

Reported by: Nick Withers Owned by: Nick Withers <nick.withers@…>
Priority: normal Milestone: 4.11
Component: bsps Version: 4.11
Severity: normal Keywords: clock, time, backwards, race, nanoseconds, monotonic
Cc: Blocked By:
Blocking:

Description

The PowerPC mpc6xx clock driver nanoseconds extension (Clock_driver_nanoseconds_since_last_tick()) is called with interrupts disabled and does not compensate for a pending clock tick interrupt when called.

As a result, the times returned by (e.g.) clock_gettime(CLOCK_MONOTONIC, ...) may not be monotonic - "time may go backwards".

This problem has been discussed on the users and devel mailing lists: http://lists.rtems.org/pipermail/users/2014-November/028354.html http://lists.rtems.org/pipermail/devel/2014-December/009446.html

I thought I'd move the discussion into a ticket as I'm so lacking in confidence with this change...

Attachments (1)

e500_decrementer.patch (3.4 KB) - added by Nick Withers on Jan 28, 2015 at 4:06:56 AM.
Current proposed patch

Download all attachments as: .zip

Change History (5)

comment:1 Changed on Jan 5, 2015 at 4:52:24 AM by Nick Withers

In attempting to address this in a clean way, I've had trouble trying to make sure I haven't introduced integer overflows / wraps on other BSPs, where values used in computations (Clock_Decrementer_value and BSP_bus_frequency/BSP_time_base_divisor) are potentially set at run-time, e.g., from c/src/lib/libbsp/powerpc/shared/startup/bspstart.c:

  BSP_bus_frequency       = residualCopy.VitalProductData.ProcessorBusHz;
  BSP_processor_frequency = residualCopy.VitalProductData.ProcessorHz;
  BSP_time_base_divisor   = (residualCopy.VitalProductData.TimeBaseDivisor?
                    residualCopy.VitalProductData.TimeBaseDivisor : 4000);

These changes potentially add precision (without ever reducing it? I'm struggling with my modulo maths...) and, I think, will not cause any not-already-present overflow / wrapping problems so long as BSP_bus_frequency/BSP_time_base_divisor >= 2 [1]. If that assumption's not safe, I could scale the newly-added Clock_Decrementer_reference value by half and the inverse operation by 2, though that feels a bit messy.

diff --git a/c/src/lib/libcpu/powerpc/mpc6xx/clock/c_clock.c b/c/src/lib/libcpu/powerpc/mpc6xx/clock/c_clock.c
index f14acab..4116894 100644
--- a/c/src/lib/libcpu/powerpc/mpc6xx/clock/c_clock.c
+++ b/c/src/lib/libcpu/powerpc/mpc6xx/clock/c_clock.c
@@ -48,6 +48,12 @@ volatile uint32_t   Clock_driver_ticks;
  */
 uint32_t   Clock_Decrementer_value;
 
+/*
+ *  This is the value by which elapsed count down timer ticks are multiplied to
+ *  give an elapsed duration in microseconds, left-shifted by 32 bits
+ */
+uint64_t   Clock_Decrementer_reference;
+
 void clockOff(void* unused)
 {
   rtems_interrupt_level l;
@@ -195,18 +201,36 @@ void Clock_exit( void )
 
 static uint32_t Clock_driver_nanoseconds_since_last_tick(void)
 {
-  uint32_t clicks, tmp;
+  uint32_t clicks;
+  uint64_t tmp;
 
   PPC_Get_decrementer( clicks );
 
   /*
-   * Multiply by 1000 here separately from below so we do not overflow
-   * and get a negative value.
+   * Check whether a clock tick interrupt is pending and the
+   * decrementer wrapped.
+   *
+   * If so, we'll compensate by returning a time one tick period longer.
+   *
+   * We have to check (Book E) interrupt status after reading the
+   * decrementer. If we don't, we may miss an interrupt and read a
+   * wrapped decrementer value without compensating for it
    */
-  tmp = (Clock_Decrementer_value - clicks) * 1000;
-  tmp /= (BSP_bus_frequency/BSP_time_base_divisor);
+  if ( ppc_cpu_is_bookE() && (_read_BOOKE_TSR() & BOOKE_TSR_DIS) )
+  {
+    /*
+     * Re-read the decrementer: The tick interrupt may have been
+     * generated and the decrementer wrapped during the time since we
+     * last read it and the time we checked the interrupt status
+     */
+    PPC_Get_decrementer( clicks );
+
+    tmp = Clock_Decrementer_value + (Clock_Decrementer_value - clicks);
+  }
+  else
+    tmp = Clock_Decrementer_value - clicks;
 
-  return tmp * 1000;
+  return (uint32_t)(((tmp * Clock_Decrementer_reference) >> 32) * 1000U);
 }
 
 /*
@@ -225,6 +249,9 @@ rtems_device_driver Clock_initialize(
   Clock_Decrementer_value = (BSP_bus_frequency/BSP_time_base_divisor)*
             (rtems_configuration_get_microseconds_per_tick()/1000);
 
+  Clock_Decrementer_reference = ((uint64_t)1000U<<32)/
+            (BSP_bus_frequency/BSP_time_base_divisor);
+
   /* set the decrementer now, prior to installing the handler
    * so no interrupts will happen in a while.
    */

If there're stronger assumptions that can be made, I could potentially increase the precision of the result further.

An alternate fix would add rtems_configuration_get_nanoseconds_per_tick() to the final result if a tick was pending, which seems a lot safer, but possibly less clean?

P.S.:

  • With either fix suggested here, time might still "go backwards" if a tick interrupt goes unserviced across another hardware decrementer wrap
  • One's complement, 1-is-zero values that can be returned by the non-Book-E decrementer don't appear to be properly handled in Clock_driver_nanoseconds_since_last_tick()
  • There's potential for improving the accuracy of the calculated Clock_Decrementer_value, but I haven't touched that (yet?); e.g., Clock_Decrementer_value on my MVME3100s is calculated to be 416660; 416666 would be more accurate and there'd therefore be less cumulative error in uptime

[1] Assumes that Clock_Decrementer_value * 1000 < 2*32 - 1 and ((Clock_Decrementer_value * 1000) / (BSP_bus_frequency/BSP_time_base_divisor)) * 1000 < 2^32 - 1, which must already hold unless tmp / the return in Clock_driver_nanoseconds_since_last_tick() can already overflow / wrap with the existing code

comment:2 Changed on Jan 20, 2015 at 6:29:45 AM by Sebastian Huber

I would drop the multiplication by 1000 and use something like this:

static uint32_t lpc_clock_nanoseconds_since_last_tick(void)
{

uint64_t k = (1000000000ULL << 32) / time_base_frequency;
uint64_t c = ticks_since_last_exception();

if ((TSR & DIS) != 0) {

c = ticks_since_last_exception() + period;

}

return (uint32_t) ((c * k) >> 32);

}

I would also do the Book E check only once and install different extensions instead. If you use the network stack, then this function is performance critical.

comment:3 in reply to:  2 Changed on Jan 28, 2015 at 4:01:08 AM by Nick Withers

Replying to sebastian.huber:

I would drop the multiplication by 1000 and use something like this:

Can you guarantee that 2 * Clock_Decrementer_value * ((uint64_t)1000000U<<32)/ (BSP_bus_frequency/BSP_time_base_divisor) won't overflow / wrap on any supported platform, though?

It happens not to on the MVME3100, where Clock_Decrementer_value is 416660 (assuming a 100 Hz tick), so *personally* I'm happy with that, as it increases precision.

static uint32_t lpc_clock_nanoseconds_since_last_tick(void)
{

uint64_t k = (1000000000ULL << 32) / time_base_frequency;

I've broken this out into Clock_Decrementer_reference (better name?), calculated in Clock_initialize(), because otherwise it'd be done at run-time every time the nanoseconds extension function were called.

uint64_t c = ticks_since_last_exception();

if ((TSR & DIS) != 0) {

c = ticks_since_last_exception() + period;

}

return (uint32_t) ((c * k) >> 32);

}

I would also do the Book E check only once and install different extensions instead. If you use the network stack, then this function is performance critical.

Righto, understood.

Changed on Jan 28, 2015 at 4:06:56 AM by Nick Withers

Attachment: e500_decrementer.patch added

Current proposed patch

comment:4 Changed on Jan 30, 2015 at 6:01:47 AM by Nick Withers <nick.withers@…>

Owner: set to Nick Withers <nick.withers@…>
Resolution: fixed
Status: newclosed

In d11b711b3ed6b4a053a298cf4e9f1209748fae5e/rtems:

bsps/powerpc: Fix a clock driver

PowerPC Book E: Account for an extra tick period if a tick increment's
pending.

Close #2230.

Note: See TracTickets for help on using tickets.