#2554 closed enhancement (fixed)

New watchdog handler implementation

Reported by: Sebastian Huber Owned by: Sebastian Huber
Priority: high Milestone: 4.12.0
Component: cpukit Version: 4.12
Severity: normal Keywords:



The watchdog handler uses delta chains. The insert operation has a O(n) worst-case time complexity with n being the count of watchdogs in the delta chain. In each step of the insert operation, the SMP lock of the corresponding watchdog header is acquired and released. The profiling data obtain by test program smptests/smpwakeafter01 showed that the current implementation leads to unacceptable latencies, thus it should be replaced by something else.

The use cases for the watchdog handler fall roughly into two categories.

  • Timeouts - used to detect if some operations needs more time than expected. Since the unexpected happens hopefully rarely, timeout timers are usually removed before they expire. The critical operations are insert and removal. They are important for the performance of a network stack.
  • Timers - used to carry out some work in the future. They usually expire and need a high resolution. An example user is a time driven scheduler, e.g. rate-monotonic or EDF.

One approach is to use a red-black tree with the expiration time as the key. This leads to O(log(n)) worst-case insert and removal operations. For each operation it is sufficient to acquire and release the lock only once. The drawback is that a 64-bit integer type must be used for the intervals to avoid a potential overflow of the key values. With a system tick interval of 1ns the system could run more than 500 years before an overflow happens. The EDF scheduler would also profit from a 64-bit interval representation, see #2173.

An alternative is the use of a timer wheel based algorithm which is used in Linux and FreeBSD for example. A timer wheel based algorithm offers O(1) worst-case time complexity for insert and removal operations. The drawback is that the run-time of the watchdog tick procedure is somewhat unpredictable due to the use of a hash table or cascading.

Which approach should we choose? Since the watchdog serves the timeout and timer services in RTEMS we have to make some trade-offs. We recommend to use the red-black tree approach which offers a more predictable run-time behaviour and sacrifice the constant insert and removal operations offered by the timer wheel algorithms, see also https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf. We can reuse the red-black tree support already used for the thread priority queues.

The new watchdog handler implementation is a prerequisite to eliminate the Giant lock in the Classic Timer manager.


Change the _Watchdog_Ticks_since_boot to a 64-bit integer type. Keep the Watchdog_Interval at 32-bit for backward compatibility. Replace the delta chains with a red-black tree. Use the ticks for timers with a relative expiration time. Use struct timespec or struct bintime for timers with an absolute expiration time. This has the benefit that we do not have to adjust the data structures in case the absolute time changes, e.g. due to NTP. It simplifies the POSIX timer services, since no conversion to ticks is necessary.

Attachments (2)

tmtimer01-T4240-f831eff7387d9a4ae8460d639da660d5ae2ce4fa.png (32.5 KB) - added by Sebastian Huber on Mar 2, 2016 at 8:02:23 AM.
tmtimer01-T4240-next.png (59.0 KB) - added by Sebastian Huber on Mar 2, 2016 at 8:02:37 AM.

Download all attachments as: .zip

Change History (18)

comment:1 Changed on Jan 27, 2016 at 9:39:52 AM by Sebastian Huber

Profiling data (reduced to show only relevant information) obtained on the 24-processor T4240 system using the smptests/smpwakeafter01 test.

<ProfilingReport name="SMPWAKEAFTER 1">
  <PerCPUProfilingReport processorIndex="0">
    <MaxThreadDispatchDisabledTime unit="ns">3128351</MaxThreadDispatchDisabledTime>
    <MeanThreadDispatchDisabledTime unit="ns">97821</MeanThreadDispatchDisabledTime>
    <TotalThreadDispatchDisabledTime unit="ns">1435132321</TotalThreadDispatchDisabledTime>
    <MaxInterruptDelay unit="ns">0</MaxInterruptDelay>
    <MaxInterruptTime unit="ns">21766</MaxInterruptTime>
    <MeanInterruptTime unit="ns">10040</MeanInterruptTime>
    <TotalInterruptTime unit="ns">124466623</TotalInterruptTime>
  <PerCPUProfilingReport processorIndex="1">
    <MaxThreadDispatchDisabledTime unit="ns">1548547</MaxThreadDispatchDisabledTime>
    <MeanThreadDispatchDisabledTime unit="ns">79339</MeanThreadDispatchDisabledTime>
    <TotalThreadDispatchDisabledTime unit="ns">1621381982</TotalThreadDispatchDisabledTime>
    <MaxInterruptDelay unit="ns">0</MaxInterruptDelay>
    <MaxInterruptTime unit="ns">37</MaxInterruptTime>
    <MeanInterruptTime unit="ns">11</MeanInterruptTime>
    <TotalInterruptTime unit="ns">32946</TotalInterruptTime>
  <SMPLockProfilingReport name="Watchdog">
    <MaxAcquireTime unit="ns">14893</MaxAcquireTime>
    <MaxSectionTime unit="ns">2978</MaxSectionTime>
    <MeanAcquireTime unit="ns">32</MeanAcquireTime>
    <MeanSectionTime unit="ns">55</MeanSectionTime>
    <TotalAcquireTime unit="ns">1158320573</TotalAcquireTime>
    <TotalSectionTime unit="ns">2009011082</TotalSectionTime>
    <ContentionCount initialQueueLength="0">6874</ContentionCount>
    <ContentionCount initialQueueLength="1">6187</ContentionCount>
    <ContentionCount initialQueueLength="2">8266</ContentionCount>
    <ContentionCount initialQueueLength="3">35964717</ContentionCount>
  <SMPLockProfilingReport name="Giant">
    <MaxAcquireTime unit="ns">1803</MaxAcquireTime>
    <MaxSectionTime unit="ns">29041</MaxSectionTime>
    <MeanAcquireTime unit="ns">105</MeanAcquireTime>
    <MeanSectionTime unit="ns">5514</MeanSectionTime>
    <TotalAcquireTime unit="ns">87844</TotalAcquireTime>
    <TotalSectionTime unit="ns">4582492</TotalSectionTime>
    <ContentionCount initialQueueLength="0">831</ContentionCount>
    <ContentionCount initialQueueLength="1">0</ContentionCount>
    <ContentionCount initialQueueLength="2">0</ContentionCount>
    <ContentionCount initialQueueLength="3">0</ContentionCount>

The clock tick interval is 1ms (1000 ticks per second). We observe a maximum thread dispatch disabled time of 3.1ms on the first processor which serves the clock interrupt and 1.5ms on the other processors. There is no contention on the Giant lock. However, the watchdog lock is highly contended. The maximum acquire and section times are small compared to the maximum thread dispatch disabled time. The maximum interrupt time on the first processor is 0.012ms, so serving the clock interrupt is no a big deal This clearly indicates that the loop based updates of the delta chains are the main issue.

comment:2 Changed on Feb 4, 2016 at 11:59:40 AM by Sebastian Huber

Status: newaccepted

comment:3 Changed on Feb 17, 2016 at 10:54:37 AM by Sebastian Huber <sebastian.huber@…>

In 10f28914f8bac5b8676ce1b9cc7efeccb2b33bb3/rtems:

smptests/smpwakeafter01: Add scheduler config

Update #2554.

comment:4 Changed on Feb 18, 2016 at 7:58:15 PM by Joel Sherrill

I am not opposed to the use of RB Trees or the use of 64 bit time for absolute/calendar time.

I would like to see us move away from ticks on the relative timer and move to something more like CLOCK_MONOTONIC based on uptime.

I assume the new ticks key value will be the "watchdog ticks since boot" value to trigger the watchdogs on. I think this is really still OK based on the math but limits the durations one can delay and the granularity.

The extra granularity may allow us to move to a tickless or variable clock tick in the future. Although I am still concerned that reprogramming the timer on the fly is potentially more dangerous and error prone than a fixed tick.

comment:5 Changed on Mar 2, 2016 at 8:01:39 AM by Sebastian Huber <sebastian.huber@…>

In f831eff7387d9a4ae8460d639da660d5ae2ce4fa/rtems:

tmtests/tmtimer01: New test

Test run performed on T4240 running at 1667MHz in uni-processor

Update #2554.

Changed on Mar 2, 2016 at 8:02:37 AM by Sebastian Huber

Attachment: tmtimer01-T4240-next.png added

comment:6 Changed on Mar 2, 2016 at 8:19:43 AM by Sebastian Huber

Test results (some data omitted the middle) for new test tmtimer01 executed on a T4240 running at 1667MHz in uni-processor configuration using [f831eff7387d9a4ae8460d639da660d5ae2ce4fa/rtems]:

<TMTimer01 timerCount="65504">
    <ActiveTimers>0</ActiveTimers><First unit="ns">6397</First><Middle unit="ns">3290</Middle><Last unit="ns">1527</Last>
    <ActiveTimers>2</ActiveTimers><First unit="ns">1986</First><Middle unit="ns">1550</Middle><Last unit="ns">1314</Last>
    <ActiveTimers>4</ActiveTimers><First unit="ns">1551</First><Middle unit="ns">2134</Middle><Last unit="ns">3072</Last>
    <ActiveTimers>7</ActiveTimers><First unit="ns">2545</First><Middle unit="ns">3511</Middle><Last unit="ns">3404</Last>
    <ActiveTimers>10</ActiveTimers><First unit="ns">1595</First><Middle unit="ns">2363</Middle><Last unit="ns">4287</Last>
    <ActiveTimers>14</ActiveTimers><First unit="ns">1473</First><Middle unit="ns">2740</Middle><Last unit="ns">6069</Last>
    <ActiveTimers>55772</ActiveTimers><First unit="ns">1979</First><Middle unit="ns">11815557</Middle><Last unit="ns">21907509</Last>
    <ActiveTimers>65503</ActiveTimers><First unit="ns">2404</First><Middle unit="ns">13694591</Middle><Last unit="ns">26215885</Last>

Last edited on Mar 2, 2016 at 8:20:54 AM by Sebastian Huber (previous) (diff)

comment:7 Changed on Mar 2, 2016 at 8:22:54 AM by Sebastian Huber

Test results (some data omitted the middle) for new test tmtimer01 executed on a T4240 running at 1667MHz in uni-processor configuration using the new red-black tree based implementation:

<TMTimer01 timerCount="65504">
    <ActiveTimers>0</ActiveTimers><First unit="ns">8812</First><Middle unit="ns">1412</Middle><Last unit="ns">917</Last>
    <ActiveTimers>2</ActiveTimers><First unit="ns">1602</First><Middle unit="ns">1010</Middle><Last unit="ns">1367</Last>
    <ActiveTimers>4</ActiveTimers><First unit="ns">1524</First><Middle unit="ns">1089</Middle><Last unit="ns">1086</Last>
    <ActiveTimers>7</ActiveTimers><First unit="ns">1791</First><Middle unit="ns">1121</Middle><Last unit="ns">1838</Last>
    <ActiveTimers>10</ActiveTimers><First unit="ns">1488</First><Middle unit="ns">1016</Middle><Last unit="ns">2134</Last>
    <ActiveTimers>14</ActiveTimers><First unit="ns">1527</First><Middle unit="ns">1698</Middle><Last unit="ns">3186</Last>
    <ActiveTimers>55772</ActiveTimers><First unit="ns">8949</First><Middle unit="ns">10031</Middle><Last unit="ns">16262</Last>
    <ActiveTimers>65503</ActiveTimers><First unit="ns">9199</First><Middle unit="ns">10309</Middle><Last unit="ns">19090</Last>

comment:8 Changed on Mar 2, 2016 at 9:26:19 AM by Sebastian Huber

Profiling data (reduced to show only relevant information) of smpwakeafter01 executed on a T4240 using 24 processors and the new red-black tree based implementation.

<ProfilingReport name="SMPWAKEAFTER 1">
  <PerCPUProfilingReport processorIndex="0">
    <MaxThreadDispatchDisabledTime unit="ns">85615</MaxThreadDispatchDisabledTime>
    <MeanThreadDispatchDisabledTime unit="ns">15508</MeanThreadDispatchDisabledTime>
    <TotalThreadDispatchDisabledTime unit="ns">563813241</TotalThreadDispatchDisabledTime>
    <MaxInterruptDelay unit="ns">0</MaxInterruptDelay>
    <MaxInterruptTime unit="ns">4771</MaxInterruptTime>
    <MeanInterruptTime unit="ns">482</MeanInterruptTime>
    <TotalInterruptTime unit="ns">11144389</TotalInterruptTime>
  <PerCPUProfilingReport processorIndex="1">
    <MaxThreadDispatchDisabledTime unit="ns">53595</MaxThreadDispatchDisabledTime>
    <MeanThreadDispatchDisabledTime unit="ns">14437</MeanThreadDispatchDisabledTime>
    <TotalThreadDispatchDisabledTime unit="ns">493386525</TotalThreadDispatchDisabledTime>
    <MaxInterruptDelay unit="ns">0</MaxInterruptDelay>
    <MaxInterruptTime unit="ns">3094</MaxInterruptTime>
    <MeanInterruptTime unit="ns">844</MeanInterruptTime>
    <TotalInterruptTime unit="ns">8783271</TotalInterruptTime>
  <PerCPUProfilingReport processorIndex="2">
    <MaxThreadDispatchDisabledTime unit="ns">87633</MaxThreadDispatchDisabledTime>
    <MeanThreadDispatchDisabledTime unit="ns">15055</MeanThreadDispatchDisabledTime>
    <TotalThreadDispatchDisabledTime unit="ns">561768649</TotalThreadDispatchDisabledTime>
    <MaxInterruptDelay unit="ns">0</MaxInterruptDelay>
    <MaxInterruptTime unit="ns">4884</MaxInterruptTime>
    <MeanInterruptTime unit="ns">447</MeanInterruptTime>
    <TotalInterruptTime unit="ns">10309678</TotalInterruptTime>
  <SMPLockProfilingReport name="Giant">
    <MaxAcquireTime unit="ns">1924</MaxAcquireTime>
    <MaxSectionTime unit="ns">89496</MaxSectionTime>
    <MeanAcquireTime unit="ns">116</MeanAcquireTime>
    <MeanSectionTime unit="ns">4644</MeanSectionTime>
    <TotalAcquireTime unit="ns">119374</TotalAcquireTime>
    <TotalSectionTime unit="ns">4751695</TotalSectionTime>
    <ContentionCount initialQueueLength="0">1023</ContentionCount>
    <ContentionCount initialQueueLength="1">0</ContentionCount>
    <ContentionCount initialQueueLength="2">0</ContentionCount>
    <ContentionCount initialQueueLength="3">0</ContentionCount>
  <SMPLockProfilingReport name="Scheduler">
    <MaxAcquireTime unit="ns">54115</MaxAcquireTime>
    <MaxSectionTime unit="ns">9609</MaxSectionTime>
    <MeanAcquireTime unit="ns">1154</MeanAcquireTime>
    <MeanSectionTime unit="ns">881</MeanSectionTime>
    <TotalAcquireTime unit="ns">1357899955</TotalAcquireTime>
    <TotalSectionTime unit="ns">1035917001</TotalSectionTime>
    <ContentionCount initialQueueLength="0">16188</ContentionCount>
    <ContentionCount initialQueueLength="1">14403</ContentionCount>
    <ContentionCount initialQueueLength="2">14350</ContentionCount>
    <ContentionCount initialQueueLength="3">1130801</ContentionCount>
  <SMPLockProfilingReport name="Watchdog">
    <MaxAcquireTime unit="ns">686</MaxAcquireTime>
    <MaxSectionTime unit="ns">2220</MaxSectionTime>
    <MeanAcquireTime unit="ns">61</MeanAcquireTime>
    <MeanSectionTime unit="ns">78</MeanSectionTime>
    <TotalAcquireTime unit="ns">4271819</TotalAcquireTime>
    <TotalSectionTime unit="ns">5394270</TotalSectionTime>
    <ContentionCount initialQueueLength="0">68964</ContentionCount>
    <ContentionCount initialQueueLength="1">0</ContentionCount>
    <ContentionCount initialQueueLength="2">0</ContentionCount>
    <ContentionCount initialQueueLength="3">0</ContentionCount>

The maximum thread dispatch disabled time drops from 3ms to about 100us. There is now one watchdog header per-CPU. Profiling data is only shown for one of the watchdog locks. No contention is visible here. The new bottle neck is the scheduler lock, but this is a different issue. In addition the interrupt count on scheduler A is about two times the clock tick interrupt count. This means that each clock tick service results in an inter-processor interrupt. This highlights one of the known problems of a global priority based scheduler: massive thread migrations.

comment:10 Changed on Mar 18, 2016 at 6:47:28 AM by Sebastian Huber <sebastian.huber@…>

In 90de3001089a8de300d9339b2d8d49a51ebd9674/rtems:

score: Destroy thread timer lock

Update #2554.

comment:11 Changed on May 12, 2016 at 11:35:43 AM by Sebastian Huber <sebastian.huber@…>

In 1379d840a483ccdce109fa8e45ef63d51a6e8e00/rtems:

smptests/smpcapture02: Adjust for clock changes

Fix overall clock tick count. Change introduced by

Update #2554.

comment:12 Changed on Dec 23, 2016 at 2:10:09 PM by Sebastian Huber

Priority: normalhigh

comment:15 Changed on Jan 30, 2017 at 1:23:07 PM by Sebastian Huber

Resolution: fixed
Status: acceptedclosed

comment:16 Changed on May 11, 2017 at 7:31:02 AM by Sebastian Huber

Note: See TracTickets for help on using tickets.