source: rtems-docs/c-user/symmetric_multiprocessing_services.rst @ 6c56401

5
Last change on this file since 6c56401 was 6c56401, checked in by Chris Johns <chrisj@…>, on 11/12/17 at 03:34:48

c-user: Fix index locations.

Update #3229.

  • Property mode set to 100644
File size: 33.7 KB
Line 
1.. comment SPDX-License-Identifier: CC-BY-SA-4.0
2
3.. COMMENT: COPYRIGHT (c) 2014.
4.. COMMENT: On-Line Applications Research Corporation (OAR).
5.. COMMENT: Copyright (c) 2017 embedded brains GmbH.
6.. COMMENT: All rights reserved.
7
8.. index:: Symmetric Multiprocessing
9.. index:: SMP
10
11Symmetric Multiprocessing (SMP)
12*******************************
13
14Introduction
15============
16
17The Symmetric Multiprocessing (SMP) support of the RTEMS is available on
18
19- ARMv7-A,
20
21- PowerPC, and
22
23- SPARC.
24
25.. warning::
26
27   The SMP support must be explicitly enabled via the ``--enable-smp``
28   configure command line option for the :term:`BSP` build.
29
30RTEMS is supposed to be a real-time operating system.  What does this mean in
31the context of SMP?  The RTEMS interpretation of real-time on SMP is the
32support for :ref:`ClusteredScheduling` with priority based schedulers and
33adequate locking protocols.  One aim is to enable a schedulability analysis
34under the sporadic task model :cite:`Brandenburg:2011:SL`
35:cite:`Burns:2013:MrsP`.
36
37The directives provided by the SMP support are:
38
39- rtems_get_processor_count_ - Get processor count
40
41- rtems_get_current_processor_ - Get current processor index
42
43Background
44==========
45
46Application Configuration
47-------------------------
48
49By default, the maximum processor count is set to one in the application
50configuration.  To enable SMP, the application configuration option
51:ref:`CONFIGURE_MAXIMUM_PROCESSORS <CONFIGURE_MAXIMUM_PROCESSORS>` must be
52defined to a value greater than one.  It is recommended to use the smallest
53value suitable for the application in order to save memory.  Each processor
54needs an idle thread and interrupt stack for example.
55
56The default scheduler for SMP applications supports up to 32 processors and is
57a global fixed priority scheduler, see also :ref:`Configuring Clustered
58Schedulers`.
59
60The following compile-time test can be used to check if the SMP support is
61available or not.
62
63.. code-block:: c
64
65    #include <rtems.h>
66
67    #ifdef RTEMS_SMP
68    #warning "SMP support is enabled"
69    #else
70    #warning "SMP support is disabled"
71    #endif
72
73Examples
74--------
75
76For example applications see `testsuites/smptests
77<https://git.rtems.org/rtems/tree/testsuites/smptests>`_.
78
79Uniprocessor versus SMP Parallelism
80-----------------------------------
81
82Uniprocessor systems have long been used in embedded systems. In this hardware
83model, there are some system execution characteristics which have long been
84taken for granted:
85
86- one task executes at a time
87
88- hardware events result in interrupts
89
90There is no true parallelism. Even when interrupts appear to occur at the same
91time, they are processed in largely a serial fashion.  This is true even when
92the interupt service routines are allowed to nest.  From a tasking viewpoint,
93it is the responsibility of the real-time operatimg system to simulate
94parallelism by switching between tasks.  These task switches occur in response
95to hardware interrupt events and explicit application events such as blocking
96for a resource or delaying.
97
98With symmetric multiprocessing, the presence of multiple processors allows for
99true concurrency and provides for cost-effective performance
100improvements. Uniprocessors tend to increase performance by increasing clock
101speed and complexity. This tends to lead to hot, power hungry microprocessors
102which are poorly suited for many embedded applications.
103
104The true concurrency is in sharp contrast to the single task and interrupt
105model of uniprocessor systems. This results in a fundamental change to
106uniprocessor system characteristics listed above. Developers are faced with a
107different set of characteristics which, in turn, break some existing
108assumptions and result in new challenges. In an SMP system with N processors,
109these are the new execution characteristics.
110
111- N tasks execute in parallel
112
113- hardware events result in interrupts
114
115There is true parallelism with a task executing on each processor and the
116possibility of interrupts occurring on each processor. Thus in contrast to
117their being one task and one interrupt to consider on a uniprocessor, there are
118N tasks and potentially N simultaneous interrupts to consider on an SMP system.
119
120This increase in hardware complexity and presence of true parallelism results
121in the application developer needing to be even more cautious about mutual
122exclusion and shared data access than in a uniprocessor embedded system. Race
123conditions that never or rarely happened when an application executed on a
124uniprocessor system, become much more likely due to multiple threads executing
125in parallel. On a uniprocessor system, these race conditions would only happen
126when a task switch occurred at just the wrong moment. Now there are N-1 tasks
127executing in parallel all the time and this results in many more opportunities
128for small windows in critical sections to be hit.
129
130.. index:: task affinity
131.. index:: thread affinity
132
133Task Affinity
134-------------
135
136RTEMS provides services to manipulate the affinity of a task. Affinity is used
137to specify the subset of processors in an SMP system on which a particular task
138can execute.
139
140By default, tasks have an affinity which allows them to execute on any
141available processor.
142
143Task affinity is a possible feature to be supported by SMP-aware
144schedulers. However, only a subset of the available schedulers support
145affinity. Although the behavior is scheduler specific, if the scheduler does
146not support affinity, it is likely to ignore all attempts to set affinity.
147
148The scheduler with support for arbitary processor affinities uses a proof of
149concept implementation.  See https://devel.rtems.org/ticket/2510.
150
151.. index:: task migration
152.. index:: thread migration
153
154Task Migration
155--------------
156
157With more than one processor in the system tasks can migrate from one processor
158to another.  There are four reasons why tasks migrate in RTEMS.
159
160- The scheduler changes explicitly via
161  :ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>` or similar
162  directives.
163
164- The task processor affinity changes explicitly via
165  :ref:`rtems_task_set_affinity() <rtems_task_set_affinity>` or similar
166  directives.
167
168- The task resumes execution after a blocking operation.  On a priority based
169  scheduler it will evict the lowest priority task currently assigned to a
170  processor in the processor set managed by the scheduler instance.
171
172- The task moves temporarily to another scheduler instance due to locking
173  protocols like the :ref:`MrsP` or the :ref:`OMIP`.
174
175Task migration should be avoided so that the working set of a task can stay on
176the most local cache level.
177
178.. _ClusteredScheduling:
179
180Clustered Scheduling
181--------------------
182
183The scheduler is responsible to assign processors to some of the threads which
184are ready to execute.  Trouble starts if more ready threads than processors
185exist at the same time.  There are various rules how the processor assignment
186can be performed attempting to fulfill additional constraints or yield some
187overall system properties.  As a matter of fact it is impossible to meet all
188requirements at the same time.  The way a scheduler works distinguishes
189real-time operating systems from general purpose operating systems.
190
191We have clustered scheduling in case the set of processors of a system is
192partitioned into non-empty pairwise-disjoint subsets of processors.  These
193subsets are called clusters.  Clusters with a cardinality of one are
194partitions.  Each cluster is owned by exactly one scheduler instance.  In case
195the cluster size equals the processor count, it is called global scheduling.
196
197Modern SMP systems have multi-layer caches.  An operating system which neglects
198cache constraints in the scheduler will not yield good performance.  Real-time
199operating systems usually provide priority (fixed or job-level) based
200schedulers so that each of the highest priority threads is assigned to a
201processor.  Priority based schedulers have difficulties in providing cache
202locality for threads and may suffer from excessive thread migrations
203:cite:`Brandenburg:2011:SL` :cite:`Compagnin:2014:RUN`.  Schedulers that use local run
204queues and some sort of load-balancing to improve the cache utilization may not
205fulfill global constraints :cite:`Gujarati:2013:LPP` and are more difficult to
206implement than one would normally expect :cite:`Lozi:2016:LSDWC`.
207
208Clustered scheduling was implemented for RTEMS SMP to best use the cache
209topology of a system and to keep the worst-case latencies under control.  The
210low-level SMP locks use FIFO ordering.  So, the worst-case run-time of
211operations increases with each processor involved.  The scheduler configuration
212is quite flexible and done at link-time, see :ref:`Configuring Clustered
213Schedulers`.  It is possible to re-assign processors to schedulers during
214run-time via :ref:`rtems_scheduler_add_processor()
215<rtems_scheduler_add_processor>` and :ref:`rtems_scheduler_remove_processor()
216<rtems_scheduler_remove_processor>`.  The schedulers are implemented in an
217object-oriented fashion.
218
219The problem is to provide synchronization
220primitives for inter-cluster synchronization (more than one cluster is involved
221in the synchronization process). In RTEMS there are currently some means
222available
223
224- events,
225
226- message queues,
227
228- mutexes using the :ref:`OMIP`,
229
230- mutexes using the :ref:`MrsP`, and
231
232- binary and counting semaphores.
233
234The clustered scheduling approach enables separation of functions with
235real-time requirements and functions that profit from fairness and high
236throughput provided the scheduler instances are fully decoupled and adequate
237inter-cluster synchronization primitives are used.
238
239To set the scheduler of a task see :ref:`rtems_scheduler_ident()
240<rtems_scheduler_ident>` and :ref:`rtems_task_set_scheduler()
241<rtems_task_set_scheduler>`.
242
243OpenMP
244------
245
246OpenMP support for RTEMS is available via the GCC provided libgomp.  There is
247libgomp support for RTEMS in the POSIX configuration of libgomp since GCC 4.9
248(requires a Newlib snapshot after 2015-03-12). In GCC 6.1 or later (requires a
249Newlib snapshot after 2015-07-30 for <sys/lock.h> provided self-contained
250synchronization objects) there is a specialized libgomp configuration for RTEMS
251which offers a significantly better performance compared to the POSIX
252configuration of libgomp.  In addition application configurable thread pools
253for each scheduler instance are available in GCC 6.1 or later.
254
255The run-time configuration of libgomp is done via environment variables
256documented in the `libgomp manual <https://gcc.gnu.org/onlinedocs/libgomp/>`_.
257The environment variables are evaluated in a constructor function which
258executes in the context of the first initialization task before the actual
259initialization task function is called (just like a global C++ constructor).
260To set application specific values, a higher priority constructor function must
261be used to set up the environment variables.
262
263.. code-block:: c
264
265    #include <stdlib.h>
266    void __attribute__((constructor(1000))) config_libgomp( void )
267    {
268        setenv( "OMP_DISPLAY_ENV", "VERBOSE", 1 );
269        setenv( "GOMP_SPINCOUNT", "30000", 1 );
270        setenv( "GOMP_RTEMS_THREAD_POOLS", "1$2@SCHD", 1 );
271    }
272
273The environment variable ``GOMP_RTEMS_THREAD_POOLS`` is RTEMS-specific.  It
274determines the thread pools for each scheduler instance.  The format for
275``GOMP_RTEMS_THREAD_POOLS`` is a list of optional
276``<thread-pool-count>[$<priority>]@<scheduler-name>`` configurations separated
277by ``:`` where:
278
279- ``<thread-pool-count>`` is the thread pool count for this scheduler instance.
280
281- ``$<priority>`` is an optional priority for the worker threads of a thread
282  pool according to ``pthread_setschedparam``.  In case a priority value is
283  omitted, then a worker thread will inherit the priority of the OpenMP master
284  thread that created it.  The priority of the worker thread is not changed by
285  libgomp after creation, even if a new OpenMP master thread using the worker
286  has a different priority.
287
288- ``@<scheduler-name>`` is the scheduler instance name according to the RTEMS
289  application configuration.
290
291In case no thread pool configuration is specified for a scheduler instance,
292then each OpenMP master thread of this scheduler instance will use its own
293dynamically allocated thread pool.  To limit the worker thread count of the
294thread pools, each OpenMP master thread must call ``omp_set_num_threads``.
295
296Lets suppose we have three scheduler instances ``IO``, ``WRK0``, and ``WRK1``
297with ``GOMP_RTEMS_THREAD_POOLS`` set to ``"1@WRK0:3$4@WRK1"``.  Then there are
298no thread pool restrictions for scheduler instance ``IO``.  In the scheduler
299instance ``WRK0`` there is one thread pool available.  Since no priority is
300specified for this scheduler instance, the worker thread inherits the priority
301of the OpenMP master thread that created it.  In the scheduler instance
302``WRK1`` there are three thread pools available and their worker threads run at
303priority four.
304
305Application Issues
306==================
307
308Most operating system services provided by the uni-processor RTEMS are
309available in SMP configurations as well.  However, applications designed for an
310uni-processor environment may need some changes to correctly run in an SMP
311configuration.
312
313As discussed earlier, SMP systems have opportunities for true parallelism which
314was not possible on uni-processor systems. Consequently, multiple techniques
315that provided adequate critical sections on uni-processor systems are unsafe on
316SMP systems. In this section, some of these unsafe techniques will be
317discussed.
318
319In general, applications must use proper operating system provided mutual
320exclusion mechanisms to ensure correct behavior.
321
322Task variables
323--------------
324
325Task variables are ordinary global variables with a dedicated value for each
326thread.  During a context switch from the executing thread to the heir thread,
327the value of each task variable is saved to the thread control block of the
328executing thread and restored from the thread control block of the heir thread.
329This is inherently broken if more than one executing thread exists.
330Alternatives to task variables are POSIX keys and :ref:`TLS <TLS>`.  All use
331cases of task variables in the RTEMS code base were replaced with alternatives.
332The task variable API has been removed in RTEMS 5.1.
333
334Highest Priority Thread Never Walks Alone
335-----------------------------------------
336
337On a uni-processor system, it is safe to assume that when the highest priority
338task in an application executes, it will execute without being preempted until
339it voluntarily blocks. Interrupts may occur while it is executing, but there
340will be no context switch to another task unless the highest priority task
341voluntarily initiates it.
342
343Given the assumption that no other tasks will have their execution interleaved
344with the highest priority task, it is possible for this task to be constructed
345such that it does not need to acquire a mutex for protected access to shared
346data.
347
348In an SMP system, it cannot be assumed there will never be a single task
349executing. It should be assumed that every processor is executing another
350application task. Further, those tasks will be ones which would not have been
351executed in a uni-processor configuration and should be assumed to have data
352synchronization conflicts with what was formerly the highest priority task
353which executed without conflict.
354
355Disabling of Thread Preemption
356------------------------------
357
358A thread which disables preemption prevents that a higher priority thread gets
359hold of its processor involuntarily.  In uni-processor configurations, this can
360be used to ensure mutual exclusion at thread level.  In SMP configurations,
361however, more than one executing thread may exist.  Thus, it is impossible to
362ensure mutual exclusion using this mechanism.  In order to prevent that
363applications using preemption for this purpose, would show inappropriate
364behaviour, this feature is disabled in SMP configurations and its use would
365case run-time errors.
366
367Disabling of Interrupts
368-----------------------
369
370A low overhead means that ensures mutual exclusion in uni-processor
371configurations is the disabling of interrupts around a critical section.  This
372is commonly used in device driver code.  In SMP configurations, however,
373disabling the interrupts on one processor has no effect on other processors.
374So, this is insufficient to ensure system-wide mutual exclusion.  The macros
375
376* :ref:`rtems_interrupt_disable() <rtems_interrupt_disable>`,
377
378* :ref:`rtems_interrupt_enable() <rtems_interrupt_enable>`, and
379
380* :ref:`rtems_interrupt_flash() <rtems_interrupt_flash>`.
381
382are disabled in SMP configurations and its use will cause compile-time warnings
383and link-time errors.  In the unlikely case that interrupts must be disabled on
384the current processor, the
385
386* :ref:`rtems_interrupt_local_disable() <rtems_interrupt_local_disable>`, and
387
388* :ref:`rtems_interrupt_local_enable() <rtems_interrupt_local_enable>`.
389
390macros are now available in all configurations.
391
392Since disabling of interrupts is insufficient to ensure system-wide mutual
393exclusion on SMP a new low-level synchronization primitive was added --
394interrupt locks.  The interrupt locks are a simple API layer on top of the SMP
395locks used for low-level synchronization in the operating system core.
396Currently, they are implemented as a ticket lock.  In uni-processor
397configurations, they degenerate to simple interrupt disable/enable sequences by
398means of the C pre-processor.  It is disallowed to acquire a single interrupt
399lock in a nested way.  This will result in an infinite loop with interrupts
400disabled.  While converting legacy code to interrupt locks, care must be taken
401to avoid this situation to happen.
402
403.. code-block:: c
404    :linenos:
405
406    #include <rtems.h>
407
408    void legacy_code_with_interrupt_disable_enable( void )
409    {
410      rtems_interrupt_level level;
411
412      rtems_interrupt_disable( level );
413      /* Critical section */
414      rtems_interrupt_enable( level );
415    }
416
417    RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" )
418
419    void smp_ready_code_with_interrupt_lock( void )
420    {
421      rtems_interrupt_lock_context lock_context;
422
423      rtems_interrupt_lock_acquire( &lock, &lock_context );
424      /* Critical section */
425      rtems_interrupt_lock_release( &lock, &lock_context );
426    }
427
428An alternative to the RTEMS-specific interrupt locks are POSIX spinlocks.  The
429:c:type:`pthread_spinlock_t` is defined as a self-contained object, e.g. the
430user must provide the storage for this synchronization object.
431
432.. code-block:: c
433    :linenos:
434
435    #include <assert.h>
436    #include <pthread.h>
437
438    pthread_spinlock_t lock;
439
440    void smp_ready_code_with_posix_spinlock( void )
441    {
442      int error;
443
444      error = pthread_spin_lock( &lock );
445      assert( error == 0 );
446      /* Critical section */
447      error = pthread_spin_unlock( &lock );
448      assert( error == 0 );
449    }
450
451In contrast to POSIX spinlock implementation on Linux or FreeBSD, it is not
452allowed to call blocking operating system services inside the critical section.
453A recursive lock attempt is a severe usage error resulting in an infinite loop
454with interrupts disabled.  Nesting of different locks is allowed.  The user
455must ensure that no deadlock can occur.  As a non-portable feature the locks
456are zero-initialized, e.g. statically initialized global locks reside in the
457``.bss`` section and there is no need to call :c:func:`pthread_spin_init`.
458
459Interrupt Service Routines Execute in Parallel With Threads
460-----------------------------------------------------------
461
462On a machine with more than one processor, interrupt service routines (this
463includes timer service routines installed via :ref:`rtems_timer_fire_after()
464<rtems_timer_fire_after>`) and threads can execute in parallel.  Interrupt
465service routines must take this into account and use proper locking mechanisms
466to protect critical sections from interference by threads (interrupt locks or
467POSIX spinlocks).  This likely requires code modifications in legacy device
468drivers.
469
470Timers Do Not Stop Immediately
471------------------------------
472
473Timer service routines run in the context of the clock interrupt.  On
474uni-processor configurations, it is sufficient to disable interrupts and remove
475a timer from the set of active timers to stop it.  In SMP configurations,
476however, the timer service routine may already run and wait on an SMP lock
477owned by the thread which is about to stop the timer.  This opens the door to
478subtle synchronization issues.  During destruction of objects, special care
479must be taken to ensure that timer service routines cannot access (partly or
480fully) destroyed objects.
481
482False Sharing of Cache Lines Due to Objects Table
483-------------------------------------------------
484
485The Classic API and most POSIX API objects are indirectly accessed via an
486object identifier.  The user-level functions validate the object identifier and
487map it to the actual object structure which resides in a global objects table
488for each object class.  So, unrelated objects are packed together in a table.
489This may result in false sharing of cache lines.  The effect of false sharing
490of cache lines can be observed with the `TMFINE 1
491<https://git.rtems.org/rtems/tree/testsuites/tmtests/tmfine01>`_ test program
492on a suitable platform, e.g. QorIQ T4240.  High-performance SMP applications
493need full control of the object storage :cite:`Drepper:2007:Memory`.
494Therefore, self-contained synchronization objects are now available for RTEMS.
495
496Directives
497==========
498
499This section details the symmetric multiprocessing services.  A subsection is
500dedicated to each of these services and describes the calling sequence, related
501constants, usage, and status codes.
502
503.. raw:: latex
504
505   \clearpage
506
507.. _rtems_get_processor_count:
508
509GET_PROCESSOR_COUNT - Get processor count
510-----------------------------------------
511
512CALLING SEQUENCE:
513    .. code-block:: c
514
515        uint32_t rtems_get_processor_count(void);
516
517DIRECTIVE STATUS CODES:
518
519    The count of processors in the system that can be run. The value returned
520    is the highest numbered processor index of all processors available to the
521    application (if a scheduler is assigned) plus one.
522
523DESCRIPTION:
524    In uni-processor configurations, a value of one will be returned.
525
526    In SMP configurations, this returns the value of a global variable set
527    during system initialization to indicate the count of utilized processors.
528    The processor count depends on the physically or virtually available
529    processors and application configuration.  The value will always be less
530    than or equal to the maximum count of application configured processors.
531
532NOTES:
533    None.
534
535.. raw:: latex
536
537   \clearpage
538
539.. _rtems_get_current_processor:
540
541GET_CURRENT_PROCESSOR - Get current processor index
542---------------------------------------------------
543
544CALLING SEQUENCE:
545    .. code-block:: c
546
547        uint32_t rtems_get_current_processor(void);
548
549DIRECTIVE STATUS CODES:
550    The index of the current processor.
551
552DESCRIPTION:
553    In uni-processor configurations, a value of zero will be returned.
554
555    In SMP configurations, an architecture specific method is used to obtain the
556    index of the current processor in the system.  The set of processor indices
557    is the range of integers starting with zero up to the processor count minus
558    one.
559
560    Outside of sections with disabled thread dispatching the current processor
561    index may change after every instruction since the thread may migrate from
562    one processor to another.  Sections with disabled interrupts are sections
563    with thread dispatching disabled.
564
565NOTES:
566    None.
567
568Implementation Details
569======================
570
571This section covers some implementation details of the RTEMS SMP support.
572
573Low-Level Synchronization
574-------------------------
575
576All low-level synchronization primitives are implemented using :term:`C11`
577atomic operations, so no target-specific hand-written assembler code is
578necessary.  Four synchronization primitives are currently available
579
580* ticket locks (mutual exclusion),
581
582* :term:`MCS` locks (mutual exclusion),
583
584* barriers, implemented as a sense barrier, and
585
586* sequence locks :cite:`Boehm:2012:Seqlock`.
587
588A vital requirement for low-level mutual exclusion is :term:`FIFO` fairness
589since we are interested in a predictable system and not maximum throughput.
590With this requirement, there are only few options to resolve this problem.  For
591reasons of simplicity, the ticket lock algorithm was chosen to implement the
592SMP locks.  However, the API is capable to support MCS locks, which may be
593interesting in the future for systems with a processor count in the range of 32
594or more, e.g.  :term:`NUMA`, many-core systems.
595
596The test program `SMPLOCK 1
597<https://git.rtems.org/rtems/tree/testsuites/smptests/smplock01>`_ can be used
598to gather performance and fairness data for several scenarios.  The SMP lock
599performance and fairness measured on the QorIQ T4240 follows as an example.
600This chip contains three L2 caches.  Each L2 cache is shared by eight
601processors.
602
603.. image:: ../images/c_user/smplock01perf-t4240.*
604   :width: 400
605   :align: center
606
607.. image:: ../images/c_user/smplock01fair-t4240.*
608   :width: 400
609   :align: center
610
611Internal Locking
612----------------
613
614In SMP configurations, the operating system uses non-recursive SMP locks for
615low-level mutual exclusion.  The locking domains are roughly
616
617* a particular data structure,
618* the thread queue operations,
619* the thread state changes, and
620* the scheduler operations.
621
622For a good average-case performance it is vital that every high-level
623synchronization object, e.g. mutex, has its own SMP lock.  In the average-case,
624only this SMP lock should be involved to carry out a specific operation, e.g.
625obtain/release a mutex.  In general, the high-level synchronization objects
626have a thread queue embedded and use its SMP lock.
627
628In case a thread must block on a thread queue, then things get complicated.
629The executing thread first acquires the SMP lock of the thread queue and then
630figures out that it needs to block.  The procedure to block the thread on this
631particular thread queue involves state changes of the thread itself and for
632this thread-specific SMP locks must be used.
633
634In order to determine if a thread is blocked on a thread queue or not
635thread-specific SMP locks must be used.  A thread priority change must
636propagate this to the thread queue (possibly recursively).  Care must be taken
637to not have a lock order reversal between thread queue and thread-specific SMP
638locks.
639
640Each scheduler instance has its own SMP lock.  For the scheduler helping
641protocol multiple scheduler instances may be in charge of a thread.  It is not
642possible to acquire two scheduler instance SMP locks at the same time,
643otherwise deadlocks would happen.  A thread-specific SMP lock is used to
644synchronize the thread data shared by different scheduler instances.
645
646The thread state SMP lock protects various things, e.g. the thread state, join
647operations, signals, post-switch actions, the home scheduler instance, etc.
648
649Profiling
650---------
651
652To identify the bottlenecks in the system, support for profiling of low-level
653synchronization is optionally available.  The profiling support is a BSP build
654time configuration option (``--enable-profiling``) and is implemented with an
655acceptable overhead, even for production systems.  A low-overhead counter for
656short time intervals must be provided by the hardware.
657
658Profiling reports are generated in XML for most test programs of the RTEMS
659testsuite (more than 500 test programs).  This gives a good sample set for
660statistics.  For example the maximum thread dispatch disable time, the maximum
661interrupt latency or lock contention can be determined.
662
663.. code-block:: xml
664
665   <ProfilingReport name="SMPMIGRATION 1">
666     <PerCPUProfilingReport processorIndex="0">
667       <MaxThreadDispatchDisabledTime unit="ns">36636</MaxThreadDispatchDisabledTime>
668       <MeanThreadDispatchDisabledTime unit="ns">5065</MeanThreadDispatchDisabledTime>
669       <TotalThreadDispatchDisabledTime unit="ns">3846635988
670         </TotalThreadDispatchDisabledTime>
671       <ThreadDispatchDisabledCount>759395</ThreadDispatchDisabledCount>
672       <MaxInterruptDelay unit="ns">8772</MaxInterruptDelay>
673       <MaxInterruptTime unit="ns">13668</MaxInterruptTime>
674       <MeanInterruptTime unit="ns">6221</MeanInterruptTime>
675       <TotalInterruptTime unit="ns">6757072</TotalInterruptTime>
676       <InterruptCount>1086</InterruptCount>
677     </PerCPUProfilingReport>
678     <PerCPUProfilingReport processorIndex="1">
679       <MaxThreadDispatchDisabledTime unit="ns">39408</MaxThreadDispatchDisabledTime>
680       <MeanThreadDispatchDisabledTime unit="ns">5060</MeanThreadDispatchDisabledTime>
681       <TotalThreadDispatchDisabledTime unit="ns">3842749508
682         </TotalThreadDispatchDisabledTime>
683       <ThreadDispatchDisabledCount>759391</ThreadDispatchDisabledCount>
684       <MaxInterruptDelay unit="ns">8412</MaxInterruptDelay>
685       <MaxInterruptTime unit="ns">15868</MaxInterruptTime>
686       <MeanInterruptTime unit="ns">3525</MeanInterruptTime>
687       <TotalInterruptTime unit="ns">3814476</TotalInterruptTime>
688       <InterruptCount>1082</InterruptCount>
689     </PerCPUProfilingReport>
690     <!-- more reports omitted --->
691     <SMPLockProfilingReport name="Scheduler">
692       <MaxAcquireTime unit="ns">7092</MaxAcquireTime>
693       <MaxSectionTime unit="ns">10984</MaxSectionTime>
694       <MeanAcquireTime unit="ns">2320</MeanAcquireTime>
695       <MeanSectionTime unit="ns">199</MeanSectionTime>
696       <TotalAcquireTime unit="ns">3523939244</TotalAcquireTime>
697       <TotalSectionTime unit="ns">302545596</TotalSectionTime>
698       <UsageCount>1518758</UsageCount>
699       <ContentionCount initialQueueLength="0">759399</ContentionCount>
700       <ContentionCount initialQueueLength="1">759359</ContentionCount>
701       <ContentionCount initialQueueLength="2">0</ContentionCount>
702       <ContentionCount initialQueueLength="3">0</ContentionCount>
703     </SMPLockProfilingReport>
704   </ProfilingReport>
705
706Scheduler Helping Protocol
707--------------------------
708
709The scheduler provides a helping protocol to support locking protocols like the
710:ref:`OMIP` or the :ref:`MrsP`.  Each thread has a scheduler node for each
711scheduler instance in the system which are located in its :term:`TCB`.  A
712thread has exactly one home scheduler instance which is set during thread
713creation.  The home scheduler instance can be changed with
714:ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>`.  Due to the
715locking protocols a thread may gain access to scheduler nodes of other
716scheduler instances.  This allows the thread to temporarily migrate to another
717scheduler instance in case of preemption.
718
719The scheduler infrastructure is based on an object-oriented design.  The
720scheduler operations for a thread are defined as virtual functions.  For the
721scheduler helping protocol the following operations must be implemented by an
722SMP-aware scheduler
723
724* ask a scheduler node for help,
725* reconsider the help request of a scheduler node,
726* withdraw a schedule node.
727
728All currently available SMP-aware schedulers use a framework which is
729customized via inline functions.  This eases the implementation of scheduler
730variants.  Up to now, only priority-based schedulers are implemented.
731
732In case a thread is allowed to use more than one scheduler node it will ask
733these nodes for help
734
735* in case of preemption, or
736* an unblock did not schedule the thread, or
737* a yield  was successful.
738
739The actual ask for help scheduler operations are carried out as a side-effect
740of the thread dispatch procedure.  Once a need for help is recognized, a help
741request is registered in one of the processors related to the thread and a
742thread dispatch is issued.  This indirection leads to a better decoupling of
743scheduler instances.  Unrelated processors are not burdened with extra work for
744threads which participate in resource sharing.  Each ask for help operation
745indicates if it could help or not.  The procedure stops after the first
746successful ask for help.  Unsuccessful ask for help operations will register
747this need in the scheduler context.
748
749After a thread dispatch the reconsider help request operation is used to clean
750up stale help registrations in the scheduler contexts.
751
752The withdraw operation takes away scheduler nodes once the thread is no longer
753allowed to use them, e.g. it released a mutex.  The availability of scheduler
754nodes for a thread is controlled by the thread queues.
755
756Thread Dispatch Details
757-----------------------
758
759This section gives background information to developers interested in the
760interrupt latencies introduced by thread dispatching.  A thread dispatch
761consists of all work which must be done to stop the currently executing thread
762on a processor and hand over this processor to an heir thread.
763
764In SMP systems, scheduling decisions on one processor must be propagated
765to other processors through inter-processor interrupts.  A thread dispatch
766which must be carried out on another processor does not happen instantaneously.
767Thus, several thread dispatch requests might be in the air and it is possible
768that some of them may be out of date before the corresponding processor has
769time to deal with them.  The thread dispatch mechanism uses three per-processor
770variables,
771
772- the executing thread,
773
774- the heir thread, and
775
776- a boolean flag indicating if a thread dispatch is necessary or not.
777
778Updates of the heir thread are done via a normal store operation.  The thread
779dispatch necessary indicator of another processor is set as a side-effect of an
780inter-processor interrupt.  So, this change notification works without the use
781of locks.  The thread context is protected by a :term:`TTAS` lock embedded in
782the context to ensure that it is used on at most one processor at a time.
783Normally, only thread-specific or per-processor locks are used during a thread
784dispatch.  This implementation turned out to be quite efficient and no lock
785contention was observed in the testsuite.  The heavy-weight thread dispatch
786sequence is only entered in case the thread dispatch indicator is set.
787
788The context-switch is performed with interrupts enabled.  During the transition
789from the executing to the heir thread neither the stack of the executing nor
790the heir thread must be used during interrupt processing.  For this purpose a
791temporary per-processor stack is set up which may be used by the interrupt
792prologue before the stack is switched to the interrupt stack.
Note: See TracBrowser for help on using the repository browser.