Context Navigation

source: rtems-docs/c-user/symmetric_multiprocessing_services.rst @ e2d2d4a

Last change on this file since e2d2d4a was e2d2d4a, checked in by Kinsey Moore <kinsey.moore@…>, on 03/27/23 at 19:32:54
c-user/smp: Fix item rendering
Property mode set to `100644`
File size: 33.9 KB

Line
1	.. SPDX-License-Identifier: CC-BY-SA-4.0
2
3	.. Copyright (C) 2014.
4	.. COMMENT: On-Line Applications Research Corporation (OAR).
5	.. Copyright (C) 2017 embedded brains GmbH.
6
7	.. index:: Symmetric Multiprocessing
8	.. index:: SMP
9
10	Symmetric Multiprocessing (SMP)
11	*******************************
12
13	Introduction
14	============
15
16	RTEMS Symmetric Multiprocessing (SMP) support is available on a subset
17	of target architectures supported by RTEMS. Further on some target
18	architectures, it is only available on a subset of BSPs. The user is
19	advised to check the BSP specific documentation and RTEMS source code
20	to verify the status of SMP support for a specific BSP. The following
21	architectures have support for SMP:
22
23	- AArch64,
24
25	- ARMv7-A,
26
27	- i386,
28
29	- PowerPC,
30
31	- RISC-V, and
32
33	- SPARC.
34
35	.. warning::
36
37	SMP support is only available if RTEMS was built with the
38	SMP build configuration option enabled.
39
40	RTEMS is supposed to be a real-time operating system. What does this mean in
41	the context of SMP? The RTEMS interpretation of real-time on SMP is the
42	support for :ref:`ClusteredScheduling` with priority based schedulers and
43	adequate locking protocols. One aim is to enable a schedulability analysis
44	under the sporadic task model :cite:`Brandenburg:2011:SL`
45	:cite:`Burns:2013:MrsP`.
46
47	Background
48	==========
49
50	Application Configuration
51	-------------------------
52
53	By default, the maximum processor count is set to one in the application
54	configuration. To enable SMP, the application configuration option
55	:ref:`CONFIGURE_MAXIMUM_PROCESSORS <CONFIGURE_MAXIMUM_PROCESSORS>` must be
56	defined to a value greater than one. It is recommended to use the smallest
57	value suitable for the application in order to save memory. Each processor
58	needs an idle thread and interrupt stack for example.
59
60	The default scheduler for SMP applications supports up to 32 processors and is
61	a global fixed priority scheduler, see also :ref:`ConfigurationSchedulersClustered`.
62
63	The following compile-time test can be used to check if the SMP support is
64	available or not.
65
66	.. code-block:: c
67
68	#include <rtems.h>
69
70	#ifdef RTEMS_SMP
71	#warning "SMP support is enabled"
72	#else
73	#warning "SMP support is disabled"
74	#endif
75
76	Examples
77	--------
78
79	For example applications see `testsuites/smptests
80	<https://git.rtems.org/rtems/tree/testsuites/smptests>`_.
81
82	Uniprocessor versus SMP Parallelism
83	-----------------------------------
84
85	Uniprocessor systems have long been used in embedded systems. In this hardware
86	model, there are some system execution characteristics which have long been
87	taken for granted:
88
89	- one task executes at a time
90
91	- hardware events result in interrupts
92
93	There is no true parallelism. Even when interrupts appear to occur at the same
94	time, they are processed in largely a serial fashion. This is true even when
95	the interupt service routines are allowed to nest. From a tasking viewpoint,
96	it is the responsibility of the real-time operatimg system to simulate
97	parallelism by switching between tasks. These task switches occur in response
98	to hardware interrupt events and explicit application events such as blocking
99	for a resource or delaying.
100
101	With symmetric multiprocessing, the presence of multiple processors allows for
102	true concurrency and provides for cost-effective performance
103	improvements. Uniprocessors tend to increase performance by increasing clock
104	speed and complexity. This tends to lead to hot, power hungry microprocessors
105	which are poorly suited for many embedded applications.
106
107	The true concurrency is in sharp contrast to the single task and interrupt
108	model of uniprocessor systems. This results in a fundamental change to
109	uniprocessor system characteristics listed above. Developers are faced with a
110	different set of characteristics which, in turn, break some existing
111	assumptions and result in new challenges. In an SMP system with N processors,
112	these are the new execution characteristics.
113
114	- N tasks execute in parallel
115
116	- hardware events result in interrupts
117
118	There is true parallelism with a task executing on each processor and the
119	possibility of interrupts occurring on each processor. Thus in contrast to
120	their being one task and one interrupt to consider on a uniprocessor, there are
121	N tasks and potentially N simultaneous interrupts to consider on an SMP system.
122
123	This increase in hardware complexity and presence of true parallelism results
124	in the application developer needing to be even more cautious about mutual
125	exclusion and shared data access than in a uniprocessor embedded system. Race
126	conditions that never or rarely happened when an application executed on a
127	uniprocessor system, become much more likely due to multiple threads executing
128	in parallel. On a uniprocessor system, these race conditions would only happen
129	when a task switch occurred at just the wrong moment. Now there are N-1 tasks
130	executing in parallel all the time and this results in many more opportunities
131	for small windows in critical sections to be hit.
132
133	.. index:: task affinity
134	.. index:: thread affinity
135
136	Task Affinity
137	-------------
138
139	RTEMS provides services to manipulate the affinity of a task. Affinity is used
140	to specify the subset of processors in an SMP system on which a particular task
141	can execute.
142
143	By default, tasks have an affinity which allows them to execute on any
144	available processor.
145
146	Task affinity is a possible feature to be supported by SMP-aware
147	schedulers. However, only a subset of the available schedulers support
148	affinity. Although the behavior is scheduler specific, if the scheduler does
149	not support affinity, it is likely to ignore all attempts to set affinity.
150
151	The scheduler with support for arbitary processor affinities uses a proof of
152	concept implementation. See https://devel.rtems.org/ticket/2510.
153
154	.. index:: task migration
155	.. index:: thread migration
156
157	Task Migration
158	--------------
159
160	With more than one processor in the system tasks can migrate from one processor
161	to another. There are four reasons why tasks migrate in RTEMS.
162
163	- The scheduler changes explicitly via
164	:ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>` or similar
165	directives.
166
167	- The task processor affinity changes explicitly via
168	:ref:`rtems_task_set_affinity() <rtems_task_set_affinity>` or similar
169	directives.
170
171	- The task resumes execution after a blocking operation. On a priority based
172	scheduler it will evict the lowest priority task currently assigned to a
173	processor in the processor set managed by the scheduler instance.
174
175	- The task moves temporarily to another scheduler instance due to locking
176	protocols like the :ref:`MrsP` or the :ref:`OMIP`.
177
178	Task migration should be avoided so that the working set of a task can stay on
179	the most local cache level.
180
181	.. _ClusteredScheduling:
182
183	Clustered Scheduling
184	--------------------
185
186	The scheduler is responsible to assign processors to some of the threads which
187	are ready to execute. Trouble starts if more ready threads than processors
188	exist at the same time. There are various rules how the processor assignment
189	can be performed attempting to fulfill additional constraints or yield some
190	overall system properties. As a matter of fact it is impossible to meet all
191	requirements at the same time. The way a scheduler works distinguishes
192	real-time operating systems from general purpose operating systems.
193
194	We have clustered scheduling in case the set of processors of a system is
195	partitioned into non-empty pairwise-disjoint subsets of processors. These
196	subsets are called clusters. Clusters with a cardinality of one are
197	partitions. Each cluster is owned by exactly one scheduler instance. In case
198	the cluster size equals the processor count, it is called global scheduling.
199
200	Modern SMP systems have multi-layer caches. An operating system which neglects
201	cache constraints in the scheduler will not yield good performance. Real-time
202	operating systems usually provide priority (fixed or job-level) based
203	schedulers so that each of the highest priority threads is assigned to a
204	processor. Priority based schedulers have difficulties in providing cache
205	locality for threads and may suffer from excessive thread migrations
206	:cite:`Brandenburg:2011:SL` :cite:`Compagnin:2014:RUN`. Schedulers that use local run
207	queues and some sort of load-balancing to improve the cache utilization may not
208	fulfill global constraints :cite:`Gujarati:2013:LPP` and are more difficult to
209	implement than one would normally expect :cite:`Lozi:2016:LSDWC`.
210
211	Clustered scheduling was implemented for RTEMS SMP to best use the cache
212	topology of a system and to keep the worst-case latencies under control. The
213	low-level SMP locks use FIFO ordering. So, the worst-case run-time of
214	operations increases with each processor involved. The scheduler configuration
215	is quite flexible and done at link-time, see
216	:ref:`ConfigurationSchedulersClustered`. It is possible to re-assign
217	processors to schedulers during run-time via
218	:ref:`rtems_scheduler_add_processor() <rtems_scheduler_add_processor>` and
219	:ref:`rtems_scheduler_remove_processor() <rtems_scheduler_remove_processor>`.
220	The schedulers are implemented in an object-oriented fashion.
221
222	The problem is to provide synchronization
223	primitives for inter-cluster synchronization (more than one cluster is involved
224	in the synchronization process). In RTEMS there are currently some means
225	available
226
227	- events,
228
229	- message queues,
230
231	- mutexes using the :ref:`OMIP`,
232
233	- mutexes using the :ref:`MrsP`, and
234
235	- binary and counting semaphores.
236
237	The clustered scheduling approach enables separation of functions with
238	real-time requirements and functions that profit from fairness and high
239	throughput provided the scheduler instances are fully decoupled and adequate
240	inter-cluster synchronization primitives are used.
241
242	To set the scheduler of a task see :ref:`rtems_scheduler_ident()
243	<rtems_scheduler_ident>` and :ref:`rtems_task_set_scheduler()
244	<rtems_task_set_scheduler>`.
245
246	OpenMP
247	------
248
249	OpenMP support for RTEMS is available via the GCC provided libgomp. There is
250	libgomp support for RTEMS in the POSIX configuration of libgomp since GCC 4.9
251	(requires a Newlib snapshot after 2015-03-12). In GCC 6.1 or later (requires a
252	Newlib snapshot after 2015-07-30 for <sys/lock.h> provided self-contained
253	synchronization objects) there is a specialized libgomp configuration for RTEMS
254	which offers a significantly better performance compared to the POSIX
255	configuration of libgomp. In addition application configurable thread pools
256	for each scheduler instance are available in GCC 6.1 or later.
257
258	The run-time configuration of libgomp is done via environment variables
259	documented in the `libgomp manual <https://gcc.gnu.org/onlinedocs/libgomp/>`_.
260	The environment variables are evaluated in a constructor function which
261	executes in the context of the first initialization task before the actual
262	initialization task function is called (just like a global C++ constructor).
263	To set application specific values, a higher priority constructor function must
264	be used to set up the environment variables.
265
266	.. code-block:: c
267
268	#include <stdlib.h>
269	void __attribute__((constructor(1000))) config_libgomp( void )
270	{
271	setenv( "OMP_DISPLAY_ENV", "VERBOSE", 1 );
272	setenv( "GOMP_SPINCOUNT", "30000", 1 );
273	setenv( "GOMP_RTEMS_THREAD_POOLS", "1$2@SCHD", 1 );
274	}
275
276	The environment variable ``GOMP_RTEMS_THREAD_POOLS`` is RTEMS-specific. It
277	determines the thread pools for each scheduler instance. The format for
278	``GOMP_RTEMS_THREAD_POOLS`` is a list of optional
279	``<thread-pool-count>[$<priority>]@<scheduler-name>`` configurations separated
280	by ``:`` where:
281
282	- ``<thread-pool-count>`` is the thread pool count for this scheduler instance.
283
284	- ``$<priority>`` is an optional priority for the worker threads of a thread
285	pool according to ``pthread_setschedparam``. In case a priority value is
286	omitted, then a worker thread will inherit the priority of the OpenMP master
287	thread that created it. The priority of the worker thread is not changed by
288	libgomp after creation, even if a new OpenMP master thread using the worker
289	has a different priority.
290
291	- ``@<scheduler-name>`` is the scheduler instance name according to the RTEMS
292	application configuration.
293
294	In case no thread pool configuration is specified for a scheduler instance,
295	then each OpenMP master thread of this scheduler instance will use its own
296	dynamically allocated thread pool. To limit the worker thread count of the
297	thread pools, each OpenMP master thread must call ``omp_set_num_threads``.
298
299	Lets suppose we have three scheduler instances ``IO``, ``WRK0``, and ``WRK1``
300	with ``GOMP_RTEMS_THREAD_POOLS`` set to ``"1@WRK0:3$4@WRK1"``. Then there are
301	no thread pool restrictions for scheduler instance ``IO``. In the scheduler
302	instance ``WRK0`` there is one thread pool available. Since no priority is
303	specified for this scheduler instance, the worker thread inherits the priority
304	of the OpenMP master thread that created it. In the scheduler instance
305	``WRK1`` there are three thread pools available and their worker threads run at
306	priority four.
307
308	Atomic Operations
309	-----------------
310
311	There is no public RTEMS API for atomic operations. It is recommended to use
312	the standard C `<stdatomic.h> <https://en.cppreference.com/w/c/atomic>`_ or C++
313	`<atomic> <https://en.cppreference.com/w/cpp/atomic/atomic>`_ APIs in
314	applications.
315
316	Application Issues
317	==================
318
319	Most operating system services provided by the uniprocessor RTEMS are
320	available in SMP configurations as well. However, applications designed for an
321	uniprocessor environment may need some changes to correctly run in an SMP
322	configuration.
323
324	As discussed earlier, SMP systems have opportunities for true parallelism which
325	was not possible on uniprocessor systems. Consequently, multiple techniques
326	that provided adequate critical sections on uniprocessor systems are unsafe on
327	SMP systems. In this section, some of these unsafe techniques will be
328	discussed.
329
330	In general, applications must use proper operating system provided mutual
331	exclusion mechanisms to ensure correct behavior.
332
333	Task variables
334	--------------
335
336	Task variables are ordinary global variables with a dedicated value for each
337	thread. During a context switch from the executing thread to the heir thread,
338	the value of each task variable is saved to the thread control block of the
339	executing thread and restored from the thread control block of the heir thread.
340	This is inherently broken if more than one executing thread exists.
341	Alternatives to task variables are POSIX keys and :term:`TLS`. All use cases
342	of task variables in the RTEMS code base were replaced with alternatives. The
343	task variable API has been removed in RTEMS 5.1.
344
345	Highest Priority Thread Never Walks Alone
346	-----------------------------------------
347
348	On a uniprocessor system, it is safe to assume that when the highest priority
349	task in an application executes, it will execute without being preempted until
350	it voluntarily blocks. Interrupts may occur while it is executing, but there
351	will be no context switch to another task unless the highest priority task
352	voluntarily initiates it.
353
354	Given the assumption that no other tasks will have their execution interleaved
355	with the highest priority task, it is possible for this task to be constructed
356	such that it does not need to acquire a mutex for protected access to shared
357	data.
358
359	In an SMP system, it cannot be assumed there will never be a single task
360	executing. It should be assumed that every processor is executing another
361	application task. Further, those tasks will be ones which would not have been
362	executed in a uniprocessor configuration and should be assumed to have data
363	synchronization conflicts with what was formerly the highest priority task
364	which executed without conflict.
365
366	Disabling of Thread Preemption
367	------------------------------
368
369	A thread which disables preemption prevents that a higher priority thread gets
370	hold of its processor involuntarily. In uniprocessor configurations, this can
371	be used to ensure mutual exclusion at thread level. In SMP configurations,
372	however, more than one executing thread may exist. Thus, it is impossible to
373	ensure mutual exclusion using this mechanism. In order to prevent that
374	applications using preemption for this purpose, would show inappropriate
375	behaviour, this feature is disabled in SMP configurations and its use would
376	case run-time errors.
377
378	Disabling of Interrupts
379	-----------------------
380
381	A low overhead means that ensures mutual exclusion in uniprocessor
382	configurations is the disabling of interrupts around a critical section. This
383	is commonly used in device driver code. In SMP configurations, however,
384	disabling the interrupts on one processor has no effect on other processors.
385	So, this is insufficient to ensure system-wide mutual exclusion. The macros
386
387	* :ref:`rtems_interrupt_disable() <rtems_interrupt_disable>`,
388
389	* :ref:`rtems_interrupt_enable() <rtems_interrupt_enable>`, and
390
391	* :ref:`rtems_interrupt_flash() <rtems_interrupt_flash>`.
392
393	are disabled in SMP configurations and its use will cause compile-time warnings
394	and link-time errors. In the unlikely case that interrupts must be disabled on
395	the current processor, the
396
397	* :ref:`rtems_interrupt_local_disable() <rtems_interrupt_local_disable>`, and
398
399	* :ref:`rtems_interrupt_local_enable() <rtems_interrupt_local_enable>`.
400
401	macros are now available in all configurations.
402
403	Since disabling of interrupts is insufficient to ensure system-wide mutual
404	exclusion on SMP a new low-level synchronization primitive was added --
405	interrupt locks. The interrupt locks are a simple API layer on top of the SMP
406	locks used for low-level synchronization in the operating system core.
407	Currently, they are implemented as a ticket lock. In uniprocessor
408	configurations, they degenerate to simple interrupt disable/enable sequences by
409	means of the C pre-processor. It is disallowed to acquire a single interrupt
410	lock in a nested way. This will result in an infinite loop with interrupts
411	disabled. While converting legacy code to interrupt locks, care must be taken
412	to avoid this situation to happen.
413
414	.. code-block:: c
415	:linenos:
416
417	#include <rtems.h>
418
419	void legacy_code_with_interrupt_disable_enable( void )
420	{
421	rtems_interrupt_level level;
422
423	rtems_interrupt_disable( level );
424	/* Critical section */
425	rtems_interrupt_enable( level );
426	}
427
428	RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" )
429
430	void smp_ready_code_with_interrupt_lock( void )
431	{
432	rtems_interrupt_lock_context lock_context;
433
434	rtems_interrupt_lock_acquire( &lock, &lock_context );
435	/* Critical section */
436	rtems_interrupt_lock_release( &lock, &lock_context );
437	}
438
439	An alternative to the RTEMS-specific interrupt locks are POSIX spinlocks. The
440	:c:type:`pthread_spinlock_t` is defined as a self-contained object, e.g. the
441	user must provide the storage for this synchronization object.
442
443	.. code-block:: c
444	:linenos:
445
446	#include <assert.h>
447	#include <pthread.h>
448
449	pthread_spinlock_t lock;
450
451	void smp_ready_code_with_posix_spinlock( void )
452	{
453	int error;
454
455	error = pthread_spin_lock( &lock );
456	assert( error == 0 );
457	/* Critical section */
458	error = pthread_spin_unlock( &lock );
459	assert( error == 0 );
460	}
461
462	In contrast to POSIX spinlock implementation on Linux or FreeBSD, it is not
463	allowed to call blocking operating system services inside the critical section.
464	A recursive lock attempt is a severe usage error resulting in an infinite loop
465	with interrupts disabled. Nesting of different locks is allowed. The user
466	must ensure that no deadlock can occur. As a non-portable feature the locks
467	are zero-initialized, e.g. statically initialized global locks reside in the
468	``.bss`` section and there is no need to call :c:func:`pthread_spin_init`.
469
470	Interrupt Service Routines Execute in Parallel With Threads
471	-----------------------------------------------------------
472
473	On a machine with more than one processor, interrupt service routines (this
474	includes timer service routines installed via :ref:`rtems_timer_fire_after()
475	<rtems_timer_fire_after>`) and threads can execute in parallel. Interrupt
476	service routines must take this into account and use proper locking mechanisms
477	to protect critical sections from interference by threads (interrupt locks or
478	POSIX spinlocks). This likely requires code modifications in legacy device
479	drivers.
480
481	Timers Do Not Stop Immediately
482	------------------------------
483
484	Timer service routines run in the context of the clock interrupt. On
485	uniprocessor configurations, it is sufficient to disable interrupts and remove
486	a timer from the set of active timers to stop it. In SMP configurations,
487	however, the timer service routine may already run and wait on an SMP lock
488	owned by the thread which is about to stop the timer. This opens the door to
489	subtle synchronization issues. During destruction of objects, special care
490	must be taken to ensure that timer service routines cannot access (partly or
491	fully) destroyed objects.
492
493	False Sharing of Cache Lines Due to Objects Table
494	-------------------------------------------------
495
496	The Classic API and most POSIX API objects are indirectly accessed via an
497	object identifier. The user-level functions validate the object identifier and
498	map it to the actual object structure which resides in a global objects table
499	for each object class. So, unrelated objects are packed together in a table.
500	This may result in false sharing of cache lines. The effect of false sharing
501	of cache lines can be observed with the `TMFINE 1
502	<https://git.rtems.org/rtems/tree/testsuites/tmtests/tmfine01>`_ test program
503	on a suitable platform, e.g. QorIQ T4240. High-performance SMP applications
504	need full control of the object storage :cite:`Drepper:2007:Memory`.
505	Therefore, self-contained synchronization objects are now available for RTEMS.
506
507	Implementation Details
508	======================
509
510	This section covers some implementation details of the RTEMS SMP support.
511
512	Low-Level Synchronization
513	-------------------------
514
515	All low-level synchronization primitives are implemented using :term:`C11`
516	atomic operations, so no target-specific hand-written assembler code is
517	necessary. Four synchronization primitives are currently available
518
519	* ticket locks (mutual exclusion),
520
521	* :term:`MCS` locks (mutual exclusion),
522
523	* barriers, implemented as a sense barrier, and
524
525	* sequence locks :cite:`Boehm:2012:Seqlock`.
526
527	A vital requirement for low-level mutual exclusion is :term:`FIFO` fairness
528	since we are interested in a predictable system and not maximum throughput.
529	With this requirement, there are only few options to resolve this problem. For
530	reasons of simplicity, the ticket lock algorithm was chosen to implement the
531	SMP locks. However, the API is capable to support MCS locks, which may be
532	interesting in the future for systems with a processor count in the range of 32
533	or more, e.g. :term:`NUMA`, many-core systems.
534
535	The test program `SMPLOCK 1
536	<https://git.rtems.org/rtems/tree/testsuites/smptests/smplock01>`_ can be used
537	to gather performance and fairness data for several scenarios. The SMP lock
538	performance and fairness measured on the QorIQ T4240 follows as an example.
539	This chip contains three L2 caches. Each L2 cache is shared by eight
540	processors.
541
542	.. image:: ../images/c_user/smplock01perf-t4240.*
543	:width: 400
544	:align: center
545
546	.. image:: ../images/c_user/smplock01fair-t4240.*
547	:width: 400
548	:align: center
549
550	Internal Locking
551	----------------
552
553	In SMP configurations, the operating system uses non-recursive SMP locks for
554	low-level mutual exclusion. The locking domains are roughly
555
556	* a particular data structure,
557	* the thread queue operations,
558	* the thread state changes, and
559	* the scheduler operations.
560
561	For a good average-case performance it is vital that every high-level
562	synchronization object, e.g. mutex, has its own SMP lock. In the average-case,
563	only this SMP lock should be involved to carry out a specific operation, e.g.
564	obtain/release a mutex. In general, the high-level synchronization objects
565	have a thread queue embedded and use its SMP lock.
566
567	In case a thread must block on a thread queue, then things get complicated.
568	The executing thread first acquires the SMP lock of the thread queue and then
569	figures out that it needs to block. The procedure to block the thread on this
570	particular thread queue involves state changes of the thread itself and for
571	this thread-specific SMP locks must be used.
572
573	In order to determine if a thread is blocked on a thread queue or not
574	thread-specific SMP locks must be used. A thread priority change must
575	propagate this to the thread queue (possibly recursively). Care must be taken
576	to not have a lock order reversal between thread queue and thread-specific SMP
577	locks.
578
579	Each scheduler instance has its own SMP lock. For the scheduler helping
580	protocol multiple scheduler instances may be in charge of a thread. It is not
581	possible to acquire two scheduler instance SMP locks at the same time,
582	otherwise deadlocks would happen. A thread-specific SMP lock is used to
583	synchronize the thread data shared by different scheduler instances.
584
585	The thread state SMP lock protects various things, e.g. the thread state, join
586	operations, signals, post-switch actions, the home scheduler instance, etc.
587
588	Profiling
589	---------
590
591	To identify the bottlenecks in the system, support for profiling of low-level
592	synchronization is optionally available. The profiling support is
593	an RTEMS build time configuration option and is implemented with an
594	acceptable overhead, even for production systems. A low-overhead counter
595	for short time intervals must be provided by the hardware.
596
597	Profiling reports are generated in XML for most test programs of the RTEMS
598	testsuite (more than 500 test programs). This gives a good sample set for
599	statistics. For example the maximum thread dispatch disable time, the maximum
600	interrupt latency or lock contention can be determined.
601
602	.. code-block:: xml
603
604	<ProfilingReport name="SMPMIGRATION 1">
605	<PerCPUProfilingReport processorIndex="0">
606	<MaxThreadDispatchDisabledTime unit="ns">36636</MaxThreadDispatchDisabledTime>
607	<MeanThreadDispatchDisabledTime unit="ns">5065</MeanThreadDispatchDisabledTime>
608	<TotalThreadDispatchDisabledTime unit="ns">3846635988
609	</TotalThreadDispatchDisabledTime>
610	<ThreadDispatchDisabledCount>759395</ThreadDispatchDisabledCount>
611	<MaxInterruptDelay unit="ns">8772</MaxInterruptDelay>
612	<MaxInterruptTime unit="ns">13668</MaxInterruptTime>
613	<MeanInterruptTime unit="ns">6221</MeanInterruptTime>
614	<TotalInterruptTime unit="ns">6757072</TotalInterruptTime>
615	<InterruptCount>1086</InterruptCount>
616	</PerCPUProfilingReport>
617	<PerCPUProfilingReport processorIndex="1">
618	<MaxThreadDispatchDisabledTime unit="ns">39408</MaxThreadDispatchDisabledTime>
619	<MeanThreadDispatchDisabledTime unit="ns">5060</MeanThreadDispatchDisabledTime>
620	<TotalThreadDispatchDisabledTime unit="ns">3842749508
621	</TotalThreadDispatchDisabledTime>
622	<ThreadDispatchDisabledCount>759391</ThreadDispatchDisabledCount>
623	<MaxInterruptDelay unit="ns">8412</MaxInterruptDelay>
624	<MaxInterruptTime unit="ns">15868</MaxInterruptTime>
625	<MeanInterruptTime unit="ns">3525</MeanInterruptTime>
626	<TotalInterruptTime unit="ns">3814476</TotalInterruptTime>
627	<InterruptCount>1082</InterruptCount>
628	</PerCPUProfilingReport>
629	<!-- more reports omitted --->
630	<SMPLockProfilingReport name="Scheduler">
631	<MaxAcquireTime unit="ns">7092</MaxAcquireTime>
632	<MaxSectionTime unit="ns">10984</MaxSectionTime>
633	<MeanAcquireTime unit="ns">2320</MeanAcquireTime>
634	<MeanSectionTime unit="ns">199</MeanSectionTime>
635	<TotalAcquireTime unit="ns">3523939244</TotalAcquireTime>
636	<TotalSectionTime unit="ns">302545596</TotalSectionTime>
637	<UsageCount>1518758</UsageCount>
638	<ContentionCount initialQueueLength="0">759399</ContentionCount>
639	<ContentionCount initialQueueLength="1">759359</ContentionCount>
640	<ContentionCount initialQueueLength="2">0</ContentionCount>
641	<ContentionCount initialQueueLength="3">0</ContentionCount>
642	</SMPLockProfilingReport>
643	</ProfilingReport>
644
645	Scheduler Helping Protocol
646	--------------------------
647
648	The scheduler provides a helping protocol to support locking protocols like the
649	:ref:`OMIP` or the :ref:`MrsP`. Each thread has a scheduler node for each
650	scheduler instance in the system which are located in its :term:`TCB`. A
651	thread has exactly one home scheduler instance which is set during thread
652	creation. The home scheduler instance can be changed with
653	:ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>`. Due to the
654	locking protocols a thread may gain access to scheduler nodes of other
655	scheduler instances. This allows the thread to temporarily migrate to another
656	scheduler instance in case of preemption.
657
658	The scheduler infrastructure is based on an object-oriented design. The
659	scheduler operations for a thread are defined as virtual functions. For the
660	scheduler helping protocol the following operations must be implemented by an
661	SMP-aware scheduler
662
663	* ask a scheduler node for help,
664	* reconsider the help request of a scheduler node,
665	* withdraw a schedule node.
666
667	All currently available SMP-aware schedulers use a framework which is
668	customized via inline functions. This eases the implementation of scheduler
669	variants. Up to now, only priority-based schedulers are implemented.
670
671	In case a thread is allowed to use more than one scheduler node it will ask
672	these nodes for help
673
674	* in case of preemption, or
675	* an unblock did not schedule the thread, or
676	* a yield was successful.
677
678	The actual ask for help scheduler operations are carried out as a side-effect
679	of the thread dispatch procedure. Once a need for help is recognized, a help
680	request is registered in one of the processors related to the thread and a
681	thread dispatch is issued. This indirection leads to a better decoupling of
682	scheduler instances. Unrelated processors are not burdened with extra work for
683	threads which participate in resource sharing. Each ask for help operation
684	indicates if it could help or not. The procedure stops after the first
685	successful ask for help. Unsuccessful ask for help operations will register
686	this need in the scheduler context.
687
688	After a thread dispatch the reconsider help request operation is used to clean
689	up stale help registrations in the scheduler contexts.
690
691	The withdraw operation takes away scheduler nodes once the thread is no longer
692	allowed to use them, e.g. it released a mutex. The availability of scheduler
693	nodes for a thread is controlled by the thread queues.
694
695	.. _SMPThreadDispatchDetails:
696
697	Thread Dispatch Details
698	-----------------------
699
700	This section gives background information to developers interested in the
701	interrupt latencies introduced by thread dispatching. A thread dispatch
702	consists of all work which must be done to stop the currently executing thread
703	on a processor and hand over this processor to an heir thread.
704
705	In SMP systems, scheduling decisions on one processor must be propagated
706	to other processors through inter-processor interrupts. A thread dispatch
707	which must be carried out on another processor does not happen instantaneously.
708	Thus, several thread dispatch requests might be in the air and it is possible
709	that some of them may be out of date before the corresponding processor has
710	time to deal with them. The thread dispatch mechanism uses three per-processor
711	variables,
712
713	- the executing thread,
714
715	- the heir thread, and
716
717	- a boolean flag indicating if a thread dispatch is necessary or not.
718
719	Updates of the heir thread are done via a normal store operation. The thread
720	dispatch necessary indicator of another processor is set as a side-effect of an
721	inter-processor interrupt. So, this change notification works without the use
722	of locks. The thread context is protected by a :term:`TTAS` lock embedded in
723	the context to ensure that it is used on at most one processor at a time.
724	Normally, only thread-specific or per-processor locks are used during a thread
725	dispatch. This implementation turned out to be quite efficient and no lock
726	contention was observed in the testsuite. The heavy-weight thread dispatch
727	sequence is only entered in case the thread dispatch indicator is set.
728
729	The context-switch is performed with interrupts enabled. During the transition
730	from the executing to the heir thread neither the stack of the executing nor
731	the heir thread must be used during interrupt processing. For this purpose a
732	temporary per-processor stack is set up which may be used by the interrupt
733	prologue before the stack is switched to the interrupt stack.
734
735	Per-Processor Data
736	------------------
737
738	RTEMS provides two means for per-processor data:
739
740	1. Per-processor data which is used by RTEMS itself is contained in the
741	`Per_CPU_Control` structure. The application configuration via
742	`<rtems/confdefs.h>` creates a table of these structures
743	(`_Per_CPU_Information[]`). The table is dimensioned according to the count
744	of configured processors
745	(:ref:`CONFIGURE_MAXIMUM_PROCESSORS <CONFIGURE_MAXIMUM_PROCESSORS>`).
746
747	2. For low level support libraries an API for statically allocated
748	per-processor data is available via
749	`<rtems/score/percpudata.h> <https://git.rtems.org/rtems/tree/cpukit/include/rtems/score/percpudata.h>`_.
750	This API is not intended for general application use. Please ask on the
751	development mailing list in case you want to use it.
752
753	.. _ThreadPinning:
754
755	Thread Pinning
756	--------------
757
758	Thread pinning ensures that a thread is only dispatched to the processor on
759	which it is pinned. It may be used to access per-processor data structures in
760	critical sections with enabled thread dispatching, e.g. a pinned thread is
761	allowed to block. The `_Thread_Pin()` operation will pin the executing thread
762	to its current processor. A thread may be pinned recursively, the last unpin
763	request via `_Thread_Unpin()` revokes the pinning.
764
765	Thread pinning should be used only for short critical sections and not all
766	the time. Thread pinning is a very low overhead operation in case the
767	thread is not preempted during the pinning. A preemption will result in
768	scheduler operations to ensure that the thread executes only on its pinned
769	processor. Thread pinning must be used with care, since it prevents help
770	through the locking protocols. This makes the :ref:`OMIP <OMIP>` and
771	:ref:`MrsP <MrsP>` locking protocols ineffective if pinned threads are
772	involved.
773
774	The thread pinning is not intended for general application use. Please ask on
775	the development mailing list in case you want to use it.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: