Context Navigation

source: rtems-docs/c-user/symmetric_multiprocessing_services.rst @ c2ee227

5

Last change on this file since c2ee227 was c2ee227, checked in by Sebastian Huber <sebastian.huber@…>, on 03/07/18 at 13:06:48

c-user: Promote clustered scheduler configuration

Add own section for the clustered scheduler configuration.

Property mode set to 100644

File size: 33.7 KB

Rev	Line
[489740f]	1	.. comment SPDX-License-Identifier: CC-BY-SA-4.0
	2
[969b4cd]	3	.. COMMENT: COPYRIGHT (c) 2014.
	4	.. COMMENT: On-Line Applications Research Corporation (OAR).
[0874d7d]	5	.. COMMENT: Copyright (c) 2017 embedded brains GmbH.
[c9aaf31]	6	.. COMMENT: All rights reserved.
	7
[6c56401]	8	.. index:: Symmetric Multiprocessing
	9	.. index:: SMP
	10
[b2e56c5]	11	Symmetric Multiprocessing (SMP)
	12	*******************************
[fd6dc8c8]	13
	14	Introduction
	15	============
	16
[60a6d6e]	17	The Symmetric Multiprocessing (SMP) support of the RTEMS is available on
[fd6dc8c8]	18
[1f3c22e]	19	- ARMv7-A,
[fd6dc8c8]	20
	21	- PowerPC, and
	22
	23	- SPARC.
	24
[c9aaf31]	25	.. warning::
	26
[1f3c22e]	27	The SMP support must be explicitly enabled via the ``--enable-smp``
	28	configure command line option for the :term:`BSP` build.
[fd6dc8c8]	29
[1f3c22e]	30	RTEMS is supposed to be a real-time operating system. What does this mean in
	31	the context of SMP? The RTEMS interpretation of real-time on SMP is the
	32	support for :ref:`ClusteredScheduling` with priority based schedulers and
	33	adequate locking protocols. One aim is to enable a schedulability analysis
	34	under the sporadic task model :cite:`Brandenburg:2011:SL`
	35	:cite:`Burns:2013:MrsP`.
[fd6dc8c8]	36
[1f3c22e]	37	The directives provided by the SMP support are:
[fd6dc8c8]	38
[c9aaf31]	39	- rtems_get_processor_count_ - Get processor count
[fd6dc8c8]	40
[c9aaf31]	41	- rtems_get_current_processor_ - Get current processor index
[fd6dc8c8]	42
	43	Background
	44	==========
	45
[1f3c22e]	46	Application Configuration
	47	-------------------------
	48
	49	By default, the maximum processor count is set to one in the application
	50	configuration. To enable SMP, the application configuration option
[1d073c5]	51	:ref:`CONFIGURE_MAXIMUM_PROCESSORS <CONFIGURE_MAXIMUM_PROCESSORS>` must be
	52	defined to a value greater than one. It is recommended to use the smallest
	53	value suitable for the application in order to save memory. Each processor
[1f3c22e]	54	needs an idle thread and interrupt stack for example.
	55
	56	The default scheduler for SMP applications supports up to 32 processors and is
[c2ee227]	57	a global fixed priority scheduler, see also :ref:`ConfigurationSchedulersClustered`.
[1f3c22e]	58
	59	The following compile-time test can be used to check if the SMP support is
	60	available or not.
	61
	62	.. code-block:: c
	63
	64	#include <rtems.h>
	65
	66	#ifdef RTEMS_SMP
	67	#warning "SMP support is enabled"
	68	#else
	69	#warning "SMP support is disabled"
	70	#endif
	71
	72	Examples
	73	--------
	74
	75	For example applications see `testsuites/smptests
	76	<https://git.rtems.org/rtems/tree/testsuites/smptests>`_.
	77
[fd6dc8c8]	78	Uniprocessor versus SMP Parallelism
	79	-----------------------------------
	80
	81	Uniprocessor systems have long been used in embedded systems. In this hardware
	82	model, there are some system execution characteristics which have long been
	83	taken for granted:
	84
	85	- one task executes at a time
	86
	87	- hardware events result in interrupts
	88
[c9aaf31]	89	There is no true parallelism. Even when interrupts appear to occur at the same
	90	time, they are processed in largely a serial fashion. This is true even when
	91	the interupt service routines are allowed to nest. From a tasking viewpoint,
	92	it is the responsibility of the real-time operatimg system to simulate
	93	parallelism by switching between tasks. These task switches occur in response
	94	to hardware interrupt events and explicit application events such as blocking
	95	for a resource or delaying.
	96
	97	With symmetric multiprocessing, the presence of multiple processors allows for
	98	true concurrency and provides for cost-effective performance
	99	improvements. Uniprocessors tend to increase performance by increasing clock
	100	speed and complexity. This tends to lead to hot, power hungry microprocessors
	101	which are poorly suited for many embedded applications.
	102
	103	The true concurrency is in sharp contrast to the single task and interrupt
	104	model of uniprocessor systems. This results in a fundamental change to
	105	uniprocessor system characteristics listed above. Developers are faced with a
	106	different set of characteristics which, in turn, break some existing
	107	assumptions and result in new challenges. In an SMP system with N processors,
	108	these are the new execution characteristics.
[fd6dc8c8]	109
	110	- N tasks execute in parallel
	111
	112	- hardware events result in interrupts
	113
[c9aaf31]	114	There is true parallelism with a task executing on each processor and the
	115	possibility of interrupts occurring on each processor. Thus in contrast to
	116	their being one task and one interrupt to consider on a uniprocessor, there are
	117	N tasks and potentially N simultaneous interrupts to consider on an SMP system.
	118
	119	This increase in hardware complexity and presence of true parallelism results
	120	in the application developer needing to be even more cautious about mutual
	121	exclusion and shared data access than in a uniprocessor embedded system. Race
	122	conditions that never or rarely happened when an application executed on a
	123	uniprocessor system, become much more likely due to multiple threads executing
	124	in parallel. On a uniprocessor system, these race conditions would only happen
	125	when a task switch occurred at just the wrong moment. Now there are N-1 tasks
	126	executing in parallel all the time and this results in many more opportunities
	127	for small windows in critical sections to be hit.
[fd6dc8c8]	128
	129	.. index:: task affinity
	130	.. index:: thread affinity
	131
[6c56401]	132	Task Affinity
	133	-------------
	134
[c9aaf31]	135	RTEMS provides services to manipulate the affinity of a task. Affinity is used
	136	to specify the subset of processors in an SMP system on which a particular task
	137	can execute.
[fd6dc8c8]	138
	139	By default, tasks have an affinity which allows them to execute on any
	140	available processor.
	141
	142	Task affinity is a possible feature to be supported by SMP-aware
	143	schedulers. However, only a subset of the available schedulers support
[c9aaf31]	144	affinity. Although the behavior is scheduler specific, if the scheduler does
	145	not support affinity, it is likely to ignore all attempts to set affinity.
[fd6dc8c8]	146
	147	The scheduler with support for arbitary processor affinities uses a proof of
	148	concept implementation. See https://devel.rtems.org/ticket/2510.
	149
	150	.. index:: task migration
	151	.. index:: thread migration
	152
[6c56401]	153	Task Migration
	154	--------------
	155
[fd6dc8c8]	156	With more than one processor in the system tasks can migrate from one processor
[a6a1f72]	157	to another. There are four reasons why tasks migrate in RTEMS.
[fd6dc8c8]	158
[a6a1f72]	159	- The scheduler changes explicitly via
	160	:ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>` or similar
	161	directives.
	162
	163	- The task processor affinity changes explicitly via
	164	:ref:`rtems_task_set_affinity() <rtems_task_set_affinity>` or similar
	165	directives.
[fd6dc8c8]	166
[c9aaf31]	167	- The task resumes execution after a blocking operation. On a priority based
	168	scheduler it will evict the lowest priority task currently assigned to a
[fd6dc8c8]	169	processor in the processor set managed by the scheduler instance.
	170
	171	- The task moves temporarily to another scheduler instance due to locking
[a6a1f72]	172	protocols like the :ref:`MrsP` or the :ref:`OMIP`.
[fd6dc8c8]	173
	174	Task migration should be avoided so that the working set of a task can stay on
	175	the most local cache level.
	176
[1f3c22e]	177	.. _ClusteredScheduling:
	178
[fd6dc8c8]	179	Clustered Scheduling
	180	--------------------
	181
[7b1c63c]	182	The scheduler is responsible to assign processors to some of the threads which
	183	are ready to execute. Trouble starts if more ready threads than processors
	184	exist at the same time. There are various rules how the processor assignment
	185	can be performed attempting to fulfill additional constraints or yield some
	186	overall system properties. As a matter of fact it is impossible to meet all
	187	requirements at the same time. The way a scheduler works distinguishes
	188	real-time operating systems from general purpose operating systems.
	189
[fd6dc8c8]	190	We have clustered scheduling in case the set of processors of a system is
[7b1c63c]	191	partitioned into non-empty pairwise-disjoint subsets of processors. These
	192	subsets are called clusters. Clusters with a cardinality of one are
	193	partitions. Each cluster is owned by exactly one scheduler instance. In case
	194	the cluster size equals the processor count, it is called global scheduling.
	195
	196	Modern SMP systems have multi-layer caches. An operating system which neglects
	197	cache constraints in the scheduler will not yield good performance. Real-time
	198	operating systems usually provide priority (fixed or job-level) based
	199	schedulers so that each of the highest priority threads is assigned to a
	200	processor. Priority based schedulers have difficulties in providing cache
	201	locality for threads and may suffer from excessive thread migrations
	202	:cite:`Brandenburg:2011:SL` :cite:`Compagnin:2014:RUN`. Schedulers that use local run
	203	queues and some sort of load-balancing to improve the cache utilization may not
	204	fulfill global constraints :cite:`Gujarati:2013:LPP` and are more difficult to
	205	implement than one would normally expect :cite:`Lozi:2016:LSDWC`.
	206
	207	Clustered scheduling was implemented for RTEMS SMP to best use the cache
	208	topology of a system and to keep the worst-case latencies under control. The
	209	low-level SMP locks use FIFO ordering. So, the worst-case run-time of
	210	operations increases with each processor involved. The scheduler configuration
	211	is quite flexible and done at link-time, see :ref:`Configuring Clustered
	212	Schedulers`. It is possible to re-assign processors to schedulers during
	213	run-time via :ref:`rtems_scheduler_add_processor()
	214	<rtems_scheduler_add_processor>` and :ref:`rtems_scheduler_remove_processor()
	215	<rtems_scheduler_remove_processor>`. The schedulers are implemented in an
	216	object-oriented fashion.
	217
	218	The problem is to provide synchronization
[fd6dc8c8]	219	primitives for inter-cluster synchronization (more than one cluster is involved
[7b1c63c]	220	in the synchronization process). In RTEMS there are currently some means
[fd6dc8c8]	221	available
	222
	223	- events,
	224
	225	- message queues,
	226
[7b1c63c]	227	- mutexes using the :ref:`OMIP`,
	228
	229	- mutexes using the :ref:`MrsP`, and
[fd6dc8c8]	230
[7b1c63c]	231	- binary and counting semaphores.
[fd6dc8c8]	232
	233	The clustered scheduling approach enables separation of functions with
	234	real-time requirements and functions that profit from fairness and high
	235	throughput provided the scheduler instances are fully decoupled and adequate
[7b1c63c]	236	inter-cluster synchronization primitives are used.
[fd6dc8c8]	237
[7b1c63c]	238	To set the scheduler of a task see :ref:`rtems_scheduler_ident()
	239	<rtems_scheduler_ident>` and :ref:`rtems_task_set_scheduler()
	240	<rtems_task_set_scheduler>`.
[fd6dc8c8]	241
	242	OpenMP
	243	------
	244
	245	OpenMP support for RTEMS is available via the GCC provided libgomp. There is
	246	libgomp support for RTEMS in the POSIX configuration of libgomp since GCC 4.9
	247	(requires a Newlib snapshot after 2015-03-12). In GCC 6.1 or later (requires a
	248	Newlib snapshot after 2015-07-30 for <sys/lock.h> provided self-contained
	249	synchronization objects) there is a specialized libgomp configuration for RTEMS
	250	which offers a significantly better performance compared to the POSIX
	251	configuration of libgomp. In addition application configurable thread pools
	252	for each scheduler instance are available in GCC 6.1 or later.
	253
	254	The run-time configuration of libgomp is done via environment variables
[c9aaf31]	255	documented in the `libgomp manual <https://gcc.gnu.org/onlinedocs/libgomp/>`_.
	256	The environment variables are evaluated in a constructor function which
	257	executes in the context of the first initialization task before the actual
	258	initialization task function is called (just like a global C++ constructor).
	259	To set application specific values, a higher priority constructor function must
	260	be used to set up the environment variables.
	261
[25d55d4]	262	.. code-block:: c
[fd6dc8c8]	263
	264	#include <stdlib.h>
	265	void __attribute__((constructor(1000))) config_libgomp( void )
	266	{
[c9aaf31]	267	setenv( "OMP_DISPLAY_ENV", "VERBOSE", 1 );
	268	setenv( "GOMP_SPINCOUNT", "30000", 1 );
	269	setenv( "GOMP_RTEMS_THREAD_POOLS", "1$2@SCHD", 1 );
[fd6dc8c8]	270	}
	271
	272	The environment variable ``GOMP_RTEMS_THREAD_POOLS`` is RTEMS-specific. It
[c9aaf31]	273	determines the thread pools for each scheduler instance. The format for
	274	``GOMP_RTEMS_THREAD_POOLS`` is a list of optional
	275	``<thread-pool-count>[$<priority>]@<scheduler-name>`` configurations separated
	276	by ``:`` where:
[fd6dc8c8]	277
[c9aaf31]	278	- ``<thread-pool-count>`` is the thread pool count for this scheduler instance.
[fd6dc8c8]	279
[c9aaf31]	280	- ``$<priority>`` is an optional priority for the worker threads of a thread
	281	pool according to ``pthread_setschedparam``. In case a priority value is
	282	omitted, then a worker thread will inherit the priority of the OpenMP master
	283	thread that created it. The priority of the worker thread is not changed by
	284	libgomp after creation, even if a new OpenMP master thread using the worker
	285	has a different priority.
[fd6dc8c8]	286
[c9aaf31]	287	- ``@<scheduler-name>`` is the scheduler instance name according to the RTEMS
	288	application configuration.
[fd6dc8c8]	289
	290	In case no thread pool configuration is specified for a scheduler instance,
	291	then each OpenMP master thread of this scheduler instance will use its own
	292	dynamically allocated thread pool. To limit the worker thread count of the
	293	thread pools, each OpenMP master thread must call ``omp_set_num_threads``.
	294
[c9aaf31]	295	Lets suppose we have three scheduler instances ``IO``, ``WRK0``, and ``WRK1``
	296	with ``GOMP_RTEMS_THREAD_POOLS`` set to ``"1@WRK0:3$4@WRK1"``. Then there are
	297	no thread pool restrictions for scheduler instance ``IO``. In the scheduler
	298	instance ``WRK0`` there is one thread pool available. Since no priority is
	299	specified for this scheduler instance, the worker thread inherits the priority
	300	of the OpenMP master thread that created it. In the scheduler instance
	301	``WRK1`` there are three thread pools available and their worker threads run at
	302	priority four.
[fd6dc8c8]	303
[b033e39]	304	Application Issues
	305	==================
	306
	307	Most operating system services provided by the uni-processor RTEMS are
	308	available in SMP configurations as well. However, applications designed for an
	309	uni-processor environment may need some changes to correctly run in an SMP
	310	configuration.
	311
	312	As discussed earlier, SMP systems have opportunities for true parallelism which
	313	was not possible on uni-processor systems. Consequently, multiple techniques
	314	that provided adequate critical sections on uni-processor systems are unsafe on
	315	SMP systems. In this section, some of these unsafe techniques will be
	316	discussed.
	317
	318	In general, applications must use proper operating system provided mutual
	319	exclusion mechanisms to ensure correct behavior.
	320
	321	Task variables
	322	--------------
	323
	324	Task variables are ordinary global variables with a dedicated value for each
	325	thread. During a context switch from the executing thread to the heir thread,
	326	the value of each task variable is saved to the thread control block of the
	327	executing thread and restored from the thread control block of the heir thread.
	328	This is inherently broken if more than one executing thread exists.
[3384994]	329	Alternatives to task variables are POSIX keys and :term:`TLS`. All use cases
	330	of task variables in the RTEMS code base were replaced with alternatives. The
	331	task variable API has been removed in RTEMS 5.1.
[b033e39]	332
	333	Highest Priority Thread Never Walks Alone
	334	-----------------------------------------
	335
	336	On a uni-processor system, it is safe to assume that when the highest priority
	337	task in an application executes, it will execute without being preempted until
	338	it voluntarily blocks. Interrupts may occur while it is executing, but there
	339	will be no context switch to another task unless the highest priority task
	340	voluntarily initiates it.
	341
	342	Given the assumption that no other tasks will have their execution interleaved
	343	with the highest priority task, it is possible for this task to be constructed
	344	such that it does not need to acquire a mutex for protected access to shared
	345	data.
	346
	347	In an SMP system, it cannot be assumed there will never be a single task
	348	executing. It should be assumed that every processor is executing another
	349	application task. Further, those tasks will be ones which would not have been
	350	executed in a uni-processor configuration and should be assumed to have data
	351	synchronization conflicts with what was formerly the highest priority task
	352	which executed without conflict.
	353
[39773ce]	354	Disabling of Thread Preemption
	355	------------------------------
[b033e39]	356
[39773ce]	357	A thread which disables preemption prevents that a higher priority thread gets
[b033e39]	358	hold of its processor involuntarily. In uni-processor configurations, this can
	359	be used to ensure mutual exclusion at thread level. In SMP configurations,
	360	however, more than one executing thread may exist. Thus, it is impossible to
	361	ensure mutual exclusion using this mechanism. In order to prevent that
[39773ce]	362	applications using preemption for this purpose, would show inappropriate
[b033e39]	363	behaviour, this feature is disabled in SMP configurations and its use would
	364	case run-time errors.
	365
	366	Disabling of Interrupts
	367	-----------------------
	368
	369	A low overhead means that ensures mutual exclusion in uni-processor
	370	configurations is the disabling of interrupts around a critical section. This
	371	is commonly used in device driver code. In SMP configurations, however,
	372	disabling the interrupts on one processor has no effect on other processors.
	373	So, this is insufficient to ensure system-wide mutual exclusion. The macros
	374
	375	* :ref:`rtems_interrupt_disable() <rtems_interrupt_disable>`,
	376
	377	* :ref:`rtems_interrupt_enable() <rtems_interrupt_enable>`, and
	378
	379	* :ref:`rtems_interrupt_flash() <rtems_interrupt_flash>`.
	380
	381	are disabled in SMP configurations and its use will cause compile-time warnings
	382	and link-time errors. In the unlikely case that interrupts must be disabled on
	383	the current processor, the
	384
	385	* :ref:`rtems_interrupt_local_disable() <rtems_interrupt_local_disable>`, and
	386
	387	* :ref:`rtems_interrupt_local_enable() <rtems_interrupt_local_enable>`.
	388
	389	macros are now available in all configurations.
	390
	391	Since disabling of interrupts is insufficient to ensure system-wide mutual
	392	exclusion on SMP a new low-level synchronization primitive was added --
	393	interrupt locks. The interrupt locks are a simple API layer on top of the SMP
	394	locks used for low-level synchronization in the operating system core.
	395	Currently, they are implemented as a ticket lock. In uni-processor
	396	configurations, they degenerate to simple interrupt disable/enable sequences by
	397	means of the C pre-processor. It is disallowed to acquire a single interrupt
	398	lock in a nested way. This will result in an infinite loop with interrupts
	399	disabled. While converting legacy code to interrupt locks, care must be taken
	400	to avoid this situation to happen.
	401
	402	.. code-block:: c
	403	:linenos:
	404
	405	#include <rtems.h>
	406
	407	void legacy_code_with_interrupt_disable_enable( void )
	408	{
	409	rtems_interrupt_level level;
	410
	411	rtems_interrupt_disable( level );
	412	/* Critical section */
	413	rtems_interrupt_enable( level );
	414	}
	415
	416	RTEMS_INTERRUPT_LOCK_DEFINE( static, lock, "Name" )
	417
	418	void smp_ready_code_with_interrupt_lock( void )
	419	{
	420	rtems_interrupt_lock_context lock_context;
	421
	422	rtems_interrupt_lock_acquire( &lock, &lock_context );
	423	/* Critical section */
	424	rtems_interrupt_lock_release( &lock, &lock_context );
	425	}
	426
	427	An alternative to the RTEMS-specific interrupt locks are POSIX spinlocks. The
	428	:c:type:`pthread_spinlock_t` is defined as a self-contained object, e.g. the
	429	user must provide the storage for this synchronization object.
	430
	431	.. code-block:: c
	432	:linenos:
	433
	434	#include <assert.h>
	435	#include <pthread.h>
	436
	437	pthread_spinlock_t lock;
	438
	439	void smp_ready_code_with_posix_spinlock( void )
	440	{
	441	int error;
	442
	443	error = pthread_spin_lock( &lock );
	444	assert( error == 0 );
	445	/* Critical section */
	446	error = pthread_spin_unlock( &lock );
	447	assert( error == 0 );
	448	}
	449
	450	In contrast to POSIX spinlock implementation on Linux or FreeBSD, it is not
	451	allowed to call blocking operating system services inside the critical section.
	452	A recursive lock attempt is a severe usage error resulting in an infinite loop
	453	with interrupts disabled. Nesting of different locks is allowed. The user
	454	must ensure that no deadlock can occur. As a non-portable feature the locks
	455	are zero-initialized, e.g. statically initialized global locks reside in the
	456	``.bss`` section and there is no need to call :c:func:`pthread_spin_init`.
	457
	458	Interrupt Service Routines Execute in Parallel With Threads
	459	-----------------------------------------------------------
	460
	461	On a machine with more than one processor, interrupt service routines (this
	462	includes timer service routines installed via :ref:`rtems_timer_fire_after()
	463	<rtems_timer_fire_after>`) and threads can execute in parallel. Interrupt
	464	service routines must take this into account and use proper locking mechanisms
	465	to protect critical sections from interference by threads (interrupt locks or
	466	POSIX spinlocks). This likely requires code modifications in legacy device
	467	drivers.
	468
	469	Timers Do Not Stop Immediately
	470	------------------------------
	471
	472	Timer service routines run in the context of the clock interrupt. On
	473	uni-processor configurations, it is sufficient to disable interrupts and remove
	474	a timer from the set of active timers to stop it. In SMP configurations,
	475	however, the timer service routine may already run and wait on an SMP lock
	476	owned by the thread which is about to stop the timer. This opens the door to
	477	subtle synchronization issues. During destruction of objects, special care
	478	must be taken to ensure that timer service routines cannot access (partly or
	479	fully) destroyed objects.
	480
	481	False Sharing of Cache Lines Due to Objects Table
	482	-------------------------------------------------
	483
	484	The Classic API and most POSIX API objects are indirectly accessed via an
	485	object identifier. The user-level functions validate the object identifier and
	486	map it to the actual object structure which resides in a global objects table
	487	for each object class. So, unrelated objects are packed together in a table.
	488	This may result in false sharing of cache lines. The effect of false sharing
	489	of cache lines can be observed with the `TMFINE 1
	490	<https://git.rtems.org/rtems/tree/testsuites/tmtests/tmfine01>`_ test program
	491	on a suitable platform, e.g. QorIQ T4240. High-performance SMP applications
	492	need full control of the object storage :cite:`Drepper:2007:Memory`.
	493	Therefore, self-contained synchronization objects are now available for RTEMS.
	494
[fd6dc8c8]	495	Directives
	496	==========
	497
[c9aaf31]	498	This section details the symmetric multiprocessing services. A subsection is
	499	dedicated to each of these services and describes the calling sequence, related
	500	constants, usage, and status codes.
[fd6dc8c8]	501
[53bb72e]	502	.. raw:: latex
	503
	504	\clearpage
	505
[c9aaf31]	506	.. _rtems_get_processor_count:
[fd6dc8c8]	507
	508	GET_PROCESSOR_COUNT - Get processor count
	509	-----------------------------------------
	510
[53bb72e]	511	CALLING SEQUENCE:
	512	.. code-block:: c
[fd6dc8c8]	513
[53bb72e]	514	uint32_t rtems_get_processor_count(void);
[fd6dc8c8]	515
[53bb72e]	516	DIRECTIVE STATUS CODES:
[794eb1b]	517
	518	The count of processors in the system that can be run. The value returned
	519	is the highest numbered processor index of all processors available to the
	520	application (if a scheduler is assigned) plus one.
[fd6dc8c8]	521
[53bb72e]	522	DESCRIPTION:
[a0d2eee]	523	In uni-processor configurations, a value of one will be returned.
[fd6dc8c8]	524
[a0d2eee]	525	In SMP configurations, this returns the value of a global variable set
[53bb72e]	526	during system initialization to indicate the count of utilized processors.
	527	The processor count depends on the physically or virtually available
	528	processors and application configuration. The value will always be less
	529	than or equal to the maximum count of application configured processors.
[fd6dc8c8]	530
[53bb72e]	531	NOTES:
	532	None.
[fd6dc8c8]	533
[53bb72e]	534	.. raw:: latex
[fd6dc8c8]	535
[53bb72e]	536	\clearpage
[fd6dc8c8]	537
[c9aaf31]	538	.. _rtems_get_current_processor:
[fd6dc8c8]	539
	540	GET_CURRENT_PROCESSOR - Get current processor index
	541	---------------------------------------------------
	542
[53bb72e]	543	CALLING SEQUENCE:
	544	.. code-block:: c
[fd6dc8c8]	545
[53bb72e]	546	uint32_t rtems_get_current_processor(void);
[fd6dc8c8]	547
[53bb72e]	548	DIRECTIVE STATUS CODES:
	549	The index of the current processor.
[fd6dc8c8]	550
[53bb72e]	551	DESCRIPTION:
[a0d2eee]	552	In uni-processor configurations, a value of zero will be returned.
[fd6dc8c8]	553
[a0d2eee]	554	In SMP configurations, an architecture specific method is used to obtain the
[53bb72e]	555	index of the current processor in the system. The set of processor indices
	556	is the range of integers starting with zero up to the processor count minus
	557	one.
[fd6dc8c8]	558
[53bb72e]	559	Outside of sections with disabled thread dispatching the current processor
	560	index may change after every instruction since the thread may migrate from
	561	one processor to another. Sections with disabled interrupts are sections
	562	with thread dispatching disabled.
[fd6dc8c8]	563
[53bb72e]	564	NOTES:
	565	None.
[785c02f]	566
	567	Implementation Details
	568	======================
	569
	570	This section covers some implementation details of the RTEMS SMP support.
	571
	572	Low-Level Synchronization
	573	-------------------------
	574
	575	All low-level synchronization primitives are implemented using :term:`C11`
	576	atomic operations, so no target-specific hand-written assembler code is
	577	necessary. Four synchronization primitives are currently available
	578
	579	* ticket locks (mutual exclusion),
	580
	581	* :term:`MCS` locks (mutual exclusion),
	582
	583	* barriers, implemented as a sense barrier, and
	584
	585	* sequence locks :cite:`Boehm:2012:Seqlock`.
	586
	587	A vital requirement for low-level mutual exclusion is :term:`FIFO` fairness
	588	since we are interested in a predictable system and not maximum throughput.
	589	With this requirement, there are only few options to resolve this problem. For
	590	reasons of simplicity, the ticket lock algorithm was chosen to implement the
	591	SMP locks. However, the API is capable to support MCS locks, which may be
	592	interesting in the future for systems with a processor count in the range of 32
	593	or more, e.g. :term:`NUMA`, many-core systems.
	594
	595	The test program `SMPLOCK 1
	596	<https://git.rtems.org/rtems/tree/testsuites/smptests/smplock01>`_ can be used
	597	to gather performance and fairness data for several scenarios. The SMP lock
	598	performance and fairness measured on the QorIQ T4240 follows as an example.
	599	This chip contains three L2 caches. Each L2 cache is shared by eight
	600	processors.
	601
	602	.. image:: ../images/c_user/smplock01perf-t4240.*
	603	:width: 400
	604	:align: center
	605
	606	.. image:: ../images/c_user/smplock01fair-t4240.*
	607	:width: 400
	608	:align: center
	609
[29222e3]	610	Internal Locking
	611	----------------
	612
	613	In SMP configurations, the operating system uses non-recursive SMP locks for
	614	low-level mutual exclusion. The locking domains are roughly
	615
	616	* a particular data structure,
	617	* the thread queue operations,
	618	* the thread state changes, and
	619	* the scheduler operations.
	620
	621	For a good average-case performance it is vital that every high-level
	622	synchronization object, e.g. mutex, has its own SMP lock. In the average-case,
	623	only this SMP lock should be involved to carry out a specific operation, e.g.
	624	obtain/release a mutex. In general, the high-level synchronization objects
	625	have a thread queue embedded and use its SMP lock.
	626
	627	In case a thread must block on a thread queue, then things get complicated.
	628	The executing thread first acquires the SMP lock of the thread queue and then
	629	figures out that it needs to block. The procedure to block the thread on this
	630	particular thread queue involves state changes of the thread itself and for
	631	this thread-specific SMP locks must be used.
	632
	633	In order to determine if a thread is blocked on a thread queue or not
	634	thread-specific SMP locks must be used. A thread priority change must
	635	propagate this to the thread queue (possibly recursively). Care must be taken
	636	to not have a lock order reversal between thread queue and thread-specific SMP
	637	locks.
	638
	639	Each scheduler instance has its own SMP lock. For the scheduler helping
	640	protocol multiple scheduler instances may be in charge of a thread. It is not
	641	possible to acquire two scheduler instance SMP locks at the same time,
	642	otherwise deadlocks would happen. A thread-specific SMP lock is used to
	643	synchronize the thread data shared by different scheduler instances.
	644
	645	The thread state SMP lock protects various things, e.g. the thread state, join
	646	operations, signals, post-switch actions, the home scheduler instance, etc.
	647
[0874d7d]	648	Profiling
	649	---------
	650
	651	To identify the bottlenecks in the system, support for profiling of low-level
	652	synchronization is optionally available. The profiling support is a BSP build
	653	time configuration option (``--enable-profiling``) and is implemented with an
	654	acceptable overhead, even for production systems. A low-overhead counter for
	655	short time intervals must be provided by the hardware.
	656
	657	Profiling reports are generated in XML for most test programs of the RTEMS
	658	testsuite (more than 500 test programs). This gives a good sample set for
	659	statistics. For example the maximum thread dispatch disable time, the maximum
	660	interrupt latency or lock contention can be determined.
	661
	662	.. code-block:: xml
	663
	664	<ProfilingReport name="SMPMIGRATION 1">
	665	<PerCPUProfilingReport processorIndex="0">
	666	<MaxThreadDispatchDisabledTime unit="ns">36636</MaxThreadDispatchDisabledTime>
	667	<MeanThreadDispatchDisabledTime unit="ns">5065</MeanThreadDispatchDisabledTime>
	668	<TotalThreadDispatchDisabledTime unit="ns">3846635988
	669	</TotalThreadDispatchDisabledTime>
	670	<ThreadDispatchDisabledCount>759395</ThreadDispatchDisabledCount>
	671	<MaxInterruptDelay unit="ns">8772</MaxInterruptDelay>
	672	<MaxInterruptTime unit="ns">13668</MaxInterruptTime>
	673	<MeanInterruptTime unit="ns">6221</MeanInterruptTime>
	674	<TotalInterruptTime unit="ns">6757072</TotalInterruptTime>
	675	<InterruptCount>1086</InterruptCount>
	676	</PerCPUProfilingReport>
	677	<PerCPUProfilingReport processorIndex="1">
	678	<MaxThreadDispatchDisabledTime unit="ns">39408</MaxThreadDispatchDisabledTime>
	679	<MeanThreadDispatchDisabledTime unit="ns">5060</MeanThreadDispatchDisabledTime>
	680	<TotalThreadDispatchDisabledTime unit="ns">3842749508
	681	</TotalThreadDispatchDisabledTime>
	682	<ThreadDispatchDisabledCount>759391</ThreadDispatchDisabledCount>
	683	<MaxInterruptDelay unit="ns">8412</MaxInterruptDelay>
	684	<MaxInterruptTime unit="ns">15868</MaxInterruptTime>
	685	<MeanInterruptTime unit="ns">3525</MeanInterruptTime>
	686	<TotalInterruptTime unit="ns">3814476</TotalInterruptTime>
	687	<InterruptCount>1082</InterruptCount>
	688	</PerCPUProfilingReport>
	689	<!-- more reports omitted --->
	690	<SMPLockProfilingReport name="Scheduler">
	691	<MaxAcquireTime unit="ns">7092</MaxAcquireTime>
	692	<MaxSectionTime unit="ns">10984</MaxSectionTime>
	693	<MeanAcquireTime unit="ns">2320</MeanAcquireTime>
	694	<MeanSectionTime unit="ns">199</MeanSectionTime>
	695	<TotalAcquireTime unit="ns">3523939244</TotalAcquireTime>
	696	<TotalSectionTime unit="ns">302545596</TotalSectionTime>
	697	<UsageCount>1518758</UsageCount>
	698	<ContentionCount initialQueueLength="0">759399</ContentionCount>
	699	<ContentionCount initialQueueLength="1">759359</ContentionCount>
	700	<ContentionCount initialQueueLength="2">0</ContentionCount>
	701	<ContentionCount initialQueueLength="3">0</ContentionCount>
	702	</SMPLockProfilingReport>
	703	</ProfilingReport>
	704
[87b4d03]	705	Scheduler Helping Protocol
	706	--------------------------
	707
	708	The scheduler provides a helping protocol to support locking protocols like the
	709	:ref:`OMIP` or the :ref:`MrsP`. Each thread has a scheduler node for each
	710	scheduler instance in the system which are located in its :term:`TCB`. A
	711	thread has exactly one home scheduler instance which is set during thread
	712	creation. The home scheduler instance can be changed with
	713	:ref:`rtems_task_set_scheduler() <rtems_task_set_scheduler>`. Due to the
	714	locking protocols a thread may gain access to scheduler nodes of other
	715	scheduler instances. This allows the thread to temporarily migrate to another
[39773ce]	716	scheduler instance in case of preemption.
[87b4d03]	717
	718	The scheduler infrastructure is based on an object-oriented design. The
	719	scheduler operations for a thread are defined as virtual functions. For the
	720	scheduler helping protocol the following operations must be implemented by an
	721	SMP-aware scheduler
	722
	723	* ask a scheduler node for help,
	724	* reconsider the help request of a scheduler node,
	725	* withdraw a schedule node.
	726
	727	All currently available SMP-aware schedulers use a framework which is
	728	customized via inline functions. This eases the implementation of scheduler
	729	variants. Up to now, only priority-based schedulers are implemented.
	730
	731	In case a thread is allowed to use more than one scheduler node it will ask
	732	these nodes for help
	733
[39773ce]	734	* in case of preemption, or
[87b4d03]	735	* an unblock did not schedule the thread, or
	736	* a yield was successful.
	737
	738	The actual ask for help scheduler operations are carried out as a side-effect
	739	of the thread dispatch procedure. Once a need for help is recognized, a help
	740	request is registered in one of the processors related to the thread and a
	741	thread dispatch is issued. This indirection leads to a better decoupling of
	742	scheduler instances. Unrelated processors are not burdened with extra work for
	743	threads which participate in resource sharing. Each ask for help operation
	744	indicates if it could help or not. The procedure stops after the first
	745	successful ask for help. Unsuccessful ask for help operations will register
	746	this need in the scheduler context.
	747
	748	After a thread dispatch the reconsider help request operation is used to clean
	749	up stale help registrations in the scheduler contexts.
	750
	751	The withdraw operation takes away scheduler nodes once the thread is no longer
	752	allowed to use them, e.g. it released a mutex. The availability of scheduler
	753	nodes for a thread is controlled by the thread queues.
	754
[785c02f]	755	Thread Dispatch Details
	756	-----------------------
	757
	758	This section gives background information to developers interested in the
	759	interrupt latencies introduced by thread dispatching. A thread dispatch
	760	consists of all work which must be done to stop the currently executing thread
	761	on a processor and hand over this processor to an heir thread.
	762
	763	In SMP systems, scheduling decisions on one processor must be propagated
	764	to other processors through inter-processor interrupts. A thread dispatch
	765	which must be carried out on another processor does not happen instantaneously.
	766	Thus, several thread dispatch requests might be in the air and it is possible
	767	that some of them may be out of date before the corresponding processor has
	768	time to deal with them. The thread dispatch mechanism uses three per-processor
	769	variables,
	770
	771	- the executing thread,
	772
	773	- the heir thread, and
	774
	775	- a boolean flag indicating if a thread dispatch is necessary or not.
	776
	777	Updates of the heir thread are done via a normal store operation. The thread
	778	dispatch necessary indicator of another processor is set as a side-effect of an
	779	inter-processor interrupt. So, this change notification works without the use
[90a3c41]	780	of locks. The thread context is protected by a :term:`TTAS` lock embedded in
	781	the context to ensure that it is used on at most one processor at a time.
[785c02f]	782	Normally, only thread-specific or per-processor locks are used during a thread
	783	dispatch. This implementation turned out to be quite efficient and no lock
	784	contention was observed in the testsuite. The heavy-weight thread dispatch
	785	sequence is only entered in case the thread dispatch indicator is set.
	786
	787	The context-switch is performed with interrupts enabled. During the transition
	788	from the executing to the heir thread neither the stack of the executing nor
	789	the heir thread must be used during interrupt processing. For this purpose a
	790	temporary per-processor stack is set up which may be used by the interrupt
	791	prologue before the stack is switched to the interrupt stack.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: