#2556 closed enhancement (fixed)
Implement the O(m) Independence-Preserving Protocol (OMIP)
Reported by: | Sebastian Huber | Owned by: | Sebastian Huber |
---|---|---|---|
Priority: | high | Milestone: | 5.1 |
Component: | score | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: |
Description (last modified by Sebastian Huber)
Background
The O(m) Independence-Preserving Protocol (OMIP) is a generalization of the priority inheritance protocol to clustered scheduling which avoids the non-preemptive sections present with priority boosting. The m denotes the number of processors in the system. Its implementation requires an extension of the scheduler helping protocol already used for the MrsP semaphores. However, the current implementation of the scheduler helping protocol has two major issues, see Catellani, Sebastiano, Luca Bonato, Sebastian Huber, and Enrico Mezzetti: Challenges in the Imple- mentation of MrsP. In Reliable Software Technologies - Ada-Europe 2015, pages 179–195, 2015. Firstly, the run-time of some scheduler operations depend on the size of the resource dependency tree. Secondly, the scheduler operations of threads which don't use shared resources must deal with the scheduler helping protocol in case an owner of a shared resource is somehow involved.
To illustrate the second issue, let us look at the following example. We have a system with eight processors and two L2 caches. We assign processor 0 to a partition P for latency sensitive real-time tasks (e.g. sensor and actuator handling), processors 1, 2 and 3 are assigned to a cluster CA and the remaining processors are assigned to a cluster CB for soft real-time worker tasks. The worker tasks use a shared resource, e.g. a file system for data storage. Let us suppose a task R of partition P sends a message to the workers. This may make a waiting worker ready, which in turn pre-empts the owner of a shared resource. In this case the scheduler helping protocol takes action and is carried out by the task R. This contradicts the intended isolation of scheduler instances.
The reason for this unfortunate coupling is a design issue of the scheduler helping protocol implementation. Some scheduler operations may return a thread in need of help. For example, if a thread is unblocked which pre-empts an owner of a shared resource, then the pre-empted thread is returned. Once a thread in need of help is returned, the ask for help operation of the scheduler is executed. An alternative to this return value based approach is the introduction of a pre-emption intervention during thread dispatching. Threads taking part in the scheduler helping protocol indicate this with a positive resource count value. In case a thread dispatch occurs and pre-empts an owner of a shared resource, the scheduler ask for help operation is invoked. So, the work is carried out on behalf of the thread which takes part in the scheduler helping protocol.
To overcome the first issue, an improved resource dependency tracking is required. One approach is to use a recursive red-black tree based data structure, see #2412.
Implementation
There are several steps necessary to implement OMIP.
- Introduce per-scheduler locks.
- Enable context switches with interrupts enabled.
- Add a pre-emption intervention to the thread dispatch.
- Add a table for priority nodes to the thread control block. For each scheduler instance there is one priority node.
- Update the table in case the thread blocks on a resource, a timeout while waiting for a resource occurs, or ownership of a resource is transferred to the thread.
- Use this table in the pre-emption intervention.
- Update the MrsP implementation to the new infrastructure.
Currently, only one scheduler lock for all scheduler instances is used. This simplified the MrsP implementation and due to the presence of a Giant lock, this was not an issue. With the elimination of the Giant lock, however, we need one scheduler lock per scheduler instance to really profit from a decoupled system due to clustered scheduling.
The current implementation of thread dispatching has some implications with respect to the interrupt latency. It is crucial to preserve the system invariant that a thread can execute on at most one processor in the system at a time. This is accomplished with a boolean indicator in the thread context. The processor architecture specific context switch code will mark that a thread context is no longer executing and waits that the heir context stopped execution before it restores the heir context and resumes execution of the heir thread (the boolean indicator is basically a TTAS lock). So, there is one point in time in which a processor is without a thread. This is essential to avoid cyclic dependencies in case multiple threads migrate at once. Otherwise some supervising entity is necessary to prevent deadlocks. Such a global supervisor would lead to scalability problems so this approach is not used. Currently the context switch is performed with interrupts disabled. Thus in case the heir thread is currently executing on another processor, the time of disabled interrupts is prolonged since one processor has to wait for another processor to make progress.
If we add pre-emption intervention to the thread dispatch sequence, then there is an even greater need to avoid this issue with the interrupt latency. Interrupts normally store the context of the interrupted thread on its stack. In case a thread is marked as not executing, we must not use its thread stack to store such an interrupt context. We cannot use the heir stack before it stopped execution on another processor. If we enable interrupts during this transition, then we have to provide an alternative thread independent stack for interrupts in this time frame.
The pre-emption intervention should be added to _Thread_Do_dispatch()
before the heir is read and perform the following pseudo-code actions.
pre_emption_intervention(executing): if executing.resource_count > 0: executing.lock() if executing.is_ready(): for scheduler in executing.schedulers: scheduler.lock() if !executing.is_scheduled(): for scheduler in executing.schedulers: scheduler.ask_for_help(executing) for scheduler in executing.schedulers: scheduler.unlock() else if executing.active_help_level > 0: idle.use(executing.scheduler_node) executing.unlock()
The scheduler help operation affects multiple scheduler instances. In terms of locking we have only two options,
- use a global scheduler lock, or
- obtain multiple per-scheduler locks at once.
A global scheduler lock is not an option. To avoid deadlocks obtain the per-scheduler locks in a fixed order. However, in this case the per-scheduler locks will observe different worst-case and average-case acquire times (depending on the order).
Use a recursive data structure to determine the highest priority available to a thread for each scheduler instance, e.g.
typedef struct Thread_Priority_node { Priority_Control current_priority; Priority_Control real_priority; struct Thread_Priority_node *owner; RBTree_Node Node; RBTree_Control Inherited_priorities; } Thread_Priority_node; typedef struct { ... Thread_Priority_node *priority_nodes; /* One per scheduler instances */ ... } Thread_Control;
Initially a thread has a priority node reflecting its real priority. The Thread_Priority_node::owner
is NULL
. The Thread_Priority_node::current_priority
is set to the real priority. The Thread_Priority_node::Inherited_priorities
is empty.
In case the thread must wait for ownership of a mutex, then it enqueues its priority node in Thread_Priority_node::Inherited_priorities
of the mutex owner.
In case the thread is dequeued from the wait queue of a mutex, then it dequeues its priority node in Thread_Priority_node::Inherited_priorities
of the previous mutex owner (ownership transfer) or the current mutex owner (acquire timeout).
In case the minimum of the Thread_Priority_node::real_priority
and the Thread_Priority_node::Inherited_priorities
changes, then Thread_Priority_node::current_priority
is updated. In case the Thread_Priority_node::owner
its not NULL
, the priority change propagates to the owner, and so on. In case Thread_Priority_node::current_priority
changes, the corresponding scheduler is notified.
Use the thread lock to protect the priority nodes.
Attachments (2)
Change History (59)
comment:1 Changed on 01/27/16 at 08:46:27 by Sebastian Huber
Description: | modified (diff) |
---|
Changed on 01/27/16 at 08:49:03 by Sebastian Huber
Attachment: | resource.png added |
---|
Changed on 01/27/16 at 08:49:16 by Sebastian Huber
comment:2 Changed on 01/27/16 at 08:52:02 by Sebastian Huber
comment:3 Changed on 01/27/16 at 08:54:45 by Sebastian Huber
This is an example of a table of priority nodes with sixteen threads t0 up to t15 and three scheduler instances s0 up to s2 corresponding to the previous example. The overall resource owner is t0. The colour of the nodes indicate the scheduler instance. Several threads of different scheduler instances depend on thread t10. So, the thread t10 contributes for example the highest priority node of scheduler instance s2 to thread t0 even though it uses scheduler instance s0.
comment:4 Changed on 02/04/16 at 12:00:29 by Sebastian Huber
Status: | new → accepted |
---|
comment:5 Changed on 05/12/16 at 11:34:06 by Sebastian Huber <sebastian.huber@…>
comment:6 Changed on 05/20/16 at 14:13:25 by Sebastian Huber <sebastian.huber@…>
comment:7 Changed on 05/20/16 at 14:13:35 by Sebastian Huber <sebastian.huber@…>
comment:8 Changed on 06/22/16 at 12:47:47 by Sebastian Huber <sebastian.huber@…>
comment:9 Changed on 06/22/16 at 12:48:45 by Sebastian Huber <sebastian.huber@…>
comment:10 Changed on 07/27/16 at 08:56:15 by Sebastian Huber <sebastian.huber@…>
comment:11 Changed on 07/27/16 at 08:56:26 by Sebastian Huber <sebastian.huber@…>
comment:12 Changed on 07/27/16 at 08:56:36 by Sebastian Huber <sebastian.huber@…>
comment:13 Changed on 07/27/16 at 08:56:46 by Sebastian Huber <sebastian.huber@…>
comment:14 Changed on 07/27/16 at 08:56:57 by Sebastian Huber <sebastian.huber@…>
comment:15 Changed on 08/03/16 at 11:58:00 by Sebastian Huber <sebastian.huber@…>
comment:16 Changed on 08/04/16 at 06:29:02 by Sebastian Huber <sebastian.huber@…>
comment:17 Changed on 09/08/16 at 07:57:08 by Sebastian Huber <sebastian.huber@…>
comment:18 Changed on 09/08/16 at 07:57:18 by Sebastian Huber <sebastian.huber@…>
comment:19 Changed on 09/08/16 at 07:57:27 by Sebastian Huber <sebastian.huber@…>
comment:20 Changed on 09/21/16 at 07:05:43 by Sebastian Huber <sebastian.huber@…>
comment:21 Changed on 09/21/16 at 07:05:58 by Sebastian Huber <sebastian.huber@…>
comment:22 Changed on 09/21/16 at 07:06:10 by Sebastian Huber <sebastian.huber@…>
comment:23 Changed on 09/21/16 at 07:06:34 by Sebastian Huber <sebastian.huber@…>
comment:24 Changed on 09/21/16 at 07:06:48 by Sebastian Huber <sebastian.huber@…>
comment:25 Changed on 11/02/16 at 09:06:30 by Sebastian Huber <sebastian.huber@…>
comment:26 Changed on 11/02/16 at 09:06:40 by Sebastian Huber <sebastian.huber@…>
comment:27 Changed on 11/02/16 at 09:06:50 by Sebastian Huber <sebastian.huber@…>
comment:28 Changed on 11/02/16 at 09:07:00 by Sebastian Huber <sebastian.huber@…>
comment:29 Changed on 11/02/16 at 09:07:10 by Sebastian Huber <sebastian.huber@…>
comment:30 Changed on 11/02/16 at 09:07:20 by Sebastian Huber <sebastian.huber@…>
comment:31 Changed on 11/02/16 at 09:07:30 by Sebastian Huber <sebastian.huber@…>
comment:32 Changed on 11/02/16 at 09:07:40 by Sebastian Huber <sebastian.huber@…>
comment:33 Changed on 11/02/16 at 09:07:50 by Sebastian Huber <sebastian.huber@…>
comment:34 Changed on 11/02/16 at 09:08:00 by Sebastian Huber <sebastian.huber@…>
comment:35 Changed on 11/02/16 at 09:08:10 by Sebastian Huber <sebastian.huber@…>
comment:36 Changed on 11/02/16 at 09:08:20 by Sebastian Huber <sebastian.huber@…>
comment:37 Changed on 11/02/16 at 09:08:30 by Sebastian Huber <sebastian.huber@…>
comment:38 Changed on 11/02/16 at 09:08:39 by Sebastian Huber <sebastian.huber@…>
comment:39 Changed on 11/02/16 at 09:08:50 by Sebastian Huber <sebastian.huber@…>
comment:40 Changed on 11/02/16 at 09:08:59 by Sebastian Huber <sebastian.huber@…>
comment:41 Changed on 11/02/16 at 09:09:09 by Sebastian Huber <sebastian.huber@…>
comment:42 Changed on 11/02/16 at 09:09:19 by Sebastian Huber <sebastian.huber@…>
comment:43 Changed on 11/02/16 at 09:09:30 by Sebastian Huber <sebastian.huber@…>
comment:44 Changed on 11/02/16 at 09:09:40 by Sebastian Huber <sebastian.huber@…>
comment:45 Changed on 11/02/16 at 09:10:09 by Sebastian Huber <sebastian.huber@…>
comment:46 Changed on 11/02/16 at 09:10:19 by Sebastian Huber <sebastian.huber@…>
comment:47 Changed on 11/02/16 at 09:10:29 by Sebastian Huber <sebastian.huber@…>
comment:48 Changed on 11/02/16 at 09:10:39 by Sebastian Huber <sebastian.huber@…>
comment:49 Changed on 11/02/16 at 09:10:48 by Sebastian Huber <sebastian.huber@…>
comment:50 Changed on 11/02/16 at 09:10:59 by Sebastian Huber <sebastian.huber@…>
comment:51 Changed on 11/02/16 at 09:11:08 by Sebastian Huber <sebastian.huber@…>
comment:52 Changed on 12/23/16 at 14:10:09 by Sebastian Huber
Priority: | normal → high |
---|
comment:53 Changed on 02/01/17 at 06:59:55 by Sebastian Huber
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Version: | 4.12 |
comment:54 Changed on 02/03/17 at 09:58:05 by Sebastian Huber <sebastian.huber@…>
comment:55 Changed on 03/07/17 at 12:21:24 by Sebastian Huber <sebastian.huber@…>
In 088acbb0/rtems:
comment:56 Changed on 05/11/17 at 07:31:02 by Sebastian Huber
Milestone: | 4.12 → 4.12.0 |
---|
comment:57 Changed on 11/09/17 at 06:27:14 by Sebastian Huber
Milestone: | 4.12.0 → 5.1 |
---|
Milestone renamed
This is an example resource dependency tree with sixteen threads t0 up to t15 and sixteen resources r0 up to r15. The root of this tree is t0. The thread t0 owns the resources r0, r1, r2, r3, r6, r11 and r12 and is in the ready state. The threads t1 up to t15 wait directly or indirectly via resources owned by t0 and are in a blocked state. The colour of the thread nodes indicate the scheduler instance.