#2795 closed enhancement (fixed)

Overrun Handling for general real-time models

Reported by: Kuan Owned by: Gedare Bloom <gedare@…>
Priority: high Milestone: 5.1
Component: score Version: 4.11
Severity: blocker Keywords: Overrun, RMS, SP, Scheduler, Periodicity
Cc: Gedare Bloom, Joel Sherrill, Sebastian Huber Blocked By:
Blocking:

Description

In the current implementation, if a task period is time out, the next call of rtems_rate_monotonic_period() will only release one following job and manipulate the task period with the calling moment + the next length of period. With the assumption that implicit/constraint deadline and hard real-time model, the above mechanism is okay.

However, it may not be applicable for general task models, e.g., soft real-time task, arbitrary deadline, mixed-criticality system [1-4]. It is usually assumed that multiple task jobs of a task are executed in a first-come-first-serve manner. Thus, it is sufficient to release the second task job at the moment the first task job finishes according to a strictly periodic release pattern. The current design in fact shifts the release pattern of periodic/sporadic tasks. Since there maybe more than one postponed jobs due to the preemption, these postponed jobs that should be released are never released to the system.

Although there is no standard requirement in reality for deadline misses, with this enhancement, the postponed jobs will be released with the correct number without periodic release shifting. This way of handling is already widely considered in academia from 90s [2] until now [3] or even on multicores as well [4].

I refine the following four files and handle this requirement individually. The overhead seems to me negligible.
cpukit/rtems/include/rtems/rtems/ratemon.h
cpukit/rtems/include/rtems/rtems/ratemonimpl.h
cpukit/rtems/src/ratemontimeout.c
cpukit/rtems/src/ratemonperiod.c
I have tested the enhancement on Qemu and Raspberry Pi Model B+ with corresponding BSPs.

I believe this patch as a basis is required for further use for more general real-time task models.
This enhancement only affect those timeout cases without changing any behaviour in normal cases.
This enhancement is accepted in workshop mixed-criticality (WMC 2016) along with RTSS'16 this year [5].

To demonstrate the differences, a heuristic example is prepared in testsuites/sptests/sprmsched01 to show the benefit of the enhancement:
Given two tasks with implicit deadline that task deadline is equal to its period.
Task 1 period is 10000 ticks, whereas task 2 is 2000 ticks.
Task 1 has the execution time 6000 ticks, and task 2 has 1000 ticks.
Assume Task 1 has a higher priority than task 2. Task 1 only executes 2 times.
In the expected result, we can observe that the postponed jobs are continuously released till there is no postponed job left, and the task period will still keep as it is.
(Job 3-7 in task 2 are postponed jobs)

[1] Buttazzo et al., Soft Real-Time Systems: Predictability vs. Efficiency, Springer 2005, ​http://www.springer.com/gp/book/9780387237015
[2] Lehoczky et al., Fixed priority scheduling of periodic task sets with arbitrary deadlines, RTSS 1990, ​http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=128748
[3] Georg von der Brüggen et al., Systems with Dynamic Real-Time Guarantees in Uncertain and Faulty Execution Environments, RTSS'16, accepted.
[4] Huang et al., Response time bounds for sporadic arbitrary-deadline tasks under global fixed-priority scheduling on multiprocessors, RTNS 2015, ​http://dl.acm.org/citation.cfm?doid=2597457.2597459
[5] Chen et al., Overrun Handling for Mixed-Criticality Support in RTEMS, WMC 2016, accepted.

Attachments (2)

patch.diff (27.1 KB) - added by Kuan on 10/08/16 at 11:56:50.
Diff patch
patch.2.diff (27.2 KB) - added by Kuan-Hsun Chen on 10/18/16 at 12:27:56.
Hotfix some typos and some redundant code.

Download all attachments as: .zip

Change History (30)

Changed on 10/08/16 at 11:56:50 by Kuan

Attachment: patch.diff added

Diff patch

comment:1 Changed on 10/08/16 at 12:03:59 by Kuan

Keywords: Overrun RMS SP Scheduler Periodicity added; overrun removed

comment:2 Changed on 10/12/16 at 06:01:30 by Sebastian Huber

In RTEMS the behaviour of the rate-monotonic periods depends on the scheduler. We have a fix-priority scheduler (default scheduler) and a job-level fixed-priority scheduler (EDF).

In your test case you have 6000/10000 + 1000/2000 = 110% processor utilization, so your task set is not schedulable.

In case of the default scheduler, the priority assignment is not according to the rules:

https://docs.rtems.org/doc-current/share/rtems/html/c_user/Rate-Monotonic-Manager-Rate-Monotonic-Scheduling-Algorithm.html#Rate-Monotonic-Manager-Rate-Monotonic-Scheduling-Algorithm

Task 2 has a shorter period compared to task 1, so it must have a higher priority.

What changes your patch? You automatically restart the watchdog in case of a timeout and count the number of timeouts?

comment:3 Changed on 10/12/16 at 08:04:20 by Kuan-Hsun Chen

Yes, as the utilization is over 100%, my test case has deadline misses definitely.
However, the test case is designed on purpose to present what is the proper overrun handling within 30000 ticks.
(Otherwise, domino effect will hide the enhancement in the worst case.)
Please note that, Task 1 only executes 2 times in my test case.
Let's only focus on the periodic behavior of task 2.

Please refer to my paper [5]:
http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2016-wmc.pdf
In the following examples, the time unit is 1000 ticks. (1 = 1000 ticks)

With the original design of overrun handing, there is only one postponed job and the release pattern is not periodic. As shown in Fig.2 (a) of [5], the arrival pattern from 16 is changed due to the lateness of the red job, by which the orange job and the following jobs are all released earlier.

With this enhancement, the behaviour of task 2 with the enhancement is shown in Fig. 2 (b) of [5]. The postponed jobs due to the execution of task 1 are marked red. The jobs postponed due to the execution of previous jobs of task 2 are marked orange. The yellow job is postponed due to the orange job in the same period but can still finish its execution on time.

In fact the example can be schedule if the system has dynamic real-time guarantees [3] for mixed-criticality system. Suppose that task 1 requires a full timing guarantee with the abnormal execution time C_{1,A} = 6, and the normal execution time C_{1,N} = 1. Task 2 is a timing tolerable task with C_{2,A} = C_{2,N} = 1 as shown in Fig. 2 (c) of [5]. The second job of task 1 needs 6 time units for its execution time due to fault detection and recovery. We can see that after all the postponed jobs of task 2 are finished at 24, the release of task 2 turns back according to the original periodic pattern again. Since this example is a bit complicated for normal users, I didn't implement it in the example case.

In my patch. I automatically restart the watchdog in case of a timeout and keep updating the number of postponed jobs (increase). If the number of postponed jobs is not 0, the default scheduler will decrease the number of postponed jobs and release the postponed jobs accordingly. Please note that, the deadline of watchdog is not updated by the job releasing unless the targeted task has no more postponed jobs.

We all know that the schedulbility researches always talking about WCET and the worst case rarely happens in reality. However the overrun handling is still useful to isolate the effect of the abrupt overrun without disturbing the other normal tasks.

comment:4 Changed on 10/12/16 at 08:12:43 by Sebastian Huber

Ok, I will have a closer look at your paper and the patch. However, I am on a business trip next week, so this will take at least two weeks.

Your patch changes the existing behaviour. We must decide if this is acceptable or if we need a new option, e.g. explicitly enable the watchdog restart plus counter.

comment:5 in reply to:  4 Changed on 10/12/16 at 08:25:49 by Kuan-Hsun Chen

Replying to sebastian.huber:

Ok, I will have a closer look at your paper and the patch. However, I am on a business trip next week, so this will take at least two weeks.

Thanks a lot.

Your patch changes the existing behaviour. We must decide if this is acceptable or if we need a new option, e.g. explicitly enable the watchdog restart plus counter.

Yes, I know. We have discussed this in user mailing list before. Please let me know if there is anything I can do.

Changed on 10/18/16 at 12:27:56 by Kuan-Hsun Chen

Attachment: patch.2.diff added

Hotfix some typos and some redundant code.

comment:6 Changed on 01/13/17 at 21:09:49 by Gedare Bloom <gedare@…>

Owner: set to Gedare Bloom <gedare@…>
Resolution: fixed
Status: newclosed

In cb2cbecefdc124b010aa9d8714856332e3e3a759/rtems:

classic: adjust names of RM postponed job functions

closes #2795

comment:7 Changed on 01/23/17 at 09:11:05 by Sebastian Huber

Several tests fail now:

WARNING - log/sp20 did not appear to complete execution
WARNING - log/sp60 did not appear to complete execution
WARNING - log/sp69 did not appear to complete execution
WARNING - log/spcbssched03 did not appear to complete execution
WARNING - log/spedfsched02 did not appear to complete execution
WARNING - log/spedfsched04 did not appear to complete execution
WARNING - log/spintrcritical08 did not appear to complete execution
WARNING - log/spratemon_err01 did not appear to complete execution
WARNING - log/sprmsched01 did not appear to complete execution

comment:8 Changed on 01/24/17 at 12:42:27 by Sebastian Huber

Milestone: 4.11.14.12
Resolution: fixed
Severity: criticalblocker
Status: closedreopened

comment:9 Changed on 01/24/17 at 12:43:11 by Sebastian Huber

There is a possible deadlock at SMP lock level:

https://lists.rtems.org/pipermail/devel/2017-January/016754.html

comment:10 Changed on 01/24/17 at 13:45:15 by Sebastian Huber <sebastian.huber@…>

In 1240aade5a35c4e8c43d5409e2329eeb6a173299/rtems:

rtems: Fix _Rate_monotonic_Renew_deadline()

Make _Rate_monotonic_Renew_deadline() static and use proper locking in SMP
configurations.

Update #2795.

comment:11 Changed on 01/24/17 at 14:04:57 by Sebastian Huber <sebastian.huber@…>

In 625bd6aca47268bc21cfa38662ebc17413475e82/rtems:

rtems: Fix _Rate_monotonic_Release_postponed_job()

Use proper locking in SMP configurations.

Update #2795.

comment:12 Changed on 01/24/17 at 14:05:50 by Sebastian Huber <sebastian.huber@…>

In b8d6eb719ad016b8e0a7752619a5004960b9711d/rtems:

rtems: rtems_rate_monotonic_postponed_job_count()

Use proper locking in SMP configurations.

Update #2795.

comment:13 Changed on 01/24/17 at 14:37:26 by Sebastian Huber <sebastian.huber@…>

In 29e08d41f42d68fdafba982061ea7a3d57f75731/rtems:

sptests/sprmsched01: Merge and fix

Merge into one file and fix obvious problems (e.g. out of bounds array
access).

Update #2795.

comment:14 Changed on 01/24/17 at 14:40:46 by Sebastian Huber

The output of sprmsched01 depends on the target timing:

*** BEGIN OF TEST SPRMSCHED 1 ***

Ticks per second in your system: 1000
Task 0 starts at tick 10.
                                        Job 1 Task 0 ends at tick 6141.
Job 1 Task 1 starts at tick 6144.
                                        Job 1 Task 1 ends at tick 7169.
Job 2 Task 1 starts at tick 8144.
                                        Job 2 Task 1 ends at tick 9168.
Task 0 starts at tick 10009.
                                        Job 2 Task 0 ends at tick 16140.
Job 3 Task 1 starts at tick 16145.
                                        Job 3 Task 1 ends at tick 17169.
RTEMS_TIMEOUT
Job 4 Task 1 starts at tick 17172.
                                        Job 4 Task 1 ends at tick 18196.
Job 5 Task 1 starts at tick 18199.
                                        Job 5 Task 1 ends at tick 19223.
Job 6 Task 1 starts at tick 19226.
                                        Job 6 Task 1 ends at tick 20250.
Job 7 Task 1 starts at tick 20253.
                                        Job 7 Task 1 ends at tick 21277.
Job 8 Task 1 starts at tick 21280.
                                        Job 8 Task 1 ends at tick 22304.
Job 9 Task 1 starts at tick 22307.
                                        Job 9 Task 1 ends at tick 23331.
RTEMS_SUCCESSFUL
Job 10 Task 1 starts at tick 24145.
                                        Job 10 Task 1 ends at tick 25169.
Job 11 Task 1 starts at tick 26144.
                                        Job 11 Task 1 ends at tick 27168.
Job 12 Task 1 starts at tick 28144.
                                        Job 12 Task 1 ends at tick 29168.
*** END OF TEST SPRMSCHED 1 ***

Its not clear if the test objectives are checked. Why is rtems_rate_monotonic_postponed_jobs_count() not used at all?

comment:15 in reply to:  14 Changed on 01/24/17 at 14:46:59 by Kuan-Hsun Chen

Replying to sebastian.huber:

The output of sprmsched01 depends on the target timing:

*** BEGIN OF TEST SPRMSCHED 1 ***

Ticks per second in your system: 1000
Task 0 starts at tick 10.
                                        Job 1 Task 0 ends at tick 6141.
Job 1 Task 1 starts at tick 6144.
                                        Job 1 Task 1 ends at tick 7169.
Job 2 Task 1 starts at tick 8144.
                                        Job 2 Task 1 ends at tick 9168.
Task 0 starts at tick 10009.
                                        Job 2 Task 0 ends at tick 16140.
Job 3 Task 1 starts at tick 16145.
                                        Job 3 Task 1 ends at tick 17169.
RTEMS_TIMEOUT
Job 4 Task 1 starts at tick 17172.
                                        Job 4 Task 1 ends at tick 18196.
Job 5 Task 1 starts at tick 18199.
                                        Job 5 Task 1 ends at tick 19223.
Job 6 Task 1 starts at tick 19226.
                                        Job 6 Task 1 ends at tick 20250.
Job 7 Task 1 starts at tick 20253.
                                        Job 7 Task 1 ends at tick 21277.
Job 8 Task 1 starts at tick 21280.
                                        Job 8 Task 1 ends at tick 22304.
Job 9 Task 1 starts at tick 22307.
                                        Job 9 Task 1 ends at tick 23331.
RTEMS_SUCCESSFUL
Job 10 Task 1 starts at tick 24145.
                                        Job 10 Task 1 ends at tick 25169.
Job 11 Task 1 starts at tick 26144.
                                        Job 11 Task 1 ends at tick 27168.
Job 12 Task 1 starts at tick 28144.
                                        Job 12 Task 1 ends at tick 29168.
*** END OF TEST SPRMSCHED 1 ***

Its not clear if the test objectives are checked. Why is rtems_rate_monotonic_postponed_jobs_count() not used at all?

That function I prepared is for the requirement from the application layer.
It will not be used directly within the overrun handling.
Since I only focused on the timing behaviors before, so I prepared this test without testing that.

Now I can add in the test and print out the number of postponed jobs.

comment:16 Changed on 01/24/17 at 14:53:07 by Sebastian Huber

The rtems_rate_monotonic_postponed_jobs_count() is not tested at all, see #2885.

There is a potential integer overflow in _Rate_monotonic_Renew_deadline(). It should be documented what happens in this case.

comment:17 Changed on 01/24/17 at 15:07:46 by Kuan-Hsun Chen

How about the potential overflow for uint32_t variables "missed_count" and "count" in rtems_rate_monotonic_period_statistics?
I didn't find related means in the current version.
stats->missed_count++ in cpukit/rtems/src/ratemonperiod.c could also incur this problem right?

The trivial solution is to add assertion to avoid this situation or shut down the system...

Last edited on 01/24/17 at 15:27:28 by Kuan-Hsun Chen (previous) (diff)

comment:18 Changed on 01/25/17 at 06:33:58 by Sebastian Huber

The missed count is just a statistic.

For the integer overflow in

static void _Rate_monotonic_Renew_deadline(
  Rate_monotonic_Control *the_period,
  Thread_Control         *owner,
  ISR_lock_Context       *lock_context
)
{
  uint64_t deadline;

  ++the_period->postponed_jobs;
  the_period->state = RATE_MONOTONIC_EXPIRED;

  deadline = _Watchdog_Per_CPU_insert_relative(
    &the_period->Timer,
    _Per_CPU_Get(),
    the_period->next_length
  );
  the_period->latest_deadline = deadline;

  _Rate_monotonic_Release( the_period, lock_context );
}

we have some options.

  1. Ignore it, then postponed jobs count will be zero again.
  2. Issue a new fatal error.
  3. Saturate, e.g. stay at 0xffffffff.

I am in favour of 3. It should be documented in any way.

comment:19 Changed on 01/25/17 at 07:33:32 by Sebastian Huber

The references mentioned in the ticket should be added to the documentation:

https://git.rtems.org/rtems-docs/tree/common/refs.bib

They should show up in the documentation somewhere:

https://git.rtems.org/rtems-docs/tree/c-user/rate_monotonic_manager.rst

The "Further Reading" should be changed to references.

comment:20 Changed on 01/25/17 at 08:30:40 by Kuan-Hsun Chen

I will set a precondition for saturating before ++the_period->postponed_jobs, and update the documentation.

comment:21 Changed on 01/25/17 at 10:17:14 by Sebastian Huber <sebastian.huber@…>

In 23a11b68cdba71ae34fe1b2271606b167eb6a5b4/rtems:

sptests/spedfsched04: Merge and fix

Merge into one file and fix obvious problems (e.g. out of bounds array
access).

Update #2795.

comment:22 Changed on 01/26/17 at 09:01:03 by Kuan-Hsun Chen <c0066c@…>

In d7feb8677d48162bf8db34406c232e0179d43dc6/rtems:

Remove rtems_rate_monotonic_postponed_job_count()

Add a variable named "count" in rtems_rate_monotonic_period_status
structure. Revise rtems_rate_monotonic_get_status() for the postponed
job count.

sptests/sp69: Add in the verification of the postponed job count for
rtems_rate_monotonic_get_status().

Update #2795.

comment:24 Changed on 01/30/17 at 07:15:11 by Sebastian Huber

The tests sprmsched01 and spedfsched01 should use rtems_rate_monotonic_get_status() to check that the postponed_jobs_count has the expected value.

Ideally, the test output should be independent of the target timing and console driver.

comment:25 in reply to:  24 Changed on 01/30/17 at 10:03:29 by Kuan-Hsun Chen

Replying to sebastian.huber:

The tests sprmsched01 and spedfsched01 should use rtems_rate_monotonic_get_status() to check that the postponed_jobs_count has the expected value.

Ideally, the test output should be independent of the target timing and console driver.

Okay, I will fix them. Before the examples were prepared for showing the diagrams on the paper.
I will use only assertions and rtems_rate_monotonic_get_status() to check the expected value of postponed jobs instead of showing the whole simulated executions.

comment:26 Changed on 01/31/17 at 06:20:41 by Kuan-Hsun Chen <c0066c@…>

Resolution: fixed
Status: reopenedclosed

In 166a9f67cde9085a74d6d5d962160b3c92b3e3d7/rtems:

sprmsched01/spedfsched04: Revise

Instead of using the target time and console driver, both tests now use
assertions and rtems_rate_monotonic_get_status() to verify the count of
postponed jobs. The setting of spedfsched04 is slightly changed.

Close #2795.

comment:27 Changed on 05/11/17 at 07:31:02 by Sebastian Huber

Milestone: 4.124.12.0

comment:28 Changed on 11/09/17 at 06:27:14 by Sebastian Huber

Milestone: 4.12.05.1

Milestone renamed

Note: See TracTickets for help on using tickets.