#2795 closed enhancement (fixed)
Overrun Handling for general real-time models
Reported by: | Kuan | Owned by: | Gedare Bloom <gedare@…> |
---|---|---|---|
Priority: | high | Milestone: | 5.1 |
Component: | score | Version: | 4.11 |
Severity: | blocker | Keywords: | Overrun, RMS, SP, Scheduler, Periodicity |
Cc: | Gedare Bloom, Joel Sherrill, Sebastian Huber | Blocked By: | |
Blocking: |
Description
In the current implementation, if a task period is time out, the next call of rtems_rate_monotonic_period() will only release one following job and manipulate the task period with the calling moment + the next length of period. With the assumption that implicit/constraint deadline and hard real-time model, the above mechanism is okay.
However, it may not be applicable for general task models, e.g., soft real-time task, arbitrary deadline, mixed-criticality system [1-4]. It is usually assumed that multiple task jobs of a task are executed in a first-come-first-serve manner. Thus, it is sufficient to release the second task job at the moment the first task job finishes according to a strictly periodic release pattern. The current design in fact shifts the release pattern of periodic/sporadic tasks. Since there maybe more than one postponed jobs due to the preemption, these postponed jobs that should be released are never released to the system.
Although there is no standard requirement in reality for deadline misses, with this enhancement, the postponed jobs will be released with the correct number without periodic release shifting. This way of handling is already widely considered in academia from 90s [2] until now [3] or even on multicores as well [4].
I refine the following four files and handle this requirement individually. The overhead seems to me negligible.
cpukit/rtems/include/rtems/rtems/ratemon.h
cpukit/rtems/include/rtems/rtems/ratemonimpl.h
cpukit/rtems/src/ratemontimeout.c
cpukit/rtems/src/ratemonperiod.c
I have tested the enhancement on Qemu and Raspberry Pi Model B+ with corresponding BSPs.
I believe this patch as a basis is required for further use for more general real-time task models.
This enhancement only affect those timeout cases without changing any behaviour in normal cases.
This enhancement is accepted in workshop mixed-criticality (WMC 2016) along with RTSS'16 this year [5].
To demonstrate the differences, a heuristic example is prepared in testsuites/sptests/sprmsched01 to show the benefit of the enhancement:
Given two tasks with implicit deadline that task deadline is equal to its period.
Task 1 period is 10000 ticks, whereas task 2 is 2000 ticks.
Task 1 has the execution time 6000 ticks, and task 2 has 1000 ticks.
Assume Task 1 has a higher priority than task 2. Task 1 only executes 2 times.
In the expected result, we can observe that the postponed jobs are continuously released till there is no postponed job left, and the task period will still keep as it is.
(Job 3-7 in task 2 are postponed jobs)
[1] Buttazzo et al., Soft Real-Time Systems: Predictability vs. Efficiency, Springer 2005, http://www.springer.com/gp/book/9780387237015
[2] Lehoczky et al., Fixed priority scheduling of periodic task sets with arbitrary deadlines, RTSS 1990, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=128748
[3] Georg von der Brüggen et al., Systems with Dynamic Real-Time Guarantees in Uncertain and Faulty Execution Environments, RTSS'16, accepted.
[4] Huang et al., Response time bounds for sporadic arbitrary-deadline tasks under global fixed-priority scheduling on multiprocessors, RTNS 2015, http://dl.acm.org/citation.cfm?doid=2597457.2597459
[5] Chen et al., Overrun Handling for Mixed-Criticality Support in RTEMS, WMC 2016, accepted.
Attachments (2)
Change History (30)
Changed on 10/08/16 at 11:56:50 by Kuan
Attachment: | patch.diff added |
---|
comment:1 Changed on 10/08/16 at 12:03:59 by Kuan
Keywords: | Overrun RMS SP Scheduler Periodicity added; overrun removed |
---|
comment:2 Changed on 10/12/16 at 06:01:30 by Sebastian Huber
In RTEMS the behaviour of the rate-monotonic periods depends on the scheduler. We have a fix-priority scheduler (default scheduler) and a job-level fixed-priority scheduler (EDF).
In your test case you have 6000/10000 + 1000/2000 = 110% processor utilization, so your task set is not schedulable.
In case of the default scheduler, the priority assignment is not according to the rules:
Task 2 has a shorter period compared to task 1, so it must have a higher priority.
What changes your patch? You automatically restart the watchdog in case of a timeout and count the number of timeouts?
comment:3 Changed on 10/12/16 at 08:04:20 by Kuan-Hsun Chen
Yes, as the utilization is over 100%, my test case has deadline misses definitely.
However, the test case is designed on purpose to present what is the proper overrun handling within 30000 ticks.
(Otherwise, domino effect will hide the enhancement in the worst case.)
Please note that, Task 1 only executes 2 times in my test case.
Let's only focus on the periodic behavior of task 2.
Please refer to my paper [5]:
http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2016-wmc.pdf
In the following examples, the time unit is 1000 ticks. (1 = 1000 ticks)
With the original design of overrun handing, there is only one postponed job and the release pattern is not periodic. As shown in Fig.2 (a) of [5], the arrival pattern from 16 is changed due to the lateness of the red job, by which the orange job and the following jobs are all released earlier.
With this enhancement, the behaviour of task 2 with the enhancement is shown in Fig. 2 (b) of [5]. The postponed jobs due to the execution of task 1 are marked red. The jobs postponed due to the execution of previous jobs of task 2 are marked orange. The yellow job is postponed due to the orange job in the same period but can still finish its execution on time.
In fact the example can be schedule if the system has dynamic real-time guarantees [3] for mixed-criticality system. Suppose that task 1 requires a full timing guarantee with the abnormal execution time C_{1,A} = 6, and the normal execution time C_{1,N} = 1. Task 2 is a timing tolerable task with C_{2,A} = C_{2,N} = 1 as shown in Fig. 2 (c) of [5]. The second job of task 1 needs 6 time units for its execution time due to fault detection and recovery. We can see that after all the postponed jobs of task 2 are finished at 24, the release of task 2 turns back according to the original periodic pattern again. Since this example is a bit complicated for normal users, I didn't implement it in the example case.
In my patch. I automatically restart the watchdog in case of a timeout and keep updating the number of postponed jobs (increase). If the number of postponed jobs is not 0, the default scheduler will decrease the number of postponed jobs and release the postponed jobs accordingly. Please note that, the deadline of watchdog is not updated by the job releasing unless the targeted task has no more postponed jobs.
We all know that the schedulbility researches always talking about WCET and the worst case rarely happens in reality. However the overrun handling is still useful to isolate the effect of the abrupt overrun without disturbing the other normal tasks.
comment:4 follow-up: 5 Changed on 10/12/16 at 08:12:43 by Sebastian Huber
Ok, I will have a closer look at your paper and the patch. However, I am on a business trip next week, so this will take at least two weeks.
Your patch changes the existing behaviour. We must decide if this is acceptable or if we need a new option, e.g. explicitly enable the watchdog restart plus counter.
comment:5 Changed on 10/12/16 at 08:25:49 by Kuan-Hsun Chen
Replying to sebastian.huber:
Ok, I will have a closer look at your paper and the patch. However, I am on a business trip next week, so this will take at least two weeks.
Thanks a lot.
Your patch changes the existing behaviour. We must decide if this is acceptable or if we need a new option, e.g. explicitly enable the watchdog restart plus counter.
Yes, I know. We have discussed this in user mailing list before. Please let me know if there is anything I can do.
Changed on 10/18/16 at 12:27:56 by Kuan-Hsun Chen
Attachment: | patch.2.diff added |
---|
Hotfix some typos and some redundant code.
comment:6 Changed on 01/13/17 at 21:09:49 by Gedare Bloom <gedare@…>
Owner: | set to Gedare Bloom <gedare@…> |
---|---|
Resolution: | → fixed |
Status: | new → closed |
comment:7 Changed on 01/23/17 at 09:11:05 by Sebastian Huber
Several tests fail now:
WARNING - log/sp20 did not appear to complete execution
WARNING - log/sp60 did not appear to complete execution
WARNING - log/sp69 did not appear to complete execution
WARNING - log/spcbssched03 did not appear to complete execution
WARNING - log/spedfsched02 did not appear to complete execution
WARNING - log/spedfsched04 did not appear to complete execution
WARNING - log/spintrcritical08 did not appear to complete execution
WARNING - log/spratemon_err01 did not appear to complete execution
WARNING - log/sprmsched01 did not appear to complete execution
comment:8 Changed on 01/24/17 at 12:42:27 by Sebastian Huber
Milestone: | 4.11.1 → 4.12 |
---|---|
Resolution: | fixed |
Severity: | critical → blocker |
Status: | closed → reopened |
comment:9 Changed on 01/24/17 at 12:43:11 by Sebastian Huber
There is a possible deadlock at SMP lock level:
https://lists.rtems.org/pipermail/devel/2017-January/016754.html
comment:10 Changed on 01/24/17 at 13:45:15 by Sebastian Huber <sebastian.huber@…>
comment:11 Changed on 01/24/17 at 14:04:57 by Sebastian Huber <sebastian.huber@…>
comment:12 Changed on 01/24/17 at 14:05:50 by Sebastian Huber <sebastian.huber@…>
comment:13 Changed on 01/24/17 at 14:37:26 by Sebastian Huber <sebastian.huber@…>
comment:14 follow-up: 15 Changed on 01/24/17 at 14:40:46 by Sebastian Huber
The output of sprmsched01 depends on the target timing:
*** BEGIN OF TEST SPRMSCHED 1 *** Ticks per second in your system: 1000 Task 0 starts at tick 10. Job 1 Task 0 ends at tick 6141. Job 1 Task 1 starts at tick 6144. Job 1 Task 1 ends at tick 7169. Job 2 Task 1 starts at tick 8144. Job 2 Task 1 ends at tick 9168. Task 0 starts at tick 10009. Job 2 Task 0 ends at tick 16140. Job 3 Task 1 starts at tick 16145. Job 3 Task 1 ends at tick 17169. RTEMS_TIMEOUT Job 4 Task 1 starts at tick 17172. Job 4 Task 1 ends at tick 18196. Job 5 Task 1 starts at tick 18199. Job 5 Task 1 ends at tick 19223. Job 6 Task 1 starts at tick 19226. Job 6 Task 1 ends at tick 20250. Job 7 Task 1 starts at tick 20253. Job 7 Task 1 ends at tick 21277. Job 8 Task 1 starts at tick 21280. Job 8 Task 1 ends at tick 22304. Job 9 Task 1 starts at tick 22307. Job 9 Task 1 ends at tick 23331. RTEMS_SUCCESSFUL Job 10 Task 1 starts at tick 24145. Job 10 Task 1 ends at tick 25169. Job 11 Task 1 starts at tick 26144. Job 11 Task 1 ends at tick 27168. Job 12 Task 1 starts at tick 28144. Job 12 Task 1 ends at tick 29168. *** END OF TEST SPRMSCHED 1 ***
Its not clear if the test objectives are checked. Why is rtems_rate_monotonic_postponed_jobs_count() not used at all?
comment:15 Changed on 01/24/17 at 14:46:59 by Kuan-Hsun Chen
Replying to sebastian.huber:
The output of sprmsched01 depends on the target timing:
*** BEGIN OF TEST SPRMSCHED 1 *** Ticks per second in your system: 1000 Task 0 starts at tick 10. Job 1 Task 0 ends at tick 6141. Job 1 Task 1 starts at tick 6144. Job 1 Task 1 ends at tick 7169. Job 2 Task 1 starts at tick 8144. Job 2 Task 1 ends at tick 9168. Task 0 starts at tick 10009. Job 2 Task 0 ends at tick 16140. Job 3 Task 1 starts at tick 16145. Job 3 Task 1 ends at tick 17169. RTEMS_TIMEOUT Job 4 Task 1 starts at tick 17172. Job 4 Task 1 ends at tick 18196. Job 5 Task 1 starts at tick 18199. Job 5 Task 1 ends at tick 19223. Job 6 Task 1 starts at tick 19226. Job 6 Task 1 ends at tick 20250. Job 7 Task 1 starts at tick 20253. Job 7 Task 1 ends at tick 21277. Job 8 Task 1 starts at tick 21280. Job 8 Task 1 ends at tick 22304. Job 9 Task 1 starts at tick 22307. Job 9 Task 1 ends at tick 23331. RTEMS_SUCCESSFUL Job 10 Task 1 starts at tick 24145. Job 10 Task 1 ends at tick 25169. Job 11 Task 1 starts at tick 26144. Job 11 Task 1 ends at tick 27168. Job 12 Task 1 starts at tick 28144. Job 12 Task 1 ends at tick 29168. *** END OF TEST SPRMSCHED 1 ***Its not clear if the test objectives are checked. Why is rtems_rate_monotonic_postponed_jobs_count() not used at all?
That function I prepared is for the requirement from the application layer.
It will not be used directly within the overrun handling.
Since I only focused on the timing behaviors before, so I prepared this test without testing that.
Now I can add in the test and print out the number of postponed jobs.
comment:16 Changed on 01/24/17 at 14:53:07 by Sebastian Huber
The rtems_rate_monotonic_postponed_jobs_count() is not tested at all, see #2885.
There is a potential integer overflow in _Rate_monotonic_Renew_deadline(). It should be documented what happens in this case.
comment:17 Changed on 01/24/17 at 15:07:46 by Kuan-Hsun Chen
How about the potential overflow for uint32_t variables "missed_count" and "count" in rtems_rate_monotonic_period_statistics?
I didn't find related means in the current version.
stats->missed_count++ in cpukit/rtems/src/ratemonperiod.c could also incur this problem right?
The trivial solution is to add assertion to avoid this situation or shut down the system...
comment:18 Changed on 01/25/17 at 06:33:58 by Sebastian Huber
The missed count is just a statistic.
For the integer overflow in
static void _Rate_monotonic_Renew_deadline( Rate_monotonic_Control *the_period, Thread_Control *owner, ISR_lock_Context *lock_context ) { uint64_t deadline; ++the_period->postponed_jobs; the_period->state = RATE_MONOTONIC_EXPIRED; deadline = _Watchdog_Per_CPU_insert_relative( &the_period->Timer, _Per_CPU_Get(), the_period->next_length ); the_period->latest_deadline = deadline; _Rate_monotonic_Release( the_period, lock_context ); }
we have some options.
- Ignore it, then postponed jobs count will be zero again.
- Issue a new fatal error.
- Saturate, e.g. stay at 0xffffffff.
I am in favour of 3. It should be documented in any way.
comment:19 Changed on 01/25/17 at 07:33:32 by Sebastian Huber
The references mentioned in the ticket should be added to the documentation:
https://git.rtems.org/rtems-docs/tree/common/refs.bib
They should show up in the documentation somewhere:
https://git.rtems.org/rtems-docs/tree/c-user/rate_monotonic_manager.rst
The "Further Reading" should be changed to references.
comment:20 Changed on 01/25/17 at 08:30:40 by Kuan-Hsun Chen
I will set a precondition for saturating before ++the_period->postponed_jobs, and update the documentation.
comment:21 Changed on 01/25/17 at 10:17:14 by Sebastian Huber <sebastian.huber@…>
comment:22 Changed on 01/26/17 at 09:01:03 by Kuan-Hsun Chen <c0066c@…>
comment:23 Changed on 01/30/17 at 07:12:14 by Sebastian Huber
comment:24 follow-up: 25 Changed on 01/30/17 at 07:15:11 by Sebastian Huber
The tests sprmsched01 and spedfsched01 should use rtems_rate_monotonic_get_status() to check that the postponed_jobs_count has the expected value.
Ideally, the test output should be independent of the target timing and console driver.
comment:25 Changed on 01/30/17 at 10:03:29 by Kuan-Hsun Chen
Replying to sebastian.huber:
The tests sprmsched01 and spedfsched01 should use rtems_rate_monotonic_get_status() to check that the postponed_jobs_count has the expected value.
Ideally, the test output should be independent of the target timing and console driver.
Okay, I will fix them. Before the examples were prepared for showing the diagrams on the paper.
I will use only assertions and rtems_rate_monotonic_get_status() to check the expected value of postponed jobs instead of showing the whole simulated executions.
comment:26 Changed on 01/31/17 at 06:20:41 by Kuan-Hsun Chen <c0066c@…>
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:27 Changed on 05/11/17 at 07:31:02 by Sebastian Huber
Milestone: | 4.12 → 4.12.0 |
---|
comment:28 Changed on 11/09/17 at 06:27:14 by Sebastian Huber
Milestone: | 4.12.0 → 5.1 |
---|
Milestone renamed
Diff patch