Opened on 06/30/20 at 05:08:08
Last modified on 10/27/21 at 07:27:18
#4019 assigned defect
Potential issue with SMP EDF scheduler and priority inheritance
Reported by: | Sebastian Huber | Owned by: | Needs Funding |
---|---|---|---|
Priority: | normal | Milestone: | Indefinite |
Component: | score | Version: | |
Severity: | normal | Keywords: | |
Cc: | Jonathan Walsh | Blocked By: | |
Blocking: |
Description
We are seeing a behavior with EDF enabled in SMP mode that we see a hang under load. The test stimulus that we have isolated this to is now very simple.
for(uint32_t j = 0; j < test_iterations; j++)
{
sc = pthread_mutex_lock(&mx);
rtems_test_assert(sc == RTEMS_SUCCESSFUL);
NOP;
sc = pthread_mutex_unlock(&mtx);
rtems_test_assert(sc == RTEMS_SUCCESSFUL);
}
When we run this pretty much basic unit test all OK. Then we run more and more threads in parallel and we see RTEMS lock up. When we go to the simple round-robin scheduler we can run these loops forever with as many threads as we want and don't see a lock-up.
Change History (10)
comment:1 Changed on 06/30/20 at 05:12:54 by Sebastian Huber
comment:2 Changed on 06/30/20 at 05:13:37 by Sebastian Huber
Cc: | Jonathan Walsh added |
---|
comment:3 Changed on 06/30/20 at 09:46:15 by Stavros Passas
Hi Sebastian,
I can try to fill in the information for you. the SMP testssuite runs clean. We run RTEMS from https://github.com/RTEMS/rtems/commit/714cb06f18ce2c0df265ada06363d71560f14a3e
We run with 2 cores, >= 4 threads (failure is not seen with less than 4 threads). All threads are pthreads, and are started with PTHREAD_EXPLICIT_SCHED If the threads are not pinned to any processor, we see no issue. If the threads are pinned to (thread_number % 2), after a while (usually 20k-100k iteration) EDF does not schedule any thread -- both cores are running the idle loop. If I change the schedule to the priority affinity SMP scheduler, context switching works as expected.
I also tried to replace pthread_mutex_lock
with sem_wait
and rtems_semaphore_obtain
and all primitives behave the same with EDF.
comment:4 Changed on 06/30/20 at 09:50:24 by Sebastian Huber
The referenced RTEMS commit is from 2017. Can you please try to reproduce this issue with the current master. I will not investigate SMP issues in RTEMS versions prior to the RTEMS 5.1 release as part of my open source project activities.
comment:5 Changed on 06/30/20 at 17:05:13 by Stavros Passas
Hi Sebastian,
Thanks for the suggestion; I ported over the failing test on a branch closer to head (I picked https://github.com/RTEMS/rtems/commit/234d155e879d7ca72fa5dd9d29a8347fa294676e as a reference point) and I cannot make the above testcase fail anymore -- scheduling works as happens as expected.
Could you point to some changes that might have changed the scheduler's behavior? I went through the commits touching directly EDF, and I didn't find any fix that could be fixing this.
Thanks a lot,
Stavros
comment:6 Changed on 06/30/20 at 17:17:27 by Sebastian Huber
There are a lot of changes between 2017 and 2019. You could try a git bisect to find a relevant commit.
comment:7 Changed on 07/01/20 at 05:14:24 by Sebastian Huber
It would be helpful if you could add the test program which reproduces the issue as smpmutex03. If you want to contribute, then please use the BSD-2-Clause license:
https://docs.rtems.org/branches/master/eng/coding-file-hdr.html#c-c-assembler-source-file-template
comment:8 Changed on 07/14/20 at 07:45:32 by Sebastian Huber
Milestone: | 6.1 → Indefinite |
---|---|
Owner: | changed from Sebastian Huber to Needs Funding |
comment:9 Changed on 06/18/21 at 09:24:45 by Sebastian Huber
Keywords: | qualification added |
---|
comment:10 Changed on 10/27/21 at 07:27:18 by Sebastian Huber
Keywords: | qualification removed |
---|
Remove qualification
keyword since this issue is not relevant to RTEMS 6.
With how many processors do you execute this test case? Do the threads have all the same priority? Does the smpmutex02 test case run on your system? Which RTEMS commit do you use?