#4019 assigned defect

Potential issue with SMP EDF scheduler and priority inheritance

Reported by: Sebastian Huber Owned by: Sebastian Huber
Priority: normal Milestone: 6.1
Component: score Version:
Severity: normal Keywords:
Cc: Jonathan Walsh Blocked By:
Blocking:

Description

We are seeing a behavior with EDF enabled in SMP mode that we see a hang under load. The test stimulus that we have isolated this to is now very simple.

for(uint32_t j = 0; j < test_iterations; j++)
{

sc = pthread_mutex_lock(&mx);
rtems_test_assert(sc == RTEMS_SUCCESSFUL);
NOP;
sc = pthread_mutex_unlock(&mtx);
rtems_test_assert(sc == RTEMS_SUCCESSFUL);

}

When we run this pretty much basic unit test all OK. Then we run more and more threads in parallel and we see RTEMS lock up. When we go to the simple round-robin scheduler we can run these loops forever with as many threads as we want and don't see a lock-up.

Change History (7)

comment:1 Changed on Jun 30, 2020 at 5:12:54 AM by Sebastian Huber

With how many processors do you execute this test case? Do the threads have all the same priority? Does the smpmutex02 test case run on your system? Which RTEMS commit do you use?

comment:2 Changed on Jun 30, 2020 at 5:13:37 AM by Sebastian Huber

Cc: Jonathan Walsh added

comment:3 Changed on Jun 30, 2020 at 9:46:15 AM by Stavros Passas

Hi Sebastian,
I can try to fill in the information for you. the SMP testssuite runs clean. We run RTEMS from https://github.com/RTEMS/rtems/commit/714cb06f18ce2c0df265ada06363d71560f14a3e

We run with 2 cores, >= 4 threads (failure is not seen with less than 4 threads). All threads are pthreads, and are started with PTHREAD_EXPLICIT_SCHED If the threads are not pinned to any processor, we see no issue. If the threads are pinned to (thread_number % 2), after a while (usually 20k-100k iteration) EDF does not schedule any thread -- both cores are running the idle loop. If I change the schedule to the priority affinity SMP scheduler, context switching works as expected.

I also tried to replace pthread_mutex_lock with sem_wait and rtems_semaphore_obtain and all primitives behave the same with EDF.

comment:4 Changed on Jun 30, 2020 at 9:50:24 AM by Sebastian Huber

The referenced RTEMS commit is from 2017. Can you please try to reproduce this issue with the current master. I will not investigate SMP issues in RTEMS versions prior to the RTEMS 5.1 release as part of my open source project activities.

comment:5 Changed on Jun 30, 2020 at 5:05:13 PM by Stavros Passas

Hi Sebastian,
Thanks for the suggestion; I ported over the failing test on a branch closer to head (I picked https://github.com/RTEMS/rtems/commit/234d155e879d7ca72fa5dd9d29a8347fa294676e as a reference point) and I cannot make the above testcase fail anymore -- scheduling works as happens as expected.

Could you point to some changes that might have changed the scheduler's behavior? I went through the commits touching directly EDF, and I didn't find any fix that could be fixing this.

Thanks a lot,

Stavros

comment:6 Changed on Jun 30, 2020 at 5:17:27 PM by Sebastian Huber

There are a lot of changes between 2017 and 2019. You could try a git bisect to find a relevant commit.

comment:7 Changed on Jul 1, 2020 at 5:14:24 AM by Sebastian Huber

It would be helpful if you could add the test program which reproduces the issue as smpmutex03. If you want to contribute, then please use the BSD-2-Clause license:

https://docs.rtems.org/branches/master/eng/coding-file-hdr.html#c-c-assembler-source-file-template

Note: See TracTickets for help on using tickets.