#2273 closed enhancement (fixed)

One Step Towards Fine Grained Locking

Reported by: Sebastian Huber Owned by: Sebastian Huber
Priority: normal Milestone: 4.11
Component: score Version: 4.11
Severity: normal Keywords:
Cc: Blocked By:
Blocking:

Description (last modified by Sebastian Huber)

Benefit

Improved average-case performance. Uni-processor configurations will also benefit from some changes since calls to the thread dispatch function are reduced.

Problem Description

The Giant lock is a major performance bottleneck. SMP aware applications tend to use fine grained locking at the application level. So it is important that the semaphore obtain/release sequence for the uncontested case is as fast as possible, as an example see also performance tests with new network stack.

Problem Solution

Remove the Giant lock. This is a very complex task. As a first step a redesign of the semaphore obtain/release sequence should be done. The semaphore objects should use an object specific SMP lock for the non-blocking case and only resort to the Giant lock if a thread must block/unblock or in case of priority inheritance.

The analysis of the current locking protocol shows that it is not compatible with fine grained locking since it uses the executing thread on several places. On SMP systems more than one executing thread exists in contrast to uni-processor systems. The semaphore manager provides the following semaphore object flavours in RTEMS

  • counting semaphores (no owner),
  • simple binary semaphores (no owner, no protocols),
  • binary semaphores,
  • binary semaphores with priority ceiling protocol,
  • binary semaphores with priority inheritance protocol, and
  • binary semaphores with multiprocessor resource sharing protocol (MrsP).

For fine grained locking each flavour must be looked at individually.

Example: Binary Semaphore with Priority Inheritance Protocol

A binary semaphore with priority inheritance protocol has at most one owner at a time. In case an owner is present, then other threads (the rivals) must block and wait for ownership. The highest priority rival will be the new owner once the current owner releases the semaphore. Rivals may use a watchdog to limit the time waiting for ownership. The owner will inherit the priority of the highest priority rival. In the next two images, coloured boxes represent the critical section of the corresponding protection mechanism.

The following image depicts the structure of a binary semaphore of the current implementation with a Giant lock only. The complete operation is protected by the Giant lock with thread dispatching disabled. The interrupts are enabled in general. Some critical sections use disabled interrupts. So interrupts on a processor owning the Giant lock will delay other processors waiting for the Giant lock.

Structure of a binary semaphore with priority inheritance protocol and Giant lock only:

The following image depicts the structure of a binary semaphore with a semaphore specific SMP lock. The state of the semaphore object is protected by this lock. Other state changes like a thread priority change or a thread blocking operation is protected by the Giant lock. The watchdog uses a dedicated lock here, see #2271.

Structure of a binary semaphore with priority inheritance protocol and semaphore specific lock:

In case no owner is present, then the semaphore obtain operation uses only the semaphore object specific SMP lock and there is no interference with the rest of the system.

Attachments (10)

fine.png (29.0 KB) - added by Sebastian Huber on Feb 17, 2015 at 6:32:18 PM.
giant.png (25.3 KB) - added by Sebastian Huber on Feb 17, 2015 at 6:32:32 PM.
Test_with_fgl.png (35.4 KB) - added by Sebastian Huber on Mar 16, 2015 at 2:36:17 PM.
Test event send/receive with fine grained locking.
test_with_giant_lock.png (47.1 KB) - added by Sebastian Huber on Mar 16, 2015 at 2:36:38 PM.
Test event send/receive with Giant lock.
ngmp-events-old.png (35.3 KB) - added by Sebastian Huber on May 15, 2015 at 9:12:27 AM.
ngmp-events-new.png (34.5 KB) - added by Sebastian Huber on May 15, 2015 at 9:12:36 AM.
ngmp-mutex-old.png (34.7 KB) - added by Sebastian Huber on May 15, 2015 at 9:12:46 AM.
ngmp-mutex-new.png (34.0 KB) - added by Sebastian Huber on May 15, 2015 at 9:12:56 AM.
ngmp-msg-old.png (35.8 KB) - added by Sebastian Huber on May 15, 2015 at 9:13:25 AM.
ngmp-msg-new.png (32.2 KB) - added by Sebastian Huber on May 15, 2015 at 9:13:33 AM.

Download all attachments as: .zip

Change History (24)

Changed on Feb 17, 2015 at 6:32:18 PM by Sebastian Huber

Attachment: fine.png added

Changed on Feb 17, 2015 at 6:32:32 PM by Sebastian Huber

Attachment: giant.png added

comment:1 Changed on Feb 17, 2015 at 6:34:15 PM by Sebastian Huber

Description: modified (diff)

comment:2 Changed on Feb 18, 2015 at 2:36:24 PM by Sebastian Huber

Status: newaccepted

comment:4 Changed on Mar 4, 2015 at 11:05:31 AM by Sebastian Huber <sebastian.huber@…>

In e50297e36caf2c989a03825cc8589cd730c03389/rtems:

score: ISR lock C/C++ compatiblity issue

Empty structures are implementation-defined in C. GCC gives them a size
of zero. In C++ empty structures have a non-zero size.

Add ISR_LOCK_DEFINE() to define ISR locks for structures used by C and
C++.

Update #2273.

Changed on Mar 16, 2015 at 2:36:17 PM by Sebastian Huber

Attachment: Test_with_fgl.png added

Test event send/receive with fine grained locking.

Changed on Mar 16, 2015 at 2:36:38 PM by Sebastian Huber

Attachment: test_with_giant_lock.png added

Test event send/receive with Giant lock.

comment:7 Changed on Mar 16, 2015 at 2:50:06 PM by Sebastian Huber

The new test [dc5e5f44485b8f9709da767f79595d2fa8aff74d/rtems] shows that the prototype implementation for the RTEMS events yields a significant improvement for SMP systems in case no blocking operations are involved. The test data is gathered on a 24 processor Freescale T4240 platform with the RTEMS version [6db5e650fb80824c4e9339dfe73976662245d449/rtems].

The first plot shows the count of event send/receive operations for task sets of size 1 to 24 with the patch [7d6e94b12ab94f13d3e769702d50f91be7d5884b/rtems] reverted. The Giant lock limits the total operation count and no benefit from more processors is visible.

Test event send/receive with Giant lock.

The second plot shows the count of event send/receive operations for task sets of size 1 to 24. The sum of the requests per second increases in this test scenario with fine grained locking linearly with the task set cardinality up to the count of processors.

Test event send/receive with fine grained locking.

In addition the new implementation (fine grained locking) is more efficient on a task set size of one with 2439242 vs. 1367189 operations per second.

comment:8 Changed on Apr 21, 2015 at 6:26:28 AM by Sebastian Huber <sebastian.huber@…>

In d3802bb5d708d01d03774b6758ed1e87484b8fac/rtems:

score: Delete object control block ISR lock

The Objects_Control::Lock was a software layer violation. It worked
only for the threads since they are somewhat special.

Update #2273.

Changed on May 15, 2015 at 9:12:27 AM by Sebastian Huber

Attachment: ngmp-events-old.png added

Changed on May 15, 2015 at 9:12:36 AM by Sebastian Huber

Attachment: ngmp-events-new.png added

Changed on May 15, 2015 at 9:12:46 AM by Sebastian Huber

Attachment: ngmp-mutex-old.png added

Changed on May 15, 2015 at 9:12:56 AM by Sebastian Huber

Attachment: ngmp-mutex-new.png added

Changed on May 15, 2015 at 9:13:25 AM by Sebastian Huber

Attachment: ngmp-msg-old.png added

Changed on May 15, 2015 at 9:13:33 AM by Sebastian Huber

Attachment: ngmp-msg-new.png added

comment:10 Changed on May 15, 2015 at 9:47:18 AM by Sebastian Huber

The following test data is gathered on a 4 processor ESA/Gaisler NGMP platform running at 50MHz. Three test cases of the tmtests/tmfine01 test program are presented using RTEMS version [dc5e5f44485b8f9709da767f79595d2fa8aff74d/rtems] with the patch [7d6e94b12ab94f13d3e769702d50f91be7d5884b/rtems] reverted (= old) and a RTEMS version with a fine grained locking implementation for events, semaphores and message queues (= new). The data for the old version is presented before the data for the new version. There is one task per processor.

The first test case sends events to self.



The second test case obtains and releases a mutex private to the task.



The third test case sends and receives a message to self.



The new implementation gets rid of the Giant lock bottleneck and scales well with the processor count. In addition it yields more operations per second on one processor.

Operations on one Processor (Old) Operations on one Processor (New) Change
Events to Self 44262 82150 86%
One Mutex per Task 42520 67998 60%
Message to Self 32591 58596 80%
Last edited on May 15, 2015 at 9:48:22 AM by Sebastian Huber (previous) (diff)

comment:11 Changed on May 16, 2015 at 9:31:43 AM by Sebastian Huber

Milestone: 4.11.14.11

comment:13 Changed on Oct 10, 2017 at 6:27:10 AM by Sebastian Huber

Component: SMPscore

comment:14 Changed on Oct 10, 2017 at 6:29:01 AM by Sebastian Huber

Component: scorecpukit
Note: See TracTickets for help on using tickets.