#1961 closed defect (fixed)

Race condition when using rtems_message_queue_send in ISR

Reported by: seb Owned by: Joel Sherrill
Priority: normal Milestone: 4.11
Component: score Version: 4.11
Severity: critical Keywords:
Cc: plattro@…, gedare@… Blocked By:
Blocking:

Description

(from Werner Almesberger, see also: http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-November/002100.html)

If it's permissible to call rtems_message_queue_send from an
interrupt, then there is at least one race condition in the core
message subsystem.

This created the MIDI/mouse hang we love so much on M1.

The problem is as follows: RTEMS queues use pre-allocated message
buffers that are kept on an "inactive" (free) list. When enqueuing
a message, a buffer is first removed from the inactive list, data
it copied to it, and it is then added to the pending list.

The reverse happens when dequeuing. Besides these two queues, there
is also a counter called number_of_pending_messages keeping track,
as the name suggests, of the number of pending messages. It is
updated atomically together with changes to the pending buffers
list.

From the above it is clear that the counter will be out of sync with
the inactive list between the beginning and the end of an enqueue or
dequeue operation.

In order to minimize interrupt latency, RTEMS disables interrupts
only when adding and removing buffers from lists, but not throughout
the whole enqueuing/dequeuing operation. Instead, it disables the
scheduler during these operations, but this doesn't prevent
interrupts.

This means that the inconsistency between number_of_pending_messages
and the inactive list can be observed from an interrupt handler if
enqueuing or dequeuing is in progress.

_CORE_message_queue_Submit checks whether there is still room in the
queue by reading number_of_pending_messages. If there is room, it
then calls _CORE_message_queue_Allocate_message_buffer to obtain a
free buffer.

Given that number_of_pending_messages and the list of inactive
buffers can disagree, e.g., if _CORE_message_queue_Seize or another
_CORE_message_queue_Submit is executing concurrently,
_CORE_message_queue_Allocate_message_buffer may fail to obtain a
free buffer despite the prior test.

_CORE_message_queue_Allocate_message_buffer can detect a lack of
free buffers and indicates it by returning a NULL pointer. Checking
whether NULL has been returned instead of a buffer is optional and
depends on RTEMS_DEBUG.

If no check is performed, _CORE_message_queue_Submit will then try
to use the buffer. In the absence of hardware detecting the
de-referencing of NULL pointers, the wounded system will limp on a
little further until, at least in the case of M1, it finally hangs
somewhere.

The patch below avoids the problem in the scenario described above
by not using number_of_pending_messages as an indicator of whether
free buffers are available, but by simply trying to get a buffer,
and handling the result of failure.

This is similar to how _CORE_message_queue_Seize works.

Another possibility would be to make testing of the_message no
longer optional. But then, there would basically be two tests for
the same condition, which is ugly.

  • Werner

Attachments (1)

coremsgsubmit.diff (1.1 KB) - added by seb on Nov 10, 2011 at 12:12:09 PM.
patch

Download all attachments as: .zip

Change History (5)

Changed on Nov 10, 2011 at 12:12:09 PM by seb

Attachment: coremsgsubmit.diff added

patch

comment:1 Changed on Nov 26, 2011 at 5:01:20 PM by Gedare

Cc: Gedare added

comment:2 Changed on Nov 28, 2011 at 6:18:09 AM by Rob Platt

Cc: Rob Platt added

comment:3 Changed on Nov 28, 2011 at 6:42:59 PM by Joel Sherrill

Resolution: fixed
Status: newclosed

Patch applied to all open branches and head. Thanks.

comment:4 Changed on Nov 24, 2014 at 6:58:28 PM by Gedare

Version: HEAD4.11

Replace Version=HEAD with Version=4.11 for the tickets with Milestone >= 4.11

Note: See TracTickets for help on using tickets.