Opened on 04/16/12 at 07:45:58
Last modified on 08/14/17 at 00:36:12
#2054 new defect
Race condition in FIFO/pipe implementation
Reported by: | Sebastian Huber | Owned by: | Joel Sherrill |
---|---|---|---|
Priority: | low | Milestone: | Indefinite |
Component: | score | Version: | |
Severity: | critical | Keywords: | |
Cc: | Blocked By: | ||
Blocking: |
Description (last modified by Sebastian Huber)
Does anyone know why the FIFO/pipe implementation uses barriers for synchronization? For me the usage of the barrier looks like a hand crafted condition variable.
I think there is also a race condition in its use. The rtems_barrier_release() releases only the threads currently waiting at the barrier. Now we have several code sections like this:
PIPE_UNLOCK(pipe);
if (! PIPE_WRITEWAIT(pipe))
ret = -EINTR;
Thus the release may have no effect in case it happens after the PIPE_UNLOCK and before the PIPE_WRITEWAIT (which is a rtems_barrier_wait() with no time out).
The bdbuf code uses a similar approach with semaphores. It disables the preemption to avoid this race condition:
static void
rtems_bdbuf_anonymous_wait (rtems_bdbuf_waiters *waiters)
{
rtems_status_code sc;
rtems_mode prev_mode;
/*
- Indicate we are waiting. */
++waiters->count;
/*
- Disable preemption then unlock the cache and block. There is no POSIX
- condition variable in the core API so this is a work around. *
- The issue is a task could preempt after the cache is unlocked because it is
- blocking or just hits that window, and before this task has blocked on the
- semaphore. If the preempting task flushes the queue this task will not see
- the flush and may block for ever or until another transaction flushes this
- semaphore. */
prev_mode = rtems_bdbuf_disable_preemption ();
/*
- Unlock the cache, wait, and lock the cache when we return. */
rtems_bdbuf_unlock_cache ();
sc = rtems_semaphore_obtain (waiters->sema, RTEMS_WAIT, RTEMS_BDBUF_WAIT_TIMEOUT);
if (sc == RTEMS_TIMEOUT)
rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_TO);
if (sc != RTEMS_UNSATISFIED)
rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_2);
rtems_bdbuf_lock_cache ();
rtems_bdbuf_restore_preemption (prev_mode);
--waiters->count;
}
Please note that this approach is not SMP proof. I think we need condition variables in the RTEMS Classic API.
Change History (4)
comment:1 Changed on 04/16/12 at 07:45:58 by Sebastian Huber
comment:2 Changed on 11/24/14 at 18:58:28 by Gedare Bloom
Version: | HEAD → 4.11 |
---|
Replace Version=HEAD with Version=4.11 for the tickets with Milestone >= 4.11
comment:3 Changed on 12/18/14 at 11:13:05 by Sebastian Huber
Description: | modified (diff) |
---|---|
Milestone: | 4.11 → 5.0 |
Priority: | normal → low |
Severity: | normal → critical |
comment:4 Changed on 08/14/17 at 00:36:12 by Chris Johns
Milestone: | 5.0 → Indefinite |
---|---|
Version: | 4.11 |
Does anyone know why the FIFO/pipe implementation uses barriers for synchronization? For me the usage of the barrier looks like a hand crafted condition variable.
I think there is also a race condition in its use. The rtems_barrier_release() releases only the threads currently waiting at the barrier. Now we have several code sections like this:
Thus the release may have no effect in case it happens after the PIPE_UNLOCK and before the PIPE_WRITEWAIT (which is a rtems_barrier_wait() with no time out).
The bdbuf code uses a similar approach with semaphores. It disables the preemption to avoid this race condition:
static void
rtems_bdbuf_anonymous_wait (rtems_bdbuf_waiters *waiters)
{
}
Please note that this approach is not SMP proof. I think we need condition variables in the RTEMS Classic API.