Version 1 (modified by Chris Johns, on Jun 25, 2016 at 4:04:48 AM) (diff)

Add watch point notes.

Debugging a Heap Protection Error

The RTEMS Heap Allocator has a heap protector to help track find corruptions. The most common type of heap corruption errors are using a pointer that has been returned to the heap or writing to memory outside the block allocated. These errors are difficult to track down and you need hardware watchpoint support to find them quickly.

Hardware watchpoint support is something GDB can provide if your hardware is capable of providing it. Processors such as ARM, i386, and Coldfire have such support. A hardware watchpoint configures special hardware support in the processor to monitor a location for a write access and if it happens a breakpoint it is triggered and executable it stopped at the specific line of code.

To find a heap corruption you need to locate the address of the heap block header and the block marker being checked and then set a hardware watchpoint.

This example is using the PC BSP (pc686) and the build-in GDB stub which has hardware watchpoint support. I have a bug in the LibBSD rc.conf support I need to track down. The rc.conf uses Newlib's regex support and code that runs before the Newlib code is called has corrupted the heap. The Newlib call just happens to make a heap allocation request that detects the corruption.

I always have a breakpoint set to bsp_fatal_extension. If something like a heap corruption is detected execution stops when that function is called. A backtrace is:

(gdb) bt
#0  bsp_fatal_extension (source=INTERNAL_ERROR_CORE, is_internal=false, code=3735928559) at /opt/work/chris/rtems/kernel/rtems.master/c/src/lib/libbsp/i386/pc386/../../shared/bspclean.c:19
#1  0x001b4100 in _User_extensions_Iterate (arg=arg@entry=0x2a4c94, visitor=0x1b4050 <_User_extensions_Fatal_visitor>, direction=<optimized out>, direction@entry=CHAIN_ITERATOR_BACKWARD) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/userextiterate.c:171
#2  0x001ae842 in _User_extensions_Fatal (error=3735928559, is_internal=false, source=INTERNAL_ERROR_CORE) at ../../cpukit/../../../pc686/lib/include/rtems/score/userextimpl.h:298
#3  _Terminate (the_source=the_source@entry=INTERNAL_ERROR_CORE, is_internal=is_internal@entry=false, the_error=the_error@entry=3735928559) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/interr.c:44
#4  0x001ad99e in _Heap_Protection_block_error_default (heap=0x2154a0 <_Workspace_Area>, block=0x2a2510) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heap.c:160
#5  0x001ae3cd in _Heap_Protection_block_error (block=0x2a2510, heap=0x2154a0 <_Workspace_Area>) at ../../cpukit/../../../pc686/lib/include/rtems/score/heapimpl.h:336
#6  _Heap_Protection_determine_block_free (block=0x2a2510, heap=0x2154a0 <_Workspace_Area>) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapfree.c:92
#7  _Heap_Free (heap=heap@entry=0x2154a0 <_Workspace_Area>, alloc_begin_ptr=0x2a2534) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapfree.c:141
#8  0x001ae02a in _Heap_Protection_free_delayed_blocks (alloc_begin=0, heap=0x2154a0 <_Workspace_Area>) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapallocate.c:52
#9  _Heap_Allocate_aligned_with_boundary (heap=<optimized out>, heap@entry=0x2154a0 <_Workspace_Area>, alloc_size=<optimized out>, alloc_size@entry=4294967255, alignment=alignment@entry=0, boundary=boundary@entry=0) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapallocate.c:253
#10 0x001ae6da in _Heap_Allocate (size=4294967255, heap=0x2154a0 <_Workspace_Area>) at ../../cpukit/../../../pc686/lib/include/rtems/score/heapimpl.h:137
#11 _Heap_Protection_free_all_delayed_blocks (heap=0x2154a0 <_Workspace_Area>) at ../../cpukit/../../../pc686/lib/include/rtems/score/heapimpl.h:345
#12 _Heap_Resize_block (heap=heap@entry=0x2154a0 <_Workspace_Area>, alloc_begin_ptr=alloc_begin_ptr@entry=0x2a576c, new_alloc_size=new_alloc_size@entry=72, old_size=old_size@entry=0x2a4dd8, new_size=new_size@entry=0x2a4ddc) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapresizeblock.c:108
#13 0x001a5b52 in realloc (ptr=0x2a576c, size=72) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/libcsupport/src/realloc.c:71
#14 0x001bb30e in stripsnug (g=0x2a55e8, g=0x2a55e8, p=0x2a4e30) at ../../../../../../gcc-6-20160609/newlib/libc/posix/regcomp.c:1668
#15 regcomp (preg=preg@entry=0x2a4ee0, pattern=0x2a5554 "test_regex_.*", cflags=cflags@entry=1) at ../../../../../../gcc-6-20160609/newlib/libc/posix/regcomp.c:271
#16 0x00139d9b in rtems_bsd_rc_conf_find_next (rc_conf=0x22e354, argc_argv=0x2a5520) at ../../rtemsbsd/rtems/rtems-bsd-rc-conf.c:395
#17 0x0010026f in test_service (rc_conf=0x22e354) at ../../testsuite/rcconf01/test_main.c:71
#18 0x00139ae9 in rc_conf_worker (task_argument=2286420) at ../../rtemsbsd/rtems/rtems-bsd-rc-conf.c:720
#19 0x001b6249 in _Thread_Handler () at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/threadhandler.c:93
#20 0xa5a5a5a5 in ?? ()

It is easy to see from the backtrace the heap has detected a corruption. Move up the stack frame to find the corruption check using the GDB up command. In this case it is frame #6:

(gdb) up
#6  _Heap_Protection_determine_block_free (block=0x2a2510, heap=0x2154a0 <_Workspace_Area>) at /opt/work/chris/rtems/kernel/rtems.master/c/src/../../cpukit/score/src/heapfree.c:92
92            _Heap_Protection_block_error( heap, block );

The code for _Heap_Protection_determine_block_free is:

(gdb) l _Heap_Protection_determine_block_free
79          Heap_Control *heap,
80          Heap_Block *block
81        )
82        {
83          bool do_free = true;
84          Heap_Block *const next = block->Protection_begin.next_delayed_free_block;
86          if ( next == NULL ) {
87            _Heap_Protection_delay_block_free( heap, block );
88            do_free = false;
(gdb) l
89          } else if ( next == HEAP_PROTECTION_OBOLUS ) {
90            _Heap_Protection_check_free_block( heap, block );
91          } else {
92            _Heap_Protection_block_error( heap, block );
93          }
95          return do_free;
96        }
97      #endif

The heap is made up of blocks of type Heap_Block and a corruption has been detected. We need to look carefully because the compiler can optimize paths. Lets have a look at the value for next:

(gdb) p next
$4 = <optimized out>
(gdb) p /x block->Protection_begin.next_delayed_free_block
$5 = 0x1

We cannot see it directly because it is not available however we can see it's value in the block. It is 1. A quick search of the RTEMS source code find the value of HEAP_PROTECTION_OBOLUS as being 1:

$ grep -r HEAP_PROTECTION_OBOLUS cpukit/score/
cpukit/score/include/rtems/score/heap.h:  #define HEAP_PROTECTION_OBOLUS ((Heap_Block *) 1)
cpukit/score/src/heapallocate.c:          HEAP_PROTECTION_OBOLUS;
cpukit/score/src/heapfree.c:    } else if ( next == HEAP_PROTECTION_OBOLUS ) {

This means the execution path is _Heap_Protection_check_free_block and a check of that code shows it to contain a call to `_Heap_Protection_block_error1:

  static void _Heap_Protection_check_free_block(
    Heap_Control *heap,
    Heap_Block *block
    uintptr_t *const pattern_begin = (uintptr_t *)
      _Heap_Alloc_area_of_block( block );
    uintptr_t *const pattern_end = (uintptr_t *)
      ((uintptr_t) block + _Heap_Block_size( block ) + HEAP_ALLOC_BONUS);
    uintptr_t *current = NULL;

    for ( current = pattern_begin; current != pattern_end; ++current ) {
      if ( *current != HEAP_FREE_PATTERN ) {
        _Heap_Protection_block_error( heap, block );

on the i386 things become complicated because the compiler has inlined all the calls in _Heap_Free and this makes it difficult to isolate the specific line of code. Setting a break on the call to _Heap_Protection_block_error in _Heap_Protection_check_free_block does not end up at the correct instruction. using objdump I inspected the code in _Heap_Free and found the specific instruction:

$ /opt/work/rtems/4.12/bin/i386-rtems4.12-objdump -d --source build/i386-rtems4.12-pc686/rcconf01.exe | less

Using less I searched for <_Heap_Free>. To search hit / then enter <_Heap_Free> and repeat the search with just / until you see:

001ae320 <_Heap_Free>:

The instruction location will be different. Scroll down looking for the source code for if ( *current != HEAP_FREE_PATTERN ):

      if ( *current != HEAP_FREE_PATTERN ) {
  1ae4ba:       81 3b df 3c 09 e7       cmpl   $0xe7093cdf,(%ebx)
  1ae4c0:       0f 85 f6 fe ff ff       jne    1ae3bc <_Heap_Free+0x9c>
  1ae4c6:       89 4c 24 1c             mov    %ecx,0x1c(%esp)
  1ae4ca:       89 da                   mov    %ebx,%edx
  1ae4cc:       8b 4c 24 18             mov    0x18(%esp),%ecx
  1ae4d0:       eb 1a                   jmp    1ae4ec <_Heap_Free+0x1cc>
  1ae4d2:       8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
  1ae4d9:       8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi
  1ae4e0:       81 3a df 3c 09 e7       cmpl   $0xe7093cdf,(%edx)
  1ae4e6:       0f 85 d0 fe ff ff       jne    1ae3bc <_Heap_Free+0x9c>

The place to set the breakpoint is 1ae4e6 which happens to the location the call frame ends up at. I set a breakpoint on that instruction:

(gdb) b *0x1ae4e6
(gdb) c

The debugger continues then hits the breakpoint and I was able to see the current pointer:

(gdb) p current
$3 = (uintptr_t *) 0x216788

I now have the location that is detected as corrupt I can set the watchpoint:

(gdb) awatch *0x216788

It pays to set to set the watchpoint as close to the access that corrupts it. This may take a few runs to find. The watchpoint can be triggered by valid accesses to the location. You need to skip these.