Changeset ae05a27 in rtems-docs


Ignore:
Timestamp:
Nov 18, 2018, 10:11:04 AM (5 months ago)
Author:
Marçal Comajoan Cara <mcomajoancara@…>
Branches:
master
Children:
0a9dd48
Parents:
0bb0b8d
git-author:
Marçal Comajoan Cara <mcomajoancara@…> (11/18/18 10:11:04)
git-committer:
Joel Sherrill <joel@…> (11/19/18 19:11:55)
Message:

Improve SPARC Calling Overview Webpage conversion

Fixed tables, typos, redrawn images and converted ASCII art to ditaa
and PNG, and improved the overall format.

This work was part of GCI 2018.

Closes #3567.

Files:
5 added
1 edited

Legend:

Unmodified
Added
Removed
  • cpu-supplement/sparc_v8_stacks_regwin.rst

    r0bb0b8d rae05a27  
    55.. comment This content is no longer online and only accessible at
    66.. comment https://web.archive.org/web/20120205014832/https://www.sics.se/~psm/sparcstack.html
    7 
    8 .. comment XXX Format Tables
    9 .. comment XXX Format Figures (could be code, ascii art, etc.)
    10 .. comment XXX double check against web page
    11 .. comment XXX Fix Figure references in text
    12 .. comment XXX instruction names probably should be marked as code font
    137
    148Understanding stacks and registers in the SPARC architecture(s)
     
    2418-----------------
    2519SPARC has 32 general purpose integer registers visible to the program
    26 at any given time. Of these, 8 registers are global registers and 24
     20at any given time. Of these, 8 registers are ``global`` registers and 24
    2721registers are in a register window. A window consists of three groups
    28 of 8 registers, the out, local, and in registers. See table 1. A SPARC
    29 implementation can have from 2 to 32 windows, thus varying the number
    30 of registers from 40 to 520. Most implentations have 7 or 8 windows. The
     22of 8 registers, the ``out``, ``local``, and ``in`` registers. See table 1. A
     23SPARC implementation can have from 2 to 32 windows, thus varying the number
     24of registers from 40 to 520. Most implementations have 7 or 8 windows. The
    3125variable number of registers is the principal reason for the SPARC being
    3226"scalable".
     
    3731incremented by the SAVE and RESTORE instructions, respectively. These
    3832instructions are generally executed on procedure call and return
    39 (respectively). The idea is that the in registers contain incoming
    40 parameters, the local register constitute scratch registers, the out
    41 registers contain outgoing parameters, and the global registers contain
     33(respectively). The idea is that the ``in`` registers contain incoming
     34parameters, the ``local`` register constitutes scratch registers, the ``out``
     35registers contain outgoing parameters, and the ``global`` registers contain
    4236values that vary little between executions. The register windows overlap
    43 partially, thus the out registers become renamed by SAVE to become the in
    44 registers of the called procedure. Thus, the memory traffic is reduced
     37partially, thus the ``out`` registers become renamed by SAVE to become the
     38``in`` registers of the called procedure. Thus, the memory traffic is reduced
    4539when going up and down the procedure call. Since this is a frequent
    4640operation, performance is improved.
     
    5549high-end SPARC processors such as the SuperSPARC, although more recent
    5650implementations have dealt effectively with the obstacles. Register
    57 windows is now part of the compatibility legacy and not easily removed
     51windows are now part of the compatibility legacy and not easily removed
    5852from the architecture.)
    5953
    60 .. comment XXX FIX FORMATTING
    61 
    62 +------------+------------+---------------+
    63 |  Register  |  Mnemonic  |    Register   |
    64 |  Group     |            |    Address    |
    65 +============+============+===============+
    66 +   global   +  %g0-%g7   + r[0] - r[7]   +
    67 +------------+------------+---------------+
    68 +    out     +  %o0-%o7   + r[8] - r[15]  +
    69 +------------+------------+---------------+
    70 +   local    +  %l0-%l7   + r[16] - r[23] +
    71 +------------+------------+---------------+
    72 +    in      +  %i0-%i7   + r[24] - r[31] +
    73 +------------+------------+---------------+
    74 
    75 Table 1 - Visible Registers
     54.. table:: Table 1 - Visible Registers
     55
     56    +----------------+------------+---------------+
     57    |   Register     |  Mnemonic  |   Register    |
     58    |   Group        |            |   Address     |
     59    +================+============+===============+
     60    +   ``global``   +  %g0-%g7   + r[0] - r[7]   +
     61    +----------------+------------+---------------+
     62    +    ``out``     +  %o0-%o7   + r[8] - r[15]  +
     63    +----------------+------------+---------------+
     64    +   ``local``    +  %l0-%l7   + r[16] - r[23] +
     65    +----------------+------------+---------------+
     66    +    ``in``      +  %i0-%i7   + r[24] - r[31] +
     67    +----------------+------------+---------------+
     68
    7669
    7770The overlap of the registers is illustrated in figure 1. The figure
    7871shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to
    79 w7 in the figure).. Each window corresponds to 24 registers, 16 of which
     72w7 in the figure). Each window corresponds to 24 registers, 16 of which
    8073are shared with "neighboring" windows. The windows are arranged in a
    8174wrap-around manner, thus window number 0 borders window number 7. The
    8275common cause of changing the current window, as pointed to by CWP, is
    83 the RESTORE and SAVE instuctions, shown in the middle. Less common is
     76the RESTORE and SAVE instructions, shown in the middle. Less common is
    8477the supervisor RETT instruction (return from trap) and the trap event
    8578(interrupt, exception, or TRAP instruction).
    8679
    87 .. comment XXX insert graphic from website (redraw if needed)
    88 
    89 Figure 1 - Windowed Registers
     80.. figure:: ../images/cpu_supplement/sparcwin.png
     81
     82    Figure 1 - Windowed Registers
    9083
    9184The "WIM" register is also indicated in the top left of Figure 1. The
     
    112105Figure 2 shows a summary of register contents at any given time.
    113106
    114 .. comment XXX FIX FORMATTING
    115 
    116107.. code-block:: c
    117108
     
    152143                 %i7  (r31)  [3]  return address - 8
    153144
    154 Notes:
    155 
    156 [1] assumed by caller to be destroyed (volatile) across a procedure call
    157 [2] should not be used by SPARC ABI library code
    158 [3] assumed by caller to be preserved across a procedure call
    159 
    160 Figure 2 - SPARC register semantics
     145.. topic:: Items
     146
     147    [1] assumed by caller to be destroyed (volatile) across a procedure call
     148
     149    [2] should not be used by SPARC ABI library code
     150
     151    [3] assumed by caller to be preserved across a procedure call
     152
     153*Figure 2 - SPARC register semantics*
    161154
    162155Particular compilers are likely to vary slightly.
     
    168161(in the future) bottom-up, i.e. %g7 is currently the "safest" to use.
    169162
    170 Optimizing linkers and interpreters are exmples that use global registers.
     163Optimizing linkers and interpreters are examples that use global registers.
    171164
    172165Register Windows and the Stack
     
    176169stack. In particular, the stack pointer (%sp or %o6) must always point
    177170to a free block of 64 bytes. This area is used by the operating system
    178 (Solaris, SunOS, and Linux at least) to save the current local and in
    179 registers upon a system interupt, exception, or trap instruction. (Note
    180 that this can occur at any time.)
     171(Solaris, SunOS, and Linux at least) to save the current ``local`` and
     172``in`` registers upon a system interrupt, exception, or trap instruction.
     173(Note that this can occur at any time.)
    181174
    182175Other aspects of register relations with memory are programming
    183 convention. The typical, and recommended, layout of the stack is shown
     176convention. The typical and recommended layout of the stack is shown
    184177in figure 3. The figure shows a stack frame.
    185178
    186 .. comment XXX FIX FORMATTING
    187 
    188 .. code-block:: c
    189 
    190                     low addresses
    191 
    192                +-------------------------+
    193      %sp  -->  | 16 words for storing    |
    194                | LOCAL and IN registers  |
    195                +-------------------------+
    196                |  one-word pointer to    |
    197                | aggregate return value  |
    198                +-------------------------+
    199                |   6 words for callee    |
    200                |   to store register     |
    201                |       arguments         |
    202                +-------------------------+
    203                |  outgoing parameters    |
    204                |  past the 6th, if any   |
    205                +-------------------------+
    206                |  space, if needed, for  |
    207                |  compiler temporaries   |
    208                |   and saved floating-   |
    209                |    point registers      |
    210                +-------------------------+
    211 
    212                +-------------------------+
    213                |    space dynamically    |
    214                |    allocated via the    |
    215                |  alloca() library call  |
    216                +-------------------------+
    217                |  space, if needed, for  |
    218                |    automatic arrays,    |
    219                |    aggregates, and      |
    220                |   addressable scalar    |
    221                |       automatics        |
    222                +-------------------------+
    223     %fp  -->
    224                      high addresses
    225 
    226 Figure 3 - Stack frame contents
     179.. figure:: ../images/cpu_supplement/stack_frame_contents.png
     180
     181    Figure 3 - Stack frame contents
    227182
    228183Note that the top boxes of figure 3 are addressed via the stack pointer
     
    232187the separation of information known at compile time (number and size
    233188of local parameters, etc) from run-time information (size of blocks
    234 allocated by alloca()).
     189allocated by ``alloca()``).
    235190
    236191"addressable scalar automatics" is a fancy name for local variables.
    237192
    238 The clever nature of the stack and frame pointers are that they are always
     193The clever nature of the stack and frame pointers is that they are always
    23919416 registers apart in the register windows. Thus, a SAVE instruction will
    240195make the current stack pointer into the frame pointer and, since the SAVE
     
    243198listing is from the "pwin" command in the SimICS simulator.)
    244199
    245 .. comment XXX FIX FORMATTING
    246 
    247 .. code-block:: c
    248 
    249                   REGISTER WINDOWS
    250 
    251                  +--+---+----------+
    252                  |g0|r00|0x00000000| global
    253                  |g1|r01|0x00000006| registers
    254                  |g2|r02|0x00091278|
    255       g0-g7      |g3|r03|0x0008ebd0|
    256                  |g4|r04|0x00000000|                     (note: 'save' and 'trap' decrements CWP,
    257                  |g5|r05|0x00000000|                      i.e. moves it up on this diagram. 'restore'
    258                  |g6|r06|0x00000000|                      and 'rett' increments CWP, i.e. down)
    259                  |g7|r07|0x00000000|
    260                  +--+---+----------+
    261  CWP (2)         |o0|r08|0x00000002|
    262                  |o1|r09|0x00000000|                            MEMORY
    263                  |o2|r10|0x00000001|
    264       o0-o7      |o3|r11|0x00000001|             stack growth
    265                  |o4|r12|0x000943d0|
    266                  |o5|r13|0x0008b400|                  ^
    267                  |sp|r14|0xdffff9a0| ----\           /|\
    268                  |o7|r15|0x00062abc|     |            |                     addresses
    269                  +--+---+----------+     |     +--+----------+         virtual     physical
    270                  |l0|r16|0x00087c00|     \---> |l0|0x00000000|        0xdffff9a0  0x000039a0  top of frame 0
    271                  |l1|r17|0x00027fd4|           |l1|0x00000000|        0xdffff9a4  0x000039a4
    272                  |l2|r18|0x00000000|           |l2|0x0009df80|        0xdffff9a8  0x000039a8
    273       l0-l7      |l3|r19|0x00000000|           |l3|0x00097660|        0xdffff9ac  0x000039ac
    274                  |l4|r20|0x00000000|           |l4|0x00000014|        0xdffff9b0  0x000039b0
    275                  |l5|r21|0x00097678|           |l5|0x00000001|        0xdffff9b4  0x000039b4
    276                  |l6|r22|0x0008b400|           |l6|0x00000004|        0xdffff9b8  0x000039b8
    277                  |l7|r23|0x0008b800|           |l7|0x0008dd60|        0xdffff9bc  0x000039bc
    278               +--+--+---+----------+           +--+----------+
    279  CWP+1 (3)    |o0|i0|r24|0x00000002|           |i0|0x00091048|        0xdffff9c0  0x000039c0
    280               |o1|i1|r25|0x00000000|           |i1|0x00000011|        0xdffff9c4  0x000039c4
    281               |o2|i2|r26|0x0008b7c0|           |i2|0x00091158|        0xdffff9c8  0x000039c8
    282       i0-i7   |o3|i3|r27|0x00000019|           |i3|0x0008d370|        0xdffff9cc  0x000039cc
    283               |o4|i4|r28|0x0000006c|           |i4|0x0008eac4|        0xdffff9d0  0x000039d0
    284               |o5|i5|r29|0x00000000|           |i5|0x00000000|        0xdffff9d4  0x000039d4
    285               |o6|fp|r30|0xdffffa00| ----\     |fp|0x00097660|        0xdffff9d8  0x000039d8
    286               |o7|i7|r31|0x00040468|     |     |i7|0x00000000|        0xdffff9dc  0x000039dc
    287               +--+--+---+----------+     |     +--+----------+
    288                                          |        |0x00000001|        0xdffff9e0  0x000039e0  parameters
    289                                          |        |0x00000002|        0xdffff9e4  0x000039e4
    290                                          |        |0x00000040|        0xdffff9e8  0x000039e8
    291                                          |        |0x00097671|        0xdffff9ec  0x000039ec
    292                                          |        |0xdffffa68|        0xdffff9f0  0x000039f0
    293                                          |        |0x00024078|        0xdffff9f4  0x000039f4
    294                                          |        |0x00000004|        0xdffff9f8  0x000039f8
    295                                          |        |0x0008dd60|        0xdffff9fc  0x000039fc
    296               +--+------+----------+     |     +--+----------+
    297               |l0|      |0x00087c00|     \---> |l0|0x00091048|        0xdffffa00  0x00003a00  top of frame 1
    298               |l1|      |0x000c8d48|           |l1|0x0000000b|        0xdffffa04  0x00003a04
    299               |l2|      |0x000007ff|           |l2|0x00091158|        0xdffffa08  0x00003a08
    300               |l3|      |0x00000400|           |l3|0x000c6f10|        0xdffffa0c  0x00003a0c
    301               |l4|      |0x00000000|           |l4|0x0008eac4|        0xdffffa10  0x00003a10
    302               |l5|      |0x00088000|           |l5|0x00000000|        0xdffffa14  0x00003a14
    303               |l6|      |0x0008d5e0|           |l6|0x000c6f10|        0xdffffa18  0x00003a18
    304               |l7|      |0x00088000|           |l7|0x0008cd00|        0xdffffa1c  0x00003a1c
    305               +--+--+---+----------+           +--+----------+
    306  CWP+2 (4)    |i0|o0|   |0x00000002|           |i0|0x0008cb00|        0xdffffa20  0x00003a20
    307               |i1|o1|   |0x00000011|           |i1|0x00000003|        0xdffffa24  0x00003a24
    308               |i2|o2|   |0xffffffff|           |i2|0x00000040|        0xdffffa28  0x00003a28
    309               |i3|o3|   |0x00000000|           |i3|0x0009766b|        0xdffffa2c  0x00003a2c
    310               |i4|o4|   |0x00000000|           |i4|0xdffffa68|        0xdffffa30  0x00003a30
    311               |i5|o5|   |0x00064c00|           |i5|0x000253d8|        0xdffffa34  0x00003a34
    312               |i6|o6|   |0xdffffa70| ----\     |i6|0xffffffff|        0xdffffa38  0x00003a38
    313               |i7|o7|   |0x000340e8|     |     |i7|0x00000000|        0xdffffa3c  0x00003a3c
    314               +--+--+---+----------+     |     +--+----------+
    315                                          |        |0x00000001|        0xdffffa40  0x00003a40  parameters
    316                                          |        |0x00000000|        0xdffffa44  0x00003a44
    317                                          |        |0x00000000|        0xdffffa48  0x00003a48
    318                                          |        |0x00000000|        0xdffffa4c  0x00003a4c
    319                                          |        |0x00000000|        0xdffffa50  0x00003a50
    320                                          |        |0x00000000|        0xdffffa54  0x00003a54
    321                                          |        |0x00000002|        0xdffffa58  0x00003a58
    322                                          |        |0x00000002|        0xdffffa5c  0x00003a5c
    323                                          |        |    .     |
    324                                          |        |    .     |        .. etc (another 16 bytes)
    325                                          |        |    .     |
    326 
    327 Figure 4 - Sample stack contents
     200.. figure:: ../images/cpu_supplement/sample_stack_contents.png
     201
     202    Figure 4 - Sample stack contents
    328203
    329204Note how the stack contents are not necessarily synchronized with the
     
    334209
    335210Writing a library for multithreaded execution is an example that requires
    336 explicit flushing, as is longjmp().
     211explicit flushing, as is ``longjmp()``.
    337212
    338213Procedure epilogue and prologue
     
    342217entry/exit mechanisms listed in figure 5.
    343218
    344 .. comment XXX FIX FORMATTING
    345 
    346219.. code-block:: c
    347220
     
    355228    restore    ; restore %g0,%g0,%g0
    356229
    357 Figure 5 - Epilogue/prologue in procedures
     230*Figure 5 - Epilogue/prologue in procedures*
    358231
    359232The SAVE instruction decrements the CWP, as discussed earlier, and also
     
    366239(the first two parameters) are read from the old register window, and
    367240the destination operand (the rightmost parameter) is written to the new
    368 window. Thus, allthough "%sp" is indicated as both source and destination,
     241window. Thus, although "%sp" is indicated as both source and destination,
    369242the result is actually written into the stack pointer of the new window
    370243(the source stack pointer becomes renamed and is now the frame pointer).
    371244
    372 The return instructions are also a bit particular. ret is a synthetic
    373 instruction, corresponding to jmpl (jump linked). This instruction
     245The return instructions are also a bit particular. ``ret`` is a synthetic
     246instruction, corresponding to ``jmpl`` (jump linked). This instruction
    374247jumps to the address resulting from adding 8 to the %i7 register. The
    375 source instruction address (the address of the ret instruction itself)
     248source instruction address (the address of the ``ret`` instruction itself)
    376249is written to the %g0 register, i.e. it is discarded.
    377250
    378 The restore instruction is similarly a synthetic instruction, and is
    379 just a short form for a restore that choses not to perform an addition.
     251The ``restore`` instruction is similarly a synthetic instruction and is
     252just a short form for a restore that chooses not to perform an addition.
    380253
    381254The calling instruction, in turn, typically looks as follows:
    382 
    383 .. comment XXX FIX FORMATTING
    384255
    385256.. code-block:: c
     
    388259    mov 0, %o0
    389260
    390 Again, the call instruction is synthetic, and is actually the same
     261Again, the ``call`` instruction is synthetic, and is actually the same
    391262instruction that performs the return. This time, however, it is interested
    392263in saving the return address, into register %o7. Note that the delay
     
    398269Leaf procedures are different. A leaf procedure is an optimization that
    399270reduces unnecessary work by taking advantage of the knowledge that no
    400 call instructions exist in many procedures. Thus, the save/restore couple
    401 can be eliminated. The downside is that such a procedure may only use
    402 the out registers (since the in and local registers actually belong to
    403 the caller). See Figure 6.
     271``call`` instructions exist in many procedures. Thus, the
     272``save``/``restore`` couple can be eliminated. The downside is that such a
     273procedure may only use the ``out`` registers (since the ``in`` and ``local``
     274registers actually belong to the caller). See Figure 6.
    404275
    405276.. comment XXX FIX FORMATTING
     
    416287    nop        ; the delay slot can be used for something else
    417288
    418 Figure 6 - Epilogue/prologue in leaf procedures
     289*Figure 6 - Epilogue/prologue in leaf procedures*
    419290
    420291Note in the figure that there is only one instruction overhead, namely the
    421 retl instruction. retl is also synthetic (return from leaf subroutine), is
    422 again a variant of the jmpl instruction, this time with %o7+8 as target.
     292``retl`` instruction. ``retl`` is also synthetic (return from leaf subroutine),
     293is again a variant of the ``jmpl`` instruction, this time with %o7+8 as target.
    423294
    424295Yet another variation of epilogue is caused by tail call elimination,
     
    427298to the calling function, it can replace its place on the stack with the
    428299called function. Figure 7 contains an example.
    429 
    430 .. comment XXX FIX FORMATTING
    431300
    432301.. code-block:: c
     
    449318        or      %g0,%g1,%o7
    450319
    451 Figure 7 - Example of tail call elimination
    452 
    453 Note that the call instruction overwrites register %o7 with the program
    454 counter. Therefore the above code saves the old value of %o7, and restores
    455 it in the delay slot of the call instruction. If the function call is
    456 register indirect, this twiddling with %o7 can be avoided, but of course
     320*Figure 7 - Example of tail call elimination*
     321
     322Note that the call instruction overwrites register ``%o7`` with the program
     323counter. Therefore the above code saves the old value of ``%o7``, and restores
     324it in the delay slot of the call instruction. If the function ``call`` is
     325register indirect, this twiddling with ``%o7`` can be avoided, but of course
    457326that form of call is slower on modern processors.
    458327
    459328The benefit of tail call elimination is to remove an indirection upon
    460329return. It is also needed to reduce register window usage, since otherwise
    461 the foo() function in Figure 7 would need to allocate a stack frame to
     330the ``foo()`` function in Figure 7 would need to allocate a stack frame to
    462331save the program counter.
    463332
     
    465334which detects functions calling themselves, and replaces it with a simple
    466335branch. Figure 8 contains an example.
    467 
    468 .. comment XXX FIX FORMATTING
    469336
    470337.. code-block:: c
     
    488355        or      %g0,1,%o0
    489356
    490 Figure 8 - Example of tail recursion elimination
     357*Figure 8 - Example of tail recursion elimination*
    491358
    492359Needless to say, these optimizations produce code that is difficult
     
    505372situations during execution. For example, GCC/GDB makes sure original
    506373parameter values are kept intact somewhere for future parsing of
    507 the procedure call stack. The live in registers other than %i0 are
    508 not touched. %i0 itself is copied into a free local register, and its
     374the procedure call stack. The live ``in`` registers other than %i0 are
     375not touched. %i0 itself is copied into a free ``local`` register, and its
    509376location is noted in the symbol file. (You can find out where variables
    510377reside by using the "info address" command in GDB.)
     
    512379Given that much of the semantics relating to stack handling and procedure
    513380call entry/exit code is only recommended, debuggers will sometimes
    514 be fooled. For example, the decision as to wether or not the current
     381be fooled. For example, the decision as to whether or not the current
    515382procedure is a leaf one or not can be incorrect. In this case a spurious
    516383procedure will be inserted between the current procedure and it's "real"
     
    539406instructions. It's pretty tricky to understand the code, but figure 1
    540407should be of help.
    541 
    542 .. comment XXX FIX FORMATTING
    543408
    544409.. code-block:: c
     
    581446        rett %l2
    582447
    583 Figure 9 - window_underflow trap handler
     448*Figure 9 - window_underflow trap handler*
    584449
    585450
     
    627492        rett %l2
    628493
    629 Figure 10 - window_underflow trap handler
     494*Figure 10 - window_underflow trap handler*
     495
Note: See TracChangeset for help on using the changeset viewer.