source: rtems-docs/cpu-supplement/sparc_v8_stacks_regwin.rst

Last change on this file was e52906b, checked in by Sebastian Huber <sebastian.huber@…>, on 01/09/19 at 15:14:06

Simplify SPDX-License-Identifier comment

  • Property mode set to 100644
File size: 19.7 KB
Line 
1.. SPDX-License-Identifier: CC-BY-SA-4.0
2
3.. comment Permission granted by the original author (Peter S. Magnusson) to
4.. comment convert this page to Rest and include in the RTEMS Documentation.
5.. comment This content is no longer online and only accessible at
6.. comment https://web.archive.org/web/20120205014832/https://www.sics.se/~psm/sparcstack.html
7
8Stacks and Register Windows
9===========================
10
11.. sidebar:: *Credit*
12
13   The contents of this section were originally written by Peter S. Magnusson
14   and available through a website which is no longer online. Peter graciously
15   granted permission for this information to be included in the RTEMS
16   Documentation.
17
18The SPARC architecture from Sun Microsystems has some "interesting"
19characteristics. After having to deal with both compiler, interpreter, OS
20emulator, and OS porting issues for the SPARC, I decided to gather notes
21and documentation in one place. If there are any issues you don't find
22addressed by this page, or if you know of any similar Net resources, let
23me know. This document is limited to the V8 version of the architecture.
24
25General Structure
26-----------------
27SPARC has 32 general purpose integer registers visible to the program
28at any given time. Of these, 8 registers are ``global`` registers and 24
29registers are in a register window. A window consists of three groups
30of 8 registers, the ``out``, ``local``, and ``in`` registers. See table 1. A
31SPARC implementation can have from 2 to 32 windows, thus varying the number
32of registers from 40 to 520. Most implementations have 7 or 8 windows. The
33variable number of registers is the principal reason for the SPARC being
34"scalable".
35
36At any given time, only one window is visible, as determined by the
37current window pointer (CWP) which is part of the processor status
38register (PSR). This is a five bit value that can be decremented or
39incremented by the ``save`` and ``restore`` instructions, respectively. These
40instructions are generally executed on procedure call and return
41(respectively). The idea is that the ``in`` registers contain incoming
42parameters, the ``local`` register constitutes scratch registers, the ``out``
43registers contain outgoing parameters, and the ``global`` registers contain
44values that vary little between executions. The register windows overlap
45partially, thus the ``out`` registers become renamed by ``save`` to become the
46``in`` registers of the called procedure. Thus, the memory traffic is reduced
47when going up and down the procedure call. Since this is a frequent
48operation, performance is improved.
49
50(That was the idea, anyway. The drawback is that upon interactions
51with the system the registers need to be flushed to the stack,
52necessitating a long sequence of writes to memory of data that is
53often mostly garbage. Register windows was a bad idea that was caused
54by simulation studies that considered only programs in isolation, as
55opposed to multitasking workloads, and by considering compilers with
56poor optimization. It also caused considerable problems in implementing
57high-end SPARC processors such as the SuperSPARC, although more recent
58implementations have dealt effectively with the obstacles. Register
59windows are now part of the compatibility legacy and not easily removed
60from the architecture.)
61
62.. table:: Table 1 - Visible Registers
63
64    +----------------+-------------------+------------------------+
65    |   Register     |      Mnemonic     |        Register        |
66    |   Group        |                   |        Address         |
67    +================+===================+========================+
68    +   ``global``   +  ``%g0``-``%g7``  +  ``r[0]`` - ``r[7]``   +
69    +----------------+-------------------+------------------------+
70    +    ``out``     +  ``%o0``-``%o7``  +  ``r[8]`` - ``r[15]``  +
71    +----------------+-------------------+------------------------+
72    +   ``local``    +  ``%l0``-``%l7``  +  ``r[16]`` - ``r[23]`` +
73    +----------------+-------------------+------------------------+
74    +    ``in``      +  ``%i0``-``%i7``  +  ``r[24]`` - ``r[31]`` +
75    +----------------+-------------------+------------------------+
76
77
78The overlap of the registers is illustrated in figure 1. The figure
79shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to
80w7 in the figure). Each window corresponds to 24 registers, 16 of which
81are shared with "neighboring" windows. The windows are arranged in a
82wrap-around manner, thus window number 0 borders window number 7. The
83common cause of changing the current window, as pointed to by CWP, is
84the ``restore`` and ``save`` instructions, shown in the middle. Less common is
85the supervisor ``rett`` instruction (return from trap) and the trap event
86(interrupt, exception, or ``trap`` instruction).
87
88.. figure:: ../images/cpu_supplement/sparcwin.png
89
90    Figure 1 - Windowed Registers
91
92The "WIM" register is also indicated in the top left of Figure 1. The
93window invalid mask is a bit map of valid windows. It is generally used
94as a pointer, i.e. exactly one bit is set in the WIM register indicating
95which window is invalid (in the figure it's window 7). Register windows
96are generally used to support procedure calls, so they can be viewed
97as a cache of the stack contents. The WIM "pointer" indicates how
98many procedure calls in a row can be taken without writing out data to
99memory. In the figure, the capacity of the register windows is fully
100utilized. An additional call will thus exceed capacity, triggering a
101window overflow trap. At the other end, a window underflow trap occurs
102when the register window "cache" if empty and more data needs to be
103fetched from memory.
104
105Register Semantics
106------------------
107
108The SPARC Architecture includes recommended software semantics. These are
109described in the architecture manual, the SPARC ABI (application binary
110interface) standard, and, unfortunately, in various other locations as
111well (including header files and compiler documentation).
112
113Figure 2 shows a summary of register contents at any given time.
114
115.. code-block:: c
116
117                 %g0  (r00)       always zero
118                 %g1  (r01)  [1]  temporary value
119                 %g2  (r02)  [2]  global 2
120     global      %g3  (r03)  [2]  global 3
121                 %g4  (r04)  [2]  global 4
122                 %g5  (r05)       reserved for SPARC ABI
123                 %g6  (r06)       reserved for SPARC ABI
124                 %g7  (r07)       reserved for SPARC ABI
125
126                 %o0  (r08)  [3]  outgoing parameter 0 / return value from callee
127                 %o1  (r09)  [1]  outgoing parameter 1
128                 %o2  (r10)  [1]  outgoing parameter 2
129     out         %o3  (r11)  [1]  outgoing parameter 3
130                 %o4  (r12)  [1]  outgoing parameter 4
131                 %o5  (r13)  [1]  outgoing parameter 5
132            %sp, %o6  (r14)  [1]  stack pointer
133                 %o7  (r15)  [1]  temporary value / address of CALL instruction
134
135                 %l0  (r16)  [3]  local 0
136                 %l1  (r17)  [3]  local 1
137                 %l2  (r18)  [3]  local 2
138     local       %l3  (r19)  [3]  local 3
139                 %l4  (r20)  [3]  local 4
140                 %l5  (r21)  [3]  local 5
141                 %l6  (r22)  [3]  local 6
142                 %l7  (r23)  [3]  local 7
143
144                 %i0  (r24)  [3]  incoming parameter 0 / return value to caller
145                 %i1  (r25)  [3]  incoming parameter 1
146                 %i2  (r26)  [3]  incoming parameter 2
147     in          %i3  (r27)  [3]  incoming parameter 3
148                 %i4  (r28)  [3]  incoming parameter 4
149                 %i5  (r29)  [3]  incoming parameter 5
150            %fp, %i6  (r30)  [3]  frame pointer
151                 %i7  (r31)  [3]  return address - 8
152
153.. topic:: Items
154
155    [1] assumed by caller to be destroyed (volatile) across a procedure call
156
157    [2] should not be used by SPARC ABI library code
158
159    [3] assumed by caller to be preserved across a procedure call
160
161*Figure 2 - SPARC register semantics*
162
163Particular compilers are likely to vary slightly.
164
165Note that globals ``%g2``-``%g4`` are reserved for the "application", which
166includes libraries and compiler. Thus, for example, libraries may
167overwrite these registers unless they've been compiled with suitable
168flags. Also, the "reserved" registers are presumed to be allocated
169(in the future) bottom-up, i.e. ``%g7`` is currently the "safest" to use.
170
171Optimizing linkers and interpreters are examples that use global registers.
172
173Register Windows and the Stack
174------------------------------
175
176The SPARC register windows are, naturally, intimately related to the
177stack. In particular, the stack pointer (``%sp`` or ``%o6``) must always point
178to a free block of 64 bytes. This area is used by the operating system
179(Solaris, SunOS, and Linux at least) to save the current ``local`` and
180``in`` registers upon a system interrupt, exception, or ``trap`` instruction.
181(Note that this can occur at any time.)
182
183Other aspects of register relations with memory are programming
184convention. The typical and recommended layout of the stack is shown
185in figure 3. The figure shows a stack frame.
186
187.. figure:: ../images/cpu_supplement/stack_frame_contents.png
188
189    Figure 3 - Stack frame contents
190
191Note that the top boxes of figure 3 are addressed via the stack pointer
192(``%sp``), as positive offsets (including zero), and the bottom boxes are
193accessed over the frame pointer using negative offsets (excluding zero),
194and that the frame pointer is the old stack pointer. This scheme allows
195the separation of information known at compile time (number and size
196of local parameters, etc) from run-time information (size of blocks
197allocated by ``alloca()``).
198
199"addressable scalar automatics" is a fancy name for local variables.
200
201The clever nature of the stack and frame pointers is that they are always
20216 registers apart in the register windows. Thus, a ``save`` instruction will
203make the current stack pointer into the frame pointer and, since the ``save``
204instruction also doubles as an ``add``, create a new stack pointer. Figure 4
205illustrates what the top of a stack might look like during execution. (The
206listing is from the ``pwin`` command in the SimICS simulator.)
207
208.. figure:: ../images/cpu_supplement/sample_stack_contents.png
209
210    Figure 4 - Sample stack contents
211
212Note how the stack contents are not necessarily synchronized with the
213registers. Various events can cause the register windows to be "flushed"
214to memory, including most system calls. A programmer can force this
215update by using ``ST_FLUSH_WINDOWS`` trap, which also reduces the number of
216valid windows to the minimum of 1.
217
218Writing a library for multithreaded execution is an example that requires
219explicit flushing, as is ``longjmp()``.
220
221Procedure epilogue and prologue
222-------------------------------
223
224The stack frame described in the previous section leads to the standard
225entry/exit mechanisms listed in figure 5.
226
227.. code-block:: c
228
229  function:
230    save  %sp, -C, %sp
231
232               ; perform function, leave return value,
233               ; if any, in register %i0 upon exit
234
235    ret        ; jmpl %i7+8, %g0
236    restore    ; restore %g0,%g0,%g0
237
238*Figure 5 - Epilogue/prologue in procedures*
239
240The ``save`` instruction decrements the CWP, as discussed earlier, and also
241performs an addition. The constant ``C`` that is used in the figure to
242indicate the amount of space to make on the stack, and thus corresponds
243to the frame contents in Figure 3. The minimum is therefore the 16 words
244for the ``local`` and ``in`` registers, i.e. (hex) 0x40 bytes.
245
246A confusing element of the ``save`` instruction is that the source operands
247(the first two parameters) are read from the old register window, and
248the destination operand (the rightmost parameter) is written to the new
249window. Thus, although ``%sp`` is indicated as both source and destination,
250the result is actually written into the stack pointer of the new window
251(the source stack pointer becomes renamed and is now the frame pointer).
252
253The return instructions are also a bit particular. ``ret`` is a synthetic
254instruction, corresponding to ``jmpl`` (jump linked). This instruction
255jumps to the address resulting from adding 8 to the ``%i7`` register. The
256source instruction address (the address of the ``ret`` instruction itself)
257is written to the ``%g0`` register, i.e. it is discarded.
258
259The ``restore`` instruction is similarly a synthetic instruction and is
260just a short form for a restore that chooses not to perform an addition.
261
262The calling instruction, in turn, typically looks as follows:
263
264.. code-block:: c
265
266    call <function>    ; jmpl <address>, %o7
267    mov 0, %o0
268
269Again, the ``call`` instruction is synthetic, and is actually the same
270instruction that performs the return. This time, however, it is interested
271in saving the return address, into register ``%o7``. Note that the delay
272slot is often filled with an instruction related to the parameters,
273in this example it sets the first parameter to zero.
274
275Note also that the return value is also generally passed in ``%o0``.
276
277Leaf procedures are different. A leaf procedure is an optimization that
278reduces unnecessary work by taking advantage of the knowledge that no
279``call`` instructions exist in many procedures. Thus, the
280``save``/``restore`` couple can be eliminated. The downside is that such a
281procedure may only use the ``out`` registers (since the ``in`` and ``local``
282registers actually belong to the caller). See Figure 6.
283
284.. code-block:: c
285
286  function:
287               ; no save instruction needed upon entry
288
289               ; perform function, leave return value,
290               ; if any, in register %o0 upon exit
291
292    retl       ; jmpl %o7+8, %g0
293    nop        ; the delay slot can be used for something else
294
295*Figure 6 - Epilogue/prologue in leaf procedures*
296
297Note in the figure that there is only one instruction overhead, namely the
298``retl`` instruction. ``retl`` is also synthetic (return from leaf subroutine),
299is again a variant of the ``jmpl`` instruction, this time with ``%o7+8``
300as target.
301
302Yet another variation of epilogue is caused by tail call elimination,
303an optimization supported by some compilers (including Sun's C compiler
304but not GCC). If the compiler detects that a called function will return
305to the calling function, it can replace its place on the stack with the
306called function. Figure 7 contains an example.
307
308.. code-block:: c
309
310      int
311        foo(int n)
312      {
313        if (n == 0)
314          return 0;
315        else
316          return bar(n);
317      }
318
319        cmp     %o0,0
320        bne     .L1
321        or      %g0,%o7,%g1
322        retl
323        or      %g0,0,%o0
324  .L1:  call    bar
325        or      %g0,%g1,%o7
326
327*Figure 7 - Example of tail call elimination*
328
329Note that the ``call`` instruction overwrites register ``%o7`` with the program
330counter. Therefore the above code saves the old value of ``%o7``, and restores
331it in the delay slot of the ``call`` instruction. If the function ``call`` is
332register indirect, this twiddling with ``%o7`` can be avoided, but of course
333that form of ``call`` is slower on modern processors.
334
335The benefit of tail call elimination is to remove an indirection upon
336return. It is also needed to reduce register window usage, since otherwise
337the ``foo()`` function in Figure 7 would need to allocate a stack frame to
338save the program counter.
339
340A special form of tail call elimination is tail recursion elimination,
341which detects functions calling themselves, and replaces it with a simple
342branch. Figure 8 contains an example.
343
344.. code-block:: c
345
346        int
347          foo(int n)
348        {
349          if (n == 0)
350            return 1;
351          else
352            return (foo(n - 1));
353        }
354
355        cmp     %o0,0
356        be      .L1
357        or      %g0,%o0,%g1
358        subcc   %g1,1,%g1
359  .L2:  bne     .L2
360        subcc   %g1,1,%g1
361  .L1:  retl
362        or      %g0,1,%o0
363
364*Figure 8 - Example of tail recursion elimination*
365
366Needless to say, these optimizations produce code that is difficult
367to debug.
368
369Procedures, stacks, and debuggers
370---------------------------------
371
372When debugging an application, your debugger will be parsing the binary
373and consulting the symbol table to determine procedure entry points. It
374will also travel the stack frames "upward" to determine the current
375call chain.
376
377When compiling for debugging, compilers will generate additional code
378as well as avoid some optimizations in order to allow reconstructing
379situations during execution. For example, GCC/GDB makes sure original
380parameter values are kept intact somewhere for future parsing of
381the procedure call stack. The live ``in`` registers other than ``%i0`` are
382not touched. ``%i0`` itself is copied into a free ``local`` register, and its
383location is noted in the symbol file. (You can find out where variables
384reside by using the ``info address`` command in GDB.)
385
386Given that much of the semantics relating to stack handling and procedure
387call entry/exit code is only recommended, debuggers will sometimes
388be fooled. For example, the decision as to whether or not the current
389procedure is a leaf one or not can be incorrect. In this case a spurious
390procedure will be inserted between the current procedure and it's "real"
391parent. Another example is when the application maintains its own implicit
392call hierarchy, such as jumping to function pointers. In this case the
393debugger can easily become totally confused.
394
395The window overflow and underflow traps
396---------------------------------------
397
398When the ``save`` instruction decrements the current window pointer (CWP)
399so that it coincides with the invalid window in the window invalid mask
400(WIM), a window overflow trap occurs. Conversely, when the ``restore`` or
401``rett`` instructions increment the CWP to coincide with the invalid window,
402a window underflow trap occurs.
403
404Either trap is handled by the operating system. Generally, data is
405written out to memory and/or read from memory, and the WIM register
406suitably altered.
407
408The code in Figure 9 and Figure 10 below are bare-bones handlers for
409the two traps. The text is directly from the source code, and sort of
410works. (As far as I know, these are minimalistic handlers for SPARC
411V8). Note that there is no way to directly access window registers
412other than the current one, hence the code does additional ``save``/``restore``
413instructions. It's pretty tricky to understand the code, but figure 1
414should be of help.
415
416.. code-block:: c
417
418        /* a SAVE instruction caused a trap */
419  window_overflow:
420        /* rotate WIM on bit right, we have 8 windows */
421        mov %wim,%l3
422        sll %l3,7,%l4
423        srl %l3,1,%l3
424        or  %l3,%l4,%l3
425        and %l3,0xff,%l3
426
427        /* disable WIM traps */
428        mov %g0,%wim
429        nop; nop; nop
430
431        /* point to correct window */
432        save
433
434        /* dump registers to stack */
435        std %l0, [%sp +  0]
436        std %l2, [%sp +  8]
437        std %l4, [%sp + 16]
438        std %l6, [%sp + 24]
439        std %i0, [%sp + 32]
440        std %i2, [%sp + 40]
441        std %i4, [%sp + 48]
442        std %i6, [%sp + 56]
443
444        /* back to where we should be */
445        restore
446
447        /* set new value of window */
448        mov %l3,%wim
449        nop; nop; nop
450
451        /* go home */
452        jmp %l1
453        rett %l2
454
455*Figure 9 - window_underflow trap handler*
456
457
458.. code-block:: c
459
460
461        /* a RESTORE instruction caused a trap */
462  window_underflow:
463
464        /* rotate WIM on bit LEFT, we have 8 windows */
465        mov %wim,%l3
466        srl %l3,7,%l4
467        sll %l3,1,%l3
468        or  %l3,%l4,%l3
469        and %l3,0xff,%l3
470
471        /* disable WIM traps */
472        mov %g0,%wim
473        nop; nop; nop
474
475        /* point to correct window */
476        restore
477        restore
478
479        /* dump registers to stack */
480        ldd [%sp +  0], %l0
481        ldd [%sp +  8], %l2
482        ldd [%sp + 16], %l4
483        ldd [%sp + 24], %l6
484        ldd [%sp + 32], %i0
485        ldd [%sp + 40], %i2
486        ldd [%sp + 48], %i4
487        ldd [%sp + 56], %i6
488
489        /* back to where we should be */
490        save
491        save
492
493        /* set new value of window */
494        mov %l3,%wim
495        nop; nop; nop
496
497        /* go home */
498        jmp %l1
499        rett %l2
500
501*Figure 10 - window_underflow trap handler*
502
Note: See TracBrowser for help on using the repository browser.