source: rtems-docs/cpu-supplement/sparc_v8_stacks_regwin.rst @ ae05a27

5
Last change on this file since ae05a27 was ae05a27, checked in by Marçal Comajoan Cara <mcomajoancara@…>, on 11/18/18 at 10:11:04

Improve SPARC Calling Overview Webpage conversion

Fixed tables, typos, redrawn images and converted ASCII art to ditaa
and PNG, and improved the overall format.

This work was part of GCI 2018.

Closes #3567.

  • Property mode set to 100644
File size: 19.2 KB
Line 
1.. comment SPDX-License-Identifier: CC-BY-SA-4.0
2
3.. comment Permission granted by the original author (Peter Magnusson) to
4.. comment convert this page to Rest and include in the RTEMS Documentation.
5.. comment This content is no longer online and only accessible at
6.. comment https://web.archive.org/web/20120205014832/https://www.sics.se/~psm/sparcstack.html
7
8Understanding stacks and registers in the SPARC architecture(s)
9===============================================================
10The SPARC architecture from Sun Microsystems has some "interesting"
11characteristics. After having to deal with both compiler, interpreter, OS
12emulator, and OS porting issues for the SPARC, I decided to gather notes
13and documentation in one place. If there are any issues you don't find
14addressed by this page, or if you know of any similar Net resources, let
15me know. This document is limited to the V8 version of the architecture.
16
17General Structure
18-----------------
19SPARC has 32 general purpose integer registers visible to the program
20at any given time. Of these, 8 registers are ``global`` registers and 24
21registers are in a register window. A window consists of three groups
22of 8 registers, the ``out``, ``local``, and ``in`` registers. See table 1. A
23SPARC implementation can have from 2 to 32 windows, thus varying the number
24of registers from 40 to 520. Most implementations have 7 or 8 windows. The
25variable number of registers is the principal reason for the SPARC being
26"scalable".
27
28At any given time, only one window is visible, as determined by the
29current window pointer (CWP) which is part of the processor status
30register (PSR). This is a five bit value that can be decremented or
31incremented by the SAVE and RESTORE instructions, respectively. These
32instructions are generally executed on procedure call and return
33(respectively). The idea is that the ``in`` registers contain incoming
34parameters, the ``local`` register constitutes scratch registers, the ``out``
35registers contain outgoing parameters, and the ``global`` registers contain
36values that vary little between executions. The register windows overlap
37partially, thus the ``out`` registers become renamed by SAVE to become the
38``in`` registers of the called procedure. Thus, the memory traffic is reduced
39when going up and down the procedure call. Since this is a frequent
40operation, performance is improved.
41
42(That was the idea, anyway. The drawback is that upon interactions
43with the system the registers need to be flushed to the stack,
44necessitating a long sequence of writes to memory of data that is
45often mostly garbage. Register windows was a bad idea that was caused
46by simulation studies that considered only programs in isolation, as
47opposed to multitasking workloads, and by considering compilers with
48poor optimization. It also caused considerable problems in implementing
49high-end SPARC processors such as the SuperSPARC, although more recent
50implementations have dealt effectively with the obstacles. Register
51windows are now part of the compatibility legacy and not easily removed
52from the architecture.)
53
54.. table:: Table 1 - Visible Registers
55
56    +----------------+------------+---------------+
57    |   Register     |  Mnemonic  |   Register    |
58    |   Group        |            |   Address     |
59    +================+============+===============+
60    +   ``global``   +  %g0-%g7   + r[0] - r[7]   +
61    +----------------+------------+---------------+
62    +    ``out``     +  %o0-%o7   + r[8] - r[15]  +
63    +----------------+------------+---------------+
64    +   ``local``    +  %l0-%l7   + r[16] - r[23] +
65    +----------------+------------+---------------+
66    +    ``in``      +  %i0-%i7   + r[24] - r[31] +
67    +----------------+------------+---------------+
68
69
70The overlap of the registers is illustrated in figure 1. The figure
71shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to
72w7 in the figure). Each window corresponds to 24 registers, 16 of which
73are shared with "neighboring" windows. The windows are arranged in a
74wrap-around manner, thus window number 0 borders window number 7. The
75common cause of changing the current window, as pointed to by CWP, is
76the RESTORE and SAVE instructions, shown in the middle. Less common is
77the supervisor RETT instruction (return from trap) and the trap event
78(interrupt, exception, or TRAP instruction).
79
80.. figure:: ../images/cpu_supplement/sparcwin.png
81
82    Figure 1 - Windowed Registers
83
84The "WIM" register is also indicated in the top left of Figure 1. The
85window invalid mask is a bit map of valid windows. It is generally used
86as a pointer, i.e. exactly one bit is set in the WIM register indicating
87which window is invalid (in the figure it's window 7). Register windows
88are generally used to support procedure calls, so they can be viewed
89as a cache of the stack contents. The WIM "pointer" indicates how
90many procedure calls in a row can be taken without writing out data to
91memory. In the figure, the capacity of the register windows is fully
92utilized. An additional call will thus exceed capacity, triggering a
93window overflow trap. At the other end, a window underflow trap occurs
94when the register window "cache" if empty and more data needs to be
95fetched from memory.
96
97Register Semantics
98------------------
99
100The SPARC Architecture includes recommended software semantics. These are
101described in the architecture manual, the SPARC ABI (application binary
102interface) standard, and, unfortunately, in various other locations as
103well (including header files and compiler documentation).
104
105Figure 2 shows a summary of register contents at any given time.
106
107.. code-block:: c
108
109                 %g0  (r00)       always zero
110                 %g1  (r01)  [1]  temporary value
111                 %g2  (r02)  [2]  global 2
112     global      %g3  (r03)  [2]  global 3
113                 %g4  (r04)  [2]  global 4
114                 %g5  (r05)       reserved for SPARC ABI
115                 %g6  (r06)       reserved for SPARC ABI
116                 %g7  (r07)       reserved for SPARC ABI
117
118                 %o0  (r08)  [3]  outgoing parameter 0 / return value from callee
119                 %o1  (r09)  [1]  outgoing parameter 1
120                 %o2  (r10)  [1]  outgoing parameter 2
121     out         %o3  (r11)  [1]  outgoing parameter 3
122                 %o4  (r12)  [1]  outgoing parameter 4
123                 %o5  (r13)  [1]  outgoing parameter 5
124            %sp, %o6  (r14)  [1]  stack pointer
125                 %o7  (r15)  [1]  temporary value / address of CALL instruction
126
127                 %l0  (r16)  [3]  local 0
128                 %l1  (r17)  [3]  local 1
129                 %l2  (r18)  [3]  local 2
130     local       %l3  (r19)  [3]  local 3
131                 %l4  (r20)  [3]  local 4
132                 %l5  (r21)  [3]  local 5
133                 %l6  (r22)  [3]  local 6
134                 %l7  (r23)  [3]  local 7
135
136                 %i0  (r24)  [3]  incoming parameter 0 / return value to caller
137                 %i1  (r25)  [3]  incoming parameter 1
138                 %i2  (r26)  [3]  incoming parameter 2
139     in          %i3  (r27)  [3]  incoming parameter 3
140                 %i4  (r28)  [3]  incoming parameter 4
141                 %i5  (r29)  [3]  incoming parameter 5
142            %fp, %i6  (r30)  [3]  frame pointer
143                 %i7  (r31)  [3]  return address - 8
144
145.. topic:: Items
146
147    [1] assumed by caller to be destroyed (volatile) across a procedure call
148
149    [2] should not be used by SPARC ABI library code
150
151    [3] assumed by caller to be preserved across a procedure call
152
153*Figure 2 - SPARC register semantics*
154
155Particular compilers are likely to vary slightly.
156
157Note that globals %g2-%g4 are reserved for the "application", which
158includes libraries and compiler. Thus, for example, libraries may
159overwrite these registers unless they've been compiled with suitable
160flags. Also, the "reserved" registers are presumed to be allocated
161(in the future) bottom-up, i.e. %g7 is currently the "safest" to use.
162
163Optimizing linkers and interpreters are examples that use global registers.
164
165Register Windows and the Stack
166------------------------------
167
168The SPARC register windows are, naturally, intimately related to the
169stack. In particular, the stack pointer (%sp or %o6) must always point
170to a free block of 64 bytes. This area is used by the operating system
171(Solaris, SunOS, and Linux at least) to save the current ``local`` and
172``in`` registers upon a system interrupt, exception, or trap instruction.
173(Note that this can occur at any time.)
174
175Other aspects of register relations with memory are programming
176convention. The typical and recommended layout of the stack is shown
177in figure 3. The figure shows a stack frame.
178
179.. figure:: ../images/cpu_supplement/stack_frame_contents.png
180
181    Figure 3 - Stack frame contents
182
183Note that the top boxes of figure 3 are addressed via the stack pointer
184(%sp), as positive offsets (including zero), and the bottom boxes are
185accessed over the frame pointer using negative offsets (excluding zero),
186and that the frame pointer is the old stack pointer. This scheme allows
187the separation of information known at compile time (number and size
188of local parameters, etc) from run-time information (size of blocks
189allocated by ``alloca()``).
190
191"addressable scalar automatics" is a fancy name for local variables.
192
193The clever nature of the stack and frame pointers is that they are always
19416 registers apart in the register windows. Thus, a SAVE instruction will
195make the current stack pointer into the frame pointer and, since the SAVE
196instruction also doubles as an ADD, create a new stack pointer. Figure 4
197illustrates what the top of a stack might look like during execution. (The
198listing is from the "pwin" command in the SimICS simulator.)
199
200.. figure:: ../images/cpu_supplement/sample_stack_contents.png
201
202    Figure 4 - Sample stack contents
203
204Note how the stack contents are not necessarily synchronized with the
205registers. Various events can cause the register windows to be "flushed"
206to memory, including most system calls. A programmer can force this
207update by using ST_FLUSH_WINDOWS trap, which also reduces the number of
208valid windows to the minimum of 1.
209
210Writing a library for multithreaded execution is an example that requires
211explicit flushing, as is ``longjmp()``.
212
213Procedure epilogue and prologue
214-------------------------------
215
216The stack frame described in the previous section leads to the standard
217entry/exit mechanisms listed in figure 5.
218
219.. code-block:: c
220
221  function:
222    save  %sp, -C, %sp
223
224               ; perform function, leave return value,
225               ; if any, in register %i0 upon exit
226
227    ret        ; jmpl %i7+8, %g0
228    restore    ; restore %g0,%g0,%g0
229
230*Figure 5 - Epilogue/prologue in procedures*
231
232The SAVE instruction decrements the CWP, as discussed earlier, and also
233performs an addition. The constant "C" that is used in the figure to
234indicate the amount of space to make on the stack, and thus corresponds
235to the frame contents in Figure 3. The minimum is therefore the 16 words
236for the LOCAL and IN registers, i.e. (hex) 0x40 bytes.
237
238A confusing element of the SAVE instruction is that the source operands
239(the first two parameters) are read from the old register window, and
240the destination operand (the rightmost parameter) is written to the new
241window. Thus, although "%sp" is indicated as both source and destination,
242the result is actually written into the stack pointer of the new window
243(the source stack pointer becomes renamed and is now the frame pointer).
244
245The return instructions are also a bit particular. ``ret`` is a synthetic
246instruction, corresponding to ``jmpl`` (jump linked). This instruction
247jumps to the address resulting from adding 8 to the %i7 register. The
248source instruction address (the address of the ``ret`` instruction itself)
249is written to the %g0 register, i.e. it is discarded.
250
251The ``restore`` instruction is similarly a synthetic instruction and is
252just a short form for a restore that chooses not to perform an addition.
253
254The calling instruction, in turn, typically looks as follows:
255
256.. code-block:: c
257
258    call <function>    ; jmpl <address>, %o7
259    mov 0, %o0
260
261Again, the ``call`` instruction is synthetic, and is actually the same
262instruction that performs the return. This time, however, it is interested
263in saving the return address, into register %o7. Note that the delay
264slot is often filled with an instruction related to the parameters,
265in this example it sets the first parameter to zero.
266
267Note also that the return value is also generally passed in %o0.
268
269Leaf procedures are different. A leaf procedure is an optimization that
270reduces unnecessary work by taking advantage of the knowledge that no
271``call`` instructions exist in many procedures. Thus, the
272``save``/``restore`` couple can be eliminated. The downside is that such a
273procedure may only use the ``out`` registers (since the ``in`` and ``local``
274registers actually belong to the caller). See Figure 6.
275
276.. comment XXX FIX FORMATTING
277
278.. code-block:: c
279
280  function:
281               ; no save instruction needed upon entry
282
283               ; perform function, leave return value,
284               ; if any, in register %o0 upon exit
285
286    retl       ; jmpl %o7+8, %g0
287    nop        ; the delay slot can be used for something else
288
289*Figure 6 - Epilogue/prologue in leaf procedures*
290
291Note in the figure that there is only one instruction overhead, namely the
292``retl`` instruction. ``retl`` is also synthetic (return from leaf subroutine),
293is again a variant of the ``jmpl`` instruction, this time with %o7+8 as target.
294
295Yet another variation of epilogue is caused by tail call elimination,
296an optimization supported by some compilers (including Sun's C compiler
297but not GCC). If the compiler detects that a called function will return
298to the calling function, it can replace its place on the stack with the
299called function. Figure 7 contains an example.
300
301.. code-block:: c
302
303      int
304        foo(int n)
305      {
306        if (n == 0)
307          return 0;
308        else
309          return bar(n);
310      }
311
312        cmp     %o0,0
313        bne     .L1
314        or      %g0,%o7,%g1
315        retl
316        or      %g0,0,%o0
317  .L1:  call    bar
318        or      %g0,%g1,%o7
319
320*Figure 7 - Example of tail call elimination*
321
322Note that the call instruction overwrites register ``%o7`` with the program
323counter. Therefore the above code saves the old value of ``%o7``, and restores
324it in the delay slot of the call instruction. If the function ``call`` is
325register indirect, this twiddling with ``%o7`` can be avoided, but of course
326that form of call is slower on modern processors.
327
328The benefit of tail call elimination is to remove an indirection upon
329return. It is also needed to reduce register window usage, since otherwise
330the ``foo()`` function in Figure 7 would need to allocate a stack frame to
331save the program counter.
332
333A special form of tail call elimination is tail recursion elimination,
334which detects functions calling themselves, and replaces it with a simple
335branch. Figure 8 contains an example.
336
337.. code-block:: c
338
339        int
340          foo(int n)
341        {
342          if (n == 0)
343            return 1;
344          else
345            return (foo(n - 1));
346        }
347
348        cmp     %o0,0
349        be      .L1
350        or      %g0,%o0,%g1
351        subcc   %g1,1,%g1
352  .L2:  bne     .L2
353        subcc   %g1,1,%g1
354  .L1:  retl
355        or      %g0,1,%o0
356
357*Figure 8 - Example of tail recursion elimination*
358
359Needless to say, these optimizations produce code that is difficult
360to debug.
361
362Procedures, stacks, and debuggers
363---------------------------------
364
365When debugging an application, your debugger will be parsing the binary
366and consulting the symbol table to determine procedure entry points. It
367will also travel the stack frames "upward" to determine the current
368call chain.
369
370When compiling for debugging, compilers will generate additional code
371as well as avoid some optimizations in order to allow reconstructing
372situations during execution. For example, GCC/GDB makes sure original
373parameter values are kept intact somewhere for future parsing of
374the procedure call stack. The live ``in`` registers other than %i0 are
375not touched. %i0 itself is copied into a free ``local`` register, and its
376location is noted in the symbol file. (You can find out where variables
377reside by using the "info address" command in GDB.)
378
379Given that much of the semantics relating to stack handling and procedure
380call entry/exit code is only recommended, debuggers will sometimes
381be fooled. For example, the decision as to whether or not the current
382procedure is a leaf one or not can be incorrect. In this case a spurious
383procedure will be inserted between the current procedure and it's "real"
384parent. Another example is when the application maintains its own implicit
385call hierarchy, such as jumping to function pointers. In this case the
386debugger can easily become totally confused.
387
388The window overflow and underflow traps
389---------------------------------------
390
391When the SAVE instruction decrements the current window pointer (CWP)
392so that it coincides with the invalid window in the window invalid mask
393(WIM), a window overflow trap occurs. Conversely, when the RESTORE or
394RETT instructions increment the CWP to coincide with the invalid window,
395a window underflow trap occurs.
396
397Either trap is handled by the operating system. Generally, data is
398written out to memory and/or read from memory, and the WIM register
399suitably altered.
400
401The code in Figure 9 and Figure 10 below are bare-bones handlers for
402the two traps. The text is directly from the source code, and sort of
403works. (As far as I know, these are minimalistic handlers for SPARC
404V8). Note that there is no way to directly access window registers
405other than the current one, hence the code does additional save/restore
406instructions. It's pretty tricky to understand the code, but figure 1
407should be of help.
408
409.. code-block:: c
410
411        /* a SAVE instruction caused a trap */
412  window_overflow:
413        /* rotate WIM on bit right, we have 8 windows */
414        mov %wim,%l3
415        sll %l3,7,%l4
416        srl %l3,1,%l3
417        or  %l3,%l4,%l3
418        and %l3,0xff,%l3
419
420        /* disable WIM traps */
421        mov %g0,%wim
422        nop; nop; nop
423
424        /* point to correct window */
425        save
426
427        /* dump registers to stack */
428        std %l0, [%sp +  0]
429        std %l2, [%sp +  8]
430        std %l4, [%sp + 16]
431        std %l6, [%sp + 24]
432        std %i0, [%sp + 32]
433        std %i2, [%sp + 40]
434        std %i4, [%sp + 48]
435        std %i6, [%sp + 56]
436
437        /* back to where we should be */
438        restore
439
440        /* set new value of window */
441        mov %l3,%wim
442        nop; nop; nop
443
444        /* go home */
445        jmp %l1
446        rett %l2
447
448*Figure 9 - window_underflow trap handler*
449
450
451.. code-block:: c
452
453
454        /* a RESTORE instruction caused a trap */
455  window_underflow:
456
457        /* rotate WIM on bit LEFT, we have 8 windows */
458        mov %wim,%l3
459        srl %l3,7,%l4
460        sll %l3,1,%l3
461        or  %l3,%l4,%l3
462        and %l3,0xff,%l3
463
464        /* disable WIM traps */
465        mov %g0,%wim
466        nop; nop; nop
467
468        /* point to correct window */
469        restore
470        restore
471
472        /* dump registers to stack */
473        ldd [%sp +  0], %l0
474        ldd [%sp +  8], %l2
475        ldd [%sp + 16], %l4
476        ldd [%sp + 24], %l6
477        ldd [%sp + 32], %i0
478        ldd [%sp + 40], %i2
479        ldd [%sp + 48], %i4
480        ldd [%sp + 56], %i6
481
482        /* back to where we should be */
483        save
484        save
485
486        /* set new value of window */
487        mov %l3,%wim
488        nop; nop; nop
489
490        /* go home */
491        jmp %l1
492        rett %l2
493
494*Figure 10 - window_underflow trap handler*
495
Note: See TracBrowser for help on using the repository browser.