source: rtems-docs/cpu-supplement/sparc_v8_stacks_regwin.rst @ f067ba3

5
Last change on this file since f067ba3 was f067ba3, checked in by Joel Sherrill <joel@…>, on 11/09/18 at 22:04:06

Initial addition of SPARC V8 Stack/Register? Windowing from Peter Magnusson

This content was originally at https://www.sics.se/~psm/sparcstack.html
and is currently only available via the Wayback Machine. Peter kindly gave
permission to include it with the RTEMS Documentation. This initial commit
is a cut and paste with minimal edits. Edits are needed to ensure proper
formatting and the figures need to be made to look correct in the Rest
formatted output.

  • Property mode set to 100644
File size: 25.5 KB
Line 
1.. comment SPDX-License-Identifier: CC-BY-SA-4.0
2
3.. COMMENT: Permission granted by the original author (Peter Magnusson) to
4.. COMMENT: convert this page to Rest and include in the RTEMS Documentation.
5.. COMMENT: This content is no longer online and only accessible at
6.. COMMENT: https://web.archive.org/web/20120205014832/https://www.sics.se/~psm/sparcstack.html
7
8Understanding stacks and registers in the Sparc architecture(s)
9===============================================================
10The Sparc architecture from Sun Microsystems has some "interesting" characteristics. After having to deal with both compiler, interpreter, OS emulator, and OS porting issues for the Sparc, I decided to gather notes and documentation in one place. If there are any issues you don't find addressed by this page, or if you know of any similar Net resources, let me know. This document is limited to the V8 version of the architecture.
11
12General Structure
13Sparc has 32 general purpose integer registers visible to the program at any given time. Of these, 8 registers are global registers and 24 registers are in a register window. A window consists of three groups of 8 registers, the out, local, and in registers. See table 1. A Sparc implementation can have from 2 to 32 windows, thus varying the number of registers from 40 to 520. Most implentations have 7 or 8 windows. The variable number of registers is the principal reason for the Sparc being "scalable".
14
15At any given time, only one window is visible, as determined by the current window pointer (CWP) which is part of the processor status register (PSR). This is a five bit value that can be decremented or incremented by the SAVE and RESTORE instructions, respectively. These instructions are generally executed on procedure call and return (respectively). The idea is that the in registers contain incoming parameters, the local register constitute scratch registers, the out registers contain outgoing parameters, and the global registers contain values that vary little between executions. The register windows overlap partially, thus the out registers become renamed by SAVE to become the in registers of the called procedure. Thus, the memory traffic is reduced when going up and down the procedure call. Since this is a frequent operation, performance is improved.
16
17(That was the idea, anyway. The drawback is that upon interactions with the system the registers need to be flushed to the stack, necessitating a long sequence of writes to memory of data that is often mostly garbage. Register windows was a bad idea that was caused by simulation studies that considered only programs in isolation, as opposed to multitasking workloads, and by considering compilers with poor optimization. It also caused considerable problems in implementing high-end Sparc processors such as the SuperSparc, although more recent implementations have dealt effectively with the obstacles. Register windows is now part of the compatibility legacy and not easily removed from the architecture.)
18
19Register Group  Mnemonic        Register Address
20global  %g0-%g7         r[0]-r[7]
21out     %o0-%o7         r[8]-r[15]
22local   %l0-%l7         r[16]-r[23]
23in      %i0-%i7         r[24]-r[31]
24
25Table 1 - Visible Registers
26
27The overlap of the registers is illustrated in figure 1. The figure shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to w7 in the figure).. Each window corresponds to 24 registers, 16 of which are shared with "neighboring" windows. The windows are arranged in a wrap-around manner, thus window number 0 borders window number 7. The common cause of changing the current window, as pointed to by CWP, is the RESTORE and SAVE instuctions, shown in the middle. Less common is the supervisor RETT instruction (return from trap) and the trap event (interrupt, exception, or TRAP instruction).
28
29Figure 1 - Windowed Registers
30
31The "WIM" register is also indicated in the top left of figure 1. The window invalid mask is a bit map of valid windows. It is generally used as a pointer, i.e. exactly one bit is set in the WIM register indicating which window is invalid (in the figure it's window 7). Register windows are generally used to support procedure calls, so they can be viewed as a cache of the stack contents. The WIM "pointer" indicates how many procedure calls in a row can be taken without writing out data to memory. In the figure, the capacity of the register windows is fully utilized. An additional call will thus exceed capacity, triggering a window overflow trap. At the other end, a window underflow trap occurs when the register window "cache" if empty and more data needs to be fetched from memory.
32
33Register Semantics
34The Sparc Architecture includes recommended software semantics. These are described in the architecture manual, the Sparc ABI (application binary interface) standard, and, unfortunately, in various other locations as well (including header files and compiler documentation).
35
36Figure 2 shows a summary of register contents at any given time.
37
38                 %g0  (r00)       always zero
39                 %g1  (r01)  [1]  temporary value
40                 %g2  (r02)  [2]  global 2
41     global      %g3  (r03)  [2]  global 3
42                 %g4  (r04)  [2]  global 4
43                 %g5  (r05)       reserved for SPARC ABI
44                 %g6  (r06)       reserved for SPARC ABI
45                 %g7  (r07)       reserved for SPARC ABI
46
47                 %o0  (r08)  [3]  outgoing parameter 0 / return value from callee
48                 %o1  (r09)  [1]  outgoing parameter 1
49                 %o2  (r10)  [1]  outgoing parameter 2
50     out         %o3  (r11)  [1]  outgoing parameter 3
51                 %o4  (r12)  [1]  outgoing parameter 4
52                 %o5  (r13)  [1]  outgoing parameter 5
53            %sp, %o6  (r14)  [1]  stack pointer
54                 %o7  (r15)  [1]  temporary value / address of CALL instruction
55
56                 %l0  (r16)  [3]  local 0
57                 %l1  (r17)  [3]  local 1
58                 %l2  (r18)  [3]  local 2
59     local       %l3  (r19)  [3]  local 3
60                 %l4  (r20)  [3]  local 4
61                 %l5  (r21)  [3]  local 5
62                 %l6  (r22)  [3]  local 6
63                 %l7  (r23)  [3]  local 7
64
65                 %i0  (r24)  [3]  incoming parameter 0 / return value to caller
66                 %i1  (r25)  [3]  incoming parameter 1
67                 %i2  (r26)  [3]  incoming parameter 2
68     in          %i3  (r27)  [3]  incoming parameter 3
69                 %i4  (r28)  [3]  incoming parameter 4
70                 %i5  (r29)  [3]  incoming parameter 5
71            %fp, %i6  (r30)  [3]  frame pointer
72                 %i7  (r31)  [3]  return address - 8
73
74Notes:
75
76[1] assumed by caller to be destroyed (volatile) across a procedure call
77[2] should not be used by SPARC ABI library code
78[3] assumed by caller to be preserved across a procedure call
79
80Figure 2 - Sparc register semantics
81
82Particular compilers are likely to vary slightly.
83
84Note that globals %g2-%g4 are reserved for the "application", which includes libraries and compiler. Thus, for example, libraries may overwrite these registers unless they've been compiled with suitable flags. Also, the "reserved" registers are presumed to be allocated (in the future) bottom-up, i.e. %g7 is currently the "safest" to use.
85
86Optimizing linkers and interpreters are exmples that use global registers.
87
88Register Windows and the Stack
89The sparc register windows are, naturally, intimately related to the stack. In particular, the stack pointer (%sp or %o6) must always point to a free block of 64 bytes. This area is used by the operating system (Solaris, SunOS, and Linux at least) to save the current local and in registers upon a system interupt, exception, or trap instruction. (Note that this can occur at any time.)
90
91Other aspects of register relations with memory are programming convention. The typical, and recommended, layout of the stack is shown in figure 3. The figure shows a stack frame.
92
93                    low addresses
94
95               +-------------------------+
96     %sp  -->  | 16 words for storing    |
97               | LOCAL and IN registers  |
98               +-------------------------+
99               |  one-word pointer to    |
100               | aggregate return value  |
101               +-------------------------+
102               |   6 words for callee    |
103               |   to store register     |
104               |       arguments         |
105               +-------------------------+
106               |  outgoing parameters    |
107               |  past the 6th, if any   |
108               +-------------------------+
109               |  space, if needed, for  |
110               |  compiler temporaries   |
111               |   and saved floating-   |
112               |    point registers      |
113               +-------------------------+
114
115               +-------------------------+
116               |    space dynamically    |
117               |    allocated via the    |
118               |  alloca() library call  |
119               +-------------------------+
120               |  space, if needed, for  |
121               |    automatic arrays,    |
122               |    aggregates, and      |
123               |   addressable scalar    |
124               |       automatics        |
125               +-------------------------+
126    %fp  -->
127                     high addresses
128
129Figure 3 - Stack frame contents
130
131Note that the top boxes of figure 3 are addressed via the stack pointer (%sp), as positive offsets (including zero), and the bottom boxes are accessed over the frame pointer using negative offsets (excluding zero), and that the frame pointer is the old stack pointer. This scheme allows the separation of information known at compile time (number and size of local parameters, etc) from run-time information (size of blocks allocated by alloca()).
132
133"addressable scalar automatics" is a fancy name for local variables.
134
135The clever nature of the stack and frame pointers are that they are always 16 registers apart in the register windows. Thus, a SAVE instruction will make the current stack pointer into the frame pointer and, since the SAVE instruction also doubles as an ADD, create a new stack pointer. Figure 4 illustrates what the top of a stack might look like during execution. (The listing is from the "pwin" command in the SimICS simulator.)
136
137                  REGISTER WINDOWS
138
139                 +--+---+----------+
140                 |g0|r00|0x00000000| global
141                 |g1|r01|0x00000006| registers
142                 |g2|r02|0x00091278|
143      g0-g7      |g3|r03|0x0008ebd0|
144                 |g4|r04|0x00000000|                     (note: 'save' and 'trap' decrements CWP,
145                 |g5|r05|0x00000000|                      i.e. moves it up on this diagram. 'restore'
146                 |g6|r06|0x00000000|                      and 'rett' increments CWP, i.e. down)
147                 |g7|r07|0x00000000|
148                 +--+---+----------+
149 CWP (2)         |o0|r08|0x00000002|
150                 |o1|r09|0x00000000|                            MEMORY
151                 |o2|r10|0x00000001|
152      o0-o7      |o3|r11|0x00000001|             stack growth
153                 |o4|r12|0x000943d0|
154                 |o5|r13|0x0008b400|                  ^
155                 |sp|r14|0xdffff9a0| ----\           /|\
156                 |o7|r15|0x00062abc|     |            |                     addresses
157                 +--+---+----------+     |     +--+----------+         virtual     physical
158                 |l0|r16|0x00087c00|     \---> |l0|0x00000000|        0xdffff9a0  0x000039a0  top of frame 0
159                 |l1|r17|0x00027fd4|           |l1|0x00000000|        0xdffff9a4  0x000039a4
160                 |l2|r18|0x00000000|           |l2|0x0009df80|        0xdffff9a8  0x000039a8
161      l0-l7      |l3|r19|0x00000000|           |l3|0x00097660|        0xdffff9ac  0x000039ac
162                 |l4|r20|0x00000000|           |l4|0x00000014|        0xdffff9b0  0x000039b0
163                 |l5|r21|0x00097678|           |l5|0x00000001|        0xdffff9b4  0x000039b4
164                 |l6|r22|0x0008b400|           |l6|0x00000004|        0xdffff9b8  0x000039b8
165                 |l7|r23|0x0008b800|           |l7|0x0008dd60|        0xdffff9bc  0x000039bc
166              +--+--+---+----------+           +--+----------+
167 CWP+1 (3)    |o0|i0|r24|0x00000002|           |i0|0x00091048|        0xdffff9c0  0x000039c0
168              |o1|i1|r25|0x00000000|           |i1|0x00000011|        0xdffff9c4  0x000039c4
169              |o2|i2|r26|0x0008b7c0|           |i2|0x00091158|        0xdffff9c8  0x000039c8
170      i0-i7   |o3|i3|r27|0x00000019|           |i3|0x0008d370|        0xdffff9cc  0x000039cc
171              |o4|i4|r28|0x0000006c|           |i4|0x0008eac4|        0xdffff9d0  0x000039d0
172              |o5|i5|r29|0x00000000|           |i5|0x00000000|        0xdffff9d4  0x000039d4
173              |o6|fp|r30|0xdffffa00| ----\     |fp|0x00097660|        0xdffff9d8  0x000039d8
174              |o7|i7|r31|0x00040468|     |     |i7|0x00000000|        0xdffff9dc  0x000039dc
175              +--+--+---+----------+     |     +--+----------+
176                                         |        |0x00000001|        0xdffff9e0  0x000039e0  parameters
177                                         |        |0x00000002|        0xdffff9e4  0x000039e4
178                                         |        |0x00000040|        0xdffff9e8  0x000039e8
179                                         |        |0x00097671|        0xdffff9ec  0x000039ec
180                                         |        |0xdffffa68|        0xdffff9f0  0x000039f0
181                                         |        |0x00024078|        0xdffff9f4  0x000039f4
182                                         |        |0x00000004|        0xdffff9f8  0x000039f8
183                                         |        |0x0008dd60|        0xdffff9fc  0x000039fc
184              +--+------+----------+     |     +--+----------+
185              |l0|      |0x00087c00|     \---> |l0|0x00091048|        0xdffffa00  0x00003a00  top of frame 1
186              |l1|      |0x000c8d48|           |l1|0x0000000b|        0xdffffa04  0x00003a04
187              |l2|      |0x000007ff|           |l2|0x00091158|        0xdffffa08  0x00003a08
188              |l3|      |0x00000400|           |l3|0x000c6f10|        0xdffffa0c  0x00003a0c
189              |l4|      |0x00000000|           |l4|0x0008eac4|        0xdffffa10  0x00003a10
190              |l5|      |0x00088000|           |l5|0x00000000|        0xdffffa14  0x00003a14
191              |l6|      |0x0008d5e0|           |l6|0x000c6f10|        0xdffffa18  0x00003a18
192              |l7|      |0x00088000|           |l7|0x0008cd00|        0xdffffa1c  0x00003a1c
193              +--+--+---+----------+           +--+----------+
194 CWP+2 (4)    |i0|o0|   |0x00000002|           |i0|0x0008cb00|        0xdffffa20  0x00003a20
195              |i1|o1|   |0x00000011|           |i1|0x00000003|        0xdffffa24  0x00003a24
196              |i2|o2|   |0xffffffff|           |i2|0x00000040|        0xdffffa28  0x00003a28
197              |i3|o3|   |0x00000000|           |i3|0x0009766b|        0xdffffa2c  0x00003a2c
198              |i4|o4|   |0x00000000|           |i4|0xdffffa68|        0xdffffa30  0x00003a30
199              |i5|o5|   |0x00064c00|           |i5|0x000253d8|        0xdffffa34  0x00003a34
200              |i6|o6|   |0xdffffa70| ----\     |i6|0xffffffff|        0xdffffa38  0x00003a38
201              |i7|o7|   |0x000340e8|     |     |i7|0x00000000|        0xdffffa3c  0x00003a3c
202              +--+--+---+----------+     |     +--+----------+
203                                         |        |0x00000001|        0xdffffa40  0x00003a40  parameters
204                                         |        |0x00000000|        0xdffffa44  0x00003a44
205                                         |        |0x00000000|        0xdffffa48  0x00003a48
206                                         |        |0x00000000|        0xdffffa4c  0x00003a4c
207                                         |        |0x00000000|        0xdffffa50  0x00003a50
208                                         |        |0x00000000|        0xdffffa54  0x00003a54
209                                         |        |0x00000002|        0xdffffa58  0x00003a58
210                                         |        |0x00000002|        0xdffffa5c  0x00003a5c
211                                         |        |    .     |
212                                         |        |    .     |        .. etc (another 16 bytes)
213                                         |        |    .     |
214
215Figure 4 - Sample stack contents
216
217Note how the stack contents are not necessarily synchronized with the registers. Various events can cause the register windows to be "flushed" to memory, including most system calls. A programmer can force this update by using ST_FLUSH_WINDOWS trap, which also reduces the number of valid windows to the minimum of 1.
218
219Writing a library for multithreaded execution is an example that requires explicit flushing, as is longjmp().
220
221Procedure epilogue and prologue
222The stack frame described in the previous section leads to the standard entry/exit mechanisms listed in figure 5.
223
224  function:
225    save  %sp, -C, %sp
226
227               ; perform function, leave return value,
228               ; if any, in register %i0 upon exit
229
230    ret        ; jmpl %i7+8, %g0
231    restore    ; restore %g0,%g0,%g0
232
233Figure 5 - Epilogue/prologue in procedures
234
235The SAVE instruction decrements the CWP, as discussed earlier, and also performs an addition. The constant "C" that is used in the figure to indicate the amount of space to make on the stack, and thus corresponds to the frame contents in Figure 3. The minimum is therefore the 16 words for the LOCAL and IN registers, i.e. (hex) 0x40 bytes.
236
237A confusing element of the SAVE instruction is that the source operands (the first two parameters) are read from the old register window, and the destination operand (the rightmost parameter) is written to the new window. Thus, allthough "%sp" is indicated as both source and destination, the result is actually written into the stack pointer of the new window (the source stack pointer becomes renamed and is now the frame pointer).
238
239The return instructions are also a bit particular. ret is a synthetic instruction, corresponding to jmpl (jump linked). This instruction jumps to the address resulting from adding 8 to the %i7 register. The source instruction address (the address of the ret instruction itself) is written to the %g0 register, i.e. it is discarded.
240
241The restore instruction is similarly a synthetic instruction, and is just a short form for a restore that choses not to perform an addition.
242
243The calling instruction, in turn, typically looks as follows:
244
245    call <function>    ; jmpl <address>, %o7
246    mov 0, %o0
247
248Again, the call instruction is synthetic, and is actually the same instruction that performs the return. This time, however, it is interested in saving the return address, into register %o7. Note that the delay slot is often filled with an instruction related to the parameters, in this example it sets the first parameter to zero.
249
250Note also that the return value is also generally passed in %o0.
251
252Leaf procedures are different. A leaf procedure is an optimization that reduces unnecessary work by taking advantage of the knowledge that no call instructions exist in many procedures. Thus, the save/restore couple can be eliminated. The downside is that such a procedure may only use the out registers (since the in and local registers actually belong to the caller). See Figure 6.
253
254  function:
255               ; no save instruction needed upon entry
256
257               ; perform function, leave return value,
258               ; if any, in register %o0 upon exit
259
260    retl       ; jmpl %o7+8, %g0
261    nop        ; the delay slot can be used for something else
262
263Figure 6 - Epilogue/prologue in leaf procedures
264
265Note in the figure that there is only one instruction overhead, namely the retl instruction. retl is also synthetic (return from leaf subroutine), is again a variant of the jmpl instruction, this time with %o7+8 as target.
266
267Yet another variation of epilogue is caused by tail call elimination, an optimization supported by some compilers (including Sun's C compiler but not GCC). If the compiler detects that a called function will return to the calling function, it can replace its place on the stack with the called function. Figure 7 contains an example.
268
269      int
270        foo(int n)
271      {
272        if (n == 0)
273          return 0;
274        else
275          return bar(n);
276      }
277
278        cmp     %o0,0
279        bne     .L1
280        or      %g0,%o7,%g1
281        retl
282        or      %g0,0,%o0
283  .L1:  call    bar
284        or      %g0,%g1,%o7
285
286Figure 7 - Example of tail call elimination
287
288Note that the call instruction overwrites register %o7 with the program counter. Therefore the above code saves the old value of %o7, and restores it in the delay slot of the call instruction. If the function call is register indirect, this twiddling with %o7 can be avoided, but of course that form of call is slower on modern processors.
289
290The benefit of tail call elimination is to remove an indirection upon return. It is also needed to reduce register window usage, since otherwise the foo() function in Figure 7 would need to allocate a stack frame to save the program counter.
291
292A special form of tail call elimination is tail recursion elimination, which detects functions calling themselves, and replaces it with a simple branch. Figure 8 contains an example.
293
294        int
295          foo(int n)
296        {
297          if (n == 0)
298            return 1;
299          else
300            return (foo(n - 1));
301        }
302
303        cmp     %o0,0
304        be      .L1
305        or      %g0,%o0,%g1
306        subcc   %g1,1,%g1
307  .L2:  bne     .L2
308        subcc   %g1,1,%g1
309  .L1:  retl
310        or      %g0,1,%o0
311
312Figure 8 - Example of tail recursion elimination
313
314Needless to say, these optimizations produce code that is difficult to debug.
315
316Procedures, stacks, and debuggers
317When debugging an application, your debugger will be parsing the binary and consulting the symbol table to determine procedure entry points. It will also travel the stack frames "upward" to determine the current call chain.
318
319When compiling for debugging, compilers will generate additional code as well as avoid some optimizations in order to allow reconstructing situations during execution. For example, GCC/GDB makes sure original parameter values are kept intact somewhere for future parsing of the procedure call stack. The live in registers other than %i0 are not touched. %i0 itself is copied into a free local register, and its location is noted in the symbol file. (You can find out where variables reside by using the "info address" command in GDB.)
320
321Given that much of the semantics relating to stack handling and procedure call entry/exit code is only recommended, debuggers will sometimes be fooled. For example, the decision as to wether or not the current procedure is a leaf one or not can be incorrect. In this case a spurious procedure will be inserted between the current procedure and it's "real" parent. Another example is when the application maintains its own implicit call hierarchy, such as jumping to function pointers. In this case the debugger can easily become totally confused.
322
323The window overflow and underflow traps
324When the SAVE instruction decrements the current window pointer (CWP) so that it coincides with the invalid window in the window invalid mask (WIM), a window overflow trap occurs. Conversely, when the RESTORE or RETT instructions increment the CWP to coincide with the invalid window, a window underflow trap occurs.
325
326Either trap is handled by the operating system. Generally, data is written out to memory and/or read from memory, and the WIM register suitably altered.
327
328The code in Figure 9 and Figure 10 below are bare-bones handlers for the two traps. The text is directly from the source code, and sort of works. (As far as I know, these are minimalistic handlers for Sparc V8). Note that there is no way to directly access window registers other than the current one, hence the code does additional save/restore instructions. It's pretty tricky to understand the code, but figure 1 should be of help.
329
330        /* a SAVE instruction caused a trap */
331window_overflow:
332        /* rotate WIM on bit right, we have 8 windows */
333        mov %wim,%l3
334        sll %l3,7,%l4
335        srl %l3,1,%l3
336        or  %l3,%l4,%l3
337        and %l3,0xff,%l3
338
339        /* disable WIM traps */
340        mov %g0,%wim
341        nop; nop; nop
342
343        /* point to correct window */
344        save
345
346        /* dump registers to stack */
347        std %l0, [%sp +  0]
348        std %l2, [%sp +  8]
349        std %l4, [%sp + 16]
350        std %l6, [%sp + 24]
351        std %i0, [%sp + 32]
352        std %i2, [%sp + 40]
353        std %i4, [%sp + 48]
354        std %i6, [%sp + 56]
355
356        /* back to where we should be */
357        restore
358
359        /* set new value of window */
360        mov %l3,%wim
361        nop; nop; nop
362
363        /* go home */
364        jmp %l1
365        rett %l2
366
367Figure 9 - window_underflow trap handler
368
369        /* a RESTORE instruction caused a trap */
370window_underflow:
371
372        /* rotate WIM on bit LEFT, we have 8 windows */
373        mov %wim,%l3
374        srl %l3,7,%l4
375        sll %l3,1,%l3
376        or  %l3,%l4,%l3
377        and %l3,0xff,%l3
378
379        /* disable WIM traps */
380        mov %g0,%wim
381        nop; nop; nop
382
383        /* point to correct window */
384        restore
385        restore
386
387        /* dump registers to stack */
388        ldd [%sp +  0], %l0
389        ldd [%sp +  8], %l2
390        ldd [%sp + 16], %l4
391        ldd [%sp + 24], %l6
392        ldd [%sp + 32], %i0
393        ldd [%sp + 40], %i2
394        ldd [%sp + 48], %i4
395        ldd [%sp + 56], %i6
396
397        /* back to where we should be */
398        save
399        save
400
401        /* set new value of window */
402        mov %l3,%wim
403        nop; nop; nop
404
405        /* go home */
406        jmp %l1
407        rett %l2
408
409Figure 10 - window_underflow trap handler
Note: See TracBrowser for help on using the repository browser.