source: rtems-docs/cpu-supplement/sparc.rst @ ad2ca17

5
Last change on this file since ad2ca17 was ad2ca17, checked in by Joel Sherrill <joel@…>, on 08/16/18 at 23:11:53

cpu-supplement/sparc.rst: Fix me

  • Property mode set to 100644
File size: 61.0 KB
RevLine 
[489740f]1.. comment SPDX-License-Identifier: CC-BY-SA-4.0
2
[f233256]3.. COMMENT: COPYRIGHT (c) 1988-2002.
4.. COMMENT: On-Line Applications Research Corporation (OAR).
5.. COMMENT: All rights reserved.
6
[d755cbd]7SPARC Specific Information
[6916004]8**************************
[d755cbd]9
[f233256]10The Real Time Executive for Multiprocessor Systems (RTEMS) is designed to be
11portable across multiple processor architectures.  However, the nature of
12real-time systems makes it essential that the application designer understand
13certain processor dependent implementation details.  These processor
14dependencies include calling convention, board support package issues,
15interrupt processing, exact RTEMS memory requirements, performance data, header
16files, and the assembly language interface to the executive.
17
18This document discusses the SPARC architecture dependencies in this port of
19RTEMS.  This architectural port is for SPARC Version 7 and
[d755cbd]208. Implementations for SPARC V9 are in the sparc64 target.
21
[f233256]22It is highly recommended that the SPARC RTEMS application developer obtain and
23become familiar with the documentation for the processor being used as well as
24the specification for the revision of the SPARC architecture which corresponds
25to that processor.
[d755cbd]26
27**SPARC Architecture Documents**
28
[f233256]29For information on the SPARC architecture, refer to the following documents
30available from SPARC International, Inc.  (http://www.sparc.com):
[d755cbd]31
32- SPARC Standard Version 7.
33
34- SPARC Standard Version 8.
35
36**ERC32 Specific Information**
37
[d8beaab]38The European Space Agency's ERC32 is a microprocessor implementing a
[f233256]39SPARC V7 processor and associated support circuitry for embedded space
40applications. The integer and floating-point units (90C601E & 90C602E) are
41based on the Cypress 7C601 and 7C602, with additional error-detection and
42recovery functions. The memory controller (MEC) implements system support
43functions such as address decoding, memory interface, DMA interface, UARTs,
44timers, interrupt control, write-protection, memory reconfiguration and
45error-detection.  The core is designed to work at 25MHz, but using space
46qualified memories limits the system frequency to around 15 MHz, resulting in a
47performance of 10 MIPS and 2 MFLOPS.
48
[d8beaab]49The ERC32 is available from Atmel as the TSC695F.
[d755cbd]50
[d8beaab]51The RTEMS configuration of GDB enables the SPARC Instruction Simulator (SIS)
52which can simulate the ERC32 as well as the follow up LEON2 and LEON3
53microprocessors.
[d755cbd]54
55CPU Model Dependent Features
56============================
57
[f233256]58Microprocessors are generally classified into families with a variety of CPU
59models or implementations within that family.  Within a processor family, there
60is a high level of binary compatibility.  This family may be based on either an
61architectural specification or on maintaining compatibility with a popular
62processor.  Recent microprocessor families such as the SPARC or PowerPC are
63based on an architectural specification which is independent or any particular
64CPU model or implementation.  Older families such as the M68xxx and the iX86
65evolved as the manufacturer strived to produce higher performance processor
66models which maintained binary compatibility with older models.
67
68RTEMS takes advantage of the similarity of the various models within a CPU
69family.  Although the models do vary in significant ways, the high level of
70compatibility makes it possible to share the bulk of the CPU dependent
71executive code across the entire family.
[d755cbd]72
73CPU Model Feature Flags
74-----------------------
75
[f233256]76Each processor family supported by RTEMS has a list of features which vary
77between CPU models within a family.  For example, the most common model
78dependent feature regardless of CPU family is the presence or absence of a
79floating point unit or coprocessor.  When defining the list of features present
80on a particular CPU model, one simply notes that floating point hardware is or
81is not present and defines a single constant appropriately.  Conditional
82compilation is utilized to include the appropriate source code for this CPU
83model's feature set.  It is important to note that this means that RTEMS is
84thus compiled using the appropriate feature set and compilation flags optimal
85for this CPU model used.  The alternative would be to generate a binary which
86would execute on all family members using only the features which were always
[d755cbd]87present.
88
[f233256]89This section presents the set of features which vary across SPARC
90implementations and are of importance to RTEMS.  The set of CPU model feature
91macros are defined in the file cpukit/score/cpu/sparc/sparc.h based upon the
92particular CPU model defined on the compilation command line.
[d755cbd]93
94CPU Model Name
95~~~~~~~~~~~~~~
96
[f233256]97The macro CPU_MODEL_NAME is a string which designates the name of this CPU
98model.  For example, for the European Space Agency's ERC32 SPARC model, this
99macro is set to the string "erc32".
[d755cbd]100
101Floating Point Unit
102~~~~~~~~~~~~~~~~~~~
103
[f233256]104The macro SPARC_HAS_FPU is set to 1 to indicate that this CPU model has a
105hardware floating point unit and 0 otherwise.
[d755cbd]106
107Bitscan Instruction
108~~~~~~~~~~~~~~~~~~~
109
[f233256]110The macro SPARC_HAS_BITSCAN is set to 1 to indicate that this CPU model has the
111bitscan instruction.  For example, this instruction is supported by the Fujitsu
112SPARClite family.
[d755cbd]113
114Number of Register Windows
115~~~~~~~~~~~~~~~~~~~~~~~~~~
116
[f233256]117The macro SPARC_NUMBER_OF_REGISTER_WINDOWS is set to indicate the number of
118register window sets implemented by this CPU model.  The SPARC architecture
119allows a for a maximum of thirty-two register window sets although most
120implementations only include eight.
[d755cbd]121
122Low Power Mode
123~~~~~~~~~~~~~~
124
[f233256]125The macro SPARC_HAS_LOW_POWER_MODE is set to one to indicate that this CPU
126model has a low power mode.  If low power is enabled, then there must be CPU
127model specific implementation of the IDLE task in cpukit/score/cpu/sparc/cpu.c.
128The low power mode IDLE task should be of the form:
129
130.. code-block:: c
[d755cbd]131
132    while ( TRUE ) {
[f233256]133        enter low power mode
[d755cbd]134    }
135
136The code required to enter low power mode is CPU model specific.
137
138CPU Model Implementation Notes
139------------------------------
140
141The ERC32 is a custom SPARC V7 implementation based on the Cypress 601/602
142chipset.  This CPU has a number of on-board peripherals and was developed by
143the European Space Agency to target space applications.  RTEMS currently
144provides support for the following peripherals:
145
146- UART Channels A and B
147
148- General Purpose Timer
149
150- Real Time Clock
151
152- Watchdog Timer (so it can be disabled)
153
154- Control Register (so powerdown mode can be enabled)
155
156- Memory Control Register
157
158- Interrupt Control
159
160The General Purpose Timer and Real Time Clock Timer provided with the ERC32
161share the Timer Control Register.  Because the Timer Control Register is write
162only, we must mirror it in software and insure that writes to one timer do not
163alter the current settings and status of the other timer.  Routines are
164provided in erc32.h which promote the view that the two timers are completely
165independent.  By exclusively using these routines to access the Timer Control
[f233256]166Register, the application can view the system as having a General Purpose Timer
167Control Register and a Real Time Clock Timer Control Register rather than the
168single shared value.
[d755cbd]169
170The RTEMS Idle thread take advantage of the low power mode provided by the
171ERC32.  Low power mode is entered during idle loops and is enabled at
172initialization time.
173
174Calling Conventions
175===================
176
177Each high-level language compiler generates subroutine entry and exit code
178based upon a set of rules known as the application binary interface (ABI)
[f233256]179calling convention.  These rules address the following issues:
[d755cbd]180
181- register preservation and usage
182
183- parameter passing
184
185- call and return mechanism
186
187An ABI calling convention is of importance when interfacing to subroutines
188written in another language either assembly or high-level.  It determines also
189the set of registers to be saved or restored during a context switch and
190interrupt processing.
191
192The ABI relevant for RTEMS on SPARC is defined by SYSTEM V APPLICATION BINARY
193INTERFACE, SPARC Processor Supplement, Third Edition.
194
195Programming Model
196-----------------
197
[f233256]198This section discusses the programming model for the SPARC architecture.
[d755cbd]199
200Non-Floating Point Registers
201~~~~~~~~~~~~~~~~~~~~~~~~~~~~
202
[f233256]203The SPARC architecture defines thirty-two non-floating point registers directly
204visible to the programmer.  These are divided into four sets:
[d755cbd]205
206- input registers
207
208- local registers
209
210- output registers
211
212- global registers
213
[f233256]214Each register is referred to by either two or three names in the SPARC
215reference manuals.  First, the registers are referred to as r0 through r31 or
216with the alternate notation r[0] through r[31].  Second, each register is a
217member of one of the four sets listed above.  Finally, some registers have an
218architecturally defined role in the programming model which provides an
219alternate name.  The following table describes the mapping between the 32
220registers and the register sets:
221
[0c97890]222================ ================ ===================
223Register Number  Register Names   Description
224================ ================ ===================
2250 - 7            g0 - g7          Global Registers
2268 - 15           o0 - o7          Output Registers
22716 - 23          l0 - l7          Local Registers
22824 - 31          i0 - i7          Input Registers
229================ ================ ===================
[f233256]230
231As mentioned above, some of the registers serve defined roles in the
232programming model.  The following table describes the role of each of these
233registers:
234
[0c97890]235============== ================ ==================================
236Register Name  Alternate Name   Description
237============== ================ ==================================
238g0             na               reads return 0, writes are ignored
239o6             sp               stack pointer
240i6             fp               frame pointer
241i7             na               return address
242============== ================ ==================================
[d755cbd]243
244The registers g2 through g4 are reserved for applications.  GCC uses them as
245volatile registers by default.  So they are treated like volatile registers in
246RTEMS as well.
247
248The register g6 is reserved for the operating system and contains the address
249of the per-CPU control block of the current processor.  This register is
250initialized during system start and then remains unchanged.  It is not
251saved/restored by the context switch or interrupt processing code.
252
253The register g7 is reserved for the operating system and contains the thread
254pointer used for thread-local storage (TLS) as mandated by the SPARC ABI.
255
256Floating Point Registers
257~~~~~~~~~~~~~~~~~~~~~~~~
258
[f233256]259The SPARC V7 architecture includes thirty-two, thirty-two bit registers.  These
260registers may be viewed as follows:
[d755cbd]261
[f233256]262- 32 single precision floating point or integer registers (f0, f1, ... f31)
[d755cbd]263
[f233256]264- 16 double precision floating point registers (f0, f2, f4, ... f30)
[d755cbd]265
[f233256]266- 8 extended precision floating point registers (f0, f4, f8, ... f28)
[d755cbd]267
[f233256]268The floating point status register (FSR) specifies the behavior of the floating
269point unit for rounding, contains its condition codes, version specification,
270and trap information.
[d755cbd]271
272According to the ABI all floating point registers and the floating point status
[f233256]273register (FSR) are volatile.  Thus the floating point context of a thread is
274the empty set.  The rounding direction is a system global state and must not be
[d755cbd]275modified by threads.
276
[f233256]277A queue of the floating point instructions which have started execution but not
278yet completed is maintained.  This queue is needed to support the multiple
279cycle nature of floating point operations and to aid floating point exception
280trap handlers.  Once a floating point exception has been encountered, the queue
281is frozen until it is emptied by the trap handler.  The floating point queue is
282loaded by launching instructions.  It is emptied normally when the floating
283point completes all outstanding instructions and by floating point exception
284handlers with the store double floating point queue (stdfq) instruction.
[d755cbd]285
286Special Registers
287~~~~~~~~~~~~~~~~~
288
[f233256]289The SPARC architecture includes two special registers which are critical to the
290programming model: the Processor State Register (psr) and the Window Invalid
291Mask (wim).  The psr contains the condition codes, processor interrupt level,
292trap enable bit, supervisor mode and previous supervisor mode bits, version
293information, floating point unit and coprocessor enable bits, and the current
294window pointer (cwp).  The cwp field of the psr and wim register are used to
295manage the register windows in the SPARC architecture.  The register windows
296are discussed in more detail below.
[d755cbd]297
298Register Windows
299----------------
300
[f233256]301The SPARC architecture includes the concept of register windows.  An overly
302simplistic way to think of these windows is to imagine them as being an
303infinite supply of "fresh" register sets available for each subroutine to use.
304In reality, they are much more complicated.
305
306The save instruction is used to obtain a new register window.  This instruction
307decrements the current window pointer, thus providing a new set of registers
308for use.  This register set includes eight fresh local registers for use
309exclusively by this subroutine.  When done with a register set, the restore
310instruction increments the current window pointer and the previous register set
311is once again available.
312
313The two primary issues complicating the use of register windows are that (1)
314the set of register windows is finite, and (2) some registers are shared
315between adjacent registers windows.
316
317Because the set of register windows is finite, it is possible to execute enough
318save instructions without corresponding restore's to consume all of the
319register windows.  This is easily accomplished in a high level language because
320each subroutine typically performs a save instruction upon entry.  Thus having
321a subroutine call depth greater than the number of register windows will result
322in a window overflow condition.  The window overflow condition generates a trap
323which must be handled in software.  The window overflow trap handler is
324responsible for saving the contents of the oldest register window on the
325program stack.
326
327Similarly, the subroutines will eventually complete and begin to perform
328restore's.  If the restore results in the need for a register window which has
329previously been written to memory as part of an overflow, then a window
330underflow condition results.  Just like the window overflow, the window
331underflow condition must be handled in software by a trap handler.  The window
332underflow trap handler is responsible for reloading the contents of the
333register window requested by the restore instruction from the program stack.
334
335The Window Invalid Mask (wim) and the Current Window Pointer (cwp) field in the
336psr are used in conjunction to manage the finite set of register windows and
337detect the window overflow and underflow conditions.  The cwp contains the
338index of the register window currently in use.  The save instruction decrements
339the cwp modulo the number of register windows.  Similarly, the restore
340instruction increments the cwp modulo the number of register windows.  Each bit
341in the wim represents represents whether a register window contains valid
342information.  The value of 0 indicates the register window is valid and 1
343indicates it is invalid.  When a save instruction causes the cwp to point to a
344register window which is marked as invalid, a window overflow condition
345results.  Conversely, the restore instruction may result in a window underflow
346condition.
347
348Other than the assumption that a register window is always available for trap
349(i.e. interrupt) handlers, the SPARC architecture places no limits on the
350number of register windows simultaneously marked as invalid (i.e. number of
351bits set in the wim).  However, RTEMS assumes that only one register window is
352marked invalid at a time (i.e. only one bit set in the wim).  This makes the
353maximum possible number of register windows available to the user while still
354meeting the requirement that window overflow and underflow conditions can be
355detected.
356
357The window overflow and window underflow trap handlers are a critical part of
358the run-time environment for a SPARC application.  The SPARC architectural
359specification allows for the number of register windows to be any power of two
360less than or equal to 32.  The most common choice for SPARC implementations
361appears to be 8 register windows.  This results in the cwp ranging in value
362from 0 to 7 on most implementations.
363
364The second complicating factor is the sharing of registers between adjacent
365register windows.  While each register window has its own set of local
366registers, the input and output registers are shared between adjacent windows.
367The output registers for register window N are the same as the input registers
368for register window ((N - 1) modulo RW) where RW is the number of register
369windows.  An alternative way to think of this is to remember how parameters are
370passed to a subroutine on the SPARC.  The caller loads values into what are its
371output registers.  Then after the callee executes a save instruction, those
372parameters are available in its input registers.  This is a very efficient way
373to pass parameters as no data is actually moved by the save or restore
374instructions.
[d755cbd]375
376Call and Return Mechanism
377-------------------------
378
[f233256]379The SPARC architecture supports a simple yet effective call and return
380mechanism.  A subroutine is invoked via the call (call) instruction.  This
381instruction places the return address in the caller's output register 7 (o7).
382After the callee executes a save instruction, this value is available in input
383register 7 (i7) until the corresponding restore instruction is executed.
384
385The callee returns to the caller via a jmp to the return address.  There is a
386delay slot following this instruction which is commonly used to execute a
387restore instruction - if a register window was allocated by this subroutine.
388
389It is important to note that the SPARC subroutine call and return mechanism
390does not automatically save and restore any registers.  This is accomplished
391via the save and restore instructions which manage the set of registers
392windows.
[d755cbd]393
394In case a floating-point unit is supported, then floating-point return values
395appear in the floating-point registers.  Single-precision values occupy %f0;
396double-precision values occupy %f0 and %f1.  Otherwise, these are scratch
397registers.  Due to this the hardware and software floating-point ABIs are
398incompatible.
399
400Calling Mechanism
401-----------------
402
[f233256]403All RTEMS directives are invoked using the regular SPARC calling convention via
404the call instruction.
[d755cbd]405
406Register Usage
407--------------
408
[f233256]409As discussed above, the call instruction does not automatically save any
410registers.  The save and restore instructions are used to allocate and
411deallocate register windows.  When a register window is allocated, the new set
412of local registers are available for the exclusive use of the subroutine which
413allocated this register set.
[d755cbd]414
415Parameter Passing
416-----------------
417
[f233256]418RTEMS assumes that arguments are placed in the caller's output registers with
419the first argument in output register 0 (o0), the second argument in output
420register 1 (o1), and so forth.  Until the callee executes a save instruction,
421the parameters are still visible in the output registers.  After the callee
422executes a save instruction, the parameters are visible in the corresponding
423input registers.  The following pseudo-code illustrates the typical sequence
424used to call a RTEMS directive with three (3) arguments:
425
426.. code-block:: c
[d755cbd]427
428    load third argument into o2
429    load second argument into o1
430    load first argument into o0
431    invoke directive
432
433User-Provided Routines
434----------------------
435
[f233256]436All user-provided routines invoked by RTEMS, such as user extensions, device
437drivers, and MPCI routines, must also adhere to these calling conventions.
[d755cbd]438
439Memory Model
440============
441
[f233256]442A processor may support any combination of memory models ranging from pure
443physical addressing to complex demand paged virtual memory systems.  RTEMS
444supports a flat memory model which ranges contiguously over the processor's
445allowable address space.  RTEMS does not support segmentation or virtual memory
446of any kind.  The appropriate memory model for RTEMS provided by the targeted
447processor and related characteristics of that model are described in this
448chapter.
[d755cbd]449
450Flat Memory Model
451-----------------
452
[f233256]453The SPARC architecture supports a flat 32-bit address space with addresses
454ranging from 0x00000000 to 0xFFFFFFFF (4 gigabytes).  Each address is
455represented by a 32-bit value and is byte addressable.  The address may be used
456to reference a single byte, half-word (2-bytes), word (4 bytes), or doubleword
457(8 bytes).  Memory accesses within this address space are performed in big
458endian fashion by the SPARC.  Memory accesses which are not properly aligned
459generate a "memory address not aligned" trap (type number 7).  The following
460table lists the alignment requirements for a variety of data accesses:
[d755cbd]461
[0c97890]462==============  ======================
463Data Type       Alignment Requirement
464==============  ======================
465byte            1
466half-word       2
467word            4
468doubleword      8
469==============  ======================
[d755cbd]470
[f233256]471Doubleword load and store operations must use a pair of registers as their
472source or destination.  This pair of registers must be an adjacent pair of
473registers with the first of the pair being even numbered.  For example, a valid
474destination for a doubleword load might be input registers 0 and 1 (i0 and i1).
475The pair i1 and i2 would be invalid.  \[NOTE: Some assemblers for the SPARC do
476not generate an error if an odd numbered register is specified as the beginning
477register of the pair.  In this case, the assembler assumes that what the
478programmer meant was to use the even-odd pair which ends at the specified
479register.  This may or may not have been a correct assumption.]
[d755cbd]480
[f233256]481RTEMS does not support any SPARC Memory Management Units, therefore, virtual
482memory or segmentation systems involving the SPARC are not supported.
[d755cbd]483
484Interrupt Processing
485====================
486
[f233256]487Different types of processors respond to the occurrence of an interrupt in its
488own unique fashion. In addition, each processor type provides a control
489mechanism to allow for the proper handling of an interrupt.  The processor
490dependent response to the interrupt modifies the current execution state and
491results in a change in the execution stream.  Most processors require that an
492interrupt handler utilize some special control mechanisms to return to the
493normal processing stream.  Although RTEMS hides many of the processor dependent
494details of interrupt processing, it is important to understand how the RTEMS
495interrupt manager is mapped onto the processor's unique architecture. Discussed
496in this chapter are the SPARC's interrupt response and control mechanisms as
497they pertain to RTEMS.
498
499RTEMS and associated documentation uses the terms interrupt and vector.  In the
500SPARC architecture, these terms correspond to traps and trap type,
501respectively.  The terms will be used interchangeably in this manual.
[d755cbd]502
503Synchronous Versus Asynchronous Traps
504-------------------------------------
505
[f233256]506The SPARC architecture includes two classes of traps: synchronous and
507asynchronous.  Asynchronous traps occur when an external event interrupts the
508processor.  These traps are not associated with any instruction executed by the
509processor and logically occur between instructions.  The instruction currently
510in the execute stage of the processor is allowed to complete although
511subsequent instructions are annulled.  The return address reported by the
512processor for asynchronous traps is the pair of instructions following the
513current instruction.
514
515Synchronous traps are caused by the actions of an instruction.  The trap
516stimulus in this case either occurs internally to the processor or is from an
517external signal that was provoked by the instruction.  These traps are taken
518immediately and the instruction that caused the trap is aborted before any
519state changes occur in the processor itself.  The return address reported by
520the processor for synchronous traps is the instruction which caused the trap
521and the following instruction.
[d755cbd]522
523Vectoring of Interrupt Handler
524------------------------------
525
[f233256]526Upon receipt of an interrupt the SPARC automatically performs the following
527actions:
[d755cbd]528
529- disables traps (sets the ET bit of the psr to 0),
530
[f233256]531- the S bit of the psr is copied into the Previous Supervisor Mode (PS) bit of
532  the psr,
[d755cbd]533
[f233256]534- the cwp is decremented by one (modulo the number of register windows) to
535  activate a trap window,
[d755cbd]536
[f233256]537- the PC and nPC are loaded into local register 1 and 2 (l0 and l1),
[d755cbd]538
[f233256]539- the trap type (tt) field of the Trap Base Register (TBR) is set to the
540  appropriate value, and
[d755cbd]541
[f233256]542- if the trap is not a reset, then the PC is written with the contents of the
543  TBR and the nPC is written with TBR + 4.  If the trap is a reset, then the PC
544  is set to zero and the nPC is set to 4.
[d755cbd]545
[f233256]546Trap processing on the SPARC has two features which are noticeably different
547than interrupt processing on other architectures.  First, the value of psr
548register in effect immediately before the trap occurred is not explicitly
549saved.  Instead only reversible alterations are made to it.  Second, the
550Processor Interrupt Level (pil) is not set to correspond to that of the
551interrupt being processed.  When a trap occurs, ALL subsequent traps are
552disabled.  In order to safely invoke a subroutine during trap handling, traps
553must be enabled to allow for the possibility of register window overflow and
554underflow traps.
[d755cbd]555
[f233256]556If the interrupt handler was installed as an RTEMS interrupt handler, then upon
557receipt of the interrupt, the processor passes control to the RTEMS interrupt
558handler which performs the following actions:
[d755cbd]559
[d389819]560- saves the state of the interrupted task on it's stack,
[d755cbd]561
[f233256]562- insures that a register window is available for subsequent traps,
[d755cbd]563
[f233256]564- if this is the outermost (i.e. non-nested) interrupt, then the RTEMS
565  interrupt handler switches from the current stack to the interrupt stack,
[d755cbd]566
567- enables traps,
568
569- invokes the vectors to a user interrupt service routine (ISR).
570
[f233256]571Asynchronous interrupts are ignored while traps are disabled.  Synchronous
572traps which occur while traps are disabled result in the CPU being forced into
573an error mode.
[d755cbd]574
[f233256]575A nested interrupt is processed similarly with the exception that the current
576stack need not be switched to the interrupt stack.
[d755cbd]577
578Traps and Register Windows
579--------------------------
580
[f233256]581One of the register windows must be reserved at all times for trap processing.
582This is critical to the proper operation of the trap mechanism in the SPARC
583architecture.  It is the responsibility of the trap handler to insure that
584there is a register window available for a subsequent trap before re-enabling
585traps.  It is likely that any high level language routines invoked by the trap
586handler (such as a user-provided RTEMS interrupt handler) will allocate a new
587register window.  The save operation could result in a window overflow trap.
588This trap cannot be correctly processed unless (1) traps are enabled and (2) a
589register window is reserved for traps.  Thus, the RTEMS interrupt handler
590insures that a register window is available for subsequent traps before
591enabling traps and invoking the user's interrupt handler.
[d755cbd]592
593Interrupt Levels
594----------------
595
[f233256]596Sixteen levels (0-15) of interrupt priorities are supported by the SPARC
597architecture with level fifteen (15) being the highest priority.  Level
598zero (0) indicates that interrupts are fully enabled.  Interrupt requests for
599interrupts with priorities less than or equal to the current interrupt mask
600level are ignored. Level fifteen (15) is a non-maskable interrupt (NMI), which
601makes it unsuitable for standard usage since it can affect the real-time
602behaviour by interrupting critical sections and spinlocks. Disabling traps
603stops also the NMI interrupt from happening. It can however be used for
604power-down or other critical events.
605
606Although RTEMS supports 256 interrupt levels, the SPARC only supports sixteen.
607RTEMS interrupt levels 0 through 15 directly correspond to SPARC processor
608interrupt levels.  All other RTEMS interrupt levels are undefined and their
609behavior is unpredictable.
610
611Many LEON SPARC v7/v8 systems features an extended interrupt controller which
612adds an extra step of interrupt decoding to allow handling of interrupt
61316-31. When such an extended interrupt is generated the CPU traps into a
614specific interrupt trap level 1-14 and software reads out from the interrupt
615controller which extended interrupt source actually caused the interrupt.
[d755cbd]616
617Disabling of Interrupts by RTEMS
618--------------------------------
619
[f233256]620During the execution of directive calls, critical sections of code may be
621executed.  When these sections are encountered, RTEMS disables interrupts to
622level fifteen (15) before the execution of the section and restores them to the
623previous level upon completion of the section.  RTEMS has been optimized to
624ensure that interrupts are disabled for less than RTEMS_MAXIMUM_DISABLE_PERIOD
625microseconds on a RTEMS_MAXIMUM_DISABLE_PERIOD_MHZ Mhz ERC32 with zero wait
626states.  These numbers will vary based the number of wait states and processor
627speed present on the target board.  [NOTE: The maximum period with interrupts
628disabled is hand calculated.  This calculation was last performed for Release
[d755cbd]629RTEMS_RELEASE_FOR_MAXIMUM_DISABLE_PERIOD.]
630
[f233256]631[NOTE: It is thought that the length of time at which the processor interrupt
632level is elevated to fifteen by RTEMS is not anywhere near as long as the
633length of time ALL traps are disabled as part of the "flush all register
634windows" operation.]
635
636Non-maskable interrupts (NMI) cannot be disabled, and ISRs which execute at
637this level MUST NEVER issue RTEMS system calls.  If a directive is invoked,
638unpredictable results may occur due to the inability of RTEMS to protect its
639critical sections.  However, ISRs that make no system calls may safely execute
640as non-maskable interrupts.
641
642Interrupts are disabled or enabled by performing a system call to the Operating
643System reserved software traps 9 (SPARC_SWTRAP_IRQDIS) or 10
[135b90c]644(SPARC_SWTRAP_IRQEN). The trap is generated by the software trap (Ticc)
[f233256]645instruction or indirectly by calling sparc_disable_interrupts() or
646sparc_enable_interrupts() functions. Disabling interrupts return the previous
647interrupt level (on trap entry) in register G1 and sets PSR.PIL to 15 to
648disable all maskable interrupts. The interrupt level can be restored by
649trapping into the enable interrupt handler with G1 containing the new interrupt
650level.
[d755cbd]651
652Interrupt Stack
653---------------
654
[f233256]655The SPARC architecture does not provide for a dedicated interrupt stack.  Thus
656by default, trap handlers would execute on the stack of the RTEMS task which
657they interrupted.  This artificially inflates the stack requirements for each
658task since EVERY task stack would have to include enough space to account for
659the worst case interrupt stack requirements in addition to it's own worst case
660usage.  RTEMS addresses this problem on the SPARC by providing a dedicated
661interrupt stack managed by software.
[d755cbd]662
[f233256]663During system initialization, RTEMS allocates the interrupt stack from the
664Workspace Area.  The amount of memory allocated for the interrupt stack is
665determined by the interrupt_stack_size field in the CPU Configuration Table.
666As part of processing a non-nested interrupt, RTEMS will switch to the
667interrupt stack before invoking the installed handler.
[d755cbd]668
669Default Fatal Error Processing
670==============================
671
[f233256]672Upon detection of a fatal error by either the application or RTEMS the fatal
673error manager is invoked.  The fatal error manager will invoke the
674user-supplied fatal error handlers.  If no user-supplied handlers are
675configured, the RTEMS provided default fatal error handler is invoked.  If the
676user-supplied fatal error handlers return to the executive the default fatal
677error handler is then invoked.  This chapter describes the precise operations
678of the default fatal error handler.
[d755cbd]679
680Default Fatal Error Handler Operations
681--------------------------------------
682
[f233256]683The default fatal error handler which is invoked by the fatal_error_occurred
684directive when there is no user handler configured or the user handler returns
685control to RTEMS.
[d755cbd]686
[f233256]687If the BSP has been configured with ``BSP_POWER_DOWN_AT_FATAL_HALT`` set to
688true, the default handler will disable interrupts and enter power down mode. If
689power down mode is not available, it goes into an infinite loop to simulate a
690halt processor instruction.
[d755cbd]691
[f233256]692If ``BSP_POWER_DOWN_AT_FATAL_HALT`` is set to false, the default handler will
693place the value ``1`` in register ``g1``, the error source in register ``g2``,
694and the error code in register``g3``. It will then generate a system error
695which will hand over control to the debugger, simulator, etc.
[d755cbd]696
697Symmetric Multiprocessing
698=========================
699
700SMP is supported.  Available platforms are the Cobham Gaisler GR712RC and
701GR740.
702
703Thread-Local Storage
704====================
705
706Thread-local storage is supported.
707
708Board Support Packages
709======================
710
[f233256]711An RTEMS Board Support Package (BSP) must be designed to support a particular
712processor and target board combination.  This chapter presents a discussion of
713SPARC specific BSP issues.  For more information on developing a BSP, refer to
714the chapter titled Board Support Packages in the RTEMS Applications User's
715Guide.
[d755cbd]716
717System Reset
718------------
719
[f233256]720An RTEMS based application is initiated or re-initiated when the SPARC
721processor is reset.  When the SPARC is reset, the processor performs the
722following actions:
[d755cbd]723
[f233256]724- the enable trap (ET) of the psr is set to 0 to disable traps,
[d755cbd]725
[f233256]726- the supervisor bit (S) of the psr is set to 1 to enter supervisor mode, and
[d755cbd]727
728- the PC is set 0 and the nPC is set to 4.
729
[f233256]730The processor then begins to execute the code at location 0.  It is important
731to note that all fields in the psr are not explicitly set by the above steps
732and all other registers retain their value from the previous execution mode.
733This is true even of the Trap Base Register (TBR) whose contents reflect the
734last trap which occurred before the reset.
[d755cbd]735
736Processor Initialization
737------------------------
738
[f233256]739It is the responsibility of the application's initialization code to initialize
740the TBR and install trap handlers for at least the register window overflow and
741register window underflow conditions.  Traps should be enabled before invoking
742any subroutines to allow for register window management.  However, interrupts
743should be disabled by setting the Processor Interrupt Level (pil) field of the
744psr to 15.  RTEMS installs it's own Trap Table as part of initialization which
745is initialized with the contents of the Trap Table in place when the
746``rtems_initialize_executive`` directive was invoked.  Upon completion of
747executive initialization, interrupts are enabled.
748
749If this SPARC implementation supports on-chip caching and this is to be
750utilized, then it should be enabled during the reset application initialization
751code.
752
753In addition to the requirements described in the Board Support Packages chapter
754of the C Applications Users Manual for the reset code which is executed before
755the call to``rtems_initialize_executive``, the SPARC version has the following
[d755cbd]756specific requirements:
757
[f233256]758- Must leave the S bit of the status register set so that the SPARC remains in
759  the supervisor state.
[d755cbd]760
[f233256]761- Must set stack pointer (sp) such that a minimum stack size of
762  MINIMUM_STACK_SIZE bytes is provided for the``rtems_initialize_executive``
763  directive.
[d755cbd]764
[f233256]765- Must disable all external interrupts (i.e. set the pil to 15).
[d755cbd]766
[f233256]767- Must enable traps so window overflow and underflow conditions can be properly
768  handled.
[d755cbd]769
[f233256]770- Must initialize the SPARC's initial trap table with at least trap handlers
771  for register window overflow and register window underflow.
[ad2ca17]772
773....................................
774....
775
776Understanding stacks and registers in the SPARC architecture(s)
777===============================================================
778
779The content in this section originally appeared at
780https://www.sics.se/~psm/sparcstack.html. It appears here with the
781gracious permission of the author Peter Magnusson.
782
783
784The SPARC architecture from Sun Microsystems has some "interesting"
785characteristics. After having to deal with both compiler, interpreter, OS
786emulator, and OS porting issues for the SPARC, I decided to gather notes
787and documentation in one place. If there are any issues you don't find
788addressed by this page, or if you know of any similar Net resources, let
789me know. This document is limited to the V8 version of the architecture.
790
791General Structure
792-----------------
793
794SPARC has 32 general purpose integer registers visible to the program
795at any given time. Of these, 8 registers are global registers and 24
796registers are in a register window. A window consists of three groups
797of 8 registers, the out, local, and in registers. See table 1. A SPARC
798implementation can have from 2 to 32 windows, thus varying the number
799of registers from 40 to 520. Most implentations have 7 or 8 windows. The
800variable number of registers is the principal reason for the SPARC being
801"scalable".
802
803At any given time, only one window is visible, as determined by the
804current window pointer (CWP) which is part of the processor status
805register (PSR). This is a five bit value that can be decremented or
806incremented by the SAVE and RESTORE instructions, respectively. These
807instructions are generally executed on procedure call and return
808(respectively). The idea is that the in registers contain incoming
809parameters, the local register constitute scratch registers, the out
810registers contain outgoing parameters, and the global registers contain
811values that vary little between executions. The register windows overlap
812partially, thus the out registers become renamed by SAVE to become the in
813registers of the called procedure. Thus, the memory traffic is reduced
814when going up and down the procedure call. Since this is a frequent
815operation, performance is improved.
816
817(That was the idea, anyway. The drawback is that upon interactions
818with the system the registers need to be flushed to the stack,
819necessitating a long sequence of writes to memory of data that is
820often mostly garbage. Register windows was a bad idea that was caused
821by simulation studies that considered only programs in isolation, as
822opposed to multitasking workloads, and by considering compilers with
823poor optimization. It also caused considerable problems in implementing
824high-end SPARC processors such as the SuperSPARC, although more recent
825implementations have dealt effectively with the obstacles. Register
826windows is now part of the compatibility legacy and not easily removed
827from the architecture.)
828
829================ ======== ================
830Register  Group  Mnemonic Register Address
831================ ======== ================
832global           %g0-%g7  r[0]-r[7]
833out              %o0-%o7  r[8]-r[15]
834local            %l0-%l7  r[16]-r[23]
835in               %i0-%i7  r[24]-r[31]
836================ ======== ================
837
838.. Table 1 - Visible Registers
839
840The overlap of the registers is illustrated in figure 1. The figure
841shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to
842w7 in the figure).. Each window corresponds to 24 registers, 16 of which
843are shared with "neighboring" windows. The windows are arranged in a
844wrap-around manner, thus window number 0 borders window number 7. The
845common cause of changing the current window, as pointed to by CWP, is
846the RESTORE and SAVE instuctions, shown in the middle. Less common is
847the supervisor RETT instruction (return from trap) and the trap event
848(interrupt, exception, or TRAP instruction).
849
850
851.. image:: sparcwin.gif
852
853Figure 1 - Windowed Registers
854
855The "WIM" register is also indicated in the top left of figure 1. The
856window invalid mask is a bit map of valid windows. It is generally used
857as a pointer, i.e. exactly one bit is set in the WIM register indicating
858which window is invalid (in the figure it's window 7). Register windows
859are generally used to support procedure calls, so they can be viewed
860as a cache of the stack contents. The WIM "pointer" indicates how
861many procedure calls in a row can be taken without writing out data to
862memory. In the figure, the capacity of the register windows is fully
863utilized. An additional call will thus exceed capacity, triggering a
864window overflow trap. At the other end, a window underflow trap occurs
865when the register window "cache" if empty and more data needs to be
866fetched from memory.
867
868Register Semantics
869------------------
870
871phe SPARC Architecture includes recommended software semantics. These are
872described in the architecture manual, the SPARC ABI (application binary
873interface) standard, and, unfortunately, in various other locations as
874well (including header files and compiler documentation).
875
876Figure 2 shows a summary of register contents at any given time.
877
878.. code-block:: asm
879
880                 %g0  (r00)       always zero
881                 %g1  (r01)  [1]  temporary value
882                 %g2  (r02)  [2]  global 2
883     global      %g3  (r03)  [2]  global 3
884                 %g4  (r04)  [2]  global 4
885                 %g5  (r05)       reserved for SPARC ABI
886                 %g6  (r06)       reserved for SPARC ABI
887                 %g7  (r07)       reserved for SPARC ABI
888
889                 %o0  (r08)  [3]  outgoing parameter 0 / return value from callee   
890                 %o1  (r09)  [1]  outgoing parameter 1
891                 %o2  (r10)  [1]  outgoing parameter 2
892     out         %o3  (r11)  [1]  outgoing parameter 3
893                 %o4  (r12)  [1]  outgoing parameter 4
894                 %o5  (r13)  [1]  outgoing parameter 5
895            %sp, %o6  (r14)  [1]  stack pointer
896                 %o7  (r15)  [1]  temporary value / address of CALL instruction
897
898                 %l0  (r16)  [3]  local 0
899                 %l1  (r17)  [3]  local 1
900                 %l2  (r18)  [3]  local 2
901     local       %l3  (r19)  [3]  local 3
902                 %l4  (r20)  [3]  local 4
903                 %l5  (r21)  [3]  local 5
904                 %l6  (r22)  [3]  local 6
905                 %l7  (r23)  [3]  local 7
906
907                 %i0  (r24)  [3]  incoming parameter 0 / return value to caller
908                 %i1  (r25)  [3]  incoming parameter 1
909                 %i2  (r26)  [3]  incoming parameter 2
910     in          %i3  (r27)  [3]  incoming parameter 3
911                 %i4  (r28)  [3]  incoming parameter 4
912                 %i5  (r29)  [3]  incoming parameter 5
913            %fp, %i6  (r30)  [3]  frame pointer
914                 %i7  (r31)  [3]  return address - 8
915
916Notes:
917
918# assumed by caller to be destroyed (volatile) across a procedure call
919
920# should not be used by SPARC ABI library code
921
922# assumed by caller to be preserved across a procedure call
923
924.. Above was Figure 2 - SPARC register semantics
925
926Particular compilers are likely to vary slightly.
927
928Note that globals %g2-%g4 are reserved for the "application", which
929includes libraries and compiler. Thus, for example, libraries may
930overwrite these registers unless they've been compiled with suitable
931flags. Also, the "reserved" registers are presumed to be allocated
932(in the future) bottom-up, i.e. %g7 is currently the "safest" to use.
933
934Optimizing linkers and interpreters are exmples that use global registers.
935
936Register Windows and the Stack
937------------------------------
938
939The SPARC register windows are, naturally, intimately related to the
940stack. In particular, the stack pointer (%sp or %o6) must always point
941to a free block of 64 bytes. This area is used by the operating system
942(Solaris, SunOS, and Linux at least) to save the current local and in
943registers upon a system interupt, exception, or trap instruction. (Note
944that this can occur at any time.)
945
946Other aspects of register relations with memory are programming
947convention. The typical, and recommended, layout of the stack is shown
948in figure 3. The figure shows a stack frame.
949
950.. code-block:: asm
951                    low addresses
952               +-------------------------+         
953     %sp  -->  | 16 words for storing    |
954               | LOCAL and IN registers  |
955               +-------------------------+
956               |  one-word pointer to    |
957               | aggregate return value  |
958               +-------------------------+
959               |   6 words for callee    |
960               |   to store register     |
961               |       arguments         |
962               +-------------------------+
963               |  outgoing parameters    |
964               |  past the 6th, if any   |
965               +-------------------------+
966               |  space, if needed, for  |
967               |  compiler temporaries   |
968               |   and saved floating-   |
969               |    point registers      |
970               +-------------------------+
971                    .................
972               +-------------------------+
973               |    space dynamically    |
974               |    allocated via the    |
975               |  alloca() library call  |
976               +-------------------------+
977               |  space, if needed, for  |
978               |    automatic arrays,    |
979               |    aggregates, and      |
980               |   addressable scalar    |
981               |       automatics        |
982               +-------------------------+
983    %fp  -->
984                     high addresses
985
986.. Figure 3 - Stack frame contents
987
988Note that the top boxes of figure 3 are addressed via the stack pointer
989(%sp), as positive offsets (including zero), and the bottom boxes are
990accessed over the frame pointer using negative offsets (excluding zero),
991and that the frame pointer is the old stack pointer. This scheme allows
992the separation of information known at compile time (number and size
993of local parameters, etc) from run-time information (size of blocks
994allocated by alloca()).
995
996"addressable scalar automatics" is a fancy name for local variables.
997
998The clever nature of the stack and frame pointers are that they are always
99916 registers apart in the register windows. Thus, a SAVE instruction will
1000make the current stack pointer into the frame pointer and, since the SAVE
1001instruction also doubles as an ADD, create a new stack pointer. Figure 4
1002illustrates what the top of a stack might look like during execution. (The
1003listing is from the "pwin" command in the SimICS simulator.)
1004
1005.. code-block:: asm
1006
1007                  REGISTER WINDOWS
1008                 +--+---+----------+
1009                 |g0|r00|0x00000000| global
1010                 |g1|r01|0x00000006| registers
1011                 |g2|r02|0x00091278|
1012      g0-g7      |g3|r03|0x0008ebd0|
1013                 |g4|r04|0x00000000|        (note: 'save' and 'trap' decrements CWP,
1014                 |g5|r05|0x00000000|        i.e. moves it up on this diagram. 'restore'
1015                 |g6|r06|0x00000000|        and 'rett' increments CWP, i.e. down)
1016                 |g7|r07|0x00000000|
1017                 +--+---+----------+
1018 CWP (2)         |o0|r08|0x00000002|
1019                 |o1|r09|0x00000000|                            MEMORY
1020                 |o2|r10|0x00000001|
1021      o0-o7      |o3|r11|0x00000001|             stack growth
1022                 |o4|r12|0x000943d0|
1023                 |o5|r13|0x0008b400|                  ^
1024                 |sp|r14|0xdffff9a0| ----\           /|\
1025                 |o7|r15|0x00062abc|     |            |                     addresses
1026                 +--+---+----------+     |     +--+----------+         virtual     physical
1027                 |l0|r16|0x00087c00|     \---> |l0|0x00000000|        0xdffff9a0  0x000039a0  top of frame 0   
1028                 |l1|r17|0x00027fd4|           |l1|0x00000000|        0xdffff9a4  0x000039a4
1029                 |l2|r18|0x00000000|           |l2|0x0009df80|        0xdffff9a8  0x000039a8
1030      l0-l7      |l3|r19|0x00000000|           |l3|0x00097660|        0xdffff9ac  0x000039ac
1031                 |l4|r20|0x00000000|           |l4|0x00000014|        0xdffff9b0  0x000039b0
1032                 |l5|r21|0x00097678|           |l5|0x00000001|        0xdffff9b4  0x000039b4
1033                 |l6|r22|0x0008b400|           |l6|0x00000004|        0xdffff9b8  0x000039b8
1034                 |l7|r23|0x0008b800|           |l7|0x0008dd60|        0xdffff9bc  0x000039bc
1035              +--+--+---+----------+           +--+----------+
1036 CWP+1 (3)    |o0|i0|r24|0x00000002|           |i0|0x00091048|        0xdffff9c0  0x000039c0
1037              |o1|i1|r25|0x00000000|           |i1|0x00000011|        0xdffff9c4  0x000039c4
1038              |o2|i2|r26|0x0008b7c0|           |i2|0x00091158|        0xdffff9c8  0x000039c8
1039      i0-i7   |o3|i3|r27|0x00000019|           |i3|0x0008d370|        0xdffff9cc  0x000039cc
1040              |o4|i4|r28|0x0000006c|           |i4|0x0008eac4|        0xdffff9d0  0x000039d0
1041              |o5|i5|r29|0x00000000|           |i5|0x00000000|        0xdffff9d4  0x000039d4
1042              |o6|fp|r30|0xdffffa00| ----\     |fp|0x00097660|        0xdffff9d8  0x000039d8
1043              |o7|i7|r31|0x00040468|     |     |i7|0x00000000|        0xdffff9dc  0x000039dc
1044              +--+--+---+----------+     |     +--+----------+
1045                                         |        |0x00000001|        0xdffff9e0  0x000039e0  parameters
1046                                         |        |0x00000002|        0xdffff9e4  0x000039e4
1047                                         |        |0x00000040|        0xdffff9e8  0x000039e8
1048                                         |        |0x00097671|        0xdffff9ec  0x000039ec
1049                                         |        |0xdffffa68|        0xdffff9f0  0x000039f0
1050                                         |        |0x00024078|        0xdffff9f4  0x000039f4
1051                                         |        |0x00000004|        0xdffff9f8  0x000039f8
1052                                         |        |0x0008dd60|        0xdffff9fc  0x000039fc
1053              +--+------+----------+     |     +--+----------+
1054              |l0|      |0x00087c00|     \---> |l0|0x00091048|        0xdffffa00  0x00003a00  top of frame 1
1055              |l1|      |0x000c8d48|           |l1|0x0000000b|        0xdffffa04  0x00003a04
1056              |l2|      |0x000007ff|           |l2|0x00091158|        0xdffffa08  0x00003a08
1057              |l3|      |0x00000400|           |l3|0x000c6f10|        0xdffffa0c  0x00003a0c
1058              |l4|      |0x00000000|           |l4|0x0008eac4|        0xdffffa10  0x00003a10
1059              |l5|      |0x00088000|           |l5|0x00000000|        0xdffffa14  0x00003a14
1060              |l6|      |0x0008d5e0|           |l6|0x000c6f10|        0xdffffa18  0x00003a18
1061              |l7|      |0x00088000|           |l7|0x0008cd00|        0xdffffa1c  0x00003a1c
1062              +--+--+---+----------+           +--+----------+
1063 CWP+2 (4)    |i0|o0|   |0x00000002|           |i0|0x0008cb00|        0xdffffa20  0x00003a20
1064              |i1|o1|   |0x00000011|           |i1|0x00000003|        0xdffffa24  0x00003a24
1065              |i2|o2|   |0xffffffff|           |i2|0x00000040|        0xdffffa28  0x00003a28
1066              |i3|o3|   |0x00000000|           |i3|0x0009766b|        0xdffffa2c  0x00003a2c
1067              |i4|o4|   |0x00000000|           |i4|0xdffffa68|        0xdffffa30  0x00003a30
1068              |i5|o5|   |0x00064c00|           |i5|0x000253d8|        0xdffffa34  0x00003a34
1069              |i6|o6|   |0xdffffa70| ----\     |i6|0xffffffff|        0xdffffa38  0x00003a38
1070              |i7|o7|   |0x000340e8|     |     |i7|0x00000000|        0xdffffa3c  0x00003a3c
1071              +--+--+---+----------+     |     +--+----------+
1072                                         |        |0x00000001|        0xdffffa40  0x00003a40  parameters
1073                                         |        |0x00000000|        0xdffffa44  0x00003a44
1074                                         |        |0x00000000|        0xdffffa48  0x00003a48
1075                                         |        |0x00000000|        0xdffffa4c  0x00003a4c
1076                                         |        |0x00000000|        0xdffffa50  0x00003a50
1077                                         |        |0x00000000|        0xdffffa54  0x00003a54
1078                                         |        |0x00000002|        0xdffffa58  0x00003a58
1079                                         |        |0x00000002|        0xdffffa5c  0x00003a5c
1080                                         |        |    .     |
1081                                         |        |    .     |        .. etc (another 16 bytes)
1082                                         |        |    .     |
1083
1084.. Figure 4 - Sample stack contents
1085
1086Note how the stack contents are not necessarily synchronized with the
1087registers. Various events can cause the register windows to be "flushed"
1088to memory, including most system calls. A programmer can force this
1089update by using ST_FLUSH_WINDOWS trap, which also reduces the number of
1090valid windows to the minimum of 1.
1091
1092Writing a library for multithreaded execution is an example that requires
1093explicit flushing, as is longjmp().
1094
1095Procedure epilogue and prologue
1096-------------------------------
1097
1098The stack frame described in the previous section leads to the standard
1099entry/exit mechanisms listed in figure 5.
1100
1101.. code-block:: asm
1102
1103  function:
1104    save  %sp, -C, %sp
1105
1106               ; perform function, leave return value,   
1107               ; if any, in register %i0 upon exit
1108
1109    ret        ; jmpl %i7+8, %g0
1110    restore    ; restore %g0,%g0,%g0
1111
1112.. Figure 5 - Epilogue/prologue in procedures
1113The SAVE instruction decrements the CWP, as discussed earlier, and also
1114performs an addition. The constant "C" that is used in the figure to
1115indicate the amount of space to make on the stack, and thus corresponds
1116to the frame contents in Figure 3. The minimum is therefore the 16 words
1117for the LOCAL and IN registers, i.e. (hex) 0x40 bytes.
1118
1119A confusing element of the SAVE instruction is that the source operands
1120(the first two parameters) are read from the old register window, and
1121the destination operand (the rightmost parameter) is written to the new
1122window. Thus, allthough "%sp" is indicated as both source and destination,
1123the result is actually written into the stack pointer of the new window
1124(the source stack pointer becomes renamed and is now the frame pointer).
1125
1126The return instructions are also a bit particular. ret is a synthetic
1127instruction, corresponding to jmpl (jump linked). This instruction
1128jumps to the address resulting from adding 8 to the %i7 register. The
1129source instruction address (the address of the ret instruction itself)
1130is written to the %g0 register, i.e. it is discarded.
1131
1132The restore instruction is similarly a synthetic instruction, and is
1133just a short form for a restore that choses not to perform an addition.
1134
1135The calling instruction, in turn, typically looks as follows:
1136
1137.. code-block:: asm
1138
1139    call <function>    ; jmpl <address>, %o7
1140    mov 0, %o0
1141
1142Again, the call instruction is synthetic, and is actually the same
1143instruction that performs the return. This time, however, it is interested
1144in saving the return address, into register %o7. Note that the delay
1145slot is often filled with an instruction related to the parameters,
1146in this example it sets the first parameter to zero.
1147Note also that the return value is also generally passed in %o0.
1148
1149Leaf procedures are different. A leaf procedure is an optimization that
1150reduces unnecessary work by taking advantage of the knowledge that no
1151call instructions exist in many procedures. Thus, the save/restore couple
1152can be eliminated. The downside is that such a procedure may only use
1153the out registers (since the in and local registers actually belong to
1154the caller). See Figure 6.
1155
1156.. code-block:: asm
1157
1158  function:
1159               ; no save instruction needed upon entry
1160
1161               ; perform function, leave return value,   
1162               ; if any, in register %o0 upon exit
1163
1164    retl       ; jmpl %o7+8, %g0
1165    nop        ; the delay slot can be used for something else   
1166
1167.. Figure 6 - Epilogue/prologue in leaf procedures
1168
1169Note in the figure that there is only one instruction overhead, namely the
1170retl instruction. retl is also synthetic (return from leaf subroutine), is
1171again a variant of the jmpl instruction, this time with %o7+8 as target.
1172
1173Yet another variation of epilogue is caused by tail call elimination,
1174an optimization supported by some compilers (including Sun's C compiler
1175but not GCC). If the compiler detects that a called function will return
1176to the calling function, it can replace its place on the stack with the
1177called function. Figure 7 contains an example.
1178
1179.. code-block:: asm
1180
1181       int
1182        foo(int n)
1183      {
1184        if (n == 0)
1185          return 0;
1186        else
1187          return bar(n);
1188      }
1189         cmp     %o0,0
1190        bne     .L1
1191        or      %g0,%o7,%g1
1192        retl
1193        or      %g0,0,%o0
1194  .L1:  call    bar
1195        or      %g0,%g1,%o7
1196
1197.. Figure 7 - Example of tail call elimination
1198
1199Note that the call instruction overwrites register %o7 with the program
1200counter. Therefore the above code saves the old value of %o7, and restores
1201it in the delay slot of the call instruction. If the function call is
1202register indirect, this twiddling with %o7 can be avoided, but of course
1203that form of call is slower on modern processors.
1204
1205The benefit of tail call elimination is to remove an indirection upon
1206return. It is also needed to reduce register window usage, since otherwise
1207the foo() function in Figure 7 would need to allocate a stack frame to
1208save the program counter.
1209
1210A special form of tail call elimination is tail recursion elimination,
1211which detects functions calling themselves, and replaces it with a simple
1212branch. Figure 8 contains an example.
1213
1214.. code-block:: asm
1215
1216         int
1217          foo(int n)
1218        {
1219          if (n == 0)
1220            return 1;
1221          else
1222            return (foo(n - 1));
1223        }
1224         cmp     %o0,0
1225        be      .L1
1226        or      %g0,%o0,%g1
1227        subcc   %g1,1,%g1
1228  .L2:  bne     .L2
1229        subcc   %g1,1,%g1
1230  .L1:  retl
1231        or      %g0,1,%o0
1232
1233.. comment Figure 8 - Example of tail recursion elimination
1234
1235Needless to say, these optimizations produce code that is difficult to debug.
1236
1237Procedures, stacks, and debuggers
1238----------------------------------
1239
1240When debugging an application, your debugger will be parsing the binary
1241and consulting the symbol table to determine procedure entry points. It
1242will also travel the stack frames "upward" to determine the current
1243call chain.
1244
1245When compiling for debugging, compilers will generate additional code
1246as well as avoid some optimizations in order to allow reconstructing
1247situations during execution. For example, GCC/GDB makes sure original
1248parameter values are kept intact somewhere for future parsing of
1249the procedure call stack. The live in registers other than %i0 are
1250not touched. %i0 itself is copied into a free local register, and its
1251location is noted in the symbol file. (You can find out where variables
1252reside by using the "info address" command in GDB.)
1253
1254Given that much of the semantics relating to stack handling and procedure
1255call entry/exit code is only recommended, debuggers will sometimes
1256be fooled. For example, the decision as to wether or not the current
1257procedure is a leaf one or not can be incorrect. In this case a spurious
1258procedure will be inserted between the current procedure and it's "real"
1259parent. Another example is when the application maintains its own implicit
1260call hierarchy, such as jumping to function pointers. In this case the
1261debugger can easily become totally confused.
1262
1263The window overflow and underflow traps
1264---------------------------------------
1265
1266When the SAVE instruction decrements the current window pointer (CWP)
1267so that it coincides with the invalid window in the window invalid mask
1268(WIM), a window overflow trap occurs. Conversely, when the RESTORE or
1269RETT instructions increment the CWP to coincide with the invalid window,
1270a window underflow trap occurs.
1271
1272Either trap is handled by the operating system. Generally, data is
1273written out to memory and/or read from memory, and the WIM register
1274suitably altered.
1275
1276The code in Figure 9 and Figure 10 below are bare-bones handlers for
1277the two traps. The text is directly from the source code, and sort of
1278works. (As far as I know, these are minimalistic handlers for SPARC
1279V8). Note that there is no way to directly access window registers
1280other than the current one, hence the code does additional save/restore
1281instructions. It's pretty tricky to understand the code, but figure 1
1282should be of help.
1283
1284.. code-block:: asm
1285
1286        /* a SAVE instruction caused a trap */
1287window_overflow:
1288        /* rotate WIM on bit right, we have 8 windows */
1289        mov %wim,%l3
1290        sll %l3,7,%l4
1291        srl %l3,1,%l3
1292        or  %l3,%l4,%l3
1293        and %l3,0xff,%l3
1294
1295        /* disable WIM traps */
1296        mov %g0,%wim
1297        nop; nop; nop
1298
1299        /* point to correct window */
1300        save
1301
1302        /* dump registers to stack */
1303        std %l0, [%sp +  0]
1304        std %l2, [%sp +  8]
1305        std %l4, [%sp + 16]
1306        std %l6, [%sp + 24]
1307        std %i0, [%sp + 32]
1308        std %i2, [%sp + 40]
1309        std %i4, [%sp + 48]
1310        std %i6, [%sp + 56]
1311
1312        /* back to where we should be */
1313        restore
1314
1315        /* set new value of window */
1316        mov %l3,%wim
1317        nop; nop; nop
1318
1319        /* go home */
1320        jmp %l1
1321        rett %l2
1322Figure 9 - window_underflow trap handler
1323        /* a RESTORE instruction caused a trap */
1324window_underflow:
1325       
1326        /* rotate WIM on bit LEFT, we have 8 windows */
1327        mov %wim,%l3
1328        srl %l3,7,%l4
1329        sll %l3,1,%l3
1330        or  %l3,%l4,%l3
1331        and %l3,0xff,%l3
1332
1333        /* disable WIM traps */
1334        mov %g0,%wim
1335        nop; nop; nop
1336
1337        /* point to correct window */
1338        restore
1339        restore
1340
1341        /* dump registers to stack */
1342        ldd [%sp +  0], %l0
1343        ldd [%sp +  8], %l2
1344        ldd [%sp + 16], %l4
1345        ldd [%sp + 24], %l6
1346        ldd [%sp + 32], %i0
1347        ldd [%sp + 40], %i2
1348        ldd [%sp + 48], %i4
1349        ldd [%sp + 56], %i6
1350
1351        /* back to where we should be */
1352        save
1353        save
1354
1355        /* set new value of window */
1356        mov %l3,%wim
1357        nop; nop; nop
1358
1359        /* go home */
1360        jmp %l1
1361        rett %l2
1362
1363.. comment Figure 10 - window_underflow trap handler
1364
Note: See TracBrowser for help on using the repository browser.