#2270 closed enhancement (fixed)

SPARC: Optimized floating-point context handling

Reported by: Sebastian Huber Owned by: Sebastian Huber
Priority: normal Milestone: 4.11
Component: score Version: 4.11
Severity: normal Keywords: SPARC
Cc: Alexander Krutwig Blocked By:
Blocking:

Description

Benefit

Improved average-case performance. Uni-processor configurations will equally benefit in case the deferred floating-point switch is retained.

Problem Description

The floating-point context switch implementation is suboptimal on RTEMS/SPARC.

SPARC ABI Considerations

The set of registers of a processor is divided into three groups by the application binary interface (ABI)

  • the volatile registers (also known as caller saved),
  • the non-volatile registers (also known as callee saved), and
  • other registers.

On RTEMS a context switch is performed via a function call (_CPU_Context_switch()), thus the context switch must save/restore the non-volatile registers. The interrupt entry/exit must save/restore the volatile registers to/from the stack of the interrupted thread. The SYSTEM V APPLICATION BINARY INTERFACE, SPARC Processor Supplement, Third Edition declares all floating point registers as volatile.

Currently the floating point context is saved/restored during the context switch on SPARC. The interrupt entry/exit doesn't deal with the floating point unit at all. This makes no sense with respect to the ABI. This seems to be an optimization for uni-processor systems and applications with only few floating point threads.

No Deferred Floating Point Context Switch on SMP

RTEMS supports the so called deferred floating point context switch. This is an anachronism dating back to times with primitive compilers and processors. Modern compilers on modern processors which provide direct register moves to/from floating point registers will use floating point and vector units for all sorts of stuff, thus the differentiation between floating point and integer context is mostly obsolete. On SPARC floating point registers must be loaded from memory, so it makes no sense to move integer variables into floating point registers to reduce register pressure. So SPARC GCC seems to use the floating point unit only for floating point operations, so here this model is still valid. The deferred floating point context switch is an optimization for applications on which the floating point threads are the minority. The floating point unit is owned by at most one thread and a floating point context switch is deferred until a context switch happens to a floating point thread other than the owner of floating point unit. On SPARC interrupts are not allowed to use the floating point unit at all on RTEMS.

WARNING: Currently the floating point unit is not disabled on interrupt entry (PSR[EF]). Thus interrupt handlers using the floating point unit may destroy the floating point context of the interrupted thread silently.

This seemed to work well on RTEMS in the last couple of years although from time to time questions pop up on the RTEMS mailing list with respect to floating point support on SPARC. With SMP the situation changed though. In the early times of RTEMS SMP support the deferred floating point support has been disabled.

Lets suppose we want to use deferred floating point context switches on SMP. Consider the following scenario. An executing thread T owning the floating point unit is interrupted and a context switch to a non-floating point thread happens. In this case the floating point context is not saved. Now the thread T resumes execution on another processor (thread migration). How do we get access to its floating point context on the other processor? We would need inter-processor interrupts, synchronization, etc. making this way too expensive.

Problem Solution

To meet the ABI requirements, the floating point context switch must move from the context switch to the interrupt entry/exit sequence. It is only necessary to save/restore the floating point context in case a context switch is triggered due to interrupt processing.

On uni-processor configurations it is possible to implement a deferred floating point context save/restore here as well.

To get rid of potentially silent corruptions of the floating point context by interrupt handlers, there are three alternative solutions:

  • Clear the PSR[EF] bit at interrupt entry and save/restore the floating point context only in case a thread dispatch is necessary,
  • save/restore the floating point context of the interrupted thread at interrupt entry/exit, or
  • use a lazy floating point context save, e.g. clear PSR[EF] and reserve space for the floating point context on interrupt entry, then save/restore the floating point context on demand.

The second solution would enable interrupt handlers to use floating point operations, but increases the interrupt latency.

The third solution would enable interrupt handlers to use floating point operations and may have less impact on the interrupt latency compared to the second alternative. The implementation is a bit more complex.

In order to test the low-level code, the _CPU_Context_volatile_clobber() and _CPU_Context_validate() functions should be implemented on SPARC. The test program SPCONTEXT 1 uses these functions to test the context switch and interrupt entry/exit code.

Change History (21)

comment:1 Changed on 02/17/15 at 15:38:31 by Gedare Bloom

I'd lean toward the first solution and a big red warning about using fp in isr context. This only works if gcc does not emit fp code though. If a user needs fp in isr context, they can do the save/restore manually?

comment:2 in reply to:  1 Changed on 02/17/15 at 21:43:54 by Chris Johns

Replying to gedare:

This only works if gcc does not emit fp code though. If a user needs fp in isr context, they can do the save/restore manually?

If the PSR[EF] is disabled in interrupts the presence of fp code should trigger an exception so the user is notified. A user should have suitable coverage analysis to know they have tested their code.

What happens when fp code is present ? What if the compiler has changed, or code they need now contains fp instructions ?

Would providing an option in the OS to support in a tested and controller manner the save/restore instructions ?

comment:3 Changed on 02/18/15 at 14:23:53 by Sebastian Huber

Cc: Alexander Krutwig added
Status: newaccepted

comment:4 Changed on 02/18/15 at 14:29:31 by Sebastian Huber

I think nobody needs FP code in ISRs, but sometimes a compiler/library plagues you with some FP code in unexpected situations. Option (2) is the only one I would not choose. On SPARC I tend to implement (1). On PowerPC for the FPU and AltiVec? I tend to use (3). The problem with (3) is that it works silently in the background and you may only notice changes if you get strange timing problems (hard to test in a test suite) after a GCC update for example.

comment:5 in reply to:  4 Changed on 02/18/15 at 23:07:16 by Chris Johns

Replying to sebastian.huber:

The problem with (3) is that it works silently in the background and you may only notice changes if you get
strange timing problems (hard to test in a test suite) after a GCC update for example.

Does the kernel know ? Would adding stats help ?

comment:6 Changed on 02/19/15 at 14:15:35 by Cláudio Silva

I guess 2) is an overkill. In 1), if you really need to have FP, you can try to use soft-fp. It will make your WCET "worse", but it should be better than having to context switch FP registers in every interrupt. Anyway, you are always in risk of GCC emitting FP instructions within RTEMS' code, but since you have PSR[EF] disabled you can easily trace the fault/trap.
Between 1) and 3), 1) is more simple. Besides, I think the worst case behaviour will be similar between these two options.

comment:7 Changed on 02/20/15 at 08:48:05 by Sebastian Huber

The hard and soft FP ABIs are incompatible on SPARC, so you must be very careful if you mix modules.

One difference between (1) and (3) is that in (1) the FP save is done with interrupts disabled and in (3) interrupts are enabled.

comment:8 Changed on 02/26/15 at 10:50:07 by Daniel Hellstrom

As I recall it GCC-4.0/4.1 and possible also 4.2 had a known bug generating FP instruction in integer code. In the Gaisler toolchain we made a fix it but the mainline does not have this issue any more. As you pointed out it is not efficient on a SPARC to use FP registers in integer code anyway.

OAR and Gaisler analysis during the early SMP work came to the same conclusion. If I remember Linux has also disabled lazy switching on SMP.

I agree with you about ABI and FP registers does not have to be saved in the normal case, unless a task is interrupted and context switched. I think you could make some significant improvements here.

Personally I think it is not that big a deal to analyse the ISR to verify that FP instructions are not used. If used the user shall save the FP context itself by calling _CPU_Context_save/restore_fp(). As I understand most LEON OSes have this design. The RTEMS documentation about this probably should be improved. This approach is in practice a bit troublesome some since it in many cases would require an extra function stack frame. If a ISR uses float types but calls _CPU_Context_save() one can not know in which order GCC performs float variable initialization or the function call. So in practice one have to jump to ISR, call FP save, then call the real ISR implementation doing FP instructions. Perhaps this can be avoided? It would have been better that the trap handler saved the FP context before calling the ISR handler. Could we introduce a RTEMS_FLOATING_POINT option to the rtems_interrupt_handler_install()?

RTEMS invites the usage of mixed ABIs by having the RTEMS_FLOATING_POINT option when creating tasks. Of course one have to be very careful mixing ABIs, I would recommend at least scan the binaries for float instructions. When verifying code you get a trap at first occurrence so normally code coverage is enough to be sure. So the float problem is not limited to the ISRs.

(3)
I think lazy context switching becomes less important when you implement a proper context switch taking the ABI into account as you describe. The FPU context is only one register, the FSR, when FPU is turned on. SO one might just save that one register, right? The case where lazy would be beneficial is when interrupting a task?

(2)
This is costy in the average case? My guess is that 99% of ISRs doesn't use floats and most tasks does not use a FP context. To the sake of average performance, wouldn't it be better to let the ISR handle the FP context itself or add a RTEMS_FLOATING_POINT option to the interrupt install routine?

(3)
I think it is problematic to disable the PSR.EF on entry. We should keep in mind that it is CPU implementation specific what happens when PSR.EF is cleared. Turning off the FPU could cause power-down mode or that FPU operations being paused. In a modern FPU the FPU performs operations in parallel with the integer pipeline so turning it off could actually have a negative impact on the interrupted task. Therefore it is important to store the FSR register to memory to wait for all FP operations to complete before turning the FPU off, this would introduce a performance loss and potentially a worst-case nightmare? Otherwise I like the idea of saving FP context only when a FP task is interrupted and context switched.

Storing all the FPU registers to stack or storing the FSR registers waits until FPU operations are completed. Storing FSR last is the best since you could potentially store a FP register that does not depend on an ongoing operation.

What about this for single-core and SMP (4):

  • FP TASK context switch: only save FSR and disable PSR.FP on normal context switch. Then we would wait for ongoing operations and ABI ensures FP registers need not to be saved.
  • interrupted FP TASK with PSR.EF=1 context switch: save FP registers, then FSR and clear PSR.EF.
  • ISRs can be marked with RTEMS_FLOATING_POINT on registration.
  • interrupts that have one or more ISRs marked FP: save FP context if PSR.EF=1, but leave PSR.EF=0 on interrupt exit to fall into FP_disabled trap. We must take extra care of nested interrupts here, not to overwrite the TCB FP context?

To avoid the problem with clobbering FP context, you can have the default interrupt handler options with enabled RTEMS_FLOATING_POINT. Personally I still think that the user should be responsible to save FP context. That could perhaps also be possible in this configuration by marking the ISR as non-FP context and then handling it itself as we do today?

What do you think?

comment:9 Changed on 02/27/15 at 01:18:08 by Chris Johns

Thanks Daniel for the detailed update. A few observations..

I think any required user interaction needs a formal API that is supported from release to release. We need to be certain of the path we take to ensure it is maintained and does not break user code from release to release. I am not sure what effect if any this may have across RTEMS and other architectures. I suppose this means calls to _CPU_* etc should not be recommended and wrapped some how.

I am not entirely comfortable with the user being responsible for the audit process to determine the use of FP instructions. This fragments the effort across all LEON users possibly creating varying methods with I suspect varying levels of success. Users would like an operating system to manage the tricky detail meaning it should provide some support for users in this area if users are required to perform this task. It is complex to carefully audit code and get it correct. If this approach is taken should RTEMS look at how to perform the audit ?

Last edited on 02/27/15 at 01:19:08 by Chris Johns (previous) (diff)

comment:10 Changed on 03/02/15 at 08:50:32 by Sebastian Huber

From the SPARC Architecture Manual Version 8 we have:

"PSR_enable_floating-point (EF)

Bit 12 determines whether the FPU is enabled. If disabled, a floating-point
instruction will trap. 1 = enabled, 0 = disabled. If an implementation does not
support a hardware FPU, PSR.EF should always read as 0 and writes to it should
be ignored.

Programming Note

Software can use the EF and EC bits to determine whether a particular process uses the FPU or CP.
If a process does not use the FPU/CP, its registers do not need to be saved across a context switch."

Where is it documented that the PSR[EF] may have side-effects like a power-down mode or leads to a pause of FP instructions? Do the LEON processors such things? Is an interrupt not context synchronizing on SPARC (like PowerPC for example)?

The PSR[EF] is only set in case SPARC_HAS_FPU == 1 and the thread has the FP attribute. So we already touch this bit during a context switch. Why are the interrupts different?

comment:11 Changed on 03/02/15 at 09:22:47 by Daniel Hellstrom

I guess STFSR instruction is in a way what PPC call context synchronizing. I don't think interrupts are different from original context switch, but we want to reach the ISR as fast as possible and not wait for FPU operations that the user has queued. That would have an impact on the worst-case scenario. I can't see a strict definition what happens when the floating point unit is turned off, just that following FP instructions shall result in a trap.

page 97 - STFSR:
...
The store floating-point state register instruction (STFSR) waits for any con-
currently executing FPop instructions that have not completed to complete,
and then writes the FSR into memory. STFSR may zero FSR.ftt after
writing the FSR to memory.
...

comment:12 Changed on 03/02/15 at 09:26:12 by Daniel Hellstrom

The LEON4-N2X and GR740 puts part of the FPU into power-down when PSR.EF is cleared.

comment:13 Changed on 03/02/15 at 12:13:37 by Sebastian Huber

Are the FPU instructions really that slow? I would disable the PSR.EF in the _ISR_Handler at

dont_fix_pil:

or %g5, SPARC_PSR_PIL_MASK, %g5

pil_fixed:

wr %g5, SPARC_PSR_ET_MASK, %psr ! ENABLE TRAPS

So why not enable traps and disable EF in one step?

comment:14 Changed on 03/03/15 at 06:50:25 by Daniel Hellstrom

Haven't calculated what the worst case is, perhaps it is not that much. You can probably get that from the GRFPU manual.

It seems like a good idea to do it at the same time. I still think one should store FSR before disabling FPU, but of course if the execution time of the trap handler is longer than what can be queued in the FPU pipeline it shouldn't be a problem...

comment:15 Changed on 05/21/15 at 08:26:22 by Alexander Krutwig <alexander.krutwig@…>

In 1c59cad4aa40238c1aa3085adf3d079114c077b8/rtems:

sparc: Add support for sptests/spcontext01

Implement _CPU_Context_validate() and _CPU_Context_volatile_clobber().

Update #2270.

comment:16 Changed on 05/29/15 at 13:37:13 by Alexander Krutwig <alexander.krutwig@…>

In 4a5a45045a2068b87cc6e7801b0147c853d165b3/rtems:

sparc: Improve _CPU_Context_validate()

Write the pattern only once to the entry register window and the
floating point registers.

Update #2270.

comment:17 Changed on 05/30/15 at 14:50:51 by Alexander Krutwig <alexander.krutwig@…>

In 2764bd43d0398be14db6930736a314a01904a072/rtems:

sparc: Disable FPU in interrupt context

Update #2270.

comment:19 Changed on 06/09/15 at 07:35:10 by Sebastian Huber

Resolution: fixed
Status: acceptedclosed

[a51b3526eac244db59ccdf582e1921ddcd969e5c/rtems]

PSR.EF is cleared for the interrupt handlers. On SMP configurations the FP context is saved/restored around the _Thread_Dispatch() call in the interrupt exit code.

On uni-processor configurations post-switch actions (e.g. signal handlers) and context switch extensions may silently corrupt the floating point context.

comment:20 Changed on 06/09/15 at 07:35:19 by Sebastian Huber

Milestone: 4.11.14.11

comment:21 Changed on 02/03/17 at 09:57:55 by Sebastian Huber <sebastian.huber@…>

In e2b1c47d8c48c351a3f91c5379a269f8942da34a/rtems:

sparc: Fix volatile clobber

Do not adjust the stack pointer, since this is already done by the
restor instruction.

Update #2270.

Note: See TracTickets for help on using tickets.