#4923 accepted defect

FPU context init/switch not working well on more than 2 tasks on Cortex-Mx/ARMv7-M platform

Reported by: kgardas Owned by: Cedric Berger
Priority: normal Milestone: 6.1
Component: arch/arm Version: 6
Severity: major Keywords:
Cc: cedric@… Blocked By:
Blocking:

Description

While working on stm32h7 I've noted strange results from fpu calculations when several tasks are involved. I've minimized that to attach testcase. The testcase runs well on Xilinx/A9/Qemu and BeagleBone? White platforms (ARMv7-A), but fails by providing wrong output on ARMv7-M (STM32H757i-eval).

The core of the testcase is a comparison of power of two done on integer by shifting and by calling C pow function with using double. When there is discrepancy in comparison of those, error is signaled and counted. The output then prints number of test run/number of failures and this all for Init and 4 other tasks created. The first error in comparison is also signaled for every task where it happen by clear 'ERROR' message. The output from H7 looks like:

initializing...
main task infinite loop.
ERROR, computation failed in fpu_task_pow2
ERROR, computation failed in fpu_task_pow3
ERROR, computation failed in fpu_task_pow4
ERROR, computation failed in Init
INIT: 1000/1000, T1: 38985/0, T2: 39802/39802, T3: 39735/39735, T4: 39804/39804
INIT: 2000/1000, T1: 78030/0, T2: 79663/79663, T3: 79508/79508, T4: 79650/79650
INIT: 3000/1000, T1: 117100/0, T2: 119517/119517, T3: 119281/119281, T4: 119513/119513
INIT: 4000/2000, T1: 156138/0, T2: 159385/159385, T3: 159054/159054, T4: 159355/159355
INIT: 5000/2000, T1: 195198/0, T2: 199242/199242, T3: 198827/198827, T4: 199195/199195
INIT: 6000/2000, T1: 234238/0, T2: 239111/239111, T3: 238599/238599, T4: 239054/239054
INIT: 7000/2000, T1: 273278/0, T2: 278961/278961, T3: 278372/278372, T4: 278903/278903
INIT: 8000/2000, T1: 312315/0, T2: 318807/318807, T3: 318145/318145, T4: 318740/318740
INIT: 9000/2000, T1: 351353/0, T2: 358628/358628, T3: 357918/357918, T4: 358600/358600
INIT: 10000/2000, T1: 390393/0, T2: 398463/398463, T3: 397691/397691, T4: 398444/398444
INIT: 11000/2000, T1: 429431/0, T2: 438302/438302, T3: 437464/437464, T4: 438279/438279
INIT: 12000/2000, T1: 468470/0, T2: 478152/478152, T3: 477236/477236, T4: 478123/478123
INIT: 13000/2000, T1: 507506/0, T2: 518006/518006, T3: 517009/517009, T4: 517961/517961 
...

as you can see computation is always good (0 error) on T1 task while it is always bad on T2, T3, T4 tasks. Sometimes it is even bad on Init task.

Expected output from BeagleBone? looks:

initializing...
main task infinite loop.
INIT: 1000/0, T1: 8028/0, T2: 8042/0, T3: 8043/0, T4: 8043/0
INIT: 2000/0, T1: 16053/0, T2: 16084/0, T3: 16084/0, T4: 16085/0
INIT: 3000/0, T1: 24076/0, T2: 24125/0, T3: 24125/0, T4: 24126/0
INIT: 4000/0, T1: 32097/0, T2: 32163/0, T3: 32165/0, T4: 32167/0
INIT: 5000/0, T1: 40120/0, T2: 40204/0, T3: 40207/0, T4: 40208/0
...

The idea here is that task context initialization or switch is not working well on v7-M platform.

Attachments (5)

init.c (5.7 KB) - added by kgardas on 07/07/23 at 09:48:03.
testcase code
init.2.c (5.7 KB) - added by kgardas on 07/07/23 at 10:19:26.
test case code with referenced output
bug-1-make-paranoia-test-fail.patch (1.3 KB) - added by Cedric Berger on 02/06/24 at 15:04:10.
VFP: make paranoia test fail
fix-1-save-and-restore-fpscr.patch (1.8 KB) - added by Cedric Berger on 02/06/24 at 15:04:50.
VFP: save and restore FPSCR register during context switch
fix-2-clean-initial-thread-frame.patch (802 bytes) - added by Cedric Berger on 02/06/24 at 15:05:30.
VFP: starts thread frames in a clean state

Download all attachments as: .zip

Change History (18)

Changed on 07/07/23 at 09:48:03 by kgardas

Attachment: init.c added

testcase code

Changed on 07/07/23 at 10:19:26 by kgardas

Attachment: init.2.c added

test case code with referenced output

comment:1 Changed on 07/07/23 at 10:23:37 by kgardas

Testing on nucleo-h743zi (most close to original platform of stm32h743-eval I have here) reveals same results as from stm32h757i-eval: (although to get failure on Init takes more time)

initializing...
main task infinite loop.
ERROR, computation failed in fpu_task_pow2
ERROR, computation failed in fpu_task_pow3
ERROR, computation failed in fpu_task_pow4
INIT: 1000/0, T1: 15085/0, T2: 15336/15336, T3: 15351/15351, T4: 15356/15356
INIT: 2000/0, T1: 30162/0, T2: 30685/30685, T3: 30716/30716, T4: 30727/30727
INIT: 3000/0, T1: 45239/0, T2: 46035/46035, T3: 46080/46080, T4: 46098/46098
INIT: 4000/0, T1: 60318/0, T2: 61384/61384, T3: 61429/61429, T4: 61468/61468
INIT: 5000/0, T1: 75419/0, T2: 76733/76733, T3: 76794/76794, T4: 76839/76839
INIT: 6000/0, T1: 90496/0, T2: 92082/92082, T3: 92159/92159, T4: 92210/92210
INIT: 7000/0, T1: 105575/0, T2: 107432/107432, T3: 107524/107524, T4: 107581/107581
INIT: 8000/0, T1: 120650/0, T2: 122799/122799, T3: 122888/122888, T4: 122952/122952
INIT: 9000/0, T1: 135725/0, T2: 138167/138167, T3: 138252/138252, T4: 138322/138322
INIT: 10000/0, T1: 150799/0, T2: 153535/153535, T3: 153617/153617, T4: 153693/153693
INIT: 11000/0, T1: 165874/0, T2: 168903/168903, T3: 168981/168981, T4: 169064/169064
INIT: 12000/0, T1: 180947/0, T2: 184253/184253, T3: 184346/184346, T4: 184435/184435
INIT: 13000/0, T1: 196021/0, T2: 199621/199621, T3: 199711/199711, T4: 199805/199805
INIT: 14000/0, T1: 211114/0, T2: 214989/214989, T3: 215075/215075, T4: 215176/215176
INIT: 15000/0, T1: 226187/0, T2: 230357/230357, T3: 230440/230440, T4: 230547/230547
INIT: 16000/0, T1: 241261/0, T2: 245726/245726, T3: 245805/245805, T4: 245918/245918
INIT: 17000/0, T1: 256335/0, T2: 261094/261094, T3: 261169/261169, T4: 261289/261289
INIT: 18000/0, T1: 271409/0, T2: 276463/276463, T3: 276534/276534, T4: 276659/276659
ERROR, computation failed in Init
INIT: 19000/1000, T1: 286484/0, T2: 291832/291832, T3: 291899/291899, T4: 292030/292030
INIT: 20000/1000, T1: 301565/0, T2: 307199/307199, T3: 307264/307264, T4: 307398/307398
INIT: 21000/1000, T1: 316637/0, T2: 322550/322550, T3: 322628/322628, T4: 322765/322765

comment:2 Changed on 07/07/23 at 13:15:38 by Sebastian Huber

Does the spcontext01 check the floating-point registers on this BSP and what is the outcome of this test?

comment:3 Changed on 07/07/23 at 13:25:38 by kgardas

Here is the output of spcontext01 on nucleo-h743zi:

*** BEGIN OF TEST SPCONTEXT 1 ***
*** TEST VERSION: 6.0.0.6264b14804bbe21f13d4691160b45e208286abaa
*** TEST STATE: EXPECTED_PASS
*** TEST BUILD:
*** TEST TOOLS: 12.2.1 20230224 (RTEMS 6, RSB 7153c2f1dcfb83b154b976298699c26e793a33dd, Newlib 17ac400)
Test configuration N N N... done
Test configuration N N F... done
Test configuration N F N... done
Test configuration N F F... done
Test configuration F N N... done
Test configuration F N F... done
Test configuration F F N... done
Test configuration F F F... done

*** END OF TEST SPCONTEXT 1 ***


[ RTEMS shutdown ]
RTEMS version: 6.0.0.6264b14804bbe21f13d4691160b45e208286abaa
RTEMS tools: 12.2.1 20230224 (RTEMS 6, RSB 7153c2f1dcfb83b154b976298699c26e793a33dd, Newlib 17ac400)
executing thread ID: 0x0a010001
executing thread name: UI1 

Let me know if there is anything I should test here. I have some spare time at my disposal, but I'd need to be kicked into the right direction. Currently studding Cortex-M lazy stacking and context switching and RTEMS book is ordered as a backup and will be here probably next week. I'll also provide few registers dumps for your information...

comment:4 Changed on 07/07/23 at 14:05:26 by Sebastian Huber

Maybe we don't save a status register related to the FPU.

comment:5 Changed on 07/07/23 at 14:19:47 by Joel Sherrill

Maybe the FPU context isn't properly initialized for this CPU variant. The first context switch restore is handled special in _Thread_Handler. Easy enough to break there and see what is actually being restored. If it's wrong, then that's the culprit.

comment:6 Changed on 07/07/23 at 14:48:04 by kgardas

Ad _Thread_Handler breakpoint:

(gdb) b _Thread_Handler
Breakpoint 1 at 0x8003748: file ../../../cpukit/include/rtems/score/percpu.h, line 719.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) c
Continuing.

Breakpoint 1, _Thread_Handler ()
    at ../../../cpukit/include/rtems/score/percpu.h:719
719       return cpu->executing;
(gdb) where
#0  _Thread_Handler () at ../../../cpukit/include/rtems/score/percpu.h:719
#1  0x08003748 in _Thread_Get (id=<optimized out>, lock_context=0x0)
    at ../../../cpukit/score/src/threadget.c:63
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.

Breakpoint 1, _Thread_Handler ()
    at ../../../cpukit/include/rtems/score/percpu.h:719
719       return cpu->executing;
(gdb) c
Continuing.

Breakpoint 1, _Thread_Handler ()
    at ../../../cpukit/include/rtems/score/percpu.h:719
719       return cpu->executing;
(gdb) c
Continuing.

Breakpoint 1, _Thread_Handler ()
    at ../../../cpukit/include/rtems/score/percpu.h:719
719       return cpu->executing;

this looks like some macro executing:

static inline struct _Thread_Control *_Per_CPU_Get_executing(
  const Per_CPU_Control *cpu
)
{
  return cpu->executing;
}

comment:7 Changed on 07/13/23 at 12:32:33 by Sebastian Huber

We were able to reproduce the issue. It is a problem in the FPU context save/restore during interrupt processing. Under certain conditions, the FPU context is restored from an uninitialized frame which could enable the flush-to-zero configuration. I am currently quite busy. I will add it to my TODO list, but it could be September/October?.

comment:8 Changed on 07/13/23 at 12:43:11 by kgardas

Sebastian, thanks a lot for the information. Could you give me as much information as possible with regarding to the issue as there is a high chance I find some time to deal it again before your available timeframe. For example I'm curious is it related to FPU lazy stack saving which is used and enforced by the RTEMS code? Knowing all information about it from you, will certainly help with possible fixing of the issue on our side. Thanks a lot!

comment:9 Changed on 07/13/23 at 13:23:33 by Sebastian Huber

The FPU context corruption happens in _ARMV7M_Pendable_service_call(). Somehow the FPU context is not restored from the right frame position in certain cases. It is not clear, why this error doesn't show up in the spcontext01 test.

comment:10 Changed on 07/27/23 at 18:45:22 by kgardas

I consider this issue to be release critical as it affects one of major platforms (ARMv7-M) and makes FPU computation unreliable/unpredictable there. Hence switching milestone to 6.1. Please discuss if you do not think so.

comment:11 Changed on 07/27/23 at 18:45:55 by kgardas

Milestone: 6.1

comment:12 Changed on 01/31/24 at 18:28:00 by Cedric Berger

Cc: cedric@… added

comment:13 Changed on 01/31/24 at 18:30:49 by Cedric Berger

Owner: set to Cedric Berger
Status: newaccepted
Version: 6

I will try to work on that in February.

Changed on 02/06/24 at 15:04:10 by Cedric Berger

VFP: make paranoia test fail

Changed on 02/06/24 at 15:04:50 by Cedric Berger

VFP: save and restore FPSCR register during context switch

Changed on 02/06/24 at 15:05:30 by Cedric Berger

VFP: starts thread frames in a clean state

Note: See TracTickets for help on using tickets.