Freezing OP-TEE fails during suspend #4581

cgellner · 2021-04-28T15:33:28Z

When trying to suspend to RAM the OP-TEE TA user-space tasks using OP-TEE might fail to freeze

PM: suspend entry (s2idle)
Filesystems sync: 0.000 seconds
Freezing user space processes ... 
Freezing of tasks failed after 20.005 seconds (1 tasks refusing to freeze, wq_busy=0):
task:optee_example_s state:R  running task     stack:    0 pid:  128 ppid:     1 flags:0x00000001
[<807d3e24>] (__schedule) from [<c01bae20>] (0xc01bae20)
OOM killer enabled.
Restarting tasks ... done.
PM: suspend exit

According to my understanding this happens if a TA, triggered by the task to freeze, is trying to communicate to tee-supplicant, but tee-supplicant has been frozen first.

In order to reproduce OP-TEE QEMU environment can be used.
In order to increase the probability of the TA trying to communicate to tee-supplicant during suspend, the below patch should be applied to secure_storage_ta.c in optee_examples.

index edaf6a3..39f5ed2 100644
--- a/secure_storage/ta/secure_storage_ta.c
+++ b/secure_storage/ta/secure_storage_ta.c
@@ -256,11 +256,19 @@ TEE_Result TA_InvokeCommandEntryPoint(void __unused *session,
 				      uint32_t param_types,
 				      TEE_Param params[4])
 {
+	TEE_Result res;
+	size_t i;
+
 	switch (command) {
 	case TA_SECURE_STORAGE_CMD_WRITE_RAW:
 		return create_raw_object(param_types, params);
 	case TA_SECURE_STORAGE_CMD_READ_RAW:
-		return read_raw_object(param_types, params);
+		for (i=0; i<400; i++) {
+			res = read_raw_object(param_types, params);
+			if (res != TEE_SUCCESS)
+				break;
+		}
+		return res;
 	case TA_SECURE_STORAGE_CMD_DELETE:
 		return delete_object(param_types, params);
 	default:
--

Afterwards you can reproduce by executing
(optee_example_secure_storage &); sleep 1; echo -n mem > /sys/power/state
from non-secure world.

During debugging I could see, that while kernel trying to freeze the task is looping in
drivers/tee/supp.c - optee_supp_thrd_req
while (wait_for_completion_interruptible(&req->c))
Note: In my understanding other cases might be possible as well e.g.

optee_cq_wait_for_completion waiting for available OP-TEE Thread
one TA waiting in OPTEE_MSG_RPC_WAIT_QUEUE_SLEEP (wq_sleep) for other TA calling OPTEE_MSG_RPC_WAIT_QUEUE_WAKEUP (wq_wakeup). But second TA is blocked due to one of the other cases.

When I modified kernel/power/process.c - try_to_freeze_tasks - not to freeze tee-supplicant when freezing user tasks, the issue is not reproducible and suspend will happen after the TA did complete its work.
I wonder if there is a generic kernel mechanism to avoid freezing of user-space helper tasks like tee-supplicant.
How is this solved for fuse-FS? Isn't it using some user-space tasks for processing as well?

When I modified drivers/tee/supp.c - optee_supp_thrd_req - to call try_to_freeze in the wait_for_completion_interruptible loop mentioned above, the issue is not reproducible and suspend will happen although TA did not yet complete its work. TA will continue after the resume.
I'm not clear, if it is legal from OP-TEE perspective to suspend while TA is still executing.

The text was updated successfully, but these errors were encountered:

etienne-lms · 2021-04-28T15:43:06Z

Thanks a lot for the description. Pretty clear and detailed.
This all makes sense. I think the last proposal is the right way.

cgellner · 2021-05-11T07:52:56Z

In order to reproduce the other two situations where wait_for_completion can get stuck, you need multiple parallel requests.

To allow multiple requests you can apply the following patch to secure_storage in optee_examples (in addition to the reproduction patch mentioned above)

diff --git a/secure_storage/ta/user_ta_header_defines.h b/secure_storage/ta/user_ta_header_defines.h
index cc69e72..83e762d 100644
--- a/secure_storage/ta/user_ta_header_defines.h
+++ b/secure_storage/ta/user_ta_header_defines.h
@@ -36,7 +36,7 @@
 #define TA_UUID                                TA_SECURE_STORAGE_UUID
-#define TA_FLAGS                       (TA_FLAG_EXEC_DDR | TA_FLAG_SINGLE_INSTANCE)
+#define TA_FLAGS                       (TA_FLAG_EXEC_DDR | TA_FLAG_SINGLE_INSTANCE | TA_FLAG_MULTI_SESSION)
 #define TA_STACK_SIZE                  (2 * 1024)
 #define TA_DATA_SIZE                   (32 * 1024)
--

Afterwards you can reproduce the issue in QEMU v7 using the following command

for I in $(seq 1 3); do (optee_example_secure_storage &); done; sleep 3; echo -n mem > /sys/power/state

In QEMU v7 you could additionally schedule resuming e.g. after 30sec by prepending

echo "+30" > /sys/devices/platform/9010000.pl031/rtc/rtc0/wakealarm

Afterwards you should get a output similar to below having 3 stuck tasks

- Read back the object
PM: suspend entry (s2idle)
Filesystems sync: 0.000 seconds
Freezing user space processes ...
Freezing of tasks failed after 20.008 seconds (3 tasks refusing to freeze, wq_busy=0):
task:optee_example_s state:R  running task     stack:    0 pid:  124 ppid:     1 flags:0x00000001
[<807d3e24>] (__schedule) from [<841c4000>] (0x841c4000)
task:optee_example_s state:D stack:    0 pid:  126 ppid:     1 flags:0x00000001
[<807d3e24>] (__schedule) from [<807d41d0>] (schedule+0x60/0x120)
[<807d41d0>] (schedule) from [<807d7ffc>] (schedule_timeout+0x1f4/0x340)
[<807d7ffc>] (schedule_timeout) from [<807d56a0>] (wait_for_completion+0x94/0xfc)
[<807d56a0>] (wait_for_completion) from [<80692134>] (optee_cq_wait_for_completion+0x14/0x60)
[<80692134>] (optee_cq_wait_for_completion) from [<806924dc>] (optee_do_call_with_arg+0x14c/0x154)
[<806924dc>] (optee_do_call_with_arg) from [<80692edc>] (optee_shm_unregister+0x78/0xcc)
[<80692edc>] (optee_shm_unregister) from [<80690a9c>] (tee_shm_release+0x88/0x174)
[<80690a9c>] (tee_shm_release) from [<8057f89c>] (dma_buf_release+0x44/0xb0)
[<8057f89c>] (dma_buf_release) from [<8028e4e8>] (__dentry_kill+0x110/0x17c)
[<8028e4e8>] (__dentry_kill) from [<80276cfc>] (__fput+0xc0/0x234)
[<80276cfc>] (__fput) from [<80140b1c>] (task_work_run+0x90/0xbc)
[<80140b1c>] (task_work_run) from [<8010b1c8>] (do_work_pending+0x4a0/0x5a0)
[<8010b1c8>] (do_work_pending) from [<801000cc>] (slow_work_pending+0xc/0x20)
Exception stack(0x843f5fb0 to 0x843f5ff8)
5fa0:                                     00000000 7ef63448 fffffffe 00000000
5fc0: 7ef63448 76f163b0 7ef63448 00000006 7ef63448 7ef634e0 7ef63438 00000000
5fe0: 00000006 7ef63400 76e74833 76dff856 800e0130 00000004
task:optee_example_s state:D stack:    0 pid:  128 ppid:     1 flags:0x00000001
[<807d3e24>] (__schedule) from [<807d41d0>] (schedule+0x60/0x120)
[<807d41d0>] (schedule) from [<807d7ffc>] (schedule_timeout+0x1f4/0x340)
[<807d7ffc>] (schedule_timeout) from [<807d56a0>] (wait_for_completion+0x94/0xfc)
[<807d56a0>] (wait_for_completion) from [<8069359c>] (optee_handle_rpc+0x554/0x710)
[<8069359c>] (optee_handle_rpc) from [<806924cc>] (optee_do_call_with_arg+0x13c/0x154)
[<806924cc>] (optee_do_call_with_arg) from [<80692910>] (optee_invoke_func+0x110/0x190)
[<80692910>] (optee_invoke_func) from [<8068fe3c>] (tee_ioctl+0x113c/0x1244)
[<8068fe3c>] (tee_ioctl) from [<802892ec>] (sys_ioctl+0xe0/0xa24)
[<802892ec>] (sys_ioctl) from [<80100060>] (ret_fast_syscall+0x0/0x54)
Exception stack(0x8424ffa8 to 0x8424fff0)
ffa0:                   00000000 7eb67584 00000003 8010a403 7eb67438 7eb675fc
ffc0: 00000000 7eb67584 7eb67604 00000036 7eb67448 7eb674e0 7eb67438 00000000
ffe0: 76ef7030 7eb6742c 76ee6469 76e83178
OOM killer enabled.
Restarting tasks ... done.
PM: suspend exit
sh: write error: Device or resource busy

The first back-trace is stuck while waiting for tee-supplicant (see initial message)

The second back-trace shows optee_cq_wait_for_completion waiting for available OP-TEE Thread

The last back-trace seems to be at OPTEE_MSG_RPC_WAIT_QUEUE_SLEEP as this is the only other occurrence of wait_for_completion below optee_handle_rpc. It seems to be waiting for the single instance to become available.

cgellner · 2021-05-11T07:54:45Z

Although it isn't directly related and it won't get stuck I wanted to mention one additional scenario influenced by freezing the tasks.

The msleep_interruptible in handle_rpc_func_cmd_wait should be affected by the fake signals of the suspend.
As it is interruptible it should not get stuck, but from my understanding the wait time might be significantly shorter than expected.

github-actions · 2021-06-11T01:30:26Z

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

cgellner · 2021-06-11T05:23:56Z

I proposed a set of patches via mailing lists
[RFC PATCH 0/3] tee: optee: Allow to freeze when tee-supplicant is frozen

Currently waiting for some more feedback

github-actions · 2021-07-12T01:20:58Z

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

cgellner · 2021-07-16T15:49:52Z

Beside the response of @jenswi-linaro I didn't see any feedback on RFC tagged patches

Therefore I send patches again - this time without RFC tag
[PATCH 0/3] tee: optee: Allow to freeze when tee-supplicant is frozen

github-actions · 2021-08-17T01:17:25Z

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

etienne-lms · 2021-08-22T17:46:39Z

Can this thread be re-open and tagged enhancement?

cgellner · 2022-03-04T13:27:50Z

Hi,
I have resend the request to op-tee@lists.trustedfirmware.org; linux-kernel@vger.kernel.org last August.

https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00496.html
https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00498.html
https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00499.html
I didn't see any reaction at all.
Not 100% sure if I missed anything or if I did anything incorrectly.
Hints how I can improve are welcome.

Do you know if the mentioned "suspend" problem has been solved differently ?
I didn't test with most recent OP-TEE or kernel.

I have seen that latest kernel has some small changes (787c80cc7b22804aa370f04a19f9fe0fa98b1e49)
So I believe patch 3/3 can't be applied, but I assume the issue is still there in the new code (still using wait_for_completion)

In case there is some interest in this patch I could

reproduce issue again in latest version
rework the patch set so it can be applied on latest kernel version
resend the patch request

But before starting I would like to understand if there is some general interest in this patch at all or if failing to freeze during suspend is no problem from your perspective.

jforissier · 2022-03-07T14:15:06Z

Hi @cgellner,

Technically speaking I can't say if your patches are correct because I don't know the Linux suspend mechanisms well enough. However it seems you are addressing a real issue with these patches, so I think they deserve more attention.

In case there is some interest in this patch I could

* reproduce issue again in latest version

* rework the patch set so it can be applied on latest kernel version

* resend the patch request

@jenswi-linaro I think it is the way to go, isn't it? Maybe CC some experts of the "freeze" interface, too?

jenswi-linaro · 2022-03-07T16:48:56Z

I agree and the same problem here that I don't know that area well enough.

victorchenpan · 2022-05-26T12:28:29Z

I have meet the problem too. When tee-supplicant has been frozen, and then other task request tee-supplicant, it can't enter freezing flow. So kernel can't finish freeze flow, and will not enter suspend flow.

Your patch can solve the problem.

jforissier · 2022-05-26T15:23:00Z

The last trace of @cgellner 's patchset I could see on the Linux Kernel Mailing list is https://lore.kernel.org/all/20210809071452.28508-1-cgellner@de.adit-jv.com/ (August 2021) and it has never gathered any Acked-by, Reviewed-by or Tested-by apparently (at least there is no such tags in the patches themselves). @victorchenpan are you subscribed to the LKML? Would you be able to provide feedback if the series was to be re-posted? Or perhaps even reply to the already sent emails by following the "Reply instructions" on the ML page?

baiqiang728 · 2023-10-28T15:03:34Z

Hi, I have resend the request to op-tee@lists.trustedfirmware.org; linux-kernel@vger.kernel.org last August.

https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00496.html

https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00498.html

https://lkml.iu.edu/hypermail/linux/kernel/2108.1/00499.html
I didn't see any reaction at all.
Not 100% sure if I missed anything or if I did anything incorrectly.
Hints how I can improve are welcome.

Do you know if the mentioned "suspend" problem has been solved differently ? I didn't test with most recent OP-TEE or kernel.

I have seen that latest kernel has some small changes (787c80cc7b22804aa370f04a19f9fe0fa98b1e49) So I believe patch 3/3 can't be applied, but I assume the issue is still there in the new code (still using wait_for_completion)

In case there is some interest in this patch I could

reproduce issue again in latest version

rework the patch set so it can be applied on latest kernel version

resend the patch request

But before starting I would like to understand if there is some general interest in this patch at all or if failing to freeze during suspend is no problem from your perspective.

Thanks for your patch @cgellner ， I met the same issue. After I update your patch, it seems that it works. However, if I do more test , such as stress test for STR when xtest is running. It may cause CPU rcu stall issue. Do you have similar phenomena ？

Thanks

github-actions bot added the Stale label Jun 11, 2021

jenswi-linaro removed the Stale label Jun 11, 2021

github-actions bot added the Stale label Jul 12, 2021

github-actions bot removed the Stale label Jul 17, 2021

github-actions bot added the Stale label Aug 17, 2021

github-actions bot closed this as completed Aug 22, 2021

jforissier reopened this Aug 22, 2021

jforissier added enhancement and removed Stale labels Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freezing OP-TEE fails during suspend #4581

Freezing OP-TEE fails during suspend #4581

cgellner commented Apr 28, 2021

etienne-lms commented Apr 28, 2021

cgellner commented May 11, 2021

cgellner commented May 11, 2021

github-actions bot commented Jun 11, 2021

cgellner commented Jun 11, 2021

github-actions bot commented Jul 12, 2021

cgellner commented Jul 16, 2021

github-actions bot commented Aug 17, 2021

etienne-lms commented Aug 22, 2021

cgellner commented Mar 4, 2022

jforissier commented Mar 7, 2022

jenswi-linaro commented Mar 7, 2022

victorchenpan commented May 26, 2022

jforissier commented May 26, 2022

baiqiang728 commented Oct 28, 2023

Freezing OP-TEE fails during suspend #4581

Freezing OP-TEE fails during suspend #4581

Comments

cgellner commented Apr 28, 2021

etienne-lms commented Apr 28, 2021

cgellner commented May 11, 2021

cgellner commented May 11, 2021

github-actions bot commented Jun 11, 2021

cgellner commented Jun 11, 2021

github-actions bot commented Jul 12, 2021

cgellner commented Jul 16, 2021

github-actions bot commented Aug 17, 2021

etienne-lms commented Aug 22, 2021

cgellner commented Mar 4, 2022

jforissier commented Mar 7, 2022

jenswi-linaro commented Mar 7, 2022

victorchenpan commented May 26, 2022

jforissier commented May 26, 2022

baiqiang728 commented Oct 28, 2023