Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generating core dump file's in gcp bucket mounted directory #1363

Open
vadirajks opened this issue Sep 8, 2023 · 15 comments
Open

generating core dump file's in gcp bucket mounted directory #1363

vadirajks opened this issue Sep 8, 2023 · 15 comments
Assignees
Labels
bug Error or flaw in the code with unintended result p2 P2 question Customer Issue: question about how to use tool

Comments

@vadirajks
Copy link

vadirajks commented Sep 8, 2023

Describe the issue
created empty core dump file in gcp bucket which is mounted using gcsfuse .

To Collect more Debug logs
Steps to reproduce the behavior:

I am able to get core dump file created using local filesystem:

[root@test-vm-01 ~]# mkdir /opt/coredumps
[root@test-vm-01 ~]# sysctl -w kernel.core_pattern="/opt/coredumps/%e-%s-%u-%g-%p-%t-%P-%h.core"
kernel.core_pattern = /opt/coredumps/%e-%s-%u-%g-%p-%t-%P-%h.core
[root@test-vm-01 ~]# sleep 100000 &
[1] 17351
[root@test-vm-01 ~]# kill -11 17351
[root@test-vm-01 ~]# ls -l /opt/coredumps/
total 160
-rw------- 1 root root 385024 Sep  8 16:12 sleep-11-0-0-17351-1694189566-17351-test-vm-01.core
[1]+  Segmentation fault      (core dumped) sleep 100000
[root@test-vm-01 ~]# file /opt/coredumps/sleep-11-0-0-17351-1694189566-17351-test-vm-01.core
/opt/coredumps/sleep-11-0-0-17351-1694189566-17351-test-vm-01.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'sleep 100000', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/bin/sleep', platform: 'x86_64'
[root@test-vm-01 ~]# 

It's creating empty file if we use gcp mounted bucket directory:

[root@test-vm-01 ~]# ls /dump/coredumps/
[root@test-vm-01 ~]# sysctl -w kernel.core_pattern="/dump/coredumps/%e-%s-%u-%g-%p-%t-%P-%h.core"
kernel.core_pattern = /dump/coredumps/%e-%s-%u-%g-%p-%t-%P-%h.core
[root@test-vm-01 ~]# ls /dump/coredumps/ -l
total 0
[root@test-vm-01 ~]# sleep 100000 &
[1] 18071
[root@test-vm-01 ~]# kill -11 18071
[root@test-vm-01 ~]# ls /dump/coredumps/ -l
total 0
-rwxrwx--x 1 atom atom 0 Sep  8 16:15 sleep-11-0-0-18071-1694189756-18071-test-vm-01.core
[1]+  Segmentation fault      sleep 100000
[root@test-vm-01 ~]# file /dump/coredumps/sleep-11-0-0-18071-1694189756-18071-test-vm-01.core
/dump/coredumps/sleep-11-0-0-18071-1694189756-18071-test-vm-01.core: empty

please help me if anything, i needed this to make use of this in autoscaling vm's so that i get dump file's in gcs bucket if in-case vm's get terminated.

Thanks,

@vadirajks vadirajks added p1 P1 question Customer Issue: question about how to use tool labels Sep 8, 2023
@gargnitingoogle
Copy link
Collaborator

@vadirajks thanks for reporting this issue. Please share the following information

  1. information about how you mounted GCP bucket using gcsfuse, the command, options etc.
  2. the machine configuration, OS version etc.

I need these to try to recreate the issue.

@gargnitingoogle gargnitingoogle self-assigned this Sep 12, 2023
@gargnitingoogle
Copy link
Collaborator

In the meantime, I tried the commands @vadirajks shared on a GCP VM with ubuntu22.04 and I didn't get any core dump files even without any gcsfuse mounts being present on it.

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Sep 12, 2023

In the meantime, I tried the commands @vadirajks shared on a GCP VM with ubuntu22.04 and I didn't get any core dump files even without any gcsfuse mounts being present on it.

Never mind. The coredump without the gcsfuse mount got generated successfully after running the following command.

ulimit -c unlimited

@gargnitingoogle
Copy link
Collaborator

With the above change, I'm getting a somewhat different issue with core-dump generation on gcsfuse mount of the folder /opt/coredumps, related to permission.

{"name":"root","levelname":"ERROR","severity":"ERROR","message":"CreateFile: permission denied, CreateChildFile: error in closing writer : googleapi: Error 403: Access denied., forbidden\n","timestampSeconds":1694516749,"timestampNanos":370800990}
{"name":"root","levelname":"ERROR","severity":"ERROR","message":"fuse: *fuseops.CreateFileOp error: permission denied\n","timestampSeconds":1694516749,"timestampNanos":370909530}

In this, the directory /opt/coredumps has been created by root user, as well as the gcsfuse mount (gcsfuse --log-file=gcsfuse_run_log.txt <bucket> /opt/coredumps) are using root user login.

@gargnitingoogle
Copy link
Collaborator

@vadirajks thanks for reporting this issue. Please share the following information

  1. information about how you mounted GCP bucket using gcsfuse, the command, options etc.
  2. the machine configuration, OS version etc.

I need these to try to recreate the issue.

@vadirajks apart from the above, could you enable logs in your gcsfuse mount call (if not already done, option --log-file) and upload the logs generated after the kill call?

@gargnitingoogle
Copy link
Collaborator

gargnitingoogle commented Sep 14, 2023

I was able to regenerate the failure as @vadirajks described.

Updating the status so far.

Expected operations:

  • touch file: LookUpInode (Error) -> CreateFile (OK) -> FlushFile (OK) -> SetInodeAttributes (OK) -> FlushFile (OK) -> ReleaseFileHandle (OK) -> GetInodeAttributes (OK) -> n x LookUpInode (OK)
  • echo contents > file: LookUpInode (Error) -> CreateFile (OK) -> FlushFile (OK) -> GetInodeAttributes (OK) -> WriteFile (OK) -> SetInodeAttributes (OK) -> FlushFile (OK) -> ReleaseFileHandle (OK)

Actual operations in this case
kill -11 pid: LookUpInode (Error) -> LookUpInode (Error) -> CreateFile (OK) -> FlushFile (OK) -> ReleaseFileHandle (OK)

The file handle is being closed without any other operations, so it is most likely being terminated/skipped by the kernel itself. Investigation continues.

@vadirajks
Copy link
Author

sorry, i didn't notice any of this messages and thank you very much for taking up this. Please let me know if anything is needed from me.

@gargnitingoogle
Copy link
Collaborator

No problem @vadirajks .
I tried to figure out the issue, but as you can see from my comments above, I'm not getting any write calls in gcsfuse in case of a coredump, only a CreateFile calls, hence the 0-byte file.

This will need me to figure out how core dump really works, and why it isn't really working in case of a gcsfuse mounted directory. I have so far tried to trace the processes involved, but had no luck.

At the moment, all I can say is that it will take time.

@jasmit-s jasmit-s added the bug Error or flaw in the code with unintended result label Sep 27, 2023
@gargnitingoogle
Copy link
Collaborator

Looked into testing core-dump issue for older versions of gcsfuse. Issue exists for v1.2.0 through to v1.0.0. Was getting current user error running gcsfuse v0.42.5 after installation (even with golang version 1.20 which was the the go version that this release version was packaged with), need to check it out, maybe it’s incompatible with ubuntu 22.04 VM that I tried on.

@gargnitingoogle gargnitingoogle added p2 P2 and removed p1 P1 labels Oct 11, 2023
@gargnitingoogle
Copy link
Collaborator

Lowered priority to P2.

@dentiny
Copy link

dentiny commented Nov 14, 2023

@gargnitingoogle I met the same issue, is there any progress?

@gargnitingoogle
Copy link
Collaborator

@dentiny there has been no progress on this issue since #1363 (comment) .

At the outset, it looks like this issue has always existed in gcsfuse, and thus is not a new issue. Currently, we have categorized this as a known limitation.

During my last debug, I was not able to trace where (from which process) the write to the core dump output file was supposed to originate, to see why it wasn't reaching gcsfuse.
I'll give it another try this week.

@vadirajks
Copy link
Author

vadirajks commented Nov 14, 2023

@dentiny , as a work around for this you can make use of incron to move files to gcp bucket based on events from the local folder.

@dentiny
Copy link

dentiny commented Nov 17, 2023

But it beats the purpose of using local filesystem. To work-around, I mount a NFS instead.

@bjornleffler
Copy link
Member

@vadirajks, this is easy to reproduce, but as @gargnitingoogle points out, it appears that the Linux kernel isn't issuing any Write calls, resulting in an empty file.

This looks like a problem outside of GCSFuse. I'm wondering if some environment variable or kernel setting could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Error or flaw in the code with unintended result p2 P2 question Customer Issue: question about how to use tool
Projects
None yet
Development

No branches or pull requests

5 participants