Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update cuda version for doppio to use cuda 10.0 #141

Open
wants to merge 4 commits into
base: gpu
Choose a base branch
from

Conversation

xinshengqin
Copy link
Contributor

No description provided.

@rjleveque
Copy link
Member

Rather than hardwiring in different CUDA versions for different computer names, would it be better to have a variable CUDA_VERSION or some such thing that a user could set as an environment variable? Most potential users won't have access to doppio or other machines listed in the Makefile.

@xinshengqin
Copy link
Contributor Author

Updated. How does this look like?

@rjleveque
Copy link
Member

Setting

CUDA_VER ?= $(CUDA_VERSION)

means to set CUDA_VER to the value of the environment variable CUDA_VERSION only if CUDA_VER is itself not set as an environment variable. This doesn't seem like what you want since the user wouldn't have two such environment variables.

Maybe you want to first set CUDA_VERSION to some particular version if you are on certain machines and then the above command would over-ride this if the user set an environment variable (which hopefully they did if they are on a different machine than any listed).

And then always do

ALL_FFLAGS   += -Mcuda=$(CUDA_VER),maxregcount:80 -Mcuda=lineinfo -Mcuda=rdc

so that the value set as CUDA_VER is always used regardless of what machine.

@mandli
Copy link
Member

mandli commented Feb 8, 2020

Is there not a canonical way to do this? I looked a bit and it looks like you can run nvcc --version as mentioned here.

@xinshengqin
Copy link
Contributor Author

Is there not a canonical way to do this? I looked a bit and it looks like you can run nvcc --version as mentioned here.

One can do nvcc --version to get something like

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

But then you have to parse these strings to get what you want. In this case it's "10.0".

Also the cuda compiler does not define an environment variable like CUDA_VERSION for us.

What @rjleveque suggested looks like a simple approach with the caveat that user will need to set this in their bashrc file.

What option do you guys think we should use, 1) parse the string return by "nvcc --version"; 2) use environment variable set by the user?

@rjleveque
Copy link
Member

I don't think we should try to parse the version string ourselves. Setting a variable with CUDA_VERSION ?= seems like the best approach to me, setting it to some default value (maybe dependent on machine although that's not useful to most users). And then the user can either set an environment variable to over-ride this, or just set CUDA_VERSION = ... in their own application Makefile which would also over-ride the default set in Makefile.common.gpu, I believe.

One way or another I think the user is responsible for setting this and we should make it easy for them to do so in their own Makefile and/or as an environment variable without having to modify Makefile.common.gpu.

@rjleveque
Copy link
Member

Also it looks like there are some other things on the same ALL_FLAGS line as cuda10.1 that are machine-dependent, e.g. cc35 vs cc60?  What is this and does it also need to be an environment variable?

@xinshengqin
Copy link
Contributor Author

Sounds good. I will go ahead to use this approach. cc35 and cc60 are "compute capabilities". Those can be interpreted as version numbers that indicate what features are supported. Different generations of GPU has different compute capabilities.

More details can be seen at: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

…nd CUDA_CC.

CUDA_VERSION should be set to the version "nvcc --version" returns.
CUDA_CC should be set to the compute capability supported by the machine
you compile this code on.

The users are expected to set the two environment variables. They will
be passed to the compiler as compilation flags:
ALL_FFLAGS   += -Mcuda=cuda$(CUDA_VERSION),$(CUDA_CC)
@xinshengqin
Copy link
Contributor Author

I updated the makefile to use CUDA_VERSION and CUDA_CC.

CUDA_VERSION should be set to the version "nvcc --version" returns.
CUDA_CC should be set to the compute capability supported by the machine
you compile this code on.

The users are expected to set the two environment variables. They will
be passed to the compiler as compilation flags:
ALL_FFLAGS += -Mcuda=cuda$(CUDA_VERSION),$(CUDA_CC)

@mandli
Copy link
Member

mandli commented Feb 11, 2020

I would be a bit cautious about adding this to something like Makefile.common even thought it is in a “GPU” version. Why not add those variables to the local makefile?

@rjleveque
Copy link
Member

Maybe we should have the application point to a new common Makefile for the GPU, e.g.

CLAWMAKE = $(CLAW)/clawutil/src/Makefile.common.gpu

so there could be a common one for gpu applications without messing with the standard one.

Or maybe better to just have in the application Makefile as @mandli suggests, which might make it easier for the user to see exactly what's being set and modify it appropriately.

@xinshengqin
Copy link
Contributor Author

xinshengqin commented Feb 12, 2020

You mentioned two options for this case:

  1. have the application point to a new common Makefile for the GPU. Users need to set variables like CUDA_PATH in their bash environment.
  2. have user set variables like CUDA_PATH in their local Makefile.

For option 1, users don't have to set it multiple times for every example. Option 2 is easier for the user to see what's being set or what they need to set.

I can't come up with other reasonings here to compare the two. But this is an important design decision to make.

@xinshengqin
Copy link
Contributor Author

Any thoughts on which of the two option we should choose?

@mandli
Copy link
Member

mandli commented Feb 25, 2020

I tend to use the CUDA_PATH ?= default in the local Makefiles for times like this where you can provide a default value but also override it in the environment.

@xinshengqin
Copy link
Contributor Author

Just updated it to use variables set in local Makefiles. Relevant changes are also made in:
clawpack/amrclaw#259
clawpack/geoclaw#438

else
CLAW_FC ?= pgfortran
endif
CLAW_FC ?= pgfortran
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually FC is set to the fortran compiler command. Here you seem to be setting it to pgi and then using that to set CLAW_FC to the compiler command. Why not just set FC ?= pgfortran in the application makefile, which would be more consistent with other usage?

I see from the old version that on some machines the pgi compiler name is ftn but when running on such a machine the user should just set FC to ftn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was just following what's done in regular Makefile.common where we have FC and then CLAW_FC ?= FC. Then in order to stress that you need a special GPU-enable pgi compiler, I set it to pgi to distinguish it from regular pgfortran. "ftn" is just an alias to pgfortran on one of the machine I have used.

I can change this if that's better. I don't fully understand why do we have both FC and CLAW_FC though. Why not just FC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants