Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do inferencing using multiple GPU's for styleformer #10

Open
pratikchhapolika opened this issue Apr 25, 2022 · 6 comments
Open

Comments

@pratikchhapolika
Copy link

pratikchhapolika commented Apr 25, 2022

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)     
@PrithivirajDamodaran
Copy link
Owner

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

@pratikchhapolika
Copy link
Author

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

@mhillebrand
Copy link

mhillebrand commented Apr 25, 2022

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

@pratikchhapolika
Copy link
Author

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.
Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

@PrithivirajDamodaran please confirm on this?

Will distributed training work here.

Like this:

python -m torch.distributed.launch --nproc_per_node 4 inference.py

@PrithivirajDamodaran
Copy link
Owner

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

@mhillebrand
Copy link

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.
@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

@PrithivirajDamodaran How's the batch patch coming along?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants