How to do inferencing using multiple GPU's for styleformer #10

pratikchhapolika · 2022-04-25T06:30:32Z

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)

The text was updated successfully, but these errors were encountered:

PrithivirajDamodaran · 2022-04-25T12:23:43Z

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

pratikchhapolika · 2022-04-25T14:58:32Z

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.
import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)
You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

mhillebrand · 2022-04-25T20:56:33Z

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

pratikchhapolika · 2022-04-26T04:33:25Z

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.
import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)
You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.
Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.
So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

@PrithivirajDamodaran please confirm on this?

Will distributed training work here.

Like this:

python -m torch.distributed.launch --nproc_per_node 4 inference.py

PrithivirajDamodaran · 2022-06-30T04:11:55Z

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

mhillebrand · 2022-09-03T22:07:20Z

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.
@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

@PrithivirajDamodaran How's the batch patch coming along?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do inferencing using multiple GPU's for styleformer #10

How to do inferencing using multiple GPU's for styleformer #10

pratikchhapolika commented Apr 25, 2022 •

edited

PrithivirajDamodaran commented Apr 25, 2022

pratikchhapolika commented Apr 25, 2022

mhillebrand commented Apr 25, 2022 •

edited

pratikchhapolika commented Apr 26, 2022

PrithivirajDamodaran commented Jun 30, 2022

mhillebrand commented Sep 3, 2022

How to do inferencing using multiple GPU's for styleformer #10

How to do inferencing using multiple GPU's for styleformer #10

Comments

pratikchhapolika commented Apr 25, 2022 • edited

PrithivirajDamodaran commented Apr 25, 2022

pratikchhapolika commented Apr 25, 2022

mhillebrand commented Apr 25, 2022 • edited

pratikchhapolika commented Apr 26, 2022

PrithivirajDamodaran commented Jun 30, 2022

mhillebrand commented Sep 3, 2022

pratikchhapolika commented Apr 25, 2022 •

edited

mhillebrand commented Apr 25, 2022 •

edited