Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pix2pix architecture differs from original paper #1614

Open
xu-minghao317 opened this issue Nov 11, 2023 · 0 comments
Open

pix2pix architecture differs from original paper #1614

xu-minghao317 opened this issue Nov 11, 2023 · 0 comments

Comments

@xu-minghao317
Copy link

According to torchsummary, it shows default generator used in pixpix has an architecture of:

        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 128, 128]           3,072
         LeakyReLU-2         [-1, 64, 128, 128]               0
            Conv2d-3          [-1, 128, 64, 64]         131,072
       BatchNorm2d-4          [-1, 128, 64, 64]             256
         LeakyReLU-5          [-1, 128, 64, 64]               0
            Conv2d-6          [-1, 256, 32, 32]         524,288
       BatchNorm2d-7          [-1, 256, 32, 32]             512
         LeakyReLU-8          [-1, 256, 32, 32]               0
            Conv2d-9          [-1, 512, 16, 16]       2,097,152
      BatchNorm2d-10          [-1, 512, 16, 16]           1,024
        LeakyReLU-11          [-1, 512, 16, 16]               0
           Conv2d-12            [-1, 512, 8, 8]       4,194,304
      BatchNorm2d-13            [-1, 512, 8, 8]           1,024
        LeakyReLU-14            [-1, 512, 8, 8]               0
           Conv2d-15            [-1, 512, 4, 4]       4,194,304
      BatchNorm2d-16            [-1, 512, 4, 4]           1,024
        LeakyReLU-17            [-1, 512, 4, 4]               0
           Conv2d-18            [-1, 512, 2, 2]       4,194,304
      BatchNorm2d-19            [-1, 512, 2, 2]           1,024
        LeakyReLU-20            [-1, 512, 2, 2]               0
           Conv2d-21            [-1, 512, 1, 1]       4,194,304
             ReLU-22            [-1, 512, 1, 1]               0
  ConvTranspose2d-23            [-1, 512, 2, 2]       4,194,304
      BatchNorm2d-24            [-1, 512, 2, 2]           1,024
UnetSkipConnectionBlock-25           [-1, 1024, 2, 2]               0
             ReLU-26           [-1, 1024, 2, 2]               0
  ConvTranspose2d-27            [-1, 512, 4, 4]       8,388,608
      BatchNorm2d-28            [-1, 512, 4, 4]           1,024
          Dropout-29            [-1, 512, 4, 4]               0
UnetSkipConnectionBlock-30           [-1, 1024, 4, 4]               0
             ReLU-31           [-1, 1024, 4, 4]               0
  ConvTranspose2d-32            [-1, 512, 8, 8]       8,388,608
      BatchNorm2d-33            [-1, 512, 8, 8]           1,024
          Dropout-34            [-1, 512, 8, 8]               0
UnetSkipConnectionBlock-35           [-1, 1024, 8, 8]               0
             ReLU-36           [-1, 1024, 8, 8]               0
  ConvTranspose2d-37          [-1, 512, 16, 16]       8,388,608
      BatchNorm2d-38          [-1, 512, 16, 16]           1,024
          Dropout-39          [-1, 512, 16, 16]               0
UnetSkipConnectionBlock-40         [-1, 1024, 16, 16]               0
             ReLU-41         [-1, 1024, 16, 16]               0
  ConvTranspose2d-42          [-1, 256, 32, 32]       4,194,304
      BatchNorm2d-43          [-1, 256, 32, 32]             512
UnetSkipConnectionBlock-44          [-1, 512, 32, 32]               0
             ReLU-45          [-1, 512, 32, 32]               0
  ConvTranspose2d-46          [-1, 128, 64, 64]       1,048,576
      BatchNorm2d-47          [-1, 128, 64, 64]             256
UnetSkipConnectionBlock-48          [-1, 256, 64, 64]               0
             ReLU-49          [-1, 256, 64, 64]               0
  ConvTranspose2d-50         [-1, 64, 128, 128]         262,144
      BatchNorm2d-51         [-1, 64, 128, 128]             128
UnetSkipConnectionBlock-52        [-1, 128, 128, 128]               0
             ReLU-53        [-1, 128, 128, 128]               0
  ConvTranspose2d-54          [-1, 3, 256, 256]           6,147
             Tanh-55          [-1, 3, 256, 256]               0
UnetSkipConnectionBlock-56          [-1, 3, 256, 256]               0
    UnetGenerator-57          [-1, 3, 256, 256]               0
     DataParallel-58          [-1, 3, 256, 256]               0

However, original paper "Image-to-Image Translation with Conditional Adversarial Networks" said:

Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNormDropout-ReLU layer with a dropout rate of 50%. All convolutions are 4 × 4 spatial filters applied with stride 2. Convolutions in the encoder, and in the discriminator, downsample by a factor of 2, whereas in the decoder they upsample by a factor of 2. The encoder-decoder architecture consists of:
encoder: C64-C128-C256-C512-C512-C512-C512-C512
U-Net decoder: CD512-CD1024-CD1024-C1024-C1024-C512 -C256-C128

Why, in this repo, is:
encoder: C64(no batchnorm)-C128-C256-C512-C512-C512-C512-C512
U-Net decoder: C512(no batchnorm)-C1024-CD1024-CD1024-CD1024-CD512 -C256-C128

Where especially in decoder, it is quite different. Does this repo provided a tweaked version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant