Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can’t build GPU docker for Nvidia Jetson devices #1685

Open
Dennis-Johnson opened this issue Aug 9, 2023 · 2 comments
Open

Can’t build GPU docker for Nvidia Jetson devices #1685

Dennis-Johnson opened this issue Aug 9, 2023 · 2 comments

Comments

@Dennis-Johnson
Copy link

How did you install ODM? (Docker, installer, natively, ...)?

Docker: tried using the original gpu.Dockerfile and a modified dockerfile (included below) as well.

What is the problem?

I can't build ODM on a Nvidia Jetson (Tegra SoC) device using the default gpu.Dockerfile.

What should be the expected behavior? If this is a feature request, please describe in detail the changes you think should be made to the code, citing files and lines where changes should be made, if possible.

I want to test ODM on Nvidia Jetson devices. These devices have an integrated GPU on the SoC unlike the PCIe connected GPUs on x86 desktops. This implies that nvidia-smi does not work to detect presence of a GPU. Typically tegrastats util is used to find GPU and CUDA info.

My use case is to accelerate SIFT feature detection on the GPU. Is the OpenCV dependency needed for this in the first place? Can someone suggest parts of the code-base to modify to test whether the SIFT bits have been built with CUDA support properly?

How can we reproduce this? What steps did you do to trigger the problem?

On trying to build the original gpu.Dockerfile, this error is shown.

# Original dockerfile with GPU options.
docker build -t test_odm -f gpu.Dockerifle . 

...
Setting up libboost-program-options1.71.0:arm64 (1.71.0-6ubuntu6) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Traceback (most recent call last):
  File "/code/run.py", line 15, in <module>
    from opendm.utils import get_processing_results_paths, rm_r
  File "/code/opendm/utils.py", line 5, in <module>
    from opendm.photo import find_largest_photo_dims
  File "/code/opendm/photo.py", line 17, in <module>
    from opendm import get_image_size
  File "/code/opendm/get_image_size.py", line 2, in <module>
    import cv2
  File "/code/SuperBuild/install/lib/python3.8/dist-packages/cv2/__init__.py", line 96, in <module>
    bootstrap()
  File "/code/SuperBuild/install/lib/python3.8/dist-packages/cv2/__init__.py", line 86, in bootstrap
    import cv2
ImportError: libavcodec.so.58: cannot open shared object file: No such file or directory
The command '/bin/sh -c bash configure.sh installruntimedepsonly   && apt-get clean   && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*   && bash run.sh --help   && bash -c "eval $(python3 /code/opendm/context.py) && python3 -c 'from opensfm import io, pymap'"' returned a non-zero code: 1

I tried apt install libavutil-dev before this step in the dockerfile, but it's not found.

Since the CUDA version is coupled with the OS for the Tegra devices, I installed what I assumed is a similar base image which has the required CUDA runtime and libraries.

# Modified gpu.Dockerfile
FROM nvcr.io/nvidia/l4t-cuda:11.4.19-runtime

# Env variables
ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONPATH="$PYTHONPATH:/code/SuperBuild/install/lib/python3.9/dist-packages:/code/SuperBuild/install/lib/python3.8/dist-packages:/code/SuperBuild/install/bin/opensfm" \
    LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/code/SuperBuild/install/lib"

# Prepare directories
WORKDIR /code

# Copy everything
COPY . ./

RUN PORTABLE_INSTALL=YES GPU_INSTALL=YES bash configure.sh install

# Install shared libraries that we depend on via APT, but *not*
# the -dev packages to save space!
# Also run a smoke test on ODM and OpenSfM
RUN bash configure.sh installruntimedepsonly \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN bash run.sh --help
RUN bash -c "eval $(python3 /code/opendm/context.py) && python3 -c 'from opensfm import io, pymap'"

# Entry point
ENTRYPOINT ["python3", "/code/run.py"]

Still no luck, I got a different error during the OpenCV build step using this modified Dockerfile.

....
Error: 
-- Installing: /code/SuperBuild/install/share/opencv4/lbpcascades/lbpcascade_frontalface_improved.xml
-- Installing: /code/SuperBuild/install/share/opencv4/lbpcascades/lbpcascade_profileface.xml
-- Installing: /code/SuperBuild/install/share/opencv4/lbpcascades/lbpcascade_silverware.xml
[ 68%] Completed 'opencv'
[ 68%] Built target opencv
make: *** [Makefile:84: all] Error 2
The command '/bin/sh -c PORTABLE_INSTALL=YES GPU_INSTALL=YES bash configure.sh install' returned a non-zero code: 2

System Info

$ cat /etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

# Info from jtop utility
Machine: aarch64 Jetson Xavier AGX
Release: 5.10.104-tegra
System Python version: 3.8.10

Linux4Tegra : v35.1.0
Jetpack: 5.0.2 
@pierotofy
Copy link
Member

pierotofy commented Aug 25, 2023

Strange that the build fails, in any case, to test stuff you shouldn't need to rebuild the image, use start-dev-env.sh and pass the IMAGE=opendronemap/odm:gpu GPU=YES parameters to the script.

Edit: ah, never mind, I see that you're running on ARM, not Intel (and we don't publish gpu images for ARM).

I'm not really sure, but I'm guessing the build might have issues with the CPU architecture rather than the GPU. I'd start by figuring out why libavcodec is missing or why it can't be installed.

@zfb132
Copy link
Contributor

zfb132 commented Oct 12, 2023

Hi @pierotofy , I got the same error on amd64 machine.

66.83     from opendm.photo import find_largest_photo_dims
66.83   File "/code/opendm/photo.py", line 17, in <module>
66.83     from opendm import get_image_size
66.83   File "/code/opendm/get_image_size.py", line 2, in <module>
66.83     import cv2
66.83   File "/code/SuperBuild/install/lib/python3.8/dist-packages/cv2/__init__.py", line 96, in <module>
66.83     bootstrap()
66.83   File "/code/SuperBuild/install/lib/python3.8/dist-packages/cv2/__init__.py", line 86, in bootstrap
66.83     import cv2
66.83 ImportError: libavcodec.so.58: cannot open shared object file: No such file or directory

I finally found the reason (at least in my case). I am new to the automated building of ODM and maybe it is wrong. I am just sharing the steps on how to find and fix the bug.

  1. I searched the log and could find the step for installing libavcodec, it's strange.
  2. After testing again, I found that the libavcodec is only installed in nvidia/cuda:11.2.2-devel-ubuntu20.04, but not in
    nvidia/cuda:11.2.2-runtime-ubuntu20.04. It means the first stage (build) has no problem, but the second stage (runtime) is broken.
  3. Go back to the log. I found an error E: Unable to locate package libproj19. And here we install the libproj19, together with libavcodec58

    ODM/snap/snapcraft.yaml

    Lines 93 to 107 in 38af615

    stage-packages:
    - libavcodec58
    - libavformat58
    - libflann1.9
    - libgtk2.0-0
    - libjpeg-turbo8
    - libopenjpip7
    - liblapack3
    - libpng16-16
    - libproj19
    - libswscale5
    - libtbb2
    - libtiff5
    - libwebpdemux2
    - libxext6

We can also find the first stage (build) is using libproj-dev (not a specific version). That's why we only get the error when running configure.sh installruntimedepsonly.

The default version of libproj is libproj15 on ubuntu 20.04. So we are using the following to make sure we can install libproj19 successfully

ODM/configure.sh

Lines 58 to 63 in 38af615

if [[ "$UBUNTU_VERSION" == *"20.04"* ]]; then
echo "Enabling PPA for Ubuntu GIS"
sudo $APT_GET install -y -qq --no-install-recommends software-properties-common
sudo add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable
sudo $APT_GET update
fi

Go back to the log again. I found an output:

#10 29.87 Cannot add PPA: 'ppa:~ubuntugis/ubuntu/ubuntugis-unstable'.
#10 29.87 The team named '~ubuntugis' has no PPA named 'ubuntu/ubuntugis-unstable'
#10 29.87 Please choose from the following available PPAs:
#10 29.87  * 'ppa':  ubuntugis-stable
#10 29.87  * 'ubuntugis-experimental':  ubuntugis-experimental
#10 29.87  * 'ubuntugis-testing':  ubuntugis-testing
#10 29.87  * 'ubuntugis-unstable':  ubuntugis-unstable

I am not sure why this happened. It is weird. Then I created a docker container using nvidia/cuda:11.2.2-runtime-ubuntu20.04 on my local machine. And I found it can be fixed by using the following script:

    if [[ "$UBUNTU_VERSION" == *"20.04"* ]]; then
        echo "Enabling PPA for Ubuntu GIS"
        sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key 089EBE08314DF160
        sudo $APT_GET install -y -qq --no-install-recommends software-properties-common ca-certificates apt-transport-https gnupg
        sudo add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable
        sudo $APT_GET update
    fi

@Dennis-Johnson Can you try this to see if works for you?
If it works, I will send a PR to solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants