Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build #122

Open
meonkeys opened this issue Nov 5, 2023 · 4 comments
Open

Docker build #122

meonkeys opened this issue Nov 5, 2023 · 4 comments

Comments

@meonkeys
Copy link

meonkeys commented Nov 5, 2023

Just thought it would be handy to have a Docker image for this tool. I've been unable to get it working so far but I'll keep trying. If anyone else has it running in Docker, please share.

@meonkeys
Copy link
Author

meonkeys commented Nov 7, 2023

I got an image built. It's not clean enough for a pull request but I'll share what I've got anyway. Maybe someone else can pick this up and contribute it (assuming the maintainers want it).

I'm just creating a Dockerfile in a working copy (local clone) of this repository (HEAD at 2bdffc6) and building with Docker. Here's the Dockerfile:

# FIXME: Makes a huge image.
# TODO: Optimize with a multi-stage build, perhaps also using venv.

# Pin to 3.10-bookworm to get Python 3.10
# because https://github.com/MahmoudAshraf97/whisper-diarization/issues/90
FROM python:3.10-bookworm

ARG WD_USER=joe
ARG WD_UID=1000
ARG WD_GROUP=joe
ARG WD_GID=1000

# We rarely see a full upgrade in a Dockerfile. Why?
# && apt-get --assume-yes dist-upgrade \
RUN apt-get update \
  && apt-get --assume-yes --no-install-recommends install \
  cython3 \
  ffmpeg \
  unzip \
  wget \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /usr/src/app

COPY . .

RUN addgroup --gid $WD_GID $WD_GROUP \
  && adduser --uid $WD_UID --gid $WD_GID --shell /bin/bash --no-create-home $WD_USER \
  && chown -R $WD_USER:$WD_GROUP /usr/src/app

USER $WD_USER:$WD_GROUP

RUN mkdir venv \
  && python -m venv venv \
  && . venv/bin/activate \
  && pip install Cython \
  && pip install --no-cache-dir --requirement requirements.txt

Build with docker build --tag whisper-diarization . The rest assumes a Bash shell on Linux or something close to / compatible with that.

As user joe with UID 1000 and GID 1000, run with, for example:

BASE=$HOME/whisper-diarization
mkdir -p $BASE/data
mkdir -p $BASE/HOME_CACHE
mkdir -p $BASE/HOME_CONFIG
APP=/usr/src/app
mv /tmp/recording.mp3 data/
docker run --rm -it \
  -v $BASE/data:/data \
  -v $BASE/HOME_CONFIG:$APP/.config \
  -v $BASE/HOME_CACHE:$APP/.cache \
  --user joe:joe \
  whisper-diarization \
  bash

Now you're in the container at a non-root shell prompt, presumably. Run:

export HOME=/usr/src/app
source venv/bin/activate
python diarize_parallel.py -a /data/recording.mp3
exit

Now, inspect and manually clean up $BASE/data/recording.txt on the host.

@cvette
Copy link

cvette commented Nov 9, 2023

Don't forget the --gpus all for docker run (if you want to use your GPU).

@transcriptionstream
Copy link
Contributor

Just released "transcription stream" on GitHub today, which includes a docker image that runs diarize.py. Takes me about 15 minutes to build, but works great and is fast/automated. Would love to get your thoughts: https://github.com/transcriptionstream/transcriptionstream

@occult
Copy link

occult commented Apr 25, 2024

It took me 30 minutes to build it and the 7.5GB size, but it works. Thanks for sharing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants