WhisperDock - Speech-to-Text Service 🎤

This repository hosts the Dockerized speech-to-text transcription service, which utilizes Whisper C++ alongside Python to provide an API for audio file transcription.

Background and Motivation

Access to efficient and robust tools for everyday applications is essential in the rapidly advancing field of machine learning. Speech-to-text transcription is one of the areas that has seen significant improvements, but deploying these models quickly and efficiently remains a challenge. Whisper C++, a high-performance transcription tool, has emerged as a powerful option, yet it still requires a streamlined pathway to deployment.

This repository was created out of a necessity to bridge the gap between the development of speech-to-text models and their deployment in real-world applications. Many existing solutions require extensive setup and intricate knowledge of systems and can be time-consuming to deploy, creating a barrier for developers, researchers, and businesses who want to integrate transcription capabilities into their services.

The Speech-to-Text Transcription Service aims to provide a fast, reliable, and easy-to-use solution for deploying Whisper C++ models. By containerizing the service with Docker, we significantly reduce the complexity of deployment and make it possible to launch a transcription service that is both scalable and accessible.

Here are some of the key motivations behind this project:

Speed of Deployment: By providing a Dockerized solution, we enable rapid deployment of the transcription service, allowing users to go from zero to a fully functioning service in minutes.
Ease of Use: The provided APIs and Docker setup are designed to be as simple as possible, requiring minimal configuration and allowing for easy integration into existing workflows.
Accessibility: Making Whisper C++ easily deployable opens up more opportunities for developers and organizations of all sizes to utilize state-of-the-art transcription technology.
Continuous Integration and Delivery: With GitHub Actions, updates, and improvements are integrated seamlessly, ensuring the service remains up-to-date with the latest advancements from the Whisper C++ repository.

In contributing to this repository, I hope to empower individuals and organizations to harness the capabilities of Whisper C++ without the overhead of complex deployment processes, thus fostering innovation and development in the field of speech recognition.

Getting Started

Using Docker Image

For quick deployment, use the Docker images provided in the Docker registry.

For the latest stable version:

docker pull dublok/whisperdock:latest
docker run -p 5000:5000 dublok/whisperdock:latest

For the nightly build (unstable but with early access to new features):

docker pull dublok/whisperdock:main
docker run -p 5000:5000 dublok/whisperdock:main

The service should now be accessible at http://localhost:5000.

Building from Source

Clone the repository:

git clone https://github.com/ErcinDedeoglu/WhisperDock

Build the Docker image:

docker build -t whisperdock .

Run the container:

docker run -p 5000:5000 whisperdock

API Usage

To transcribe audio, make a POST request to the /transcribe endpoint with the audio file:

curl -X POST -F 'file=@/path/to/your/audio.wav' http://localhost:5000/transcribe

Ensure your audio file is in WAV format with a sample rate of 16kHz.

Example Response

Upon successful transcription, the service will return a JSON response containing the transcription along with the timestamps for each transcribed segment. An example response might look like this:

{
  "transcription": [
    {
      "start_time": "00:00:00.000",
      "end_time": "00:00:03.000",
      "text": "Welcome to our speech-to-text service."
    },
    {
      "start_time": "00:00:03.500",
      "end_time": "00:00:05.000",
      "text": "This is a sample transcription."
    }
  ]
}

If there is an error in transcription, the service might return an error response like:

{
  "error": "Error in transcription"
}

Make sure to handle both success and error responses appropriately in your application.

Adjust the example response to match the actual output format of your transcription service. The error message should also reflect what your service would actually return in case of a failure.

Development

Prerequisites

Docker
Python 3.8
C++ build tools (cmake, make, g++)
ffmpeg

Setup and Build

The Dockerfile in this repository details the steps to set up the environment and install dependencies necessary for running the transcription service.

Contributing

Contributions are welcome! If you wish to contribute, please create a pull request with your proposed changes or fixes.

Continuous Integration

This project uses GitHub Actions for continuous integration, which automates the following:

sync-whisper.yml: Synchronizes with the latest tag or commit of whisper.cpp.
publish-docker.yml: Automatically builds and pushes Docker images to the registry upon changes.

License

This Speech-to-Text Transcription Service is available under the CC0 1.0 Universal public domain dedication.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

assets

assets

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

WhisperDock - Speech-to-Text Service 🎤

Background and Motivation

Getting Started

Using Docker Image

Building from Source

API Usage

Example Response

Development

Prerequisites

Setup and Build

Contributing

Continuous Integration

License

About

Contributors 2

Languages

License

ErcinDedeoglu/WhisperDock

Folders and files

Latest commit

History

Repository files navigation

Background and Motivation

Getting Started

Using Docker Image

Building from Source

API Usage

Example Response

Development

Prerequisites

Setup and Build

Contributing

Continuous Integration

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages