Skip to content

In this notebook, I implemented a script to transcribe YouTube videos (and audio files in general) using Google's speech-to-text API.

Notifications You must be signed in to change notification settings

labrijisaad/Youtube-video-transcriptor

Repository files navigation

🎥 Youtube-video-transcriptor in Python 🐍

In this project, I developed a script in Python that uses Google's speech-to-text technology to transcribe audio from YouTube videos.

  • ⚠️ Please note the following before using the script:
  • 1️⃣ The script is intended to be run on Google Colaboratory! Open In Colab
  • 2️⃣ The script may not always accurately transcribe text due to noise or the way the speaker talks in the video (e.g. speaking too fast or too slow).
  • 3️⃣ The summary model used in the script is a community model available on Huggingface that only supports English text. It may not always accurately capture the general idea of the transcription, especially if there is a lack of data.

❓❓❓ HOW TO USE ❓❓❓

>>> 1️⃣ Run the notebook in Colab (make sure you are logged into Colab with your Google account).
>>> 2️⃣ Paste the URL of the youtube video you want to transcribe into the `url` variable.
>>> 3️⃣ Replace the `lang` variable with the language spoken in the video (all instructions are provided in the notebook).
>>> 4️⃣ Run all cells (shortcut: `CTRL + F9`)
>>> 5️⃣ Download the generated TXT files (there will be two in total: one for the transcription and one for the translated transcription).

⚠️⚠️⚠️ UPDATE ⚠️⚠️⚠️

>>> To optimize transcription time, I have updated the script to use `Python threads`, which helps to fully utilize the CPU resources provided by Colab. 
>>> As a result, the performance has significantly improved - a 30-minute video can now be transcribed in approximately 35 seconds, compared to the previous time of 2 minutes and 30 seconds. 
>>> You can find the updated script with threads in the accompanying notebook. 😁
  • 📫 Feel free to contact me if anything is wrong or if anything needs to be changed 😎! labrijisaad@gmail.com

Releases

No releases published

Packages

No packages published