-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Youtube Compatible Transcript #151
Comments
Issue is chatGPT is not able to convert long video like 20 min based on your output to YouTube subtitle. It says "I apologize for any misunderstanding. Generating a large amount of text, such as full subtitles for a 19-minute video, is beyond the capabilities of this platform. However, I can help you generate a summary or key points from the video if you provide me with the specific details or time stamps of the sections you need assistance with. Please let me know how I can assist you further." |
Hey @rajeshkumaryadavdotcom! Thanks for your interest in Whisper JAX and glad to hear it's a useful resource! The idea of the demo is that it's intended to be a demonstration (demo) of the Whisper model for speech transcription, rather than a fully-fledged meeting transcription tool. If you'd like to build these features on top of the demo, feel free to fork the space and add these new features on top! However, they're more along a product line than the ML demo this is purposed to be |
Hello, @sanchit-gandhi. I appreciate your generosity in providing the HF Space to the public, it's a great resource for general quick transcription tasks, but also for using its API - although it is hidden in the UI. I'm replying to this issue since it's on the topic of YouTube transcriptions. I'm working on a userscript (mod) for YouTube that can transcribe any video and display the subtitles in the player natively. I've attached a demo video. I've been able to transcribe videos up to 50 minutes long. I wanted to ask if this is acceptable for you? I understand that running TPUs like that must be costly, but I read that it's supported by Google's TRC programme, so I just wanted to confirm if it's okay. I might publish my project in the future to a userscript directory, making my project be used by more people - although I am not sure exactly how many, or I can just keep it for personal use, depending on how okay you are with it. Thank you in advance. Screen.Recording.2024-01-14.at.14.50.422.mp4 |
@rajeshkumaryadavdotcom If you are familiar with Node, I wrote a rough simple parser for the timestamped output of whisper-jax. You can modify it to suit your desired format: const fs = require('fs');
function customFormatToJson(subtitleContent) {
const subtitleBlocks = subtitleContent.split('\n'); // Assuming each subtitle is on a new line
const jsonSubtitles = { events: [] };
subtitleBlocks.forEach(block => {
const timeTextSplit = block.split('] ');
const timeRange = timeTextSplit[0].replace('[', '').split(' -> ');
const startTime = customTimeToMs(timeRange[0]);
const endTime = customTimeToMs(timeRange[1]);
const text = timeTextSplit[1];
jsonSubtitles.events.push({
tStartMs: startTime,
dDurationMs: endTime - startTime,
segs: [{ utf8: text }]
});
});
return jsonSubtitles;
}
function customTimeToMs(timeStr) {
if (!timeStr || !timeStr.includes(":")) return 0;
const [hoursMinSec, milli] = timeStr.split('.');
// example: 15:22 570, if hours then 01:15:22 570
const hours = hoursMinSec.length > 5 ? hoursMinSec.split(':')[0] : 0;
const minutes = hoursMinSec.length > 5 ? hoursMinSec.split(':')[1] : hoursMinSec.split(':')[0];
const seconds = hoursMinSec.length > 5 ? hoursMinSec.split(':')[2] : hoursMinSec.split(':')[1];
const milliseconds = milli || 0;
return parseInt(hours) * 3600000 + parseInt(minutes) * 60000 + parseInt(seconds) * 1000 + parseInt(milliseconds);
}
const srtContent = fs.readFileSync('jax-output-timestamps.txt', 'utf8');
const jsonSubtitles = customFormatToJson(srtContent);
console.log(JSON.stringify(jsonSubtitles, null, 2)); Currently, it takes in |
Hi,
Thank you very much for whisper-jax, it is very useful.
I would like to request a feature on https://huggingface.co/spaces/sanchit-gandhi/whisper-jax after transcript is generated, I need to go to chatGPT and ask it to convert in that format which YouTube accepts.
Can you please enable one more radio option like transcribe, translate, YouTube subtitle and also you can have one more option to write YouTube video description based on transcript.
Regards,
Raj
The text was updated successfully, but these errors were encountered: