privacy-friendly transcription

A few years ago, I transcribed audio files locally/offline on my computer, which is privacy-friendly because you don’t need to upload the files anywhere and everything stays on your local disk. I can’t remember anymore; I think it was either ‘Vosk’ or ‘Whisper’. There was a certain error rate, and I had to manually correct some parts.

Now I had to transcribe something again, and a lot has changed; out of 100 pages of text, only 1 word was misspelled. Really impressive, though the person didn’t speak clearly or spoke quietly. And it automatically detected the language.

After I installed ‘whisper’ (github.com/openai/whisper) via Python and some dependencies, I just had to use 2 commands. One is meant to convert the file to wav format – because it somehow only works with wav. And the other is for transcription.

Interestingly, I read that VLC Media Player has introduced a new feature that they implemented with the same tool whisper, that I’ve used. And I guess that they’re using Ollama (ollama.com) for generating ai-generated subtitles.

I hope there have also been advancements in text-to-speech technology that can be used freely, as they were somewhat inferior to commercial services. This would be really beneficial for open source development and accessibility.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *