![]() ![]() ![]() ![]() (I was, though you can also try to compile the code yourself if you want.) Just grab WhisperDesktop.zip and extract it somewhere. Getting WhisperDesktop running proved very easy, assuming you're willing to download and run someone's unsigned executable. It also means that it's not using special hardware like Nvidia's Tensor cores or Intel's XMX cores. That uses DirectCompute rather than PyTorch, which means it will run on any DirectX 11 compatible GPU - yes, including things like Intel integrated graphics. There's also this Const-Me project, WhisperDesktop, which is a Windows executable written in C++. ![]() Of course there's the OpenAI GitHub (instructions and details below). There are a few options for running Whisper, on Windows or otherwise. We wanted to let the various GPUs stretch their legs a bit and show just how fast they can go. Real-time speech recognition only needs to keep up with maybe 100–150 words per minute (maybe a bit more if someone is a fast talker). We did not attempt to use it in that fashion, as we were more interesting in checking performance. Note also that Whisper can be used in real-time to do speech recognition, similar to what you can get through Windows or Dragon NaturallySpeaking. You can also run it on your CPU, though the speed drops precipitously. The last one is our subject today, and it can provide substantially faster than real-time transcription of audio via your GPU, with the entire process running locally for free. Besides ChatGPT, Bard, and Bing Chat (aka Sydney), which all run on data center hardware, you can run your own local version of Stable Diffusion, Text Generation, and various other tools. Biases have long plagued even the best systems, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM and Microsoft made far fewer errors - about 19% - with users who are white than with users who are Black.The best graphics cards aren't just for gaming, especially not when AI-based algorithms are all the rage. That last bit is nothing new to the world of speech recognition, unfortunately. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to speakers of languages that aren’t well-represented in the training data. Because the system was trained on a large amount of noisy data, OpenAI cautions that Whisper might include words in its transcriptions that weren’t actually spoken - possibly because it’s both trying to predict the next word in audio and transcribe the audio recording itself. Whisper has its limitations, though - particularly in the area of “next-word” prediction. According to a 2020 Statista survey, companies cite accuracy, accent- or dialect-related recognition issues and cost as the top reasons they haven’t embraced tech like tech-to-speech. To Brockman’s point, there’s plenty in the way of barriers when it comes to enterprises adopting voice transcription technology. It’s much, much faster and extremely convenient.” “The Whisper API is the same large model that you can get open source, but we’ve optimized to the extreme. “We released a model, but that actually was not enough to cause the whole developer ecosystem to build around it,” Brockman said in a video call with TechCrunch yesterday afternoon. But what makes Whisper different is that it was trained on 680,000 hours of multilingual and “multitask” data collected from the web, according to OpenAI president and chairman Greg Brockman, which lead to improved recognition of unique accents, background noise and technical jargon. It takes files in a variety of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM.Ĭountless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon and Meta. Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI claims enables “robust” transcription in multiple languages as well as translation from those languages into English. To coincide with the rollout of the ChatGPT API, OpenAI today launched the Whisper API, a hosted version of the open source Whisper speech-to-text model that the company released in September. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |