pull down to refresh
30 sats \ 9 replies \ @optimism 13h \ on: OpenAI Charges by the Minute, So Make the Minutes Shorter devs
lol! awesome hack
it reminded me of deaf people who can listen to their screen readers at like 5x speed and still understand it
reply
this makes me wonder if there's some theoretical maximal compression you can put on audio data to still make it transcribable
reply
He tried this in the article
Why Not 4x? When I pushed it to 4x the results became comically unusable. https://gist.github.com/georgemandis/1ec4ef084789f92ee06ac6283338a194
reply
Locally tested because whisper is one of the few open source-ish things from
open
AI.Test subject: 00:01:00 to 00:01:30 from #1014758, at 1x, 2x, 3x and 4x.
Because the default python library doesn't support Apple Silicon out of the box and I'm too lazy to spend time figuring out converting to CoreML right now, i just used
WhisperKit
instead, which basically provides pre-converted models and downloads them for ya (first run takes forever because you'll be leeching a few GB off HF without being informed about that)% whisperkit-cli transcribe --audio-path ./small.mp3
So how's everything going at the hackerspace? Are you all still pushing out code over
there in Italy? Yeah, yeah, still going strong, I would say. So it goes a bit up and down
because I'm obviously super busy with the company. So there are times when I have
more free time and I try to dedicate that to the hackerspace. So sometimes we
organize events, stuff like that. Sometimes we're less active when we're particularly
busy with the company, we tend to be less
% whisperkit-cli transcribe --audio-path ./small-2x.mp3
So how's everything going at the hackerspace? Are you all still pushing out code over
there? Anybody? Yeah, yeah, still going strong, I would say. So it goes a bit up and
down because I'm obviously super busy with the company. So there are times when I
have more free time. I try to dedicate that to the hackerspace. So sometimes we
organize events, stuff like that. Sometimes we're less active when we're particularly
busy with the company. We tend to be less active.
% whisperkit-cli transcribe --audio-path ./small-3x.mp3
So how's everything going into the hacker space? Are you still pushing up code over
there? Yeah, it's going strong, I would say. So we have to get up and down, because
we do the components at the other times when I have more free time, I have to get up
and down, so we have to get up and down.
% whisperkit-cli transcribe --audio-path ./small-4x.mp3
So how's it going? Are you still? Yeah, that's going strong. Okay. So we're going to
have a couple of a couple of the comments that we've got to go to the first one. I'm
going to have a couple of the comments. I'm going to have a couple of the
comments.
So if I look at what the FOSS implementations do, it's the same as what the image analyzers do: shifting a smaller window to reduce context. So perhaps the issue is that there is too much information within the window if the speed is too high (because the windowing is probably static?)
i.e how SonicVerse operates on large audio files: (note the 10s chunks)
reply