reply on: OpenAI Charges by the Minute, So Make the Minutes Shorter \ stacker news ~devs

pull down to refresh

30 sats \ 9 replies \ @optimism 26 Jun \ on: OpenAI Charges by the Minute, So Make the Minutes Shorter devs

lol! awesome hack

100 sats \ 8 replies \ @carter OP 26 Jun

it reminded me of deaf people who can listen to their screen readers at like 5x speed and still understand it

100 sats \ 7 replies \ @SimpleStacker 26 Jun

this makes me wonder if there's some theoretical maximal compression you can put on audio data to still make it transcribable

100 sats \ 5 replies \ @carter OP 26 Jun

He tried this in the article

Why Not 4x? When I pushed it to 4x the results became comically unusable. https://gist.github.com/georgemandis/1ec4ef084789f92ee06ac6283338a194

100 sats \ 0 replies \ @optimism 26 Jun

So if I look at what the FOSS implementations do, it's the same as what the image analyzers do: shifting a smaller window to reduce context. So perhaps the issue is that there is too much information within the window if the speed is too high (because the windowing is probably static?)

i.e how SonicVerse operates on large audio files: (note the 10s chunks)

330 sats \ 3 replies \ @optimism 26 Jun

Locally tested because whisper is one of the few open source-ish things from openAI.

Test subject: 00:01:00 to 00:01:30 from #1014758, at 1x, 2x, 3x and 4x.

Because the default python library doesn't support Apple Silicon out of the box and I'm too lazy to spend time figuring out converting to CoreML right now, i just used WhisperKit instead, which basically provides pre-converted models and downloads them for ya (first run takes forever because you'll be leeching a few GB off HF without being informed about that)

% whisperkit-cli transcribe --audio-path ./small.mp3 
So how's everything going at the hackerspace? Are you all still pushing out code over
there in Italy? Yeah, yeah, still going strong, I would say. So it goes a bit up and down
because I'm obviously super busy with the company. So there are times when I have
more free time and I try to dedicate that to the hackerspace. So sometimes we
organize events, stuff like that. Sometimes we're less active when we're particularly
busy with the company, we tend to be less

% whisperkit-cli transcribe --audio-path ./small-2x.mp3
So how's everything going at the hackerspace? Are you all still pushing out code over
there? Anybody? Yeah, yeah, still going strong, I would say. So it goes a bit up and
down because I'm obviously super busy with the company. So there are times when I
have more free time. I try to dedicate that to the hackerspace. So sometimes we
organize events, stuff like that. Sometimes we're less active when we're particularly
busy with the company. We tend to be less active.

% whisperkit-cli transcribe --audio-path ./small-3x.mp3
So how's everything going into the hacker space? Are you still pushing up code over
there? Yeah, it's going strong, I would say. So we have to get up and down, because
we do the components at the other times when I have more free time, I have to get up
and down, so we have to get up and down.

% whisperkit-cli transcribe --audio-path ./small-4x.mp3
So how's it going? Are you still? Yeah, that's going strong. Okay. So we're going to
have a couple of a couple of the comments that we've got to go to the first one. I'm
going to have a couple of the comments. I'm going to have a couple of the
comments.