Cut it in 1 minute file chunks to not need 50TB RAM
Ran the chunks through whisper spitting out the text of what was said
Combined the chunks. It's now 450kb (!!!)
Ran the full text through SpaCy for normalization
Ran the normalized, lemmatized text through a bert-base-uncased transformer
Asked it: "what should policy support?"
It answered: modernization, ideology, geopolitics. That's nice and concise and now I don't have to watch nearly 9 hours of talks. (kidding, but this is still useful)
Total time spent coding: 1h (next time: at most 10 minutes)
whisper
spitting out the text of what was saidSpaCy
for normalizationbert-base-uncased
transformer"what should policy support?"
modernization
,ideology
,geopolitics
. That's nice and concise and now I don't have to watch nearly 9 hours of talks. (kidding, but this is still useful)