0 sats \ 3 replies \ @k00b OP 10 Oct 2023 \ parent \ on: Big Tech Struggles to Turn AI Hype Into Profits tech
Oh for sure. My knowledge is dated but once upon a time it was thought you could ship trained models to clients and run them there without specialized hardware.
If this is all there is to it then some customers are performing 4x more inference requests than others which tracks.
Maybe what I'm not accounting for is the size of these models. If they are enormous with many many weights, then scaling inference could be super-linear.
Based on what I know of these model architectures, compute costs should scale linearly with the number of requests (or more precisely, the number of batches since TPUs will process requests in parallel)
There could be other issues regarding concurrency, latency, congestion, etc. Or maybe there are other physical limitations regarding hardware. But just on the model itself I don't see why it should super-linear in the number of requests. If I'm wrong I'd be happy to know it though.
reply
This is a quote from the blog I was thinking of:
In a widely-read 2020 paper, OpenAI reported that the accuracy of its language models scaled “as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude.”
reply
Thanks. This still seems to be mostly talking about fixed training costs though.
I can't figure out why it's so expensive to run the models once they're created unless they're massive and irreducible ... which they probably are, but I haven't found a written account of that.
reply