Training is definitely costlier than inference, but inference is not costless either, especially if you are fielding millions of requests
Moreover, to maintain a competitive edge I would assume that the models are constantly being fine-tuned, not to mention the fixed costs of maintaining highly specialized and in-demand engineers on staff... I can easily see how costs add up
Training is definitely costlier than inference, but inference is not costless either, especially if you are fielding millions of requests
Oh for sure. My knowledge is dated but once upon a time it was thought you could ship trained models to clients and run them there without specialized hardware.
Moreover, to maintain a competitive edge I would assume that the models are constantly being fine-tuned, not to mention the fixed costs of maintaining highly specialized and in-demand engineers on staff... I can easily see how costs add up
If this is all there is to it then some customers are performing 4x more inference requests than others which tracks.
Maybe what I'm not accounting for is the size of these models. If they are enormous with many many weights, then scaling inference could be super-linear.
reply
Based on what I know of these model architectures, compute costs should scale linearly with the number of requests (or more precisely, the number of batches since TPUs will process requests in parallel)
There could be other issues regarding concurrency, latency, congestion, etc. Or maybe there are other physical limitations regarding hardware. But just on the model itself I don't see why it should super-linear in the number of requests. If I'm wrong I'd be happy to know it though.
reply
This is a quote from the blog I was thinking of:
In a widely-read 2020 paper, OpenAI reported that the accuracy of its language models scaled “as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude.”
reply
Thanks. This still seems to be mostly talking about fixed training costs though.
I can't figure out why it's so expensive to run the models once they're created unless they're massive and irreducible ... which they probably are, but I haven't found a written account of that.
reply