when digging around I happened to find this thread which has some benchmarks for a diff model
it's apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf. strawberry
), then any Use Deserving Of o3 (let's say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a "reasonable" response latency ("5-ish seconds")
so you'd need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in "quick" time. which, again, isn't even whatever the fuck people are doing with openai's garbage.
utter, complete, comprehensive clownery. era-redefining clownery.
but some dumb motherfucker in a bar will keep telling me it's the future. and I get to not boop 'em on the nose. le sigh.
that list undercounts far more than I expected it to