I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post
https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/
The author make a (very) interesting claim: if table stakes are $6K (they’re not…but go with it for now), then most folks are cooked from the get go.
Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (–ctx 64K IIRC, but go check the vids yourself)
I think there’s a lot of juice to squeeze by turning LLMs from “all seeing sages” into basically mouth pieces for shit that actually runs fast on regular silicon - but that’s just me and my crazy brain. YMMV.


I hear you; I’m not wildly enamored with reddit either…but that convo is a good springboard.
I see almost everyone chasing bigger GPUs, more parameters, more more more. I figure when 9 people say “go right”, there should be at least someone that can make the plausible case for “actually, here’s why go left works”.
Eg: I think there should be some discussion about watts per token vs tokens per second.
I’m still re-writing the FAQ for my project - when it’s done (and if there’s interest) I will post it here.