I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post
https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/
The author make a (very) interesting claim: if table stakes are $6K (they’re not…but go with it for now), then most folks are cooked from the get go.
Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (–ctx 64K IIRC, but go check the vids yourself)
I think there’s a lot of juice to squeeze by turning LLMs from “all seeing sages” into basically mouth pieces for shit that actually runs fast on regular silicon - but that’s just me and my crazy brain. YMMV.


I think it’s a bit early to be locking down your local LLM setup. The history of computing suggests there is some cost reduction to come for the hardware. In the mean time there is time to find the sweet spot for performance of the small and medium models. There are plenty of cloud hosts which can run the open models and allow you to experiment while the models mature. Hopefully your not burning 6k worth of tokens anytime soon.