I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post
https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/
The author make a (very) interesting claim: if table stakes are $6K (they’re not…but go with it for now), then most folks are cooked from the get go.
Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (–ctx 64K IIRC, but go check the vids yourself)
I think there’s a lot of juice to squeeze by turning LLMs from “all seeing sages” into basically mouth pieces for shit that actually runs fast on regular silicon - but that’s just me and my crazy brain. YMMV.


On the broader topic of “the llm is the mouth, not the brain”, I just stumbled across this.
https://www.atomelm.com/index.html#what
https://www.atomelm.com/index.html#prototypes
Might turn out to be something yet, dunno. Web demo is a bit meh.