I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post

https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/

The author make a (very) interesting claim: if table stakes are $6K (they’re not…but go with it for now), then most folks are cooked from the get go.

Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (–ctx 64K IIRC, but go check the vids yourself)

https://youtu.be/8F_5pdcD3HY

I think there’s a lot of juice to squeeze by turning LLMs from “all seeing sages” into basically mouth pieces for shit that actually runs fast on regular silicon - but that’s just me and my crazy brain. YMMV.

  • randomaside@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    7 hours ago

    I’ve been saying this for a while to people. I think the long term use case for LLMs is the semantic human interface device.

    Siri,Alexa, even google home?(whatever they called it), they all swung and missed at this. However evening able to provide commands unclearly to a computer and get the intended result would be a huge win.

    I know the big llm inference can do a lot more but the cost is high for systems with that ability to reason however small, lightweight llms are actually very good for command and control.

    This where my current homeland projects are focused.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 hours ago

      Hey, me too :) As my school teachers use to tell me “Great minds think alike (but fools seldom differ :)”

      For me, I’m thinking of having a LLM as one layer / one container in a homelab that does some specific stuff

      • queries against local docs / notes / manuals / PDFs / wiki material as the trusted knowledge layer
      • uses tools for search, file lookup, shell, git, Docker, Home Assistant, calendar, etc.
      • a local “Codex” / wiki layer that turns my own source material into an inspectable knowledge base
      • provenance and audit trails

      I want to take a screenshot of something, drop it into Syncthing from my phone, then later ask “did I fuck the pins on this?” … and for it to look up the schematics, eyeball the pins and tell me. Or I say “hey, can you grab a copy of X for me, usual params” and have the LLM instruct Sonarr/Radarr/Sabnzdb to do that. (That is, make your OWN “Alexa” with an Arduino ESP32, stick it in a room and then call it when you need it).

      So instead of asking a 70B model to “know” why your media server is down, the system checks service status, logs, last config changes, prior notes, Docker state, network state, etc., then the LLM explains the result in human language. You can probably do that with a 4B (I’m testing that assumption now).

      Same for “find that motherboard note,” “summarize this email thread,” “turn this into a task,” “compare this Ebay listing to my saved hardware notes,” “what did I do last time this broke,” or “run the smoke test and tell me the first real failure.”

      I think small models are the shit for this because if the model only has to classify intent, route the request, render structured evidence, and talk like a normal human…then it doesn’t need to be a giant oracle. The expensive (time wise) part becomes less “make the model smarter” and more “build a better control plane around it.”

      Basically: local LLM as semantic HID; expert system/tool router underneath; user owns the data and the machine.

      As always, ICBW…but fuck it, I’m gonna try.

      PS: I have an idea of how to apply that to coding too…but that’s a project for much later. I’ve been cooking this shit for far too long. The next thing I wanna do is a fun project for myself (that is: ROM hack a parachute and grappling gun into Super Mario Sunshine, so I can basically play “What if Super Mario Sunshine but actually Just Cause 2” on my Wii with the kids.