"The cost of running LLMs is just too damn high"

SuspiciousCarrot78@aussie.zone · edit-2 1 day ago

"The cost of running LLMs is just too damn high"

1 day ago

I’ve been running Qwen3.6 35B very easily. but that’s because I’ve got a ASUS Z13, which is one of the newish laptops that have the AMD Ryzen AI MAX+ 395 in it. you can get that chip with 128gb of unified memory, 96gb of which I have dedicated to be VRAM. I can also run Qwen3 Coder Next 80B. I’m not sure how many tokens per second I’m getting with Coder, but it’s fast.

honestly I think this unified memory might be the future of mobile chips, because the things I can do with it are pretty crazy. it’s not just useful for AI either, it’s in a few gaming laptops because it also works really well when gaming. but the things you can do with LLMs or diffusion models is amazing. I donate compute to AI Horde, and I’m finishing image generation jobs for people in like 4 seconds.

SuspiciousCarrot78@aussie.zone · edit-2 1 day ago

Good man/woman. Nerd Valhalla awaits you :)