

Awesome! I have llama.cpp running in a desktop in another room and OpenCode on my laptop, so I have to keep going back and forth to see what’s going on. I’ll definitely be having a look at this!


Awesome! I have llama.cpp running in a desktop in another room and OpenCode on my laptop, so I have to keep going back and forth to see what’s going on. I’ll definitely be having a look at this!


I just commented about the Llama.cpp TurboQuant fork in this post, which would help. Also, the recipe for single card here https://github.com/noonghunna/club-3090 will get you 200k. If you have a 4090, there’s no reason you can’t do Qwen 3.6 27B with 200k at decent speeds.


I’m running the TurboQuant fork of Llama.cpp and doing K:Q4, V:Q3 with 200k context on the Qwen 3.6 MoE variant. On a RTX 3070 with CPU offloading, I’m getting 280 t/s prefill on over 100k context and about 20-30 t/s decode. It’s usable, though the MoE makes a lot of coding mistakes and I have to make it fix things constantly.
I just tried the 27B variant yesterday and it was too slow to be usable with 50 t/s prefill and 0.27 t/s decode, though it fixed a bug the MoE model was struggling with in one shot (took hours 😭). I bet with 16GB of VRAM and TurboQuant for KV cache, you’d get decent speeds and decent context. I have a RTX 4060 in my server and I’m considering moving it so the 3070 and 4060 are in the same machine to see how 27B does across 2 cards with 16GB.
A peaceful world where the Greater Israel Project has been completed and there are no pesky anti-Semitic Semites left in the Middle East to create conflict.


Ah, ok, hope it helps someone. I’ll probably try the model this weekend sometime.


This Dockerfile worked for me to build the llama-cpp-turboquant fork: https://huggingface.co/spaces/ai-engineering-at/llama-cpp-turboquant-guide/blob/main/Dockerfile. Should work for upstream too. The Dockerfile I made myself crashed 2 different machines, but then I found this one and can confirm it works well.
Carbon is the 4th most abundant element in the galaxy. Silicon is twice as rare, so maybe spend 6 month’s salary on a quartz ring instead? Either that, or save up for a down payment on a house. Nah, who needs a place to live when you can have a hunk of mineral, right?


Ollama is not open source? It seems to be MIT licensed: https://github.com/ollama/ollama/blob/main/LICENSE. Am I overlooking something?


Not necessarily the best example in this post, but in general, I find I get downvoted a lot when I make a good faith comment in .ml posts where I think I’m agreeing and am trying to understand the topic further. Getting 20 downvotes kind of kills my motivation to engage any further, though, so I usually just delete my comment , shrug, and move on. I’ve gotten to the point where I try to avoid wasting my time and energy commenting on .ml posts in the first place.
Maybe it’s just a different perspective on what a downvote means? To me, I’m generous with upvotes and withholding an upvote means I don’t find it interesting or disagree. I use downvotes sparingly for spam, trolling, comments made in bad faith, etc.
A downvote in my mind roughly translates to “fuck off”, so if a group gangs up and tells everyone with a slightly different worldview to fuck off, then they eventually will and said group will be all by themselves in their own echo chamber. If that’s the goal, then fair enough.
Thinking about the decapitated heads in her freezer makes her smile.
Addiction drives quarterly profits, though.


I run k3s on a single node and it’s not really that much more overhead than Docker Compose if you understand k8s. I mostly have a deployment.yaml, service.yaml, ingress.yaml, and network-policy.yaml for each service that I’ve copy / pasted and updated. Here are some of the benefits over Docker Compose for my setup:
Has a built-in Traefik reverse proxy / ingress controller so I can access my services by domain name instead of by port, like http://jellyfin.lan/, http://forgejo.lan/ (using local dns on my OpenWRT router)
I use the Calico CNI so I can have network policies for each service to allow them to access only what they need. If a service doesn’t need internet access, it doesn’t get it.
I use Bitnami Sealed Secrets to store my secrets in YAML files that can be safely stored in git
ConfigMaps make it easy to manage configuration files
Easier to have separate YAML files for each service while sharing a network between them. Services connect to each other like http://forgejo.forgejo.svc.cluster.local/
Of course, if you’re looking to load balance across multiple machines, k3s makes even more sense.
Edit:
k8s is the clear industry standard for container orchestration at this point, so if you want something beyond Compose, a lightweight k8s distribution like k3s is an obvious choice.
Never really understood why people drink aspartame water. Soda was originally intended to be a sugary treat and an experience to be enjoyed every once in a while, not a daily source of hydration.
deleted by creator
deleted by creator


Maybe try old.reddit.com?


I got an espresso machine a few years ago and learned to make a proper latte with it. At this point, a $9 cup of charry sugar water made by a teenager in a fast food restaurant doesn’t really appeal to me.
This is the bee’s knees.


Right? I’ve got the original and the 90s version in Jellyfin on my home lab server. 🧟♂️
Using copyleft licenses for closed models is clearly against the spirit of the licenses if the users don’t have access to the source code that includes the original copyleft works. Even open weight models aren’t really the source code, and are more akin to a compiled binary. The source code is all the training data and code used to train the model such that anyone can build on it and train new models.
I’m not a lawyer and am not sure how well existing copyleft licenses like GPL or CC-SA would stand up in court to enforce this, but if they don’t, then stronger licenses that explicitly cover works being used as training data need to become more common.
I’ve seen the argument that the models are just learning from the data in the same way a human would. That’s nonsense. It’s not like they’re creating a sentient being with its own agency that can tell them to fuck off if it wants. These companies are running a software pipeline against copyrighted IP to convert it into a derivative work that is now supposedly wholly owned by said company, but the reality is that it’s collectively owned by everyone who contributed to the copyleft training data.