Qwen3-Next with 80b-a3b parameters is out

robber@lemmy.ml · edit-2 25 days ago

Qwen3-Next with 80b-a3b parameters is out

robber@lemmy.ml · 26 days ago

I’d add that memory bandwidth is still a relevant factor, so the faster the RAM the faster the inference will be. I think this model would be a perfect fit for the Strix Halo or a >= 64GB Apple Silicon machine, when aiming for CPU-only inference. But mind that llamacpp does not yet support the qwen3-next architecture.

Multiplexer@discuss.tchncs.de · 25 days ago

Can confirm that from my setup. Increasing the parallelization beyond 3-4 concurrent threads doesn’t also significantly increase the inference speed any more.
This is a telltale sign that some of the cores are starving because data doesn’t arrive fast enough any more…

Qwen3-Next with 80b-a3b parameters is out

Qwen3-Next with 80b-a3b parameters is out

Qwen3-Next - a Qwen Collection