Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane.
LocalLLaMA
2
Posts
2
Posters
0
Views
-
Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I'll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
-
Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I'll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
wrote on last edited by [email protected]Some kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg
Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?