Models Compression - Search News

18h

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.

The Manila Times

Nota AI Has Two MoE Quantization Papers Accepted at ICML 2026 Workshop, Demonstrating Global Competitiveness in Large-Scale AI Optimization

Two papers on MoE-specific quantization algorithms accepted at a workshop held in conjunction with ICML 2026Recognition ...

TechCrunch

Buzzy AI startup Multiverse creates two of the smallest high-performing models ever

One of Europe’s most prominent AI startups has released two AI models that are so tiny, they have named them after a chicken’s brain and a fly’s brain. Multiverse Computing claims these are the ...

10d

Tether Brings AI Memory Compression To Consumer Devices

Tether’s TurboQuant enables useful and powerful local AI applications on consumer devices at much lower costs and without ...

TechNewsWorld

Small Changes in AI Models Can Produce Big Energy Savings

Small changes in the large language models (LLMs) at the heart of AI applications can result in substantial energy savings, according to a report released by the United Nations Educational, Scientific ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results