AI
Shrinking LLMs Without Losing Intelligence: GSQ Quantization
GSQ uses Gumbel-Softmax sampling to compress large language models to 2-3 bits while maintaining the accuracy that older methods lose at high compression levels.
Category
GSQ uses Gumbel-Softmax sampling to compress large language models to 2-3 bits while maintaining the accuracy that older methods lose at high compression levels.
Researchers introduce FUSE, a method to ensemble multiple imperfect LLM judges into a high-accuracy verifier without requiring expensive human-labeled datasets.