SparseLLM Global Pruning Framework
SparseLLM is a new global pruning framework proposed by researchers from Emory University and Argonne National Laboratory in 2024.SparseLLM: Towards Global Pruning of Pre-trained Language Models", which has been accepted by the NeurIPS conference, is a framework designed to improve the efficiency of pre-training large language models (LLMs).
The SparseLLM framework achieves efficient optimization and excellent performance even at high sparsity by decomposing the global pruning problem into more manageable subproblems. The advantage of SparseLLM is that it can achieve global pruning with low memory consumption. Based on the observation that LLMs can be expressed as a composite function, the development team reformulated the global pruning objective into an equivalent form through auxiliary variables, thereby decomposing it into multiple subproblems. Then, an efficient algorithm was developed to achieve the global optimal solution by alternately optimizing each subproblem.
Experimental results show that the SparseLLM framework can achieve efficient global pruning on pre-trained language models of different sizes while maintaining good model performance. SparseLLM performs well on both smaller OPT models and larger LLaMA models, especially under high sparsity conditions. In addition, SparseLLM's convergence speed and versatility after pruning also provide strong support for its efficiency and applicability in practical applications.