Persian Gulf University | Research Info

Title	Green Speculative Decoding for Energy-Efficient Small Lamgauge Models
Type	Presentation
Keywords	Green AI, Large Language Models, Energy Efficiency, Speculative Decoding, Sustainable Computing, Model Optimization
Abstract	— The rapid expansion of large language models has intensified computational and energy demands, making efficiency a key requirement for sustainable AI deployment. While most prior work targets optimization of very large models, the efficiency of widely deployed small-scale LLMs remains underexplored. This paper introduces a Green Speculative Decoding framework aimed at reducing the energy footprint of compact language models. The proposed approach employs a synchronized speculative pipeline in which a lightweight draft model generates candidate tokens that are selectively verified by a larger, more accurate target model. Experimental results from a simulation-based evaluation demonstrate notable improvements in inference efficiency, achieving higher throughput and significant energy savings without degrading output quality. These findings highlight the potential of speculative decoding as a practical and eco-friendly solution for energy-efficient LLM inference in real-world deployments.
Researchers	Nima Razmjoei (First researcher) , Rezvan MohammadiBaghmolaei (Second researcher)