|
Keywords
|
Green AI, Large Language Models, Energy Efficiency, Speculative Decoding, Sustainable Computing, Model Optimization
|
|
Abstract
|
— The rapid expansion of large language models has intensified computational and energy demands, making efficiency a key requirement for sustainable AI deployment. While most prior work targets optimization of very large models, the efficiency of widely deployed small-scale LLMs remains underexplored. This paper introduces a Green Speculative Decoding framework aimed at reducing the energy footprint of compact language models. The proposed approach employs a synchronized speculative pipeline in which a lightweight draft model generates candidate tokens that are selectively verified by a larger, more accurate target model. Experimental results from a simulation-based evaluation demonstrate notable improvements in inference efficiency, achieving higher throughput and significant energy savings without degrading output quality. These findings highlight the potential of speculative decoding as a practical and eco-friendly solution for energy-efficient LLM inference in real-world deployments.
|