While environmentally beneficial, reducing Sulphur content in diesel fuel has compromised its inherent lubricity,
increasing wear in fuel injection systems. Conventional lubricity testing methods are often inconsistent and
inefficient. This study introduces a data-driven framework that leverages machine learning to estimate diesel
lubricity from standard, easily measurable fuel properties. A substantial industrial dataset of over 400 diesel
samples from multiple refineries was analyzed using a dual-strategy approach: a high-performance Random
Forest (RF) model and an interpretable Python symbolic regression (PySR) model, complemented by Principal
Component Analysis (PCA) for dimensionality reduction. The RF model demonstrated high predictive accuracy
(R2 > 0.96). In contrast, the PySR model generated a transparent, empirically derived equation, identifying
distillation-related parameters as the most critical predictors within the analyzed dataset. While these regression
models successfully capture statistical patterns, it is recognized that they primarily function as “black-box” estimators
that do not account for the specific chemical additives or surface-active polar compounds that fundamentally
govern boundary lubrication. SHAP analysis revealed that while parameters like density and flash point
show statistical importance within this specific model, they are not necessarily physically related to Wear Scar
Diameter (WSD) in practice. The methods use in the current work offers a refined statistical approach to estimating
lubricity, providing a screening tool that complements traditional testing while acknowledging the
inherent complexity of diesel fuel chemistry.