Abstract words form a significant component of the vocabulary of any language. Their separation from perception, action, and emotion systems makes these words challenging for EFL learners to understand and learn. The literature highlights the importance of collocating abstract adjectives with concrete nouns to enhance learning. However, the type of context and modality of instruction most effective for teaching abstract words remains underexplored. This study investigated the effects of instructional modality (visual imagery through pictures and performance through videos), and the compatibility of modality with linguistic context (collocated action or non-action noun) on learning abstract adjectives by EFL learners, as well as their interplay. To that aim, 60 adjective-noun collocations were chosen as the target items to teach to two groups of intermediate young EFL learners in two different modalities. The frequency, length, and abstractness degrees of the adjectives were controlled, while the collocated nouns were all highly frequent, known to the participants, and categorized as action or nonaction. The participants (N=30) were randomly assigned to two equal groups, each undergoing a different instructional treatment. The findings of the statistical analysis revealed the superiority of video modality over the picture modality. Additionally, vocabulary learning improved in the compatible condition, when compatibility was considered in isolation. However, the interaction of compatibility and modality led to the emergence of a counterintuitive effect emerged, pointing to a decline in learning when abstract adjectives were taught in collocation with action nouns through videos. This finding might be accounted for by the cognitive load of the participants in this condition. These findings hold significant pedagogical implications for educators and curriculum designers who aim to improve the effectiveness of vocabulary instruction in EFL contexts.