06 اردیبهشت 1403
سيدمحمد بيدكي

سیدمحمد بیدکی

مرتبه علمی: استادیار
نشانی: دانشکده مهندسی جم - گروه مهندسی کامپیوتر (جم )
تحصیلات: دکترای تخصصی / مهندسی کامپیوتر - سیستم های نرم افزاری
تلفن: 07734567889
دانشکده: دانشکده مهندسی جم

مشخصات پژوهش

عنوان A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities
نوع پژوهش مقالات در نشریات
کلیدواژه‌ها
Multi-document Extractive Summarization, Sentence Expansion, Conceptual Density Tuning, Word Embedding, Text Clustering, Language-independent Approach
مجله INFORMATION PROCESSING & MANAGEMENT
شناسه DOI https://doi.org/10.1016/j.ipm.2020.102341
پژوهشگران سیدمحمد بیدکی (نفر اول) ، سید محمدرضا موسوی (نفر دوم) ، سید مصطفی فخراحمد (نفر سوم)

چکیده

Today, due to a vast amount of textual data, automated extractive text summarization is one of the most common and practical techniques for organizing information. Extractive summarization selects the most appropriate sentences from the text and provide a representative summary. The sentences, as individual textual units, usually are too short for major text processing techniques to provide appropriate performance. Hence, it seems vital to bridge the gap between short text units and conventional text processing methods. In this study, we propose a semantic method for implementing an extractive multi-document summarizer system by using a combination of statistical, machine learning based, and graph-based methods. It is a language-independent and unsupervised system. The proposed framework learns the semantic representation of words from a set of given documents via word2vec method. It expands each sentence through an innovative method with the most informative and the least redundant words related to the main topic of sentence. Sentence expansion implicitly performs word sense disambiguation and tunes the conceptual densities towards the central topic of each sentence. Then, it estimates the importance of sentences by using the graph representation of the documents. To identify the most important topics of the documents, we propose an inventive clustering approach. It autonomously determines the number of clusters and their initial centroids, and clusters sentences accordingly. The system selects the best sentences from appropriate clusters for the final summary with respect to information salience, minimum redundancy, and adequate coverage. A set of extensive experiments on DUC2002 and DUC2006 datasets was conducted for investigating the proposed scheme. Experimental results showed that the proposed sentence expansion algorithm and clustering approach could considerably enhance the performance of the summarization system. Also, comparative experiments demonstrated t