Automatic Summarization
Automatic summarization is the process of shortening text documents using software in order to create a summary that contains the main points of the original document. It is currently part of the field of machine learning and data mining, the goal of which is to find subsets of data that contain relevant "information".
There are currently two methods for automatic summarization: extraction and abstraction. Extraction is to form a summary based on a subset of words, phrases or sentences in the original text; abstraction is to establish an internal semantic representation and then use natural language generation technology to create a summary that is close to human expression.
There are broadly two types of extractive summarization tasks depending on what the summarizer focuses on, the first is general summarization, which focuses on obtaining a general summary of a collection or a summary of an article etc. The second is query-dependent summarization, which summarizes objects specific to a query.
A common way to evaluate automatic summarization is to compare it with human summarization, which can be mainly divided into internal evaluation and external evaluation, and between text and within text.
Internal and external assessments
The internal evaluation is used to test the summary system itself, mainly evaluating the coherence and informativeness of the summary; the external evaluation is based on the impact of the summary on the completion of other tasks, including the impact of the summary on tasks such as relevance assessment and reading comprehension.
Intertextual and intratextual
Intra-text methods evaluate the output of a specific summarization system; inter-text methods focus on comparative analysis of the outputs of several summarization systems.