Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
Authors propose a method called Dependency-Tree Automatic Title Generator (DTATG), a strategy to build appropriate titles for news articles based on heuristics and syntax parsing.
Because other similar methods are ineffective (generate only unordered sets of words) or only work on limited domains.
DTATG is thought of as a series of steps:
Extract keywords from the document
Extract sentences using a text segmentation technique and rank them wrt. how well they summarize the content of the document. These are candidate sentences.
Parse candidate sentences using a Dependency Parser and trim out unimportant bits
Filter out candidate sentences that fail empirical rules (title tests) as to what makes up good titles
Authors claim their method generated titles that are comparable to the original document titles. They are measured subjectively across 3 dimensions:
- Topic relevance (how relevant is the generated title wrt. the document content?)
- Conciseness (how succint and clear is the generated title?)
- Fluency (how grammatically correct is the generated title?)
- DTATG only works if you can extract sentences from the document and there must be sentences that sumamrize the contents of the text.
- I think this would more accurately be described as a candidate title ranker method instead of a title generator method instead. This is because it requires that you have some candidate sentences beforehand.