This extension takes inputted text and summarizes the text using the TextRank algorithm
This extension takes text and summarizes the text using the TextRank algorithm. The TextRank algorithm is actually based on Google's early PageRank algorithm, which revolutionized how we viewed webpages by assigning importance to links in a set. We use the same basic technology to rank sentences as being more or less important and return the more important sentences.The user has 4 parameters he or she can set if he or she chooses. The number of iterations is the number of times the TextRank algorithm will be run. The idea is that more iterations will produce a better answer, at the cost of efficiency and time. 40 iterations is the default value. There are two other factors which are a bit more confusing: damping factor and delta. The damping rate makes the changing of weights in between iterations less dramatic and delta defines a threshold under which the response is satisfactory. These parameters do not need to be changed by the user but the user can change them if he or she wants. In a paper by Mihalcea and Tarau, describing the algorithm, they state that the value for the damping factor is set to 0.85, which is what we have set it to. The pre-processing on the sentences inputted into the algorithm is limited but sufficient (trim function). The inputted text is pre-processed and made into a graph of frequencies for each word in each sentence over all the words in the document. We then run the PageRank algorithm on the set and get back the sentences and their scores. We sort according to scores and return the number of sentences by scores requested by the user, essentially summarizing the text by importance and relevance.