A method for authorship attribution based on function
word adjacency networks (WANs) is introduced. Function
words are parts of speech that express grammatical relationships
between other words but do not carry lexical meaning on their
own. In the WANs in this paper, nodes are function words and
directed edges stand in for the likelihood of finding the sink word
in the ordered vicinity of the source word. WANs of different
authors can be interpreted as transition probabilities of a Markov
chain and are therefore compared in terms of their relative
entropies. Optimal selection of WAN parameters is studied and
attribution accuracy is benchmarked across a diverse pool of
authors and varying text lengths. This analysis shows that, since
function words are independent of content, their use tends to be
specific to an author and that the relational data captured by
function WANs is a good summary of stylometric fingerprints.
Attribution accuracy is observed to exceed the one achieved by
methods that rely on word frequencies alone. Further combining
WANs with methods that rely on word frequencies alone, results
in larger attribution accuracy, indicating that both sources of
information encode different aspects of authorial styles.
コメント