Authorship AttributionAuthorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. It is an important problem not only in information retrieval but in many other disciplines as well, from technology to teaching and from finance to forensics. The idea that authors have a statistical "fingerprint'' that can be detected by computers is a compelling one that has received a lot of research attention. Authorship Attribution surveys the history and present state of the discipline, presenting some comparative results where available. It also provides a theoretical and empirically-tested basis for further work. Many modern techniques are described and evaluated, along with some insights for application for novices and experts alike. Authorship Attribution will be of particular interest to information retrieval researchers and students who want to keep up with the latest techniques and their applications. It is also a useful resource for people in other disciplines, be it the teacher interested in plagiarism detection or the historian interested in who wrote a particular document. |
Common terms and phrases
AAAC accuracy accurate Ad-hoc Authorship Attribution algorithms analysis method analyzed applied approach Argamon Authorship Attribution Competition Baayen Beale cipher Burrows Chaski common corpora corpus Corpus Linguistics course cross-entropy Cusum Danilo Daubert defined Delta dimensions discussed distance distribution English entropy essays evaluation event evidence example expert feature set Federalist papers forensic Forensic Linguistics framework function words gender genre given Halteren identify infer Juola Kolmogorov complexity Koppel language letters lexical linear discriminant analysis Linguistic Computing Literary and Linguistic machine learning markedness mathematics measure Mosteller and Wallace n-grams Naive Bayes classifiers noun particular performance person practical probably problem question researchers Rudman sample Schler separate sequence similar simple specific standard statistics style stylistic stylometry support vector machines syntactic techniques test document tion token unsupervised analysis variation vector space vector space model vocabulary word frequencies word length writing written