Earlier this year a new AI application was released to the public that is able to read text and define an author’s style. Called Emma Identity, the software combines natural language processing (NLP) and machine learning with the techniques of stylometry (the study of linguistic style). Given enough information—in this case, at least 5,000 words—Emma extracts patterns from the analysis of an author’s text, some of which are not easily detected by the human eye. These patterns include word frequencies, complexity of language, sentence structure, all the way down to the author’s use of commas. Once it learns the patterns from the sample text, Emma can then apply it to text of ‘unknown origin’ and determine with amazing accuracy if the text is from the studied author.
Many are now familiar with how a computer was the first to uncover Robert Galbraith’s The Cuckoo’s Calling, as a work by Harry Potter author, J. K. Rowling. In his recent article 5 Ways Big Data Analytics Caught J.K. Rowling in the Act : Pseudonyms Can’t Hide, author Ryan Cox outlines how the computer uncovers an author’s style.
- Comparing all of the word pairings, or sets of adjacent words, in each book.
- Tests that searched for “character n-grams”, or sequences of adjacent characters.
- Tallied the 100 most common words in each book and compared the small differences in frequency.
- Testing completely separates a word from its meaning, by sorting words simply by their length.
- Principal Component Analysis: compare all of the books on six features: word length, sentence length, paragraph length, letter frequency, punctuation frequency, and word usage.
AI continues getting smarter and more sophisticated.
In an effort to improve its AI’s understanding of natural language flow, the research team at Google (Google Brain) used fiction novels as a means to better communication skills in its other apps. The experiment ingested 11,038 books to teach itself the nuances of language so that its apps can think and ‘talk’ like people. The results proved successful, even if it was conducted without permission of the authors.
Is this a good technology for authors?
In the world of the plagiarist, this is bad news. With reference to Emma, it’s purpose is not to detect plagiarism, but determine authorship. It learns the patterns of an author and aims to guess if another piece is or is not written by the same person. But close on Emma’s heels will be AI applications which, given enough data, will be able to instantly detect if a writer’s effort is their own or someone else’s. Once college professors get a hold of this technology, grabbing large chunks of text for that thesis from other sources will be a thing of the past.
But in the world of the average author and publishers, this can be very good news.
As AI examines more content, it will be able to do very accurate comparisons of one author’s style to another. For an unknown author to be compared to a Margaret Atwood or a Michael Crichton can only help is gaining attention and support from publishers and readers. As a publisher debates over which manuscript to acquire, or how to position a newly released book in the marketplace, stylistic recommendations from AI analysis can assist in the creative decision making process.