Skip to main content

Similarity Of Authors

This visualization depicts the similarity of the works of authors in various fields. The proximity of two authors represent the similarity of their works. The closer they are- the more similar their works (with some approximations).

A directed red arrow from author A to author B means that - for author A the most similar author was found to be author B.





More technically, the works of each author is represented as an n dimensional vector where each dimension represents a word. The co-ordinate for each dimension is determined as the TF(Term Frequency) * IDF (Relative rarity) of each word.
The distance used to measure is the CosineDistance ( 1 - CosineSimilarity).

For a further exploration of Author Similarity here is a live demo

Popular posts from this blog

Simulated Conversations

Simulated Conversations  I am experimenting with text analysis based techniques to simulated interesting conversations between authors (or, the written works of those authors!) I seed the conversation with a topic and let the program take over which generates the conversation. The conversation that happens is really a search for most relevant sentences using word similarity and decaying the importance of what is said earlier without completely losing it. Let's start off by discussing with these authors whether they think computer think. It is interesting - though somewhat tangential. The context in which we read these sentences gives them an altogether different meaning from which they were originally written for. Which, in itself is something to think about.  You: do computers think? Guy de Maupassant: Do you think that they love me? Bertrand Russell: I do not think the truth is quite so picturesque as this. Shakespeare: Do you think there is truth in them? Bert

The Expression of Emotions in 20th Century Books

This is an interesting analysis of the relative frequency of usage of words associated with particular moods over the years. This is the research paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059030 At a very high level this is what they did. 1. Sourced the number of occurrences of the words across the years from the google ngram project. 2. Got the mood scores associated with the words from WordNet. 3. Computed the relative frequency of words associated particular moods across the years. Here is a demo of the Bag of Words model and  a sentiment analysis model from Stanford. Reference Links: https://books.google.com/ngrams http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html https://aws.amazon.com/datasets/google-books-ngrams/ WORD NET http://wordnet.princeton.edu/ http://sentiwordnet.isti.cnr.it/