Similarity Of Authors

This visualization depicts the similarity of the works of authors in various fields. The proximity of two authors represent the similarity of their works. The closer they are- the more similar their works (with some approximations).

A directed red arrow from author A to author B means that - for author A the most similar author was found to be author B.

More technically, the works of each author is represented as an n dimensional vector where each dimension represents a word. The co-ordinate for each dimension is determined as the TF(Term Frequency) * IDF (Relative rarity) of each word.
The distance used to measure is the CosineDistance ( 1 - CosineSimilarity).

For a further exploration of Author Similarity here is a live demo

The Expression of Emotions in 20th Century Books

This is an interesting analysis of the relative frequency of usage of words associated with particular moods over the years. This is the research paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059030 At a very high level this is what they did. 1. Sourced the number of occurrences of the words across the years from the google ngram project. 2. Got the mood scores associated with the words from WordNet. 3. Computed the relative frequency of words associated particular moods across the years. Here is a demo of the Bag of Words model and a sentiment analysis model from Stanford. Reference Links: https://books.google.com/ngrams http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html https://aws.amazon.com/datasets/google-books-ngrams/ WORD NET http://wordnet.princeton.edu/ http://sentiwordnet.isti.cnr.it/

A Gentle Introduction to Sentiment Analysis

I wrote an introduction to Sentiment Analysis for the layperson which was published on my company blog. http://blog.gale.com/do-computers-understand-our-emotions/

The patterns in words

Search This Blog