Playful Technology Limited

The Entropy of "Alice in Wonderland"

Demonstration of Montemurro and Zanette's information theory based keyword agorithm

Several years ago. I read in New Scientist about an information theory based technique for identifying the most significant words in a document, according to the role they play in its structure. After looking up the paper, Towards the quantification of semantic information in written language by Marcello Montemurro and Damian Zanette, I implemented the algorithm and contributed it to Gensim. Unfortunately, it's no longer in the latest release, but I have created a fork of Gensim to allow further development of features that have been dropped from the latest release.

When I found the text of Alice's Adventures in Wonderland as a Kaggle Dataset, it provided the opportunity to create a demonstration for the algorithm.

I also created a video explaining it.

If you are interested in document analysis, please contact me.

By @Dr Peter J Bleackley in
Tags : #Natural Language Processing, #Information Theory, #entropy, #summarisation, #keyword extraction, #kaggle, #video,