All Street Research

Finding the most relevant paragraphs from corporate documents for given themes

The Client

The Problem

All Street Research wanted to be able to find the most relevant paragraphs of corporate documents related to given themes.

The Approach

A set of key words and phrases was obtained for each of the topics of interest. Then, from a corpus of corporate documents, words which correlated with the key words on a paragraph level were identified. These correlations were used to derive a scoring function for each theme that was used to identify the most relevant paragraphs.

Technology Used

  • NLTK
  • Gensim
  • Numpy
  • Pandas
  • Jupyter Notebooks