Skip to content

Natural Language Processing (NLP) Resources

This is a set of materials to learn and practice NLP. This list may also be used as general reference to go back to for a refresher.

Courses and Course Materials (Start Here)

  1. Recurrent Neural Networks by Andrew Ng Course Youtube Material -- Highly recommended to start here if you've never done NLP
  2. Stanford Deep Learning for NLP (cs224n) Course Material

Tutorials

Topic Title/Description Link
Topic Modeling Topic modeling from Gensim official Docs Tutorial
Topic Modeling and Clustering A topic identification and document clustering algorithm tutorial with Gensim/NLTK from PyCon Video
Intent and Entity Recognition Language Understanding with Recurrent Networks from CNTK official Docs Tutorial
Word2Vec Vector Representations of Words from TensorFlow official Docs Tutorial
Text categorization Analysing a collection of text documents from Scikit-Learn official Docs Tutorial
Sequence to Sequence A tutorial on how to summarize text and generate features using deep learning with Keras and TensorFlow Tutorial

Examples - Try Me!

  1. Document clustering with k-means official scikit-learn Example
  2. Featurize free-form text data using mmlspark on top of primitives in SparkML via a single transformer in this official mmlspark Notebook
  3. Sequence Classification with CNTK Example
  4. Sequence2Sequence with CNTK Example

NLP-Specific Packages

  1. gensim: topic modelling Docs - good for word2vec, semantic similarity, LDA, LSA, etc.
  2. nltk: Natural Language Toolkit Docs - good for tokenization, stemming, tagging, parsing, corpora, etc.
  3. spacy: Efficient and Backed by ANNs NLP Toolkit Docs - good for parsing, tagging, entity recognition, text categorization, phrase matching, etc.
  4. allennlp: Deep Learning for NLP from AllenNLP built on PyTorch Ref - good for conditional random field, encoders/decoders, reading comprehension, semantic role, etc.

Blog Articles

Topic Title/Description Link
Basics 7 types of Artificial Neural Networks for Natural Language Processing Link
TF/IDF Calculating TF/IDF on How I met your mother transcripts (with scikit-learn) Link
General/Sentiment Analysis Breakthrough Research Papers and Models for Sentiment Analysis Link
Link

Papers

Topic Title/Description Author(s) Link
Text Classification Fine-tuned Language Models for Text Classification (with Transfer Learning) Jeremy Howard, Sebastian Ruder Link

NLP at Scale

  1. Document classification with pyspark with HDInsight on Azure Doc

Kaggle

  1. Toxic Comment Classification Challenge Competition

Books

TBD

Exercises - Try Me!

Topic Title/Description Link
Sentiment Analysis Build a sentiment analysis / polarity model scikit-learn Exercise and Code to start

List updated 2017-01-26