Resources
Tutorials
- Gensim official Docs Tutorial
- Natural Language Processing with NLTK and Gensim Video
- A Word2Vec Keras tutorial Tutorial
- Language Understanding with Recurrent Networks and CNTK Tutorial
- Vector Representations of Words with TensorFlow Tutorial
- Word2Vec word embedding tutorial in Python and TensorFlow Tutorial
Courses and Course Materials
- Stanford Deep Learning for NLP (cs224n) Course Material
Examples
- Document clustering with k-means official
scikit-learn
Example
- Featurize free-form text data using
mmlspark
on top of primitives in SparkML via a single transformer in this official mmlspark
Notebook
- Sequence Classification with CNTK Example
- Sequence2Sequence with CNTK Example
NLP-Specific Packages
allennlp
: Deep Learning for NLP from AllenNLP built on PyTorch Ref - good for conditional random field, encoders/decoders, reading comprehension, semantic role, etc.
gensim
: topic modelling Docs - good for word2vec, semantic similarity, LDA, LSA, etc.
nltk
: Natural Language Toolkit Docs - good for tokenization, stemming, tagging, parsing, corpora, etc.
…
NLP at Scale
- Document classification with
pyspark
with HDInsight on Azure Doc
Blog Articles
- Calculating TF/IDF on How I met your mother transcripts Blog Post for TF/IDF with scikit-learn
Kaggle
- Toxic Comment Classification Challenge Competition
Books