5 Heroic Tools for Natural Language Processing

  1. CoreNLP from Stanford group
  2. NLTK, the most widely-mentioned NLP library for Python
  3. TextBlob, a user-friendly and intuitive NLTK interface
  4. Gensim, a library for document similarity analysis
  5. SpaCy, an industrial-strength NLP library built for performance

CoreNLP, the Java library well-known for its speed

CoreNLP is the production-ready solution built and maintained by Stanford group. This library is optimized for speed and has functions like Part-of-Speech (PoS) tagging, pattern learning parsing, titled entity recognition, and much, much more. As it was originally written in Java, it is highly appraised for its high speed and can support multiple languages (including Python) due to using specialized wrappers. CoreNLP is widely used in production environments nowadays, as it is polished, fast, and provides precise results.

NLTK, the most widely-mentioned NLP library

NLTK stands for Natural Language ToolKit and it is the best solution for learning the ropes of NLP domain. Its modular structure helps comprehend the dependencies between components and get the firsthand experience with composing appropriate models for solving certain tasks. Since its release, NLTK has helped solve multiple problems in various aspects of Natural Language Processing.

TextBlob, the best way NLTK should be used

TextBlob is an interface for NLTK that turns text processing into a simple and quite enjoyable process, as it has rich functionality and smooth learning curve due to a detailed and understandable documentation. Resting upon the shoulders of a giant, TextBlob allows simple addition of various components like sentiment analyzers and other convenient tools. It can be used for rapid prototyping of various NLP models and can easily grow into full-scale projects.

Gensim, a library for document similarity analysis

While Gensim can be not as ubiquitous and all-around capable as the previous components, there definitely is an area where it shines. This area is the topic modeling and document similarity comparison, and highly-specialized Gensim library has no equals there. Offering the tools like LDA (or Latent Dirichlet Allocation), scalable and robust, Gensim is a production-ready tool you can trust with several crucial components of your NLP projects, not to mention topic modeling being one of the most engaging and promising fields of the modern NLP science.

SpaCy, an industrial-strength library boasting high performance

Written in Cython, SpaCy cannot present over 50 variants of solution for any task, like NLTK does. As a matter of fact, SpaCy provides only one (and, frankly, the best one) solution for the task, thus removing the problem of choosing the optimal route yourself, and ensuring the models built are lean, mean and efficient. In addition, the tool’s functionality is already robust, and new features are added regularly.

Conclusions

After you get a tight grip on these 5 heroic tools for Natural Language Processing, you will be able to learn any other library in quite a short time. We are sure, however, there will be no need for that, as NLTK with TextBlob, SpaCy, Gensim, and CoreNLP can cover almost all needs of any NLP project. Do you think otherwise?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store