Source: Burst

Experimenting the Sentiment Analysis with Python (Part 1)

Feroz Kazi
2 min readJun 28, 2020

--

Theoretically, every code you develop, it is very important to know which library is relevant enough to give you the best coding support. So I begin to analyze the very common and popular libraries which I might be using in my experiment of Sentiment Analysis with python (NLP)

When I first started learning NLP using Python, I was more curious to see how much a code can do related to Natural Language and how a code can replace human instincts to identify sentiment and pattern in the Natural Language communication.

“Cowards die many times before their deaths; The valiant never taste of death but once.” (Julius Caesar)

Can a program written be human be so intelligent enough to understand this!!!! Well, this is where I stumble. Is there anyway to find the sentiment of this statement using NLP, and if so what would be the confidence level of the code generated results!.

Let’s learn and dig into this:

Steps:

1. Importing required library

2. Preprocessing

3. Using relevant code

4. Post processing

5. Results Analysis

I can say those above steps will go with every program. Well, the best way to understand the whole cycle is to start with the libraries relevant to our requirement.

Few quick introductions to our libraries:

NLTK:

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

NLTK is a framework with basic to advance level of components and very popular due the fact that it has been very Simple, Consistent, Scalable and Modular in its own.

NLTK Modules in Brief:

spaCy

“Computers don’t understand text. This is unfortunate, because that’s what the web is mostly made of.” Matthew Honnibal (Author: spaCy)

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython (Wikipedia). spaCy supports tokenization of more then 49 languages and is best suitable for industrial usages. Moreover spaCy leverages the best of Object Oriented features and has support for word vectors. Almost all methods returns objects rather then string of arrays which helps developer to directly work with the objects instead of looking for the content.

spaCy classes in brief:

Part 2 — We will discuss more on below packages:

  • TextBlob
  • Gensim
  • Polyglot

--

--

Feroz Kazi

AI & Machine Learning, Principal Data Scientist, Functional Analytics, Insights, Metrics, Dashboards, Researching in Forecasting Models Optimization and PCA