NLTK – Natural Language Processing in Python

In our day-to-day life we generate a lot of data like tweets, facebook posts, comments, Blog posts, articles which are generally in our natural language and which  falls in category of semi-structured  and unstructured data, So as when we process natural language data “the unstructured data – plain text”  we call it Natural Language Processing. 

Natural Language Tool Kit is a library for NLP which deals with natural language such as plain text, words, sentences.

Building blocks of NLTK

  • Tokenizers – Separating the text in to words and sentences

word tokenizer – separate by word

sentence tokenizer – separate by sentence

  • Corpora – body of text such as any written speech, news article.
  • Lexicon – dictionary, meaning of the words. which can be differ in context they are used.

let’s understand how the NLTK works, consider a sample_text such as

sample_text = “Hey jimmy, How are you? Today is my birthday. let’s go for lunch today. I’m throwing a party at the Hard rock cafe
Here if we would like to separate every sentence we could do that with normal programming putting conditions like treat a new sentence after every full stop but in some scenarios our conditions will fail like if we have Ms. in our sentence then it would consider the further content after Ms. a new sentence.
sample_text = “Ms. Margaret, How are you? Today is my birthday. let’s go for lunch today. I’m throwing a party at the Hard rock cafe

 

So NLTK comes to the rescue and separate the body of text (Corpora) in to sentences & words like

 

Comments

comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.