Tag: natural language processing

NLTK – Natural Language Processing in Python

In our day-to-day life we generate a lot of data like tweets, facebook posts, comments, Blog posts, articles which are generally in our natural language and which falls in category of semi-structured and unstructured data, So as when we process natural language data “the unstructured data – plain text” we call it Natural Language Processing.

Natural Language Tool Kit is a library for NLP which deals with natural language such as plain text, words, sentences.

Building blocks of NLTK

Tokenizers – Separating the text in to words and sentences

word tokenizer – separate by word

sentence tokenizer – separate by sentence

Corpora – body of text such as any written speech, news article.
Lexicon – dictionary, meaning of the words. which can be differ in context they are used.

let’s understand how the NLTK works, consider a sample_text such as

sample_text = “Hey jimmy, How are you? Today is my birthday. let’s go for lunch today. I’m throwing a party at the Hard rock cafe“

Here if we would like to separate every sentence we could do that with normal programming putting conditions like treat a new sentence after every full stop but in some scenarios our conditions will fail like if we have Ms. in our sentence then it would consider the further content after Ms. a new sentence.

sample_text = “Ms. Margaret, How are you? Today is my birthday. let’s go for lunch today. I’m throwing a party at the Hard rock cafe“

So NLTK comes to the rescue and separate the body of text (Corpora) in to sentences & words like

November 4, 2016 Add Comment