In our day-to-day life we generate a lot of data like tweets, facebook posts, comments, Blog posts, articles which are generally in our natural language and which falls in category of semi-structured and unstructured data, So as when we process natural language data “the unstructured data – plain text” we call it Natural Language Processing.
Natural Language Tool Kit is a library for NLP which deals with natural language such as plain text, words, sentences.
Building blocks of NLTK
- Tokenizers – Separating the text in to words and sentences
word tokenizer – separate by word
sentence tokenizer – separate by sentence
- Corpora – body of text such as any written speech, news article.
- Lexicon – dictionary, meaning of the words. which can be differ in context they are used.
let’s understand how the NLTK works, consider a sample_text such as
So NLTK comes to the rescue and separate the body of text (Corpora) in to sentences & words like