Big Data & NLP

Natural Language Processing (NLP) is field of developing systems that allow computers to communicate with people using everyday language. NLP is considered as a sub-field of AI and has significant overlap with the field of computational linguistics.

Why NLP?

  • Classify text into categories, index and search large texts: Classify documents by topics, language, author, spam filtering, information retrieval (relevant, not relevant), sentiment classification (positive, negative).nlp
  • Speech processing, artificial voice..
  • Plagiarism detection.
  • Automatic translation…

For computer systems the task is tough. When people see text, they understand its meaning. For example: According to research, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. When computers see text, they get only character strings.The NLP is difficult because the language is flexible, there is constantly new words, new meanings, different meanings in different contexts, language is subtle, the language is complex, there are many hidden variables (knowledge to the world, knowledge of the context, knowledge of the techniques of human communication (example: can you tell me the time), the problem of scale (infinite possible words, meanings, context), problem of scarcity (very difficult to do statistical analysis, most things, words, concepts are never seen before), long range correlation, …

In this area Teradata offers Aster analytic solutions involving Attensity and they make it easy to handle large volumes of textual data, analyze them and give them meaning. Specifically they facilitate the application of linguistic principles to extract the context of entities and relationships, similar to what a human would