Hello Computer, I am Human…

How computers understands human language through Natural Language Processing

L.A. Randrup
4 min readMar 8, 2021
Natural Language Process helps computers understand human language

Natural Language refers to the way we, humans, communicate with each other, through speech and text. We don’t regularly think about the complications of our own languages. It’s a way we used to convey information and meaning with words and signs mindlessly. What comes naturally to us, humans, could be difficult for computers to comprehend with unstructured data, absence of context, or simply because we, humans, are continuously evolving on how we communicate using various words.

Natural Language Processing, NLP, is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language as we speak it. NLP is a computer based method of understanding human communication in text and speech. Nowadays, NLP surround us through various technological tools such as:

  • Email filters such as spam and other filters
  • Smart Assistants
  • Search Results
  • Predictive Text like Search Autocorrect and Autocomplete
  • Language Translation
  • Data Analytics such as Surveys and Text Analysis
  • Digital Phone Calls
  • Chatbots
  • and so much more…

Linguistics is the scientific study of language, that includes grammar, semantics, and phonetics. It also involves devising and evaluating rules of language. In Machine Learning, the interest in working with text data, linguistic with math as a tool, and methods from the Natural Language Processing is used to fill the gap of human communications and computer understanding while having the outcome goal in mind.

The NLP process starts with Exploratory Data Analysis, EDA. EDA helps us explore and see relationships in the data, detect outliers and errors, understand context of the data, and examine assumptions about the data. During EDA we can discover insights and understanding that a computer would otherwise have difficulty translating unstructured data without set of rules. From EDA, we then create a model as a set of rule for the computer to understand the data.

Here’s the break down of the data science process for Natural Language Processing.

  1. Define the problem
  2. Gather Data
  3. Do Exploratory Data Analysis
  4. Preprocess the data
  5. Model with data
  6. Evaluate the model
  7. Answer the Problem
Machine Learning on Human Language for Artificial Intelligence

For Natural Language Processing, exploratory data analysis, data cleaning, and preprocessing of data is crucial. As a data scientist, you have to understand the context of the unstructured data in order to apply necessary methods in your model for the computer to understand these unstructured data. Preprocessing of data consists of:

  • Removal of special characters — special characters like — (hyphen) or / (slash) don’t add any value, so we generally remove those. We remove characters based on the use case.
  • Tokenizing — Simply put, text data is unworkable if they are not tokenized first. Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords (n-gram characters).
  • Lemmatizing or Stemming — Lemmatizing and Stemming both return the root word of the inflected word. The difference is that in lemmatizing, the context of how the word is used and return a meaningful root word, while in stemming, it simply returns the root word of the inflected word.
  • Stop Word Removal — removal or words that are not required for tasks such as sentiment analysis or text classification. These words doesn’t change the context of a string such as “be”, “the”, “of” etc.

By breaking down and putting structure on the data, the computer can now understand the human words. Why is this cool? Not only that we are using NLP in most of the newer technology that we’re using now but we’ll see more innovations that reduces the gap for machines to understand the human input and therefore make things automated if not seamless for us humans.

Recent NLP innovations such as chatbots, virtual assistants like Siri, or when using Google Translate, is giving us more tools that we can imagine 10 years ago. Industry experts predict that the demand for NLP will grow exponentially as the next generation apps are expected to use AI. As more and more unstructured data are out there, the future with NLP is thrilling as advances and innovations it will allow human to shift focus from the questions to the answers. In the exciting days yet to arrive, NLP will be built-in with different technologies such as gesture and facial recognition to enterprise revenues and make them more efficient and increase productivity.

Just a few years ago the way computers understand human language appeared to be unimaginable. Now, Artificial Intelligence with Machine Learning through NLP has become one of the prominent and growing fields that opens exciting opportunities for more innovations in technology.

--

--

L.A. Randrup

Data Scientist who is passionate about improving machine learning algorithms, creating a positive impact, and solving real-world problems.