Natural language processing is a term that you may not be familiar with yet you probably use the technology based around the concept every day. Natural language processing (NLP) is simply how computers attempt to process and understand human language [1].
Can’t think of any times you found yourself speaking to a computer? Well, it’s not just Alexa and Google Home that use this technology and serve as the most obvious examples of NLP. For example, if you use email, your email server spends time deciding whether or not to filter an incoming email to spam.
By evaluating the content of the email, the subject line, and the email domain, the server either sends the email to your inbox or files it into the black pit of your junk folder. Another example you’ve probably used is Google’s autocompleting feature. NLP is what powers the auto-complete suggestions for commonly queried terms or phrases.
Humans have been trying to perfect natural language processing since the 50s, but it’s proven to be a complicated technology that involves much more than breaking down a sentence word by word.
What is natural language processing?
Let’s start at the beginning: natural language processing (NLP) is a subfield of artificial intelligence (AI) that centers around helping computers better process, interpret, and understand human languages and speech patterns [2].
In 1950, Alan Turing asked the question, “Can machines think?” He made a prediction that by 2000, computers would be capable of tricking 30% of human judges after five minutes of questioning [3]. The Turing Test became a controversial measure of whether or not a computer is intelligent.
How does the Turing Test work? What does it have to do with NLP?
A human questioner goes into a room and uses a computer to communicate with participants “A” and “B” in a different room
- One participant is a computer, the other is a human
- The human questioner doesn’t know whether A or B is the computer
After talking to both participants for five minutes, the questioner has to choose whether A or B is the computer
- The test is repeated with many different human questioners
- If the computer can trick at least thirty percent of humans, it’s considered intelligent
Although computer scientists and engineers have gotten extremely close to a computer passing the Turing Test aka “The Imitation Game,” no machine has definitively and convincingly passed it as of this writing [4].
How does natural language processing work?
NLP uses algorithms to identify and interpret natural language rules so unstructured language data can be processed in a way the computer can actually understand. Computers use computer programming languages like Java and C++ to make sense of data [5].
Humans, of course, speak English, Spanish, Mandarin, and well, a whole host of other natural human languages. NLP attempts to bridge this computer-human speech gap. Unfortunately for computers, language can’t be neatly tidied away into Excel spreadsheets so NLP relies on algorithms to do the heavy lifting of understanding.
How does a computer interpret language in text form?
Once text is given to the computer, the computer will use the algorithms to extract meanings associated with each sentence and word. Then it collects data from them. Using this data, the computer will determine an accurate response - or at least, hazard a guess as to what an appropriate answer or response would be.
Occasionally, of course, the computer may not understand the meaning of a sentence or words, resulting in fairly muddled, sometimes funny, results. One such incident with natural language processing that occurred during the early testing phases of the technology in the 1950s [6].
A computer was tasked with translating words and sentences between English and Russian. One sentence it tried to translate was: “The spirit is willing, but the flesh is weak.” The computer worked to translate the phrase to Russian and then back to English. It came up with this: “The vodka is good, but the meat is rotten.”
As you can see, language is tough for computers because of the inherent nuances of words in the context of a sentence. These days, this technology has been advanced and the computers’ NLP have much more robust tech behind them.
What are the techniques used in NLP?
Becoming Human AI provides a helpful guide to explain how computers break down and grapples with human language utilizing syntax and semantics [7].
1. Syntax techniques
What is syntax? It is how words are arranged in a sentence so they make grammatical sense [8]. In natural language processing, analysis of syntax is critical for computers, they rely on algorithms to apply grammatical rules to words and from there, extract meaning.
Here are some ways computers may use syntax:
- Lemmatization: flattens out inflected forms of a word to its base form or “lemma” [9]
- Morphological segmentation: breaks down words into smaller units called morphemes [10]
- Word segmentation: dividing a large piece of continuous text into small segments [11]
- Part of speech tagging: identifying the part of speech for every word [12]
- Stemming: involves cutting down a word to its word stem [13]
- Parsing: analysis of grammar for a given phrase or sentence [14]
- Sentence breaking: figuring out where sentences begin and end in a long piece of text [15]
2. Semantics techniques
Semantics is simply the study of the meaning of texts and language, it also refers to the fact that words can have multiple meanings (as well as the connotation and denotation) [16]. Semantic analysis is one of the difficult aspects of NLP that is still being worked on, why?
Examples of semantics and why computers struggle with our complicated languages sometimes:
A child can be referred to as:
- daughter, son, niece, nephew, kid, girl, or boy
A “bass” can refer to:
- A fish
- A low-pitched instrument
- Lowest adult male singing voice
There are plenty of these instances in English and other languages where humans naturally understand the meaning of a word based on our instinctual understanding of inflection, tone, and context. Whereas computers, on the other hand, need hard data to gain some meaning from sentences and words.
Semantic analysis techniques computers rely on in NLP:
- Named entity recognition (NER): algorithms identify, categorize, and pair entities in the text with other entities in knowledge databases [17]
- Word sense disambiguation (WSD): involves giving meaning to a word based on context [18]
- Natural language generation (NLG): involves creating a natural language using AI and computational linguistics [19]
For example, Gmail suggests automatic answers when you’re responding to an email. That’s a case of NLG at work.
What are some examples of natural language processing examples?
Natural language processing technology is something you come into contact with almost every day [20].
- Autocomplete and predictive typing
- Chatbots [21]
- Machine translation: language translation applications like Google Translate
- Word processors like Microsoft Word and Grammarly to correct and check for misspellings, typos, and grammatical errors [22]
- Interactive Voice Response (IVR) technology used in call centers to respond to certain users’ requests without needing human intervention [23]. This is who you “talk” to when you need to pay a bill through an automated phone system, for example
- Virtual personal assistants such as OK Google, Siri, Cortana, and Alexa
- Customer reviews: NLP can parse through data left in customer reviews so companies can better track customer satisfaction and suggest other services and relevant products [24]
- Keyword extractor: finding key phrases in an article and listing them (you may see this on blog platforms, for example)
- Handwriting recognition
Why is natural language processing difficult?
After seeing all the different uses of natural language processing, you might be surprised to learn that it’s an incredibly difficult problem for computer science to solve to this day [25]. Why? It’s not for a lack of technology. It’s actually due to the nature of human language itself.
The rules and cadence of our speech make it a challenge for a computer to interpret, understand, and respond to a given text or voice instruction. For example, computers aren’t great at reading tone, so sarcasm goes over a computer’s virtual head.
Complicating this is there are hundreds of natural languages, each with its own grammatical rules. That’s a lot of different data sets for a computer to know and understand.
NLP requires people to speak to a computer in a programming language that is exact and unambiguous of well-spoken commands. Human speech, as you know, is far from exact, and Shakespeare wasn’t known for speaking in JavaScript.
Here are just a few things that can trip up natural language processing AI:
- Slang: y’all, OMG, ya, just to name a few
- Regional dialects: US southern or midwestern dialects, for example, or Welsh and Scottish
- Accents: for example, English is spoken by the British, Australians, Canadians, and Americans, not to mention the myriad of countries where it is taught as a second language.
- Social context
- Body language: a computer can’t see your gesticulations which can add meaning or context to a particular word
When you type into a machine or send a text to a computer, the machine in question isn’t getting perfectly clear English. Instead, it’s probably getting phrases with shorthand terms, typos, and fuzzy intention.
When you add the complication of voice to this process, determining what a user is actually saying or asking becomes even murkier. For example, if you say, “I scream,” are you saying, “I scream,” or “ice cream?” How does the machine determine between the two?
This is why users sometimes have difficulty with their virtual personal assistants as the computer tries to recognize particular commands or instructions. Natural language generation (NLG) is a subset of AI that deals with creating realistic responses to text or voice input [26]. If you’re not speaking unambiguous, perfect English, it can be a recipe for humorous or frustrating results.
Despite all of these potential issues, natural language processing has made huge strides in recent years. For example, with the advent of deep learning techniques, NLP tasks and abilities have improved [27].
Computers now have very sophisticated techniques to understand what humans are saying. Using a huge database, AI can now match words and phrases to their likely meaning with more accuracy than ever before.
NewtonX predicts that AI will outperform humans with the translation of languages and editing high school essays within the next ten years [28]. Instead of a human editor, you might be able to run an essay through a piece of highly sophisticated software that will accurately detect grammar, typos, and misspellings with an incredibly high degree of accuracy.
The future of natural language processing
Natural language processing isn’t just a convenient, cool technology. It also has
far-reaching implications in a range of industries like healthcare, for example. One day, a visit to a doctor could be enhanced by NLP AI that dives deep into your health data to extract information to better assist your physician for diagnosis and treatment.
It's also believed that it will play an important role in the development of data science. There’s a huge demand for ways to parse through and analyze large amounts of data. With advanced techniques like sentiment analytics, where machines can determine positive, negative, or neutral opinions, companies will be better able to analyze customer preferences and attitudes [29].
NLP, a sign of the evolution of language and computers
Natural language processing has come a long way since the 50s when scientists were first testing out the implications of artificial intelligence and a machine’s ability to understand language. With its broad applications and convenient technology, NLP is proving to be a valuable addition to businesses, schools, and health organizations.
If you use predictive text or if you’ve ever had software catch an embarrassing typo before you hit send or
print, you know just how useful this technology can be. And who knows? Maybe we’ll finally see a computer pass the Turing Test once and for all.
About the Author
Michelle Wilson is a contributing writer for HP® Tech Takes. Michelle is a content creation specialist writing for a variety of industries, including tech trends and media news.
Popular HP Desktops: