What in the “Hello World” is Natural Language Processing (NLP)?
At Finlabs we are always keen on learning new technologies from a practical point of view. NLP is interesting both from a technological and methodological perspective and is definitely one of the most thrilling, heart worthy and rewarding yet highly demanding subfields of Artificial Intelligence.
Artificial Intelligence (AI) and its various technologies and methods have been the hype for quite some time in recent years. The basics of AI have been under development since the 1940s and AI was founded as an academic discipline in 1955. Nowadays AI has again heavily become under public limelight due to the increased processing power and storage that was not previously available. Also, advances in Deep Learning (DL) technologies have garnered efficient results in Medical Imaging and Computer Vision.
One of the most interesting AI subfields is Natural Language Processing (NLP). This is a multidisciplinary field dealing with the interaction of computers and humans using a natural language. NLP solutions vary from speech synthesizers, Interactive Voice Response (IVR) systems, and automatic spelling tools to chatbots and automatic document reporting scanners. For example, Grammarly is a company utilizing AI and NLP techniques to automatically correct written language. Google Duplex, introduced in 2018, tries to conduct natural conversations to carry out real world tasks on the phone. You’ve also probably heard of Shazam, which recognizes song titles from snippets of audio data. It is even possible to recognize musical genres. Moreover, Woebot is an innovative software to conduct Cognitive Behavioural Therapy. Consequently, NLP as a field is highly comprehensive but can provide many valuable solutions to various industries and application areas. The reduction of consistent, tiring, manual operations involving a lot of precision is invaluable and reduces a lot of burdens.
NLP applications are currently in full effect in some areas. For example, some companies have included chatbots and automatic speech answering machines to provide better service for their customers. These two methods are mainly used to answer Frequently Asked Questions (FAQs) that the majority of the customer feedback consists of. If a question cannot be answered by these, the customer will be directed to a human operator.
Imagine an NLP solution providing a detailed written report of a doctor’s dictation. Or a solution that handles millions of documents and provides a brief synopsis of each of them? As I am writing this blog post, I am receiving many handy recommendations and corrections to it instantly. That is the power of NLP.
Alright. You have delivered some fancy hype and the nuts and bolts of NLP. What about the essential niche that powers NLP?
In NLP, the data is normally audio or text data. Sometimes it may involve pictures or video that needs to be transformed into text. The technology that powers an NLP solution depends highly on the application area and the solution usage. For example, if an NLP solution should work in a mobile application, it would require the basic components of a traditional mobile application with frontend and backend logic. If high processing power is needed, the NLP components could reside in a cloud environment, such as AWS, MS Azure, or Google Cloud. Some of these technologies even include ready-made components for NLP. In fact, it is quite straightforward to build a prototype NLP solution using these tools by the cloud providers. Amazon Transcribe is a ready-made tool to add speech-to-text capability to applications, and Amazon Transcribe Medical is able to transform medical speech to text. MS Azure also has tools for Cognitive Services.
On the other hand, if the NLP solution is placed in a scarce environment, such as a mine or a forest, in which very rudimentary network connections are available, the NLP logic should reside in the device that reads the needed data, such as a cell phone or some other audio or Internet of Things (IoT) device. In such a case, where Machine Learning (ML) or DL is required, TensorFlow Lite is a promising solution. It can be used to deploy machine learning models on mobile and IoT devices.
Python is the de-facto tool for NLP. It is widely used as a general programming language and includes numerous NLP libraries. The most popular NLP library for Python is the Natural Language Toolkit (NLTK) that states to be the leading platform for building Python programs to work with human language data. Additionally, Apache OpenNLP is a machine learning-based toolkit for NLP. It has support for the most common NLP tasks.
However, there are other programming languages besides Python that can be used for NLP, the other two popular being Java and Julia. Java has been around for many years, while Julia is slowly gaining momentum and popularity among data scientists and NLP practitioners. It is nevertheless still in its infancy at the moment. The choice of programming language for NLP depends on numerous factors: what is the processing power requirement, what NLP tools are needed, what is the community support for the language, and how the solution can be deployed and functions in a production environment.
Another league of its own is using NLP for the Finnish language. Yes, our very own unique language with lots of double consonants, umlaut letters, and peculiar sentence structures. At the moment there aren’t that many NLP libraries aimed specifically for Finnish. This is an area that has a lot of potential in the future. The blog post here details the steps required in building a Finnish Part of Speech Tagger.
But what about audio data?
This deals with recognizing certain patterns from speech that are well known. The first step is always extracting the necessary features from the audio signal and doing subsequent analysis on them. Which features to analyze depends again on the use case. Writing a synopsis from a spoken narrative may include first transforming the audio data into text data and doing follow-up synopsis writing on it. In terms of technologies, neural networks have become handy in these. Popular Python libraries include Keras, Sklearn, and Tensorflow. Also, pandas, numpy and matplotlib can be used for data extraction, analysis, and visualization.
Automatic speech recognition system (ASR) takes into account the phonemes of a language.
For example, in Finnish there are 21 phonemes, consisting of 13 consonants and 8 vowels. However, the exact number may depend on the spoken dialect. After gathering all phonemes from a sound wave, the ASR software uses statistical probability analysis to deduce whole words and from there, form complete sentences. Check out this awesome infographic for more information. Using phonemes to form speech is also common in concatenation based speech synthesizers.
So what’s the conclusion?
I have delivered a no-frills introduction to NLP and its various applications and technologies. NLP has a lot of potential to revolutionize industries and application areas. However, NLP practitioners are a scarce resource and more training and development with practical real-world applications need to be done.
As a statistical and linguistic data nerd, I am looking forward to exciting things to come. I will be digging more deeply into NLP methods in the upcoming posts, so stay tuned.
Olli Mämmelä is Finlabs Senior AI Engineer and our master of data science and machine learning at Finlabs.
He is an integral part of our computer vision team where his passion for learning and development takes our products and services to another level with specialties that include algorithm design and development, machine learning, data analysis, and programming.
Olli has a Master’s Degree in Information Engineering and a Ph.D. in Telecommunications Engineering. He has more than 10 years of working experience in both international and national research projects coming from Uniersity of Oulu and VTT.