Incorrect Pronunciation Detection in eLearning using Deep Learning

February 13, 2020


A system with a variety of test paragraphs is specially designed to test end-users pronunciation skills and detect incorrectly pronounced words.

Pronouncing words in a proper way is one of the most difficult skills to acquire and researchers around the globe are concentrating on detecting pronunciation errors using machine/deep learning techniques. The aim of incorrect pronunciation detection in eLearning is to identify the pronunciation errors or deficiency in high precision and provide instructive feedback to improve pronunciation.

Importance of Correct Pronunciation

Communication is a very important aspect of our lives and it is crucial to communicate effectively in negotiations to ensure you achieve your goals and convey your thought(s)/messages. Correct pronunciation in communication is one of the important and primary attributes of effective communication.

Teaching students to pronounce correctly is a daunting task. Teachers don’t have proper guidelines for teaching how to pronounce words correctly. The need is to have a well-established method of deciding what to teach and how to teach it in order for words to be pronounced correctly. We will try to find out some of the important issues of pronunciation instruction. Let's see how technology can help improve teaching and learning correct pronunciation.

When we speak we push air through our lungs, up to the throat and vocal cords, through the mouth, past our tongue, and out between our teeth and lips. In order to pronounce different words, we use our mouth muscles, tongue, and lips to control the flow of air. We need to control the shape of our mouth and flow air correctly through it to pronounce words clearly and properly so that people can interpret what we are trying to convey. Let’s see how we can improve our pronunciation by eliminating common errors using machine learning.

Automatic Pronunciation Detection using Machine Learning

A system with a variety of test paragraphs is specially designed to test end-users pronunciation skills and detect incorrectly pronounced words. It then lists them to help users improve their pronunciation by using Phonetic transcription to receive more relative tests/paragraphs as next assignments. It then analyses them to get a unique pattern by considering age group, local region, gender etc., so that the data can be used in the future to draft paragraphs for the test.    

High-Level Basic Block Diagram

Some of the possible use cases:

  1. In school, languages and chapters will be loaded into the system. Each chapter can have one or more test paragraphs designed to cover words from the syllabus of the respective curriculum or class. Students can go through each paragraph, read them and the system will monitor their reading and detect incorrect pronunciation and share the result summary with the teacher, student and parents so that work can be done to improve the incorrect ones. The system will also capture data and do a deep analysis so that it can be used to help draft/design the language syllabus or its content for future use.
  2. In the BPO/call centre: The system will monitor all the conversations/calls happening between the customer and BPO agent, and detect incorrect pronunciation to help the BPO/call centre agent improve.
  3. In Music academy, the desired instrument will be loaded as per end-user and will have various themes/tune to play. It will monitor and the system will detect incorrect nodes or tunes to help the end-user improve.


As the world is going digital, eLearning is getting a lot more important. Automatic pronunciation detection is becoming a need in order to improve the communication skills of students. At a high level, the idea is to monitor pronunciation of end-user, perform an analysis on it and feed the analysis back to them so that they can work on it to improve and also record it for deep learning. This can be achieved with an add-on or separate system that enables students/user to improve their pronunciation skills, not limited to single or any specific language, including music. Artificial Intelligence & Machine Learning offerings help organizations build highly-customized solutions running on advanced machine learning algorithms.

We also help companies integrate these algorithms with image & video analytics, as well as with emerging technologies such as augmented reality & virtual reality to deliver utmost customer satisfaction and gain a competitive edge over others.

For more information, visit:

About the authors

Rajesh Khamitkar is working as a Senior Tech Lead at eInfochips. He has an industry experience of more than 11 years. He has worked in safety-critical avionics systems for more than 7 years. His areas of interest include communication and flight display systems.

Rahul Badnakhe is a Senior Content Marketer at eInfochips with around five years of experience in developing customer-centric collaterals on digital technologies such as IoT, Machine Learning, Artificial Intelligence, Cloud computing, Big Data Analytics among others.