A Step by Step Guide to Voice Enabled Device Testing

By Keyur Shah

Senior Technical Lead


June 24, 2020


A Step by Step Guide to Voice Enabled Device Testing

It has been said that devices cannot do everything that humans can do. However, the devices that we use in our daily lives have been evolving over the last couple of decades.

It has been said that devices cannot do everything that humans can do. However, the devices that we use in our daily lives have been evolving over the last couple of decades. We have seen significant changes in them in terms on functionality, connectivity, and size. One of the biggest challenges, however, has been the size of the device as more efforts are put in to achieving a smaller form factor.

A few years ago, a new challenge emerged: the device cannot communicate like humans. This led to standalone devices being transformed into connected devices with added voice enabled operations.

How are Voice Enabled Devices Helping Human?

Initially, human touch was required to perform any action on the device. Now, with voice enabled devices and IoT technology, humans can give a command to operate the device over voice. These newly invented devices convert human voice into device action, sends command to another device over the internet, and get the desired action performed. These devices can not only interact with humans, but can also interact with other devices over the internet.

The Major Challenge of Voice Enabled Devices Testing

Day-by-Day usage of voice enabled devices around the world is increasing rapidly. Having more than 500 countries supporting more than 1000 languages with different accents, genders, and voice modulations based on age group, makes it challenging to verify voice enabled devices. It is almost impossible to test these devices with so many different combination and permutations in a short span of time. So, let us see how we can automate the testing of voice enabled devices.

Automating Voice Enabled Device Testing

To avoid manual testing efforts, we need to design an automation solution which can be used to test these devices using different languages. The easiest option is to work with frameworks that can help develop automation scripts with such voice integrated devices.

As of now, there are no open-source frameworks available in the market that provide all of the features required to test integration with voice enabled devices. The challenges here are how to give a command to the device in different languages, how to read the response from the device, and test the expected output.

  • To give a command to a device without manual efforts, one needs to identify a command in text format. Convert the text in audio format.
  • Play the audio so that voice enabled devices can listen and process.
  • Wait for a response from the device, record it in an audio file, and as the last step you need to convert this audio into the text to match against the expected format.

Each device testing procedure will have custom requirements; hence, the framework has to be modular. To get the solution, we need to design a modular and scalable framework where each step of this solution can be implemented by open-source or paid libraries available in the market.

We have designed 4 modules in the framework below:

  • Multi Language Text: To convert text from one language to another language
  • Text-Audio Module: To convert text to mp3
  • Audio-Text Module: To convert wav to text
  • Audio Module:
    • To play an mp3 file using an audio output device
    • To read audio data using a mic
    • Save audio data to wav file

Detailed Solution

1.Prepare device command in the English language

a.Use Multi Language Text module to convert the device command in a language that can be understood by the device. It uses services provided by Google for translation, where you can translate text from any source language to any desired language.

2.Create an audio file for the translated text

a.Use the Text-Audio module to convert text to audio.  Generated audio can be played on the audio output device. This module uses Google text-to-speech service in the backend.

3.Play audio

a.Use the Audio module to play an mp3 file to an audio output device.

b.This step requires audio output device and voice enabled device in proximity so that when audio is played, the device can capture the audio and process the command.

4.Record audio

a.This step is required to capture the response from voice enabled device.

b.Use the Audio module to capture the recording data from the mic. You need to pass the duration parameter to mention for how long you want to have recording and module returns audio sample data

c.Once sample data is available, it needs to be saved as a wav (audio) file. To achieve this, save_audio_to_file method can be used. This method takes sample audio data and writes it to a wav file which can later be played using an audio device or can be used to convert it to text.

5.Convert captured audio to text

a.Use the Text-Audio module to convert the wav file to text content. This is achieved using a speech recognizer. You should specify the input wav file and audio content language.

b.To convert audio to text, third party libraries provided by various vendors can be used.

6.Translate above text to English language and verify it against the expected result in English

Using these above 4 modules, once can implement voice automation for voice based integrated devices.

Real Life Scenario for E2E Testing of Home Automation Product

Home Automation System consists of various devices that can be operated through the web using REST services. Security cameras, lights, thermostats, sensors, and doorbells are few examples of home automation devices. For example, the end user who is the homeowner can turn on or off the light remotely using internet portals. Some of the systems provide integration with third party partners like Alexa, Google, etc.

Read: Case Study- Alexa Integration for a Home Automation Hub

Companies provide devices that can listen to the human voice and perform the action requested by the user. So, considering light as a home automation product and Alexa as a third-party partner to home automation system provider, we want to test whether light can be turned on/off through Alexa or not.

To automate the E2E scenario, we need to perform the below steps using the above discussed automation framework.

1.Prepare the Alexa command to turn on the light in the English language.

a.“Alexa, turn on light”

2.Convert above command to mp3 file

3.Play an mp3 file near the Alexa device using the speaker attached to the automation machine.

4.Record the response from Alexa in the wav file.

5.Convert audio file to the text which could be “Ok. Turning on the light” or “light is turned on”.

6.Verify the converted text with the expected result set.

7.In verification, just one step ahead, we can also verify the actual IoT light status using:  

a.REST API can be used to fetch the light status from the home security system

b.Web automation of web security portals can be done to verify the light status

c.If the light status is being stored in cloud DB, we can fetch the data from DB to verify the status



Using the above steps, one can not only test the system integration or end-2-end testing with just one voice enabled device, but can also test the system by combining multiple clients or devices. Users can perform an action on one of the device/product using Alexa and verify it’s status using Google or Portal or vice versa. For example, user asks Google to Turn On the light and then fetch the light status using Alexa/Customer Portals.

eInfochips is a preferred partner for product companies requiring comprehensive test coverage from devices to applications. eInfochips offers extensive cost and effort savings through test automation, SDET (Software development engineer in Test), Shift-left testing, and DevOps.

About the Authors

Dhaval Patel

Dhaval Patel has more than 15 years of IT experience in software design, development, and automation testing to create products for IoT enabled platform and home security & automation domains. He is also involved in the design and development of unified Test Automation Framework, which enables automation on IoT devices, voice assistant enabled devices, Rest API, Web application, and Mobile application.

Keyur Shah

Keyur Shah is Sr. Tech lead with more than 11 years of experience at eInfochips Product Engineering Service Business unit. He worked on the full lifecycle of product/application beginning from product requirement, design, development, manual testing, automation to product release into market using technologies like Java, QT C++, Javascript, Python, Git, Jenkins, Jira, etc.. His experience varies in different vertical that includes home security and automation, collaborative and semiconductor.

Experienced Senior Technical Lead with a demonstrated history of working in the information technology and services industry. Strong engineering professional skilled in Core Java, J2EE and advanced frameworks, Agile Methodologies, Python, and Android.

More from Keyur