NVIDIA Enables Era of Interactive Conversational AI with New Inference Software

By Tiera Oliver

Associate Editor

Embedded Computing Design

January 02, 2020


NVIDIA Enables Era of Interactive Conversational AI with New Inference Software

NVIDIA TensorRT 7?s Compiler Delivers Real-Time Inference for Smarter Human-to-AI Interactions.

NVIDIA, a technology company that designs graphics processing units for gaming and professional markets, and system on a chip units for the mobile computing and automotive market, introduced inference software that developers can use to deliver conversational AI applications, inference latency, and interactive engagement.

NVIDIA TensorRT 7, according to the company, opens the door to smarter human-to-AI interactions, enabling real-time engagement with applications such as voice agents, chatbots and recommendation engines.

It is also estimated that there are 3.25 billion digital voice assistants being used in devices around the world, according to Juniper Research. By 2023, that number is expected to reach 8 billion, more than the world’s total population.

TensorRT 7 features a new deep learning compiler designed to optimize and accelerate the recurrent and transformer-based neural networks needed for AI speech applications. According to the company, this speeds the components of conversational AI by more than 10x compared to when run on CPUs, driving latency below the 300-millisecond threshold considered necessary for real-time interactions.

Some companies are already taking advantage of NVIDIA’s conversational AI acceleration capabilities. Among these is Sogou, which provides search services to WeChat, a frequently used application on mobile phones.

Rising Importance of Recurrent Neural Networks
With TensorRT’s new deep learning compiler, developers everywhere now have the ability to automatically optimize these networks, such as bespoke automatic speech recognition networks, and WaveRNN and Tacotron 2 for text-to-speech, and to deliver performance and low latencies. 

The new compiler also optimizes transformer-based models like BERT for natural language processing.

Accelerating Inference from Edge to Cloud
According to NVIDIA, TensorRT 7 can optimize, validate and deploy a trained neural network for inference by hyperscale data centers, embedded or automotive GPU platforms.

NVIDIA’s inference platform, which includes TensorRT, as well as several NVIDIA CUDA-X AI libraries and NVIDIA GPUs, delivers low-latency, high-throughput inference for applications beyond conversational AI, including image classification, fraud detection, segmentation, object detection and recommendation engines. Its capabilities are used by some of the world’s leading enterprise and consumer technology companies, including Alibaba, American Express, Baidu, PayPal, Pinterest, Snap, Tencent and Twitter.

TensorRT 7 will be available in the coming days for development and deployment, without charge to members of the NVIDIA Developer program from the TensorRT webpage. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

For more information, please visit: https://www.nvidia.com/en-us/#source=pr

Tiera Oliver, Associate Editor for Embedded Computing Design, is responsible for web content edits, product news, and constructing stories. She also assists with newsletter updates as well as contributing and editing content for ECD podcasts and the ECD YouTube channel. Before working at ECD, Tiera graduated from Northern Arizona University where she received her B.S. in journalism and political science and worked as a news reporter for the university’s student led newspaper, The Lumberjack.

More from Tiera