IoT Intelligence Moves Toward the Edge

By Vineet Ganju

Vice President Marketing

Synaptics Incorporated

September 16, 2019


IoT Intelligence Moves Toward the Edge

Moving IoT Intelligence more toward the edge: how enhanced options for local processing lessens dependence on cloud connections.

While the Internet of Things (IoT) is by definition a concept that implies connectivity, in Consumer IoT, there is an increasing demand for more local, or edge-based processing in devices to complement cloud-based functions. Certainly, access to a cloud server is necessary for real-time information, like news, stock quotes and other dynamic data. But for cost, performance, privacy and security reasons, moving more to the edge has become an increasing priority.  

For consumers, much of what people want easy access to – personal calendars, memos, even email, for example – can be cached locally, providing more speed and security in how we access useful information. Also, localized information like weather can be cached locally helping to provide a speedier response without worrying about internet latency. Another motivation is bandwidth: Do we really want to update our data plan when we install a new video doorbell, to support continuous livestreaming to the cloud? Finally, and perhaps most importantly to user adoption, is the human experience: smart devices need to behave and react in more natural ways – including being able to understand intent, not just commands, recognizing personal preferences, and responding in near real-time. Thanks to developments in neural networks and AI, such capabilities can now be implemented on the edge. This is ushering in an era of distributed intelligence (AI on the Edge and in the Cloud) from a centralized intelligence architecture (AI in the Cloud), which had been the cornerstone of IoT devices in the past few years.  

From the supplier perspective, on top of the raw processing power needed to perform these types of functions, smart devices significantly increase the cost and bandwidth required for cloud operations. Google is a prime example. When analyzing the impact of voice recognition in Google search on Android phones, Google engineers concluded that if each user of Google voice search used it for just three minutes a day, the company would need twice as many data centers. Finding more efficient cloud processing solutions, such as neural network accelerators and machine learning help address this in the data center, but longer term, lessening the dependence on the cloud by shifting the burden to the edge (and to end users) has become a priority strategy across the consumer IoT ecosystem. 

HMI Move to the Edge 

Human machine interface (HMI) is a critical element in enhancing the user experience in this new era of connected devices. Machines that can understand and predictively respond to what we do, say, or touch without a constant dependence on the cloud promise to revolutionize how IoT can deliver unprecedented levels of privacy, convenience and productivity in our lives. Smart devices enabled with more responsive and sophisticated voice or visual interfaces can better control lighting or temperature, or physical access, for example, and allow more contextual awareness, such as user personalization, parental controls, motion detection and security monitoring.  

Thanks to advancements in edge-based technology from companies like Synaptics, the processing burden for many of these functions can be handled securely on a local device. This has direct benefits both in terms of the user experience and cost, as well as in terms of data security. From a developer standpoint, the several metrics critical to understanding the tradeoffs between processing at the edge vs. cloud are: 

· Latency

· Security

· Privacy

· Power

· Reliability

· Cost

· Content Rights

Today’s edge-processing platforms can favorably measure up on all of those criteria, and both device makers and consumers agree that whatever can be done locally should be enabled ‘on device’ in today’s smart home, commercial and industrial environments. Local caching, sensor fusion, and secure inferencing from machine learning algorithms all are enabling a greater user of local processing, improving the overall customer experience in many ways. 

Neural Networks are Key 

Access to the internet will always be necessary for many IoT scenarios for functions like streaming movies and music and real-time updates or requests for random information. But this new era of hybrid cloud/edge IoT will be facilitated by more ‘local intelligence’ that lessens the need for (and cost and risk of) always sending the user-input (which could be voice or vision) and getting a response from the cloud. AI-driven neural networks, processed at the edge, hold the key to addressing challenges in performance and robustness, as well as address privacy concerns. 

Until now, smart edge processing has been reserved for expensive devices like smartphones, as it requires a considerable amount of computation that has been out of reach for low cost devices or appliances. New generation SoCs offer secure neural network acceleration at price points targeting mainstream consumer devices. 

Now, cost-effective AI-based solutions can be used to improve performance to create a more human-like experience. Truly smart devices will take advantage of multi-sensor, always listening features to learn behavioral patterns and associate them with device interactions. This will enable the device to make use of implicit communication instead of only relying on explicit communications that today’s devices depend on. Most notably, enabling devices with local intelligence will allow them to respond with near human-like speed. That reduction in latency achieved by not making a cloud call feels almost immediate to the user. 

IoT Human Interface Becomes Multimodal 

Voice represents the fastest growing user interface and continues to be redefined by ongoing technological innovation. With performance and feature breakthroughs using a far-field voice interface, this brings a more natural user convenience and usefulness to voice enabled devices. Another exciting area is integrated computer vision on the device which offers face, emotion and content identification. Computer vision can now be enabled locally in a cost-effective manner and the next step for the IoT interface is to become multimodal, where voice, gestures, gaze and touch will all play a role, and it will be further personalized through secure biometric identification.  

Biometric identification can be achieved on the device without registering a profile, it just differentiates your voice from another voice, and through machine learning, it can deliver specialized content or personalization. The range of additional modalities, combined with AI, will then enable the HMI to better learn and adapt to individual users’ behaviors. It will become context aware. In order to enhance user experience, on top of voice and vision user interface, the device should also be able to analyze the content via machine learning methods that the user is watching locally on the device. This will allow the device to personalize its interface to better match the user preference. 

Secure Inferencing at the Edge Addresses Consumer Concerns 

Secure inferencing at the edge addresses many of the challenges of more widespread adoption of IoT devices and opens up new possibilities for the human interface whether it’s video, voice, touch or visual. At the same time, edge processing increases privacy, security, and users’ control over their own data as the personal data is processed and consumed on the device and only anonymized information is sent to the cloud. For a device to become contextually aware, and enable more seamless interaction, it requires the device to listen for more than just a trigger word.  

For example, the device should be able to perform sophisticated levels of speech-to-text and natural language understanding, be able to detect and match faces of users with an on-device database, without needing the cloud resources. Enabling such contextually aware features using a cloud-centric architecture introduces privacy and security concerns for consumers. New distributed architectures address these concerns with high performance processing in the edge device that can enable features such as a 100,000+ word vocabulary in a voice recognition capability without a cloud connection. This is important because it can be implemented in a very cost-effective way in devices at consumer-friendly price points. 

The Privacy Issue

User privacy has become a major issue for connected devices. Using voice as an example, we can see where privacy risks are: with a typical current generation architecture, the audio signal either goes directly to the CPU or DSP where some processing is done, and the actual audio signal is transmitted to the cloud. Until now, all the AI processing took place in the cloud. In these implementations, all sensor information is sent to the cloud (much of it needlessly in terms of the intended function) and there are multiple points of vulnerabilities in the system where a remote attack can occur – whether tapping into the sensor data or providing a direct access to personal and sensitive information. Even with software level encryption, a hacker would need an understanding of the software to circumvent it. 

What is required is enterprise-grade encryption at the local (edge) hardware level. To implement a truly secure approach, an integrated solution that has the right sensor interfaces, a powerful application processor with a Trusted Execution Environment, a firewalled security processor with hardware root-of-trust, and a neural network accelerator is required. That is the approach underlying the SyNAP framework developed by Synaptics for its new line of Smart Edge AI SoC solutions. For the sensor interface, a minimum requirement is a microphone interface, while an RGB sensor interface is beneficial to enable additional contextual awareness. The security processor and the neural network engine are the two key ingredients to enable a voice/vision UI that is safe, reliable, and has the robustness to deliver a sophisticated user experience. 

The goal of the security processor is to firewall sensor information and user data from malicious attacks. Even if an attack would successfully enable a hacker to run software on the device, it is important that any such software would not be able to get access to the sensor interface, and all user data that are stored on the device securely in a Trusted Execution Environment, or TEE.  In a secure inference, the sensor information and other user data can be processed by the application processors and neural network accelerators, while the data are still being firewalled from software frameworks running on the application processor as well as malicious code running on the application processor. 

By having the capability to efficiently run neural network processing on the edge SoC, a lot of the AI processing currently done in the cloud can now be done on the local consumer edge device. This reduces the need to transmit all the sensor information to the cloud while at the same time enabling a secure ‘always listening’ HMI that is context aware. 

Bringing it all Together

The ability to firewall all sensor information within a device and combined with the capability to run complex machine learning algorithms opens a range of new applications that were not practical before. With the sensor data secure, device makers and consumers alike can be more confident that both audio and image sensors can be in always-on mode while providing an acceptable guarantee for privacy. The device can then use its machine learning capabilities to become even more contextually aware using audio and/or video data. 

As an example, the device can be running voice biometrics, a large vocabulary, and natural language understanding in always-listening mode. This allows the device to constantly break down who is talking around it and determine from the content of the speech if its participation is needed. Using this information, the device could determine if it is being addressed without the use of trigger words like ‘Hey Alexa,’ making the interaction more seamless.  The device could even in some cases decide to initiate a conversation. The device can then over time build up a knowledge of preferences that are linked to contextual events, allowing the device to determine users’ intent with minimal interaction. None of this requires a connection to the cloud. 

Given the lack of IoT device-to-device universal communication standards, it is still required to control one device from another via a cloud-to-cloud connection. But as neural network accelerators get enabled in IoT edge devices, it will spur a new trend for the devices in a LAN to control each other without going to the cloud because once the ASR+NLU takes place on the device there is no real reason to go to the cloud except for the lack of communication standards. But given that ASR+NLU was happening on the cloud, there was no real impetus to do a true LAN based device control that could be triggered by voice. But that will change now because doing it in the LAN has the lowest latency and the high security and privacy benefits. 

Digital rights management is another challenge. When analyzing content (video or movie the user is streaming from an OTT source), the content rights associated with this premium content must be considered. These rights prohibit any part of the video and audio track to be sent to a centralized server for analysis, so it must be done on the device in a secure and trusted environment. New SoCs can enable machine learning based analysis of content rights, protecting video content in a secure and trusted execution environment. 

Better Efficiency Through the Edge 

Edge-based processing is also a great way to reduce the amount of data that needs to be sent to the cloud. For example, security cameras with the ability to run object and event detection locally will save large amounts of internet data bandwidth. A security camera recording in 1080p can transmit up to 4Mbps. It’s not unusual for security cameras installed with cloud recording services to result in exploding data usage that can lead to users quickly going over the limit of their current data plans. This can lead to considerable cost to the user. When the camera runs object and event detection locally, it can be configured to only send video to the cloud when something meaningful happens. This leads to a significant savings in data transmission, and a direct cost savings for the consumer. 

There is a wealth of other applications that can be enabled on security cameras. For example, the camera could be configured to not send video to the cloud if the activity it sees only stems from family members. Notifications could be made more accurate, both by reducing annoying false triggering that produce meaningless notifications, and by providing more meaningful descriptions in notifications. The camera could also use detection of acoustic events to initiate cloud transmission and notifications. 

Similar to the Google voice example cited earlier, not having to store all the data that cameras generate reduces the complexity and size of data centers, which then will reduce the operating expenses for the device OEMs. This is one key reason behind the big push to enable machine learning and AI on the Edge by these device OEMs. 

Edge Processing with AI will Expand the use of Consumer IoT 

IoT device makers know the benefits of edge-based processing, but until now, many of the challenges in terms of cost, performance and security have made it impractical for implementing in consumer products and systems. The shift toward more use of edge processing in conjunction with cloud connectivity has begun in earnest, as evidenced by the adoption of Synaptics solutions in this space by leading device makers and platform suppliers. By using advanced AI-based neural networks to enable edge-based IoT, chip suppliers are able to offer a broad and integrated solution to address the challenges of traditional cloud-only architectures. This type of advanced human interface functionality can be cost-effectively implemented in a wide range of devices that improve and secure our lives. 

Vineet Ganju, Vice President IOT AI, Voice and Audio at Synaptics, the leading developer of human interface solutions, has over 15 years’ experience in technology. Vineet has an M.S. from Stanford University in Electrical Engineering, an MBA from the University of Texas at Austin and a B.S. in Math and Electrical Engineering from Queen’s University.