CEVA Announces NeuPro-M Heterogeneous and Secure Processor Architecture

By Tiera Oliver

Assistant Managing Editor

Embedded Computing Design

January 06, 2022

News

CEVA Announces NeuPro-M Heterogeneous and Secure Processor Architecture

CEVA, Inc. announced NeuPro-M, its latest generation processor architecture for artificial intelligence and machine learning (AI/ML) inference workloads.

Targeting the broad markets of Edge AI and Edge Compute, NeuPro-M is a self-contained heterogeneous architecture that is composed of multiple specialized co-processors and configurable hardware accelerators that seamlessly and simultaneously process diverse workloads of Deep Neural Networks, boosting performance by 5-15X compared to its predecessor.

An industry first, NeuPro-M supports both system-on-chip (SoC) as well as Heterogeneous SoC (HSoC) scalability to achieve up to 1,200 TOPS and offers optional robust secure boot and end-to-end data privacy.

NeuPro–M compliant processors initially include the following pre-configured cores:

NPM11 – single NeuPro-M engine, up to 20 TOPS at 1.25GHz
NPM18 – eight NeuPro-M engines, up to 160 TOPS at 1.25GHz

According to the company, a single NPM11 core, when processing a ResNet50 convolutional neural network, achieves a 5X performance increase and 6X memory bandwidth reduction versus its predecessor, which results in suitable power efficiency of up to 24 TOPS per watt.

NeuPro-M is capable of processing all known neural network architectures, as well as integrated native support for next-generation networks like transformers, 3D convolution, self-attention, and all types of recurrent neural networks. NeuPro-M has been optimized to process more than 250 neural networks, more than 450 AI kernels, and more than 50 algorithms. The embedded vector processing unit (VPU) ensures future proof software-based support of new neural network topologies and new advances in AI workloads. Furthermore, the CDNN offline compression tool can increase the FPS/Watt of the NeuPro-M by a factor of 5-10X for common benchmarks, with minimal impact on accuracy.

The NeuPro-M heterogenic architecture is composed of function-specific co-processors and load balancing mechanisms that are the main contributors to the leap in performance and efficiency compared to its predecessor. By distributing control functions to local controllers and implementing local memory resources in a hierarchical manner, the NeuPro-M achieves data flow flexibility that result in more than 90% utilization and protects against data starvation of the different co-processors and accelerators at any given time. The optimal load balancing is obtained by practicing various data flow schemes that are adopted to the specific network, the desired bandwidth, the available memory, and the target performance, by the CDNN framework.

NeuPro-M architecture highlights include:

Main grid array consisting of 4K MACs (Multiply And Accumulates), with mixed precision of 2-16 bits.
Winograd transform engine for weights and activations, reducing convolution time by 2X and allowing 8-bit convolution processing with <0.5% precision degradation.
Sparsity engine to avoid operations with zero-value weights or activations per layer, for up to 4X performance gain, while reducing memory bandwidth and power consumption.
Fully programmable Vector Processing Unit, for handling new unsupported neural network architectures with all data types, from 32-bit Floating Point down to 2-bit Binary Neural Networks (BNN).
Configurable Weight and Data compression down to 2-bits while storing to memory, and real-time decompression upon reading, for reduced memory bandwidth.
Dynamically configured two level memory architecture to minimize power consumption attributed to data transfers to and from an external SDRAM.

To illustrate the benefit of these features in the NeuPro-M architecture, concurrent use of the orthogonal mechanisms of Winograd transform, Sparsity engine, and low-resolution 4x4-bit activations, delivers more than a 3X reduction in cycle count of networks such as Resnet50 and Yolo V3.

As neural network Weights and Biases, the data set, and network topology become key Intellectual Property of the owner, there is a need to protect these from unauthorized use. The NeuPro-M architecture supports secure access in the form of optional root of trust, authentication, and cryptographic accelerators.

For the automotive market, NeuPro-M cores and its CEVA Deep Neural Network (CDNN) deep learning compiler and software toolkit comply with Automotive ISO26262 ASIL-B functional safety standard and meets the stringent quality assurance standards IATF16949 and A-Spice.

Together with CEVA’s neural network compiler – CDNN – and its robust software development environment, NeuPro-M provides a fully programmable hardware/software AI development environment for customers to maximize their AI performance. CDNN includes software that can fully utilize the customers’ NeuPro-M customized hardware to optimize power, performance & bandwidth. The CDNN software also includes a memory manager for memory reduction and optimal load balancing algorithms, and wide support of various network formats including ONNX, Caffe, TensorFlow, TensorFlow Lite, Pytorch, and more. CDNN is compatible with common open-source frameworks, including Glow, tvm, Halide, and TensorFlow, and includes model optimization features like ‘layer fusion’ and ‘post training quantization’ all while using precision conservation methods.

NeuPro-M is available for licensing to lead customers today and for general licensing in Q2 this year. NeuPro-M customers can also benefit from Heterogenous SoC design services from CEVA to help integrate and support system design and chiplet development.

For more information, visit https://www.ceva-dsp.com/product/ceva-neupro-m/.

Read more of Embedded Computing Design’s CES 2022 coverage at https://www.embeddedcomputing.com/ces-2022 or stay up to date by following the @embedded_comp twitter handle.

Tiera Oliver is the assistant managing editor at Embedded Computing Design. She is responsible for web content editing, product news, and story development. She also manages, edits, and develops content for ECD podcasts, including Embedded Insiders.

She utilizes her expertise in journalism and content management to oversee editorial content, coordinate with editors, and ensure high-quality output across web, print, and multimedia platforms. She manages diverse projects, assists in the production of digital magazines, and hosts company podcasts by conducting in-depth interviews with industry leaders to deliver engaging and insightful discussions.

Tiera attended Northern Arizona University, where she received her bachelor's in journalism and political science. She was also a news reporter for the student-led newspaper, The Lumberjack.

Embedded Computing Design

CEVA Announces NeuPro-M Heterogeneous and Secure Processor Architecture

By Tiera Oliver

CEVA, Inc. announced NeuPro-M, its latest generation processor architecture for artificial intelligence and machine learning (AI/ML) inference workloads.

Categories

AI & Machine Learning - AI Logic Devices & Workload Acceleration

Processing

Topic Tags

Trending Articles

oToBrite Releases GMSL2 Repeater to Extend Camera-to-SoC Links Up to 30 Meters

Getting Started with MIPI I3C: A Guide for System Designers

Astera Labs Introduces Taurus 3.2T Smart Retimers and Redrivers to Scale Next-Generation AI Infrastructure

Sundance VCS³ Stack Combines AMD Zynq UltraScale+ MPSoC with FPGA Acceleration in 70-Gram Platform

Efficient Computer Launches Efficient Labs Interactive Tool Suite for Engineers

Debug & Test

Apply now: TITAN Haptics Introduces TITAN Haptics Launchpad with Up to $60,000 in Engineering Support for Hardware Startups

Industrial

SAPPHIRE EDGE+ Apex Leverages AMD Ryzen AI to Accelerate Physical AI for Autonomous Robotics

Security

Protect Your Industrial Network Without a Redesign

Software & OS

Your Distributors Sell What's Easy. AI Just Made Your Parts the Easy Choice