Should You Choose the Google Coral Toolkit for Your Low-Power Edge AI Applications?
March 14, 2022
After the success of Google’s first-generation scalable distributed training and inference system, DistBelief, the Google Brain team, in collaboration with Alphabet, built TensorFlow, the second-generation system for the implementation and deployment of large-scale machine learning models.
Compared to the DistBelief , TensorFlow’s programming model is much more flexible while maintaining its high performance and support for training and using a wide range of machine learning models on various heterogeneous hardware platforms. As Google said, “great software shines brightest with great hardware underneath,” several teams at Google explored the design for custom accelerator for machine learning applications. These efforts led to the birth of the Tensor Processing Unit (TPU), a custom application-specific integrated circuit for machine learning– tailored for TensorFlow.
For over a year, Google verified the performance and efficiency of TPUs in their data centers that delivered an order of magnitude optimized performance per watt. The TPU chip is more tolerant for reduced computational precision signifying that it requires few transistors per operation– resulting in more operations per second on the same silicon tapeout.
In terms of quantitative analysis, the TPU delivered 15-30x more performance and 30-80x higher performance per watt than contemporary CPUs and GPUs. Through this, Google can design and deploy machine learning neural network models at scale and at a lower cost. Using 28nm process technology, the Google TPU runs at 700MHz and consumes 40W power when running and supports PCIe Gen3 x16 bus that provides 12.5GB/s of bandwidth for connecting with its host platform.
The Road to Google Coral
Google introduced Coral, a complete toolkit to build AI applications, leveraging the on-device inference capabilities that are efficient, private, fast, and offline. All of this started with the announcement of Google’s Edge TPU, a small application-specific integrated circuit providing high-performance ML inference for low-power devices.
An individual ASIC can perform 4 trillion operations per second (4 TOPS) while demanding 2 watts of power (2 TOPS/watt). Cloud TPUs are very different from Edge TPU as they are ideal to train large, complex machine learning models that can take weeks to train on hardware. Edge TPUs are designed for small and low-power devices as mentioned earlier that are ideal for on-device ML inference.
Image Source: Google Coral AI
As the name suggests, Google Edge TPU supports TensorFlow Lite only with the first-generation Edge TPU capable of executing deep feed-forward neural networks like convolutional neural networks (CNN), making it a good choice for vision-based ML applications. This Edge TPU can perform accelerated machine learning training but is only restricted to retaining the final layer. However, the APIs can perform accelerated transfer learning through backpropagation and weight imprinting.
To support the Google TPU, the manufacturer designed several hardware under the hood of Coral that integrated edge TPUs. Some of the popular custom hardware on the list include the Dev Board and USB accelerator– seen as a part of experimentation in many AI-focused applications.
A team of researchers from Commonwealth Scientific and Industrial Research Organization (CSIRO), the University of Queensland, and the Queensland University of Technology in Australia reported the findings of energy efficiency performance of the Edge TPU compared to the widely adopted embedded processor Arm Cortex-A53 . The results indicated that for the models with less than 5400 input nodes and 0.15MB model size, Cortex-A53 is more efficient than the Edge TPU. However, as the model size increases, Edge TPU outperforms the Cortex-A53– maintaining the performance until the model size exceeds 8MB. As soon as the model size is around 13.5MB, Cortex-A53 overtakes the Edge TPU and with greater input nodes than 5400, Cortex-A53 is highly efficient.
Taking a Step Forward
Recently, Google, without any official press release or announcement, launched a landing page for the new Coral development board micro with an onboard camera, mic, and Edge TPU. Slightly larger than the famous feather form factor, 65x30 mm development board micro integrates the NXP i.MX RT1176 microcontroller featuring Cortex-A7 and Cortex-A4– along with the Coral Edge TPU co-processor, delivering 4 TOPS. The combination of Arm processors in a single silicon tapeout provides superior computing power and multiple media capabilities.
Image Credit: Google Coral AI
As per the onboard components, a built-in camera and microphone indicate the special design to prototype and deploy low-power embedded systems like object detection and image classifications. Deep neural networks have optimized the implementation of vision-based applications which are supported by the Edge TPU on-device machine learning inference. Along with the decent input/output connectivity, the 12-pin GPIO header enables developers to interface I/O devices to the Coral development board micro.
With Google realizing the need to fulfill the requirements for extremely low-powered edge devices for faster data processing and low latency, Google released the Coral dev board micro with to focus on microcontroller-powered tinyML projects. Google hasn’t revealed many details on the pricing and availability of the product, meaning interested developers need to wait an indefinite time for more clear information. There is no information available other than the product listing page, as of writing.
 Abadi. M, et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” White Paper, 2015. [Online]. Available: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf.
 Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Brano Kusy, Raja Jurdak, Marius Portmann: Exploring Deep Neural Networks on Edge TPU. DOI arXiv:2110.08826 [cs.LG].