Is the Google Coral Toolkit Right for Your Low-Power Edge AI Applications?
March 29, 2022
After the success of Google’s first-generation scalable distributed training and inference system, DistBelief, the Google Brain team, in collaboration with Alphabet, built the second-generation system for the implementation and deployment of large-scale machine learning models, TensorFlow.
Compared to DistBelief , TensorFlow’s programming model is more flexible while maintaining its high performance and support for training, and uses a wide range of machine learning models on various heterogeneous hardware platforms. Several teams at Google explored the design for custom accelerators for machine learning applications. These efforts led to the birth of the Tensor Processing Unit (TPU), a custom application-specific integrated circuit for machine learning tailored for TensorFlow.
For over a year, Google verified the performance and efficiency of TPUs in their data centers that delivered an order of magnitude optimized performance per watt. The TPU chip is more tolerant for reduced computational precision signifying, so it requires few transistors per operation – resulting in more operations per second on the same silicon tapeout.
In terms of quantitative analysis, the TPU delivered 15-30x more performance and 30-80x higher performance per watt than contemporary CPUs and GPUs. Through this, Google is able to design and deploy machine learning neural network models at scale and at a lower cost. Using 28nm process technology, the Google TPU runs at 700MHz, consumes 40W when running, and supports a PCIe Gen3 x16 bus that provides 12.5GB/s of bandwidth for connecting to the host platform.
The Road to Google Coral
Google introduced Coral, a complete toolkit to build AI applications, leveraging the on-device inference capabilities that are efficient, private, fast, and offline. All of this started with the announcement of Google’s Edge TPU, a small application-specific integrated circuit providing high-performance ML inference for low-power devices. An individual ASIC can perform 4 trillion operations per second (4 TOPS) while demanding 2 watts of power (2 TOPS/watt). Cloud TPUs are suitable for training large, complex machine learning models that can take weeks to train on hardware. Edge TPUs are designed for small, low-power devices that are ideal for on-device ML inference.
[Image Source: Google Coral AI]
As the name suggests, Google Edge TPU supports TensorFlow Lite only with the first-generation Edge TPU capable of executing deep feed-forward neural networks like convolutional neural networks (CNN), making it a good choice for vision-based ML applications. This Edge TPU can perform accelerated machine learning training but is only restricted to retaining the final layer. However, the APIs can perform accelerated transfer learning through backpropagation and weight imprinting.
To support the Google TPU, the manufacturer designed several hardware under the hood of Coral that integrated edge TPUs. Some of the popular custom hardware on the list includes the Dev Board and USB accelerator, seen as a part of experimentation in many AI-focused applications. A team of researchers from Commonwealth Scientific and Industrial Research Organisation (CSIRO), the University of Queensland, and the Queensland University of Technology in Australia reported the findings of energy efficiency performance of the Edge TPU compared to the widely adopted embedded processor Arm Cortex-A53 .
The results indicated that for the models with less than 5400 input nodes and 0.15MB model size, Cortex-A53 is more efficient than the Edge TPU. However, as the model size increases, Edge TPU outperforms the Cortex-A53, maintaining the performance until the model size exceeds 8MB. As soon as the model size is around 13.5MB, Cortex-A53 overtakes the Edge TPU; for input nodes greater than 5400, Cortex-A53 is highly efficient.
Taking a Step Forward
Recently, and without any official press release or announcement, Google launched a landing page for the Coral development board micro with an onboard camera, mic, and Edge TPU. Slightly larger than the famous feather form factor, the 65x30 mm development board micro integrates the NXP i.MX RT1176 microcontroller featuring Cortex-A7 and Cortex-A4 along with the Coral Edge TPU coprocessor, delivering 4 TOPS. The combination of Arm processors in a single silicon tapeout provides superior computing power and multiple media capabilities.
[Image Source: Google Coral AI]
As per the onboard components, a built-in camera and microphone indicate the special design to prototype and deploy low-power embedded systems like object detection and image classifications. Deep neural networks have optimized the implementation of vision-based applications that are supported by the Edge TPU on-device machine learning inference. Along with the input/output connectivity, the 12-pin GPIO header enables developers to interface I/O devices to the Coral development board micro.
After the Raspberry Pi-like Google Coral development board, the manufacturer looked for a smaller, low-power, high-performance board in the Coral dev board mini. But with Google's realization of the need to fulfill the requirements for extremely low-powered edge devices for faster data processing and lower latency, comes the Coral dev board micro to focus on microcontroller-powered tinyML projects.
Google hasn’t revealed many details on the pricing and availability of the product, meaning interested developers need to wait an indefinite time for more clear information. There is no information available other than the product listing page, as of writing.
 Abadi. M, et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” White Paper, 2015. [Online]. Available: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf.
 Seyedehfaezeh Hosseininoorbin, Siamak Layeghy, Brano Kusy, Raja Jurdak, Marius Portmann: Exploring Deep Neural Networks on Edge TPU. DOI arXiv:2110.08826 [cs.LG].