Search

Growing MCU capabilities shift more AI from the cloud to the edge

By Mikko Saarnivala, Alif Semiconductor

Battery life (the time between charges) is a crucial figure of merit for many types of portable, wearable, mobile and remotely located devices, from smart glasses and smart watches to agricultural sensors for measuring soil conditions or animal health. We now know from research cited in an earlier blog that energy consumption can be at least five times lower when AI inference is performed locally on an ESP32-S3 system-on-chip (SoC) than when raw data is transferred wirelessly to the cloud for inference. Replacing the ESP32-S3 with a power-efficient Ensemble E3 microcontroller can increase the energy saving much more, providing as much as a 55x reduction in energy.

So the case for on-device edge AI/ML seems clear-cut.

Except that real-world applications sometimes do not present simple, binary choices to system architects. Often, an edge device such as a wireless sensor node is a small part of a large intelligent system which takes inputs from widely distributed sources: information for critical business decisions might reside not in any one node, but in the aggregation and processing of data centrally from hundreds or thousands of nodes. In this case, there would be no value without the compute resources of the cloud.

At the same time, improving technology means the increasing capability of edge AI MCUs such as the Ensemble and Balletto families, which can perform locally a wide range of AI/ML inference functions including keyword spotting and object recognition, have given developers the opportunity to shift the balance of inference processing more to the edge than before. And even in systems which require a large amount of data aggregation in the cloud, local inference reduces the amount of data transferred wirelessly to the cloud, saving power and reducing network usage costs. 

So local edge-based processing should and will in many cases co-exist with central, cloud-based processing.

Brilliant Labs’ Halo smart glasses use a hybrid local/cloud processing architecture enabled by the Balletto B1 wireless MCU

What this means is that a new dimension is being added to the developer’s decision making: the allocation of processing workloads between the cloud and the edge is becoming a much more important part of architectural design than before.

And all the time, the ground is shifting beneath developers’ feet: the capabilities of embedded AI/ML devices are getting ever closer to those of the cloud. For instance, the latest Ensemble E4, E6 and E8 systems-on-chip from Alif Semiconductor can handle transformer operations, enabling them to use language models and perform generative AI on-device.

The determination about where inference happens is not only dictated by concerns over battery cycle life: local on-device inference can also bring important privacy and latency benefits. By locating inference on an Ensemble MCU, for instance, the developer can guarantee that potentially sensitive raw data (such as personal health monitoring data from a smart watch) does not leave the device. Isolating the data from wireless networks and the cloud greatly reduces the opportunity for hackers to gain access to it.

Local inference also reduces the latency experienced by users, since they do not have to wait out the round-trip time for raw data to be sent to the cloud, processed, and the inference result sent back to the device.

So embedded system developers are entering a time in which system architecture becomes more complex, a situation driven by the growing capabilities of edge MCUs such as the Ensemble and Balletto families, and the new power-saving opportunities offered by local inference of language models and other transformer-based networks. Hybrid system architectures will become more prevalent – but over time, more and more AI data processing looks set to shift away from the cloud and towards the edge.

X

This field is for validation purposes and should be left unchanged.
(Required)