By Mikko Saarnivala, Alif Semiconductor
It is widely assumed that running local AI inference on-device in an edge microcontroller uses less power than transferring the raw data to the cloud, and then performing the inference on a cloud server. Intuitively, it seems obvious: an edge device such as an Ensemble or Balletto MCU has a much smaller power footprint than even a purpose-built AI inference server in the cloud. And when inference is performed locally, the system saves the power which would be used by an RF circuit to transmit data to the cloud.
But just because a common assumption feels intuitive does not mean that it is true. As engineers, we prefer a data-driven decision to one based on a hunch. So what does the evidence tell us about the relative power usage of performing the AI/ML inference at the edge versus in the cloud?

Fortunately, we do not need to set up our own experiment in the lab to find the answer to this general question: the work has already been done for us by a team at the University of Münster in Germany. In a pre-print paper [1] published in October 2025, the researchers examine whether a typical wireless IoT system performing remote environmental sensing can save a significant amount of power by processing machine learning data locally on an ESP32-S3 system-on-chip (SoC).
They compare the energy consumption in different operating scenarios:
- Transmitting raw 224x224px images for inference in the cloud
- Transmitting pre-processed 32x32px images to the cloud for final processing and inference
- Performing inference on the device and transmitting all the results to the cloud for use in a cloud-based application. The inference result was contained in a 1-byte transmission.
- Performing inference on the device and transmitting to the cloud only the positive results (10% of the total), again in 1-byte signals.
The research considers the impact of different radio technologies and application protocols for transmitting the data wirelessly, including LoRA, LTE-M and NB-IoT. The paper explains and quantifies the different components of the total power consumed, including image acquisition, image pre-processing, inference, establishing a connection, and data transmission.
It proves that in scenarios involving wireless data transmission, local on-device AI/ML inference uses substantially less power. The report’s authors conclude:
‘Performing on-device inference with embedded CNNs on microcontrollers…can significantly reduce energy consumption in IoT-based environmental monitoring applications. By reducing the data of interest from a full 224×224 pixel image to just a single 8-bit class indicator through on-board inference, we achieve an energy savings factor of up to five regarding transmission energy at the sensing node.’ [Alif’s emphasis]
Even taking into account the power consumption of the ESP32-S3 when performing the computationally intensive inference operations, total power consumption is much lower because the RF system is active much more briefly to transmit the 1-byte inference result than to transmit raw image data.
An SoC which saves even more power
How much lower total power consumption could be if the ESP32-S3 used in the research project were replaced by an Ensemble MCU!
The ESP32-S3 features dual Xtensa® LX7 general-purpose microprocessor cores. The Ensemble MCUs, on the other hand, have dedicated AI/ML processors – the Arm® Ethos™-U55 or Ethos-U85 neural processing units (NPUs) – alongside their Arm Cortex®-M55 controller or Cortex-A32 microprocessor cores.
This Ensemble architecture offers strong power-efficiency advantages for manufacturers of battery-powered and wireless devices. In part, this is because the NPU performs AI/ML operations very fast, enabling the device to spend more time in low-power sleep states between inference events. This is borne out in testing: the Münster University paper says that an inference using the Mobile Netv2 ML model on the ESP32-S3 consumes 21.73µAh of energy: the same operation performed on an Ensemble E3 MCU consumes just 0.072µAh – orders of magnitude less power.
This suggests that the power advantage for edge AI processing of common types of AI models such as convolutional neural networks will be substantially better when using an Ensemble MCU or Balletto wireless MCU even than the Münster University research has found.
So now engineers can rest easy. The intuitive assumption is correct: in many wirelessly-connected applications for edge AI/ML, local AI inference beats cloud inference on the crucial parameter of power consumption.
[1] ‘Send Less, Save More: Energy-Efficiency Benchmark of Embedded CNN Inference vs. Data Transmission in IoT’, by Benjamin Karic, Nina Herrmann, Jan Stenkamp, Fabian Gieseke, Angela Schwering (all of Münster University), and Paula Scharf, of re:edu GmbH