AI hardware Acceleration: Breaking Beyond Traditional 32-bit MCUs

AI Hardware Acceleration Goes Beyond the Traditional 32-bit MCU Resource Constraints

AI Hardware Acceleration Goes Beyond the Traditional 32-bit Resource Constraints

The 32-bit MCU has until now been restricted to basic applications for AI such as simple keyword detection. Now Alif the Ensemble architecture promises to bring vision and advanced voice use cases within the scope of the MCU.


AI at the edge promises to have a transformative effect on the value and usefulness of microcontroller-based applications. But much of the potential has been stifled by the resource constraints imposed by MCUs. The limited horsepower offered by the 32-bit CPU, combined with the requirement to restrict both power consumption and bill-of-materials cost, hems design engineers inside guardrails that tightly restrict what they can accomplish with AI on an MCU.

How Far Can tinyML Go For Artificial Intelligence Applications

In fact, some have assumed that if an AI application cannot be performed by a TinyML neural network, then it is not compatible with an MCU. This is not so: dedicated AI hardware acceleration provides a way to open up a much wider array of AI applications to users of 32-bit MCUs.

This is not to say that TinyML should be overlooked – it is a good fit particularly for conventional MCUs, since it is intended to perform ‘on-device sensor data analytics at extremely low power, typically in the mW range and below, and hence [to enable] a variety of always-on use-cases, and…battery operated devices’ (source:

But in practice, the range of applications that can be implemented on a conventional MCU with TinyML is narrow. And more complex deep learning models need to be simplified or compressed for deployment on a typical MCU. For real-time inferencing, the MCU’s computational capacity is so tightly rationed that designers find they have to sacrifice accuracy to stay within their latency budget.

When considering which applications to deploy on MCU hardware, the industry has concentrated on the ‘three Vs’: vibration, voice and video.

Fast, low-power vibration monitoring using neural networks is comfortably within the capabilities of today’s 32-bit MCUs. Voice processing beyond basic keyword detection is a stretch. And real-time video analytics is beyond the reach of conventional MCUs.

The Capabilities of an MCU with Tightly Integrated AI Hardware Acceleration

The answer to the limited AI capability of an MCU’s CPU is to free the CPU to perform the system management and control functions for which it is optimized, and to transfer AI operations to a dedicated accelerator that is optimized for machine learning tasks.

This is what the Ensemble family of MCUs offers with its new architecture which consists of one or more Arm® Cortex®-M55 CPUs tightly coupled with one or more Arm Ethos™-U55 neural processing units (NPUs). Machine learning performance on an Ensemble MCU is typically as much as 100x faster, while power consumption for AI workloads such as image classification can be more than 50 times less than a conventional MCU’s.

The transformation is reminiscent of the switch in agriculture from the horse-drawn plough to the tractor: now, edge AI can be put to a far greater range of uses, from smart surveillance systems that need real-time object detection and tracking, to autonomous drones performing obstacle avoidance and navigation, to voice assistants and intelligent health monitoring devices.


The difference is stark: for any application involving edge AI or machine learning, the scope for adding value to the application is far greater when using an MCU with integrated NPU than a conventional 32-bit MCU relying on its CPU alone.


This field is for validation purposes and should be left unchanged.