The Alif Ensemble family marries a dedicated neural processing unit with a Cortex-M CPU to enable high AI performance in a familiar MCU environment with AI Acceleration.
Introduction
Like a living organism in nature, the 32-bit microcontroller has evolved over a long time to adapt to the application conditions to which it is exposed. The result is a device that performs well in traditional embedded control applications, while staying on the right side of the typical MCU restrictions on silicon area and power consumption.
The Problem With Traditional CPU-Based Solutions for AI Acceleration
Plunged into the completely new application environment presented by artificial intelligence, however, the conventional CPU-based architecture encounters considerable difficulties. Many embedded system developers have experienced this when trying to implement any inferencing operations beyond the simplest AI applications, such as vibration monitoring or basic keyword detection. Tasked with implementing more complex models for functions such as natural language processing or video analysis, the traditional MCU is typically forced to drive its CPU at maximum frequency for extended periods. This has the effect of denying processor time to other parts of the application and increasing system-wide latency, while sucking up power and draining charge from the battery in portable or wireless applications.
And even so, performance in anything other than basic AI applications often turns out to be disappointing.
The reason that the CPU-based MCU struggles with machine learning is that its CPU architecture is well adapted to the sequential environment of deterministic embedded control instructions – but poorly adapted to the parallel world of machine learning operations.
This does not mean that advanced machine learning applications cannot be performed on an MCU, but they do call for a different type of MCU – one with a hardware accelerator that is configured for the requirements of machine learning.
Alif Ensemble Uses Arm Cortex-M55 and Ethos-U55 For Hardware AI Acceleration
This combination of a conventional MCU with AI hardware acceleration is now available for use in embedded design prototypes: the Ensemble family MCUs combine a CPU – the Arm® Cortex®-M55 – with an AI accelerator: the Arm Ethos™-U55, a micro neural processing unit (microNPU).
The low-power Ethos-U55 microNPU provides native support for machine learning workloads based on convolutional (CNN) or recurrent neural network (RNN) models. Configured for seamless interaction with the Cortex-M55 embedded core, the Ethos microNPU contains a configurable multiply-accumulate (MAC) engine that can handle 128-bit and 256-bit MACs. Also supporting 8-bit (int8) and 16-bit (int16) quantization, the microNPU enables models to be shrunk by up to 75%.
Working with the Cortex-M55 core, the Ethos-U55 performs inferencing operations some 480 times faster than the Cortex-M55 can manage working alone.
The provision of dedicated AI acceleration hardware solves the problems that conventional CPU-only MCUs suffer from. With machine learning operations allocated to the low-power microNPU, the CPU can provide fast, deterministic responses to the control part of the application, reaching maximum frequency only for brief bursts when the workload peaks, but operating in low-power mode for most of the time. Machine learning performance is typically better by more than an order of magnitude, while power consumption for AI workloads such as image classification can be more than 50 times less.
Conclusion
Now that AI and machine learning are becoming important elements of most new embedded system designs, developers have to face the shortcomings of the conventional MCU. A hybrid architecture combining a CPU with dedicated AI acceleration hardware is the answer.