Search

Understanding the Ensemble Difference: Intelligent partitioning extends battery run-time in edge AI devices

Traditionally, microcontrollers’ architecture has been built for old-style system control workloads. The model for the MCU’s operation was the execution of a sequence of tasks: the MCU would fire up its CPU to process the instructions required to fulfil the tasks as fast as possible, and then retire the CPU to a low-power quiescent state until the next set of tasks presented itself.

The design engineer would specify their CPU to fit the scale of the tasks that it would be required to implement: perhaps an Arm® Cortex®-M0+ CPU if the workload was simple; a Cortex-M7 for demanding, graphics- and connectivity-rich applications.

Traditional MCU manufacturers still design their products for this all-or-nothing mode of operation – but this architecture is a bad fit for the new workloads that edge AI devices perform: the typical edge AI system continually monitors its environment in a kind of low-intensity scanning mode, looking for exceptions to the normal pattern, or for events that fit a known template. When an exception or a recognized event occurs, the system goes into a high-intensity mode to perform a fast, accurate inference and make the appropriate response to it.

Let’s illustrate this type of AI operation with an example. AI is ideal for preventive maintenance of industrial equipment. In a machine monitoring system, the AI in low-intensity mode might ‘listen’ for a change in the sound caused by the machine’s vibration, which could give an early warning of a potential fault. When the system detects an exceptional vibration, it could fire up into high-intensity mode to analyze in detail the sonic and kinetic signature of the unusual vibration, to make an inference about its cause and severity, and then to trigger the appropriate action, such as shutting down the machine and summoning a repair technician.

This example shows why the MCU architecture that traditional MCU manufacturers continue to use is a bad fit for edge AI applications. In a traditional MCU – even those that have added a neural processing unit (NPU) alongside the CPU – the processing system has to be sized for the highest-intensity workload: this makes it over-powered for the low-intensity monitoring mode. And in most edge AI applications, the system will spend most of its time in monitoring mode!

As a result, the traditional MCU burns too much power, limiting the ability of embedded device manufacturers to use AI to build the kind of products that they want to build: smarter, more responsive to the user and their environment, and powered by a small battery that gives long run-time between charges.

Alif’s two-part processor architecture fits the typical AI workload

In Ensemble multi-core MCUs and fusion processors from Alif Semiconductor – products which were built from the ground up for edge AI applications – a new architecture reflects the typical AI workload. And this helps to explain why Ensemble products offer such an attractive combination of high edge AI performance and very low power consumption, suitable even for products such as wearable medical devices that have a tiny battery.

Alif has partitioned the system so that a low-power portion of the chip can be always-on, but still offer robust compute capability, enabling it to selectively wake a much higher performance portion of the chip to execute heavy workloads then return to sleep.

To facilitate this division of functions, many Ensemble MCUs have two pairs of Cortex-M55 CPU + Arm Ethos™-U55 neural processing unit (NPU) cores:

  • One pair is in the High-Efficiency region of the chip built on low-leakage transistors that can be always-on operating at up to 160MHz
  • The other pair is in the High-Performance region operating much faster, at up to 400MHz


To show the benefits of this partitioned architecture in a different example, imagine a smart occupancy camera which continuously scans a room at a low frame rate using the High-Efficiency pair of cores to classify a valid event (such as a human falling to the floor, or a specific gesture). When this High-Efficiency pair detects a valid event, it wakes the High-Performance pair to identify a person or persons, check for blocked exits, dial for help, and so on.

In this case the camera can be intelligently vigilant, produce fewer false positives, and extend battery life. Similar uses for these two pairs of CPU+NPU cores can be applied just as well to the classification of sounds, voices, words, text, vibrations, and sensor data in a wide variety of applications.

In fact, whatever type of edge AI system you are building, you will benefit from power savings when you take advantage of the Ensemble family’s intelligent partitioning.

To find out how efficiently your application could run on an Ensemble MCU or fusion processor, try it for yourself with the Ensemble E7 Dev Kit.

X

(Required)
This field is for validation purposes and should be left unchanged.