Search

Comparing MCUs for generative AI: it’s not just about the GOPS!

For manufacturers of endpoint devices – often small, battery-powered, and with much more limited compute capacity than a cloud system has – generative AI offers both promise and a problem.

The promise is in the extra value that generative AI can bring to applications, beyond what standard AI can do. At the endpoint, generative AI can enable machines which ‘remember’, put high-level instructions in context, handle long sequences of commands, and offer a new dimension of autonomous operation. Generative AI enables a machine to adapt and learn from the behavior and preferences of the user.

The problem is in the limited capacity of endpoint devices to run generative AI software. The workhorse at the endpoint is the microcontroller, the only type of device to offer the combination of feature integration, low-power operation, small size, low cost and broad ecosystem support which embedded developers need. But developers are already struggling to implement standard AI on most MCUs: the first-generation Ensemble and wireless Balletto made-for-AI MCUs are popular because they efficiently perform standard AI functions such as keyword detection and object recognition that leverage RNN and CNN AI networks.

The code base of the models which implement generative AI is far bigger than that of standard AI models. This means that the specification of the MCU which runs a generative AI algorithm needs to be substantially uprated beyond the capabilities of even the best of today’s AI MCUs.

So how will developers know whether an MCU is capable of supporting generative AI?

There is no single, straightforward answer to the question – although one hard-and-fast rule is that the MCU’s neural processing unit (NPU) must be able to accelerate transformer operations: the Arm® Ethos™-U85 NPU in the second-generation Ensemble MCUs has this capability.

Beyond this, it will be tempting to compare MCUs’ raw throughput in terms of GOPS or TOPS: where today’s best AI MCUs typically offer up to 250 GOPS, MCUs for generative AI will provide at least 2x more performance.

But in generative AI, raw throughput is a poor indicator of actual system performance: that’s because successful generative AI applications work by supporting transformer operators, shifting large amounts of data inside the system, between memory, the NPU, the CPU, and peripheral functions such as an image signal processor.

So a system with high raw throughput might in theory be able to process large amounts of data fast, but if the system cannot serve the data up to the NPU at a fast rate, real-world performance will be sluggish and disappointing. In other words, there is a difference between theoretical throughput, expressed by a headline GOPS or TOPS value, and actual inferencing performance, which can be measured for instance with standard benchmark tests.

This is why the design of the second-generation Ensemble MCUs places such an emphasis on its high-bandwidth system bus and large, fast, tightly-coupled memory provision.

With samples available in the second half of 2025, developers of endpoint devices can see for themselves what high actual performance looks like for generative AI applications running at the endpoint.

X

(Required)
This field is for validation purposes and should be left unchanged.