0.45 v and 18 μA/MHz MCU SOC with Advanced Adaptive Dynamic Voltage Control (ADVC)

Uzi Zangi 1,*, Neil Feldman 1, Tzach Hadas 1, Noga Dayag 1, Joseph Shor 2 and Alexander Fish 2

1 PLSense, Yokneam 2066724, Israel; neil.d.feldman@gmail.com (N.F.); tzach@plsense.com (T.H.); noga.dayag@plsense.com (N.D.)
2 Faculty of Engineering, Bar Ilan University, Ramat Gan 52900, Israel; Joseph.Shor@biu.ac.il (J.S.); alexander.fish@biu.ac.il (A.F.)
* Correspondence: uzi@plsense.com; Tel.: +972-52-639-0007

Received: 9 April 2018; Accepted: 2 May 2018; Published: 9 May 2018

Abstract: An ultra-low-power MicroController Unit System-on-Chip (MCU SOC) is described with integrated DC to DC power management and Adaptive Dynamic Voltage Control (ADVC) mechanism. The SOC, designed and fabricated in a 40 nm ULP standard CMOS technology, includes the complete Synopsys ARC EM5D core MCU, featuring a full set of DSP instructions and minimizing energy consumption at a wide range of frequencies: 312 K–80 MHz. A number of unique low voltage digital libraries, comprising of approximately 300 logic cells and sequential elements, were used for the MCU SOC design. On-die silicon sensors were utilized to continuously change the operating voltage to optimize power/performance for a given frequency and environmental conditions, and also to resolve yield and life time problems, while operating at low voltages. A First Fail (FFail) mechanism, which can be digitally and linearly controlled with up to 8 bits, detects the failing SOC voltage at a given frequency. The core operates between 0.45–1.1 V volts with a direct battery connection for an input voltage of 1.6–3.6 V. Measurement results show that the peak energy efficiency is 18μW/MHz. A comparison to state-of-the-art commercial SOCs is presented, showing a 3–5× improved current/DMIPS (Dhrystone Million Instructions per second) compared to the next best chip.

Keywords: subthreshold logic; low voltage MCU SOC; adaptive dynamic voltage control (ADVC), first fails circuit

1. Introduction

With advances in Internet of Thing (IoT) applications and the expansion of mobile devices, energy consumption has become a primary focus of attention in integrated circuits design [1–18]. While IoT applications cover a broad range of products from wearable devices, smart houses, automotive devices, smart meters to inspection tools and many others, more than 50% of the market is dominated by battery operated devices. This includes wearable sensors, hearing aids, smart phones, smart meters, etc. Unlike devices that operate with an unlimited power supply, these mobile battery-operated units need to operate for extended periods without recharging, therefore, requiring ultra-low energy consumption. The demand for low energy dissipation is usually associated with a significant drop in performance, which can be acceptable in some applications. However, most portable IoT devices must operate in a wide range of frequencies that are dynamically defined by the application. In many cases, the IoT chips have two modes of operation: the low performance mode, where the device is mostly in deep sleep mode and simply senses the target, and the high performance mode, where the data from the environment are collected, computed and transmitted. Several publications presented state-of-the-art IC solutions with integrated dynamic voltage and
frequency scaling (DVFS)\cite{19-3,21-1,23-1}, allowing adaptation of the voltage to the frequency requirements. The design of complex (and cost efficient) Systems on a Chip (SoC) that can support both minimum energy operation, and also reliably adapt their operating voltage to different environmental conditions is very challenging.

Unfortunately, conventional SoCs, using supply voltages in the range of 0.9–1.8 V, are not suitable for IoT applications. In these SoCs, the “on” transistors operate in “super-threshold” region, far above the switching threshold of a transistor. In this region, the “ion” current of a transistor is very strong, which results in a ratio of many orders of magnitude between the “ion” and the “loff” (parasitic leakage) currents. This results in a very fast and reliable, albeit power hungry operation.

Low voltage operation in the “sub-threshold” or “near-threshold” regions have been found to be advantageous in dramatically reducing energy dissipation\cite{16,18,19}. However, the power supply reduction is accompanied by a number of problems and significant challenges, especially in mass production designs\cite{22-1,23-1,24-1,25-1,26-1}. The low voltage associated with frequency reduction is not suitable for all modes of operation, and an adaptive voltage control mechanism is required. Lower supply voltages also mean lower noise margins, reduced yield, and increased vulnerability to process variations and temperature fluctuations\cite{27-1,28-1,29-1,30-1,31-1,32-1}. The characteristics of semiconductor behavior in sub/near-threshold are not well represented by standard transistor models and are different from those in the super-threshold region, resulting in different sizing and ratio optimizations.

The subthreshold current is exponentially dependent on the transistor’s threshold voltage. Hence, process variations that substantially affect the threshold voltages can cause considerable variance (up to a few orders of magnitude between slow/fast process corners to a typical corner) in the behavior of low voltage circuits. Another challenge is the effect of temperature variations on circuit behavior. During the super-threshold operation, a rise in temperature generally slows down circuits due to the loss of mobility. However, a rise in temperature causes a drop in the threshold voltage, which exponentially increases the subthreshold current. At a certain temperature, this increase overtakes the decline in mobility and sub/near-threshold circuits get faster with a significant increase in leakage current. In extreme cases, the combination of process and temperature variations can even cause certain circuits to malfunction. Therefore, specific measures need to be taken to deal with this drawback.

Sub-threshold and near-threshold circuits have received a lot of attention in the research community during the last decade\cite{6-1,8-1,10-1,12-1,13-1,27-1,28-1,33-1,35-1} with successful ultra-low voltage chip implementations: microprocessors\cite{6-1,8-1,10-1,12-1,13-1,27-1,28-1,33-1,35-1,36-1}, as well as dedicated ASICs for biomedical applications\cite{7-1,13-1,33-1,35-1}, for wireless sensor nodes\cite{10-1,15-1,17-1,37-1,39-1}, communication\cite{40-1,41-1}, image processing\cite{42-1,43-1}, RFID\cite{44-1} and many others. While previous research indicated significant power reduction and successful minimum energy operation, it did not present complete MCU SoC solutions that are suitable for mass production.

This paper presents a 40 nm MCU System-on-a-Chip (SoC), which includes the complete Synopsys ARC EM5D core MCU and features a full set of DSP instructions\cite{29-1}. The SoC demonstrates minimum energy per operation and reliably adapts its operation to different environmental conditions across a wide operating voltage range (0.45 V to 1.1 V). The proposed solution is different from a conventional DVFS approach, which only allows adaption of the voltage to the frequency requirements and doesn’t take into account of the environmental conditions. A holistic cross abstraction layer approach was utilized to design the proposed SoC. According to this approach, low voltage optimization has been done at all abstraction layers, starting from a dedicated digital library development, dedicated SRAM and special SRAM bit-cell, through analog and sensors optimization, continuing with unique power management and Adaptive Dynamic Voltage Control (ADVC) algorithm, and finishing with software, which allows real time dynamic interaction with all the abstraction levels. Measurement results show that the peak energy efficiency is 18 µW/MHz. A comparison to state-of-the-art commercial SoCs is presented, showing a 3–5x improved current/DMIPS (Dhrystone Million Instructions per second) compared to the next best chip. These unique properties of the presented SoC device make it suitable for many IoT applications, including
This paper is an extended version of [29], where we briefly presented the proposed MCU SOC for the first time. In this paper, we extend the content by adding detailed descriptions of the concepts of library characterization, first fail circuit operation, DC2DC and ADC circuits and providing schemes of the circuits, used in the PMU. The paper is organized as follows. In Section 2, we discuss the architecture, ADVC mechanism, FirstFail (FFail) and DC2DC circuits of proposed MCU SoC. Measurement results from the fabricated test chips are shown in Section 3. Section 4 concludes the paper.

2. The Proposed MCU SOC

Design of a reliable low voltage SoC, which can support minimum energy dissipation in a wide range of operation requirements, defined by the application is not a trivial task. This task becomes much more difficult when these requirements are defined dynamically by the users’ needs and different environmental conditions. Targeting mass production brings a completely different level of complexity and can only be achieved by using a holistic approach. As mentioned in Section 1, a cross abstraction layer holistic approach was utilized to design the proposed SoC. In this Section, we present a number of basic concepts, starting from a basic library characterization, through to the MCU SoC architecture description, and finishing with Adaptive Dynamic Voltage Control, First Fail mechanism and DC2DC circuits.

2.1. Low Voltage Libraries Characterization

Low voltage library characterization was one of the first and most important steps in the holistic approach. Utilization of a conventional digital library for low voltage operation is not efficient for many reasons, including high sensitivity to process and temperature variations, non-compatibility of conventional Flip-flops and latches for low voltage operation, lack of mechanisms that allow dynamic adaptation of the gates, and sequential elements to follow real time requirements and environmental conditions and many others. Figure 1 depicts an example of voltage transfer characteristics (VTC) of a conventional standard cell operating from a 450 mV supply voltage under global process variations. As observed in Figure 1, moving from FS to SF corners results in VTC shift of 120 mV. This shift will be even worse, if local (on die) variations are considered, causing a reduced noise margin and reduced reliability. To solve the aforementioned challenges, more than one low voltage libraries were characterized in the frame of this work. Each one of the libraries, comprising approximately 300 digital cells and sequential elements, presented different tradeoffs in terms Vt, size, energy dissipation, minimum supply voltage, and performance. The libraries were used in different modules according to the architectural requirements. The libraries present the following features to address the challenges: (a) mixed threshold (mixed Vt) transistors are utilized. Using transistors with different threshold voltages in different cells and libraries is a known methodology, however, in our case, transistors with different Vts are used inside the same digital cells and sequential elements. The use of mixed Vt transistors (HVT and SVT) in the same cell targets to overcome the relatively slow speed of p-channel transistors and to improve the noise margin (more symmetric VTC) and speed of the cell. To meet certain performance, one can either use a significantly large transistor or use a transistor with a lower Vth (i.e., standard Vth vs. high Vth). In conventional chips, utilization of mixed Vt transistors in a single cell can lead to increased sensitivity to process variations, especially under a low voltage operation. However, the unique Adaptive Dynamic Voltage Control (ADVC) approach, utilized in the proposed MCU SOC, allows reduction of this sensitivity by real time adaptation of the operating voltage, bulk biasing, frequency and other parameters of the systems, according to the process variations and environmental conditions. Mixing different V-th level transistors in the same cell require meeting additional design rules which in some cases require wider space. However, when comparing this to the option of enlarging the transistors, we found that the method of using mixed Vth, provides a better optimization of performance vs. area; (b) most of the cells allow selective bulk
biasing for the PMOS transistors; (c) high fan-ins are avoided; (d) flip flop architecture avoid any ratioed logic and (e) careful sizing to assure area-energy-performance-reliability tradeoff.

Figure 1b demonstrates the advantages of the characterized low voltage library over a conventional solution, showing 50% variance reduction. This also results in improved static noise margins, as shown in Figure 2.

**Figure 1.** Voltage transfer characteristics of (a) Regular (reference) standard cell and (b) The characterized standard cell with 50% variance reduction.

**Figure 2.** Comparing the proposed and conventional (reference) solutions to noise margins.

### 2.2. MCU SoC Architecture

The SoC includes the complete Synopsys ARC EM5D MCU core, featuring a full set of DSP instructions and minimizing energy consumption at a wide range of frequencies: 312 K–80 MHz. It also contains all external interfaces to support the SOC operation, including an “Always-On” function at the system level. The SoC integrates all the needed analog blocks which are required for a complete “Sub/Near-threshold” operation, including programmable DC to DC converters (DC2DC), LDO, ADC, level shifters and other needed components, as shown in Figure 3.

**Figure 3.** SOC Block Diagram including full MCU sub-system, clock generation unit and Power Management Unit (PMU).
The chip can be connected directly to a battery source and can accept input voltages from 1.6 V up to 3.6 V. The chip also supports an advanced Adaptive Dynamic Voltage Control (ADVC) approach of the internal core and bias voltages change to achieve the best power per performance for the desired speed and given environmental conditions. Furthermore, it improves the production yield and life time of the chip. The SoC chip incorporates a set of embedded memories and an extensive range of on-chip enhanced I/Os.

2.3. Adaptive Dynamic Voltage Control (ADVC)

A diagram describing the ADVC feature is shown in Figure 4. The Power Management Unit (PMU) contains switching and linear power supply generators, all of which are connected to the battery. A series of sensors continuously measure the relevant Silicon operating parameters of the IC and feeds this information into a look-up-table (LUT) which sets the voltage identification code (VID) to the PMU. The VID determines the VCC level of the core and a programmable first-fail circuit (Ffail), whose VCC characteristics track those of the core. The user sets the target operating speed, and a built-in SW algorithm controls the core and the bias voltages based on monitored sensors and the data in the LUT. This Adaptive Dynamic Voltage Control (ADVC) continuously optimizes the core voltage to optimize power consumption for any user-selected target frequency and different environment conditions, and also improve the chip yield and reliability. The core voltage can be changed between 0.45 V to 1.1 V in small steps of 10 mv. The sensors include process monitors, temperature sensor, aging sensor, threshold voltage (Vt) sensor, and other sensors.

The PMU contains power supply circuits required for energy efficient operation. There is two inductor based DC to DC converters (DC2DC) which supply separate VCC voltages to the Core at an efficiency above 85% even at low VCC voltages and low load currents. The supplies can operate from near-Vt levels (0.45 V) up to 1.1 V. Figure 5 shows a fixed voltage LDO which provides a 0.9–1.1 V supply to analog and always-on circuits, while the sensors are supplied by a separate dynamically programmable LDO. A current mirror output stage is used in the LDO in order to improve the power supply rejection. The current mirror links the gate of the PMOS driver to VCC, while enables a more constant current even when VCC has AC ripples. The accuracy of all of these circuits is enabled by an on-die Bandgap reference circuit (BGREF), also shown in Figure 5. The PMU has the option of controlling the body bias of the MCU N-wells for Vt variance compensation.

Figure 5 shows several examples of the on-die sensors used to measure the Silicon parameters. These sensors are distributed in several locations on the chip to compensate for on die variations and supply the necessary feedback to the software ADVC algorithm. Temperature is measured by monitoring the Base-Emitter (Vbe) voltage of a parasitic PNP transistor and comparing it to a reference voltage, similar to [45]. The accuracy of this thermal sensor is ±5 °C across the operating range, which is sufficient for this application.

Figure 4. The ADVC features an interaction between the PMU, LUT and the on-die sensors.
All of the analog parameters are sampled by Successive Approximation Analog-to-Digital Converters (SAR ADC). A more detailed diagram of the SAR ADC is shown in Figure 6 and it is similar to the architecture shown in [46]. It has a coarse and fine split capacitor array, which is divided by an attenuation capacitor to save cap area while maintaining high resolution. It is a pseudo-differential 12-bit SAR ADC that is configurable to be used with varying sample rates and 2 resolution options to allow for high resolution, high speed solutions which consume more power as well as low resolution, low speed solutions which consume minimal power. The accuracy of ADC can also be increased by oversampling and the decimation is done by the digital section inside the PLS15 to reach up to 16-bit accuracy. At the high-performance mode, it is capable of a 12 bit, 1 MSPs (1 Mega Sample per second) measurement consuming 410 uW and 0.35 pJ/step. Vbat is used as the reference voltage. An additional 6 bit SAR is available for the low power modes (12 uW @ 1 MSPs). The ADC is capable of a single shot or continuous sampling and has a built-in averaging of multiple samples.

A process monitoring circuit (PM), is utilized to obtain an accurate reading of the SoC’s frequency capability. The PM is comprised of a ring oscillator which uses a variety of standard cells. The first-fail (FFail) circuit is a programmable critical path monitor (CPM) [47] designed to show the lowest acceptable SoC core voltage. The FFail gives a fail signal at a voltage which is slightly higher than the failing SoC voltage at a given frequency. It is designed from elements representing critical paths of the chip and can be digitally and linearly controlled with up to 8 bits. The detailed description of the FFail circuit is presented in the next sub-section.

Figure 5. Simplified schematics of some of the circuits used in the PMU.
2.3.1. First Fail Circuit

FFail circuit is one of the most important and key components of the proposed system. It is used to determine the optimal operating voltage per device for a required frequency and environment conditions, such as the chip silicon corner or temperature. The FFail employs the exact programmable DLL which is used to represent the accurate delay of the critical path, which represents the worst-case timing path of the system. By using FFail circuit, it is possible to mimic the exact delay of the critical path and with this emulation, it is possible to get an accurate optimal operating voltage. The proposed FFail is unique since the measurement of the critical path delay is done in-line to the critical path logic, and an accurate representation of this path is done using a programmable DLL which is connected to the critical path logic. In the existing solutions, the measurements of the critical path delay are done either externally from the critical path logic or by not being represented using a programmable DLL.

A general architecture of the proposed FFail concept is shown in Figure 7. The structure of the in-line DLL is depicted in Figure 8. A programmable DLL is connected to the output and input of the combinational cells (represent the critical path) in a way which creates an oscillation loop, as seen in Figure 8. The output of the oscillation loop is driven to an external PAD to be sampled by an external frequency measurement tool. Critical path combinational cells can be controlled through existing Flip-Flops in the device, which drives a valid data into each Flip-Flop in order to control the logic of the critical path to ensure that the logic of the selected critical path enables the signal propagation.
Figure 8. The scheme of the in-line DLL. The yellow elements represent modules that were added to the original design.

To ensure the desired operation, three multiplexers are added within the oscillation path. The first multiplexer is used to bypass the programmable DLL when required. The goal of the second multiplexer is to close the oscillation loop. The third multiplexer is used to add an additional inversion stage if needed to ensure oscillation. By using these three multiplexers with different settings, it is possible to create an oscillation loop through the critical path logic or the critical path logic and the programmable DLL.

The DLL is programmed to represent the delay of the critical path delay in the following way. At the first stage, the critical path is configured to assure the data flows through that critical path. The second stage of this method is to set the multiplexer controls so that only the plurality of the critical path combinational cells are closed in an oscillation loop, and the frequency is measured off-chip. The third stage of this method is to set the multiplexer controls so that the programmable DLL is also connected within the oscillation loop, while all DLL control bits are zero and the frequency is again measured off-chip. The fourth stage of this method is to modify the programmable DLL code in a binary search until the output frequency equals half the frequency, which was measured in the second stage (via the combination cells only). The fifth stage of this method is to record the DLL code within a memory element within the device. This code represents the delay of the critical path and will be further used to program the FFail circuit.

The structure of the FFail circuit is shown in Figure 9. The circuit is constructed from the same programmable DLL, which is used in-line with the critical path. The same DLL control code, which was generated using the in-line DLL and the critical path oscillation loop is also used for the FFail programmable DLL. Two Flip-Flops are used within the FFail Circuit. The first Flip-flop drives the input to the programmable DLL and has an input which is always connected to “1”, and the second Flip-Flop which is connected to the output of the programmable DLL. Both Flip-Flops can be reset before the test is performed. Subsequently, exactly two clock cycles are given to both Flip-Flops, after these two clock cycles, the output of the second Flip-Flop is sampled by the CPU. If the result equals ‘1’ then the DLL delay is too short, and if the result equals ‘0’, the DLL delay is too long.

In order to determine the optimal voltage, the operating voltage of the FFail logic, which is controlled by a separate voltage source from the rest of the device, is set to the minimal voltage. For a given frequency, which is given to the clock of the FFail Flip-Flops, a test can be performed to check if the DLL delay is too short or too long at a given voltage. By using this method, the voltage can be increased until a pass result is obtained by the FFail circuit. A pass condition indicates that the FFail
delay is exactly within the given clock period. Once this condition is met, the test can be stopped, and the last passing voltage can be used by the rest of the device as the operating voltage. As the FFail circuit has an equal path delay as the worst case critical path within the device, we ensure that the device would function correctly given the same voltage of operation, which is also used by the FFail Circuit.

A more detailed drawing of the DLL circuit is shown in Figure 10. The dll_code_in [8:0] bus determines how many delay elements will be enabled. These are binarily weighted for a total of 511 different delay combinations, which are used to tune the FFail function.

2.3.2. DC2DC

The DC2DC, implemented in this work is optimized for low power applications and maintains high efficiency, even at low load current and low output voltages. In general, switched DC-to-DC conversion can be accomplished by three modes of operation: Pulse-Width Modulation (PWM), Pulse-Frequency Modulation (PFM) and Pulse-Skipping Modulation (PSM—also known as discontinuous mode) (Figure 11). Among the above-mentioned, Pulse Width Modulation (PWM) mode is most appropriate for high power modes, where the switching and conduction losses are relatively small and maximum load current is required. The Pulse-Frequency Modulation (PFM) mode is used when the transmitted power is comparable to the switching losses. Adjusting the frequency limits these switching losses, such that this mode is useful at medium loads. At very low loads, the PSM mode “skips” many of the pulses and generates a pulse only when the voltage drops below a pre-determined threshold, which increases the light-load efficiency. Consequently, the PSM mode is appropriate for low-current loads and was adopted in this design.
Figure 12 shows the schematic implementation of the DC2DC. The configuration of the DC2DC is input through the decoder, a block which operates at 1.1 V. The output voltage, VDD, is determined by the attenuator, which is a resistive voltage divider for the sensed output signal. The divided voltage is compared with the reference voltage, Vref, by the comparator. The comparator features a 4 mV-wide hysteresis loop. The soft start block ensures smooth charging of the output capacitor at power-up. If the signal EN = 0 then the signal power_good = 0. The soft start circuit blocks PMOS Driver and puts the fixed voltage to the gate of the PMOS driver, such that it acts as a current source. When the voltage VDD reaches a predetermined level, the high-level output of the comparator will disable the soft start circuit. The power_good signal is enabled, which allows the normal function of the PMOS driver. The turn-on time of the PMOS driver during normal operation is determined by the On-Time Oscillator according to the following expression:

$$ \text{Ton} = L \cdot I_p / (V_{BAT} - VDD) $$

(1)

where L is inductance, Ip is peak current (15 mA), VBAT is battery voltage and VDD is output voltage.

The off-time of the PMOS is determined by the off-time oscillator according to:

$$ \text{Toff} = L \cdot I_p / VDD $$

(2)

A zero-cross detector senses when the NMOS current changes direction, at which point the NMOS driver is disabled. Additionally, a second-order L-C filter is connected at the output of the DC2DC to smoothen its output voltage.

Figure 13 shows typically measured efficiency for an input voltage of 3.0 V and an output of 0.8 V. The efficiency is high across 3 orders of magnitude. The DC2DC is capable of driving currents from 1 μA–10 mA for output voltages of 0.4–1.1 V and input voltages of 1–3.8 V.

Figure 11. The three modes of the DC2DC operation.
Figure 13. Measured efficiency vs. load current for \( \text{Vin} = 3 \, \text{V} \) and \( \text{Vout} = 0.8 \, \text{V} \).

3. Measurements Results

A test chip, dubbed PLS10, consisting of the proposed MPU SoC has been designed and fabricated in TSMC’s 40 nm ULP process (die photo is shown in Figure 14a). The measurement setup is shown in Figure 14b. It includes conventional tools, such dedicated testing board, current meter and heating system. Many units have been characterized at different temperatures, process corners, and voltages.

![Image](a)
![Image](b)

Figure 14. (a) Die Photo, (b) Measurements setup.

The measured functionality of the FFail circuit is shown in Figure 15. On the left graph of Figure 15, the Core and PM circuit frequencies are plotted against the supply VCC voltage. The PM circuit is designed such that its frequency is similar to the Core’s. The FFail code (at 40 MHz) is shown on the right axis of the figure. It is shown that the FFail code tracks the Core frequency very accurately, proving its usefulness as a canary-bird circuit to determine the minimum operating voltage. On the right graph of Figure 15, the FFail fail code is plotted against VCC for 5 frequencies. At lower VCC levels, the FFail code shows a nearly linear response to VCC level up to code saturation. Figure 15 indicates good MCU and Ffail functionality down to \( \text{VCC} = 0.45 \, \text{V} \) at 2 MHz.

The power and performance of the MCU are shown in Figures 16 and 17. The graph shows a power range of 135–6800 \( \mu \text{W} \) at frequencies of 2–80 MHz for VCC ranging from 0.45–0.85 V, which includes current for always on. As expected, the leakage power is a strong function of temperature, as shown in Figure 16, but is reduced dramatically with the VCC reduction.
Figure 15. (a) measured correlation between the FFail circuit code (at 40 MHz) and the MCU and PM frequencies at 25°C; (b) Measured FFail code vs. voltage for frequencies between 2–40 MHz.

Figure 16. Leakage power of the SOC vs. VCC for different temperatures.

Figure 17. Power and performance of the SOC vs. Core VCC at room temperature.

A comparison to state-of-the-art commercial SOCs is shown in Table 1. In order to make an “apples-to-apples” comparison, we listed only Cortex M4F MCUs. Our IC (PLS10) has a 3× improved current/DMIPS (Dhrystone Million Instructions per second) compared to the next best chip. We also showed a 2× improvement in current/MHz, which is a measure of energy efficiency. The table was measured and compared at a nominal 20 MHz but at lower frequencies < 2 MHz and VCC = 0.45–0.5 V, the power difference being 3–4×. We also listed the next generation of our SOC (PLS15). PLS15 is an improved version of the PLS10, which also includes an improved Ffail circuits (the concept was
presented in Section 2.3.1), embedded flash memory, 12 bit ADC, LCD controller, real time clock (RTC) operating in the deep subthreshold regime and consuming less than 100 nA.

| Table 1. Power/performance metrics compared to other commercial Cortex Cores (Running Dhrystone 2.1 at 20 MHz). The performance numbers were taken from commercial data-sheets. |
|---|---|---|---|
| Chip | MCU | DMIPS/MHz | Current/DMIPS | Optimal Current per MHz on VBAT at 3 V [μJ] |
| TI MSP432P401x [48] | 32 bit Cortex M4F | 1.2 | 76.4 | 91.6 uA/MHz |
| STM43L43xxx [49] | 32 bit Cortex M4F | 1.2 | 79.2 | 95 uA/MHz |
| ATMEL SAM4L [50] | 32bit Cortex M4F | 1.2 | 147.5 | 177 uA/MHz |
| Ambiq Apollo [51] | 32 bit Cortex M4F | 1.2 | 29.2 | 35 uA/MHz |
| This Work (PLS10) | 32 bit EM5D (equivalent to Cortex M4F) | 1.8 | 10.33 | 18.6 uA/MHz |
| This Work (PLS15) | 32 bit EM5D | 1.8 | 5.55 | 10 uA/MHz |

4. Conclusions

In this paper, an ultra-low power MCU SoC was presented. The SoC includes the complete Synopsys ARC EM5D core MCU, featuring a full set of DSP instructions and minimizing energy consumption at a wide range of frequencies: 312 K–80 MHz. Detailed descriptions of the design approach, including library characterization, SoC architecture, ADVC mechanism, Fail and DC2DC circuits were shown. The proposed core allows for operation between 0.45–1.1 V volts, with a direct battery connection for an input voltage of 1.6–3.6 V. Measurement resulting from a PLS10 test chip, designed and fabricated in 40 nm ULP technology, showed that the peak energy efficiency is 18 μW/MHz. A comparison to state-of-the-art commercial SoCs was presented, showing a 3–5× improved current/DMIPS (Dhrystone Million Instructions per second) in comparison to the next best chip. It was shown that the next generation of the SoC is expected to improve these numbers by almost 2×.

Author Contributions: All authors contributed equally to this work.

Funding: We would like to acknowledge Taiwan Semiconductor Manufacturing Company, Ltd. for technical assistance and test chip manufacturing support.

Conflicts of Interest: The authors declare no conflict of interest.

References


© 2018 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).