Continuously Energy Consumption Measure Approach Using a DMA Double-Buﬀering Technique

Measuring the consumption of electronic devices is a diﬃcult and sensitive task. Data AcQuisition (DAQ) systems are often used to determine such consumption. In theory, measuring energy consumption is straightforward, just by acquiring current and voltage signals we can determine the consumption. However, a number of issues arise when a ﬁne analysis is required. The main problem is that sampling frequencies have to be high enough to detect variations in the assessed signals over time. In that regard, some popular DAQ systems are based on RISC ARM processors for microcontrollers combined with Analog-Digital Converters (ADC) to meet the frequency acquisition requirements. The eﬃcient use of the Direct Memory Access (DMA) modules combined with pipelined processing in the microcontroller allows to improve the sample rate overcoming the processing time and the internal communication protocol limitations. This paper presents a novel approach for high frequency energy measurement composed of a DMA rate improvement (data acquisition logic), a data processing logic and a low-cost hardware. The contribution of the paper is the combination of a double buﬀered signal acquisition mechanism and an algorithm that computes the device’s energy consumption using parallel data processing. The combination of these elements enables a high-frequency (continuous) energy consumption measurement of an electronic device, improving the accuracy and reducing the cost of existing systems. We have validated our approach by measuring the energy consumed by basic circuits and Wireless Sensors Networks (WSNs) motes. The results indicate that the energy measurement error is less than 5%, and that the proposed method is suitable to measure WSN motes even during sleep cycles, enabling a better characterization of their consumption proﬁle.


Introduction
Energy consumption is a critical aspect in the development of electronic circuits.One of the priority tasks when designing electronic devices is to know the consumption in order to select the components that supply the energy and to define the compatibility with other devices.The energy consumption is an important characteristic in data sheets and a source of information used to compare devices.
Measuring the energy consumption is not an easy task for various reasons.First, the energy consumption signals often do not have a defined frequency; they are very random and often they vary very with high frequency, requiring costly measurement devices.Second, these signals contain noise that needs to be filtered.Third, the signals usually have low power, so errors are often introduced when using commercial devices to measure them, e.g., burden voltage.
Notwithstanding, the process for measuring these signals are not complicated when the right tools are used, such as Digital Signal Processors (DPSs).However, these tools are not always available due to their big size and high cost.Therefore, other techniques need to be used.
When sizes are minimized and costs are reduced, we can measure the energy consumption incorporating small Data AcQuisition (DAQ) systems instead of using DSPs.DAQ systems gather the energy consumption signal, operate the sampled data and deliver the result to the user using data acquisition boards and programmable measuring instruments.Also, DAQ systems are embedded in the consumer electronics that we use every day, providing us of their consumption in real time, e.g., mobile phones, tablets, smart watches, etc.
By getting deeper into the DAQ systems, we can see in literature [10] that there are two processes required to know the value of a physical phenomenon: 1.The data acquisition process.This process obtains discrete samples periodically since the original signal is analogical and it usually needs discrete data in a selected range.2. The data handling process.This process uses the discrete data to obtain values or characteristics of the original signal.
Besides, depending on whether DAQ systems acquire data once or cyclically, they can be divided into the following types: -"One-shot" DAQ system.They only analyze one set of samples on a time when the user requests it.These systems are useful if the sample to acquire is fixed such as the samples from biochemical labs [37].-"Continuously" DAQ system.They take some samples and later analyze them.After that, a new samples set is obtained and processed in an ongoing basis without requiring user interaction.These systems are useful if the signal to characterize vary over time such as acquiring biometeorological variables [6].
These two DAQ systems show different types of behavior.The latency problems of the "one-shot" type have no influence in the measures.Instead, the latency problems of the "continuously" type influence in the measures.The "continuously" type must pay particular attention to overlapping times in order to obtain real time measures.This is because the "continuously" method analyzes the acquired samples before taking other samples.When the samples are being analyzed, the data acquisition process stops and the signal value is missed until the data handling process finishes.This creates uncertainty due to the latency of the data handling process, and a discontinuity in the measured signal.
In order to solve this problem, as pointed out in the literature [4] [19], the data acquisition process can be deployed in parallel with the data handling process.Then, the data gathered is analyzed in sets meanwhile new data is gathered in parallel, i.e., the acquisition process is continuous.Using parallelism there are not latencies due to data processing of previous samples sets.
Using the advantages of the parallel deployment of the two processes, the approach shown in this paper measures the energy consumption of a device acquiring the current input signal of the device continuously.This approach was developed to solve the problems found during the testing of a WSN algorithm [35].During the tests, we needed to measure the energy consumption of a ZigBee mote that uses an algorithm for reducing the energy consumption.To characterize such consumption we need a high temporal resolution of the samples (as the mote activation period is very short), and therefore we could not measure the energy consumption with standard laboratory tools.We would have required an expensive DSP and lot of memory to solve the problem.Instead, we deployed the approach presented in this paper, which allowed us to measure and characterize the energy consumption of our motes.
Our approach uses the "continuously" measure method with a doublebuffering technique to enable the parallel data processing of the signal.The novelty of the approach is the combination of a double-buffering DAQ system with a new data processing deployed in low-cost hardware.The results presented in this paper show how to obtain a high sample rate DAQ system using a pipeline processing composed of an ADC, two Direct Memory Access (DMA) channels and a state machine with two timers, minimizing the data loss (maximizing the sample rate) when the data acquisition process discretizes the current input signal.
Meanwhile the modern ADCs have high sample rates (up to giga samples per second), they are frequently used in microcontrollers with limitations.The nominal ADC sample rate is limited by the processing time and the communication time due to memory access.In consequence, the effective ADC sample rate is less than the nominal ADC one and some problems will almost certainly arise from this limitation such as data lossing.In order to overcome this problem we can use two elements available in most microcontrollers that can improve the data transfer times and allow the parallel samples processing: -The cache memory.It is a Random Access Memory (RAM) faster than a regular RAM.In cache memory systems, the Central Processing Unit (CPU) looks first in the cache memory; if it finds the data there, it does not have to do a more time-consuming reading from a slower memory or other data storage, since cache memory stores copies of the data frequently used.Using this memory, the microcontrollers can access more quickly to some data that will be used several times.The microcontrollers use data caches to transfer data between slower and faster memories improving the access time.
-The Direct Memory Access (DMA).It is an electronic device that allows accessing the memory independently of the CPU.With the DMA, the CPU is not interrupted when there is a memory transfer managed by the DMA.
A DMA transfer typically consists of copying a block of memory from one device to another, notifying the CPU when the transfer is completed.
The two previous elements allow us to improve memory access times and implement parallel processing.In the hardware we will use there is not internal cache memory, but it has a DMA unit.So, our approach uses the DMA to enable data transfer and the processing in parallel like Lewis [23] suggests.Attention should be drawn to the fact that our approach implements a doublebuffer, allowing a predictable access time using the DMA, meanwhile the CPU can process the data without interruption.In this sense, the background of this paper is the development of high performance routines for the data acquisition process and the data handling process running in parallel.
The contribution of this paper is an approach to calculate the energy consumption of a device supplied from a fixed DC voltage source, on a "continuously" way and with the lower data loss using a DAQ low-cost hardware.Specifically, our approach is composed of three parts: (i) a low-cost DAQ hardware, (ii) a double buffer data acquisition logic and (iii) a data processing algorithm.In this regard, the paper presents a novel algorithm and two distributed mathematical calculation blocks.On the one hand, the algorithm implements the data acquisition process and the data processing in parallel.On the other hand, the mathematical calculation blocks minimize the operations of the data processing in order to reduce the processing time.
This approach allows using less powerful microcontrollers and/or reducing the required clock frequency when analyzing energy consumption and, therefore, lowering costs.The results show that the error to measure energy consumption in our approach is less than 5% regarding to the theoretical measure.Also, they probe that the on-chip ADC sampling rate limitations can be overcome, allowing the energy consumption measures of WSN motes during sleep cycles.
This article is organized as follows.Section 2 includes the setting for the energy measurement.Next section introduces the state of the art of DAQ systems for energy consumption measuring.Then, section4 defines the problem of measuring energy for WSN motes.Section 5 provides an overview of the proposed energy measure approach.Following, section 6 presents the hardware components used in our approach.Section 7 illustrates the process to acquire samples improving the sample rate through a double-buffered mechanism by means of the algorithm introduced in Subsection 7.2.Then, section 8 shows how to analyze the samples in order to obtain the energy consumption.Section 9 states the limitations and temporal properties of the proposed approach.Section 10 presents the experiments performed to validate the proposed approach.Finally, a discussion, the conclusions and the future work are outlined.

Method
In our proposal for measuring the energy for the WSN motes, we need the following elements: 1. Hardware components.The hardware components are required to support signal sampling and data processing.They contain the necessary elements to convert the input current into a voltage, to filter the signal, to discretize it, to store the samples, and to implement mathematical processes as well as data interchange protocols.2. Data acquisition logic.The data acquisition logic is a software process that takes samples without stops.It uses DMA channels, timers and ADC routines to avoid interrupting the CPU, saving processing time and internal memory communication time.3. Data processing.The data processing is a mathematical method that analyzes the latest samples taken by the data acquisition logic and calculates the energy consumption.It stops when the calculations are finished until new calculations are required.To perform the calculations it uses numerous equations in order to reduce computational cost, minimizing the microcontroller power and the processing time.

State of the Art
In the state of the art we can appreciate different hardware solutions coupled with data acquisition and data handling processes for measuring the energy consumption of an electronic device.In this section we will review the hardware required to acquire signals and the mentioned processes.

Hardware Solutions
Some hardware solutions are needed in DAQ systems to take samples periodically.The hardware used for this purpose can be divided in transducers, ADCs and microcontrollers, in conjunction with amplifiers and filters.Using these elements, in the literature we can find different hardware solutions for energy measurement: -By charging and discharging capacitors.The capacitors are set between the power supply and the device that we want to measure.Then, the energy consumption can be obtained through the charge and discharge cycle of these capacitors as Andersen et al. [1] suggested.We do not follow this method because it is an empirical solution that depends on the current leakages of the capacitors.-By a programmable ammeter.Other possibility is to periodically measure the current input signal to determine the energy consumption.However, an programmable ammeter is not enough for our purpose since the time between measures is too long and the burden voltage of common ammeters is very high, with the subsequent loss of accuracy.
-By sensors based on the Faraday's law.These sensors are the Rogowski coil [31] and the current transformer [26].They provide electrical isolation, enabling to measure currents that circulate in the devices with high voltage.These sensors are not the subject of our approach since they work with medium or low frequencies due to the hysteresis cycle that is represented by their components.-By means of magnetic field sensors.The following sensors can be seen as current sensors that use magnetic fields in the A-mA range: the current sensors based on the Hall-effect [30], the current sensors based on the tunnel magnetoresistive [21], the current sensors based on galvanomagnetic technology [9], the current sensors based on anisotropic technology [3], the current sensors based on giant technology [2] and those based on the fluxgate principle [22].We do not use these sensors since we need more precision than those found in the market to embed them in a Prototype Circuit Board (PCB).-By recent current sensors based on tunnel magnetoresistive effects.Garca et al. [36] present a new sensor based on a tunnel magnetoresistive effect that improve the older sensors based on magnetoresistive effects since they provide electrical isolation and have self-heating properties due to the power dissipation on it.However, currently they are not available in the market.
-By the switching cycles of a switching regulator.If switching regulators are used instead of linear regulators, there are nearly linear relationships between the switching frequency and the load current over a wide dynamic range.This relationship implies that a fixed amount of energy is delivered per cycle.Therefore, the energy consumption can be calculated counting the rising edges of the connected inductor voltage as Dutta et al. [13] appoint with the iCount energy meter design.However, in our approach we use a linear regulator instead of a switching regulator.-By debugging and trace probes.We can measure the energy consumption of microcontrollers using tools such as I-Jet [16] for Arm Cortex microcontrollers.These devices provide power-debugging via their Joint Test Ac-tion Group and Serial Wire Debug (JTAG/SWD) connection.Also, they can debug external power signals adding others complementary devices such as I-scope.These devices are based on an energetic analysis of the microncontrollers code such as the EnergyTrace tool for the MSP430 microcontrollers [34].Nevertheless, the resolution of these devices is small to measure WSNs (160A in the I-Jet device) and we want to improve their sample rate (200ksps in the I-Jet device [17]).-By current sensors based on Ohm's law.This option measures the current consumption of a shunt resistor in conjunction to a low noise chopper amplifier.The voltage drop across the shunt resistor is used as a proportional measure of the current flow as illustrates Ziegler et al. [38].Sometimes, the accuracy of the shunt resistor is poor and it does not work well, but using a resistance with great precision the results are optimal.We have chosen this solution because the accuracy can be less than 0.1%, the thermal drift can be less than 25ppm/K and the rage can be A-mA.Moreover, it allows the highest frequencies due to the Ohm's law with small size and low cost components.

Data Acquisition Processes
The double-buffering technique is the main method to sample data for the real time DAQ systems.It is not a new technique [20]; for instance, we can mention the old DAQ system of Gay et al. [14] that present a multiprocessor doublebuffering technique.However, we only use one processor in our approach.
There are some DAQ systems with double-buffering techniques running a ARM processor in parallel with a Field Programmable Gate Array (FPGA) in order to take samples and process them as the work of Li et al. [24].However we use the same device to take samples and process them.Other ARM DAQ system with double-buffering technique incorporate external ADC devices in order to take samples more quickly and save them in a specific hardware [32].However, our approach improves the sample rate without any external ADC device.
In addition, there are multiple applications used in other disciplines that help double-buffering techniques to take samples (or transfer them between memories) while the data are processed in the same processor as Tan et al. [33] suggest.This double-buffering technique has been used in image processing [39], paralleling processing [25] or ultrasound imaging methods [4].However, our approach uses this double-buffering technique by adding a pipeline processing composed of timers, interruptions and a state machine.This pipeline processing allows us to set the sample rate for complete control over the DAQ.

Data Handling Processes
Related to the data handling processes, we have considered that the current input supply signals is not constant.Therefore, we must look at the classical definition of electrical energy, taking into account the power consumption at each discrete time interval.This same process is carried out by the newest electrical smart meters [8].
The smart meters [28] work with alternate current, meanwhile we work with continuous signals.Even so, both its signals and those we obtain are random depending on the energy consumption at every moment.The difference is that we optimize the mathematical calculation in order to minimize the operations carried out by the microcontroller, improving the processing time.
The difference between the presented works and our approach is that our work join previous works to implement a method for energy measurement as efficiently as possible in a continuous basis, with only one processor and without external off-chip ADC components when the voltage supply does not change.Our approach uses transducers and a data acquisition logic accompanied by a data processing, giving a real solution to the measurement of energy when the current input signals change very quickly.Our work can perform other tasks in parallel while measuring energy, contributing to solve the problem of real-time energy metering by minimizing resources.

The Addressed Problem
This section defines the addressed problem by our energy consumption measure approach.Also we define the magnitude of the measured signals and we define the sampling rate required.
Our initial objective was a method to measure the energy consumption of ZigBee motes (a class of WSN protocol).We observed that the current input signal of the ZigBee motes varied very quickly in short periods of time.This is because the motes are in an idle state when no data is sent and in a very short active state when data is sent.In the idle state the motes consume a negligible amount of energy, but in the active state they consume a lot of energy in small periods of time.Therefore, the energy consumption signal of the motes is mainly contributed by short energy peaks.
There are tools for estimating the energy consumption of a sensor mote analyzing the software and the energy consumption of each components of the mote such as AVAKIS [11].Notwithstanding, we wanted to measure the energy consumption of the motes physically and continuously with a real time application.One possible approach to do so was to assess only the energy peaks and ignoring the energy consumption in the idle status.However, we wanted to have as much information as possible about the energy consumption, including the energy consumption of any state, that is, we want to take into account the energy consumption of the idle state too.
The most important consumptions of a mote become the consumptions of the microcontroller and the consumptions of the radio transceiver.Therefore, we can estimate the total current drain as the sum of the microcontroller current drain plus the radio transceiver current drain.We obtained the current drain values for industrial and academics motes from previous works [35],  1.In this regard, the order of magnitude of the current drain for the sleep state and the active state is A and mA respectively.In order to sample the current drain signal we also need to consider its spectrum.In this regard, Casilari et al. [5] study the characterization of the energy consumption signal.The spectrum of the signal depends on the dutycycling algorithm of the mote and the size of the sensed data.On the one hand, the duty-cycling algorithm controls the state change between the active state and the idle state and therefore, the energy peaks frequency.On the other hand, the size of the sensed data determines the energy peaks width.Casilari et al. analyze the current drain signal of commercial ZigBee motes using an oscilloscope with 50 KSPS (kilosamples per second).They divide the analysis based on the ZigBee protocol communication sequence: start up, scan, association, transmission, and sleep state (idle state).As can be seen in their analysis, they show the current drain signal of the start up, scan, association and transmission using 50 KSPS, but they still can not analyze the sleep state over the time with that sample rate.
Focusing on the family of Berkeley motes (WeC, Ren, Ren2, Dot, Mica, Mica2Dot, Mica2 and Telos [29]), we can observe that the wake-up time of these devices are very low (1000-6 S).Therefore, we have to maximize the sample rate if we want to detect when the mote wakes-up for a short period of time, for instance, when the mote wakes-up to not to be ejected from the network due to a downtime timeout.
Therefore, maximizing the signal sampling frequency is a priority to measure the energy consumption of the sleep state.In this regard, we estimate that doubling the sample rate over 95 KSPS is sufficient to measure the energy consumption of the sleep state.In this paper we present an energy consumption measure approach that covers these specifications.

The General Approach
This section provides an overview of the developed approach and the parts in which it is divided.
As explained before, traditional approaches do not measure the energy consumption of the motes at an optimum sampling rate, generating loss of relevant information.Therefore, we want a system that improves the sampling rate for properly characterizing the energy consumption of electronic devices.For this end, we propose an approach for analyzing the energy consumption of electronic devices by means of maximizing the sample rate of commercial microcontrollers for adequate data gathering.
The most important problem we had to overcome was to improve the real sample rate of our DAQ system.The commercial microcontrollers with an ADC have a nominal sample rate that was never reached due to the limitations of the CPU architecture and due to the communication protocol of the microcontroller.The CPU processing time of the microcontroller and the time that the memories transfer information generate latencies in the availability of data found in the ADC registers.The useful sample rate of the ADC was impaired, and therefore, the temporal resolution was not enough for our energy consumption measure application.
As a result, we designed a new approach where the real sample rate is improved.The following sections explains step by step how we did it, from a general approach to the details of implementation.Our system contributes to solve the problem of calculating the energy consumption when the input current signal is not band-limited and the voltage supply is constant.The idea is to periodically sample the electronic device input current and to calculate the energy consumption in parallel when the voltage supply is constant, i.e., when an electronic device is supplied by a voltage regulator.
To cope with the problem, we decided to create a new and fast energy measurement approach.The new approach would meet the following requirements in addition to those set out above: -To use low-cost hardware.A current challenge in consumer electronics is to reduce costs.Consequently, the technique should be implemented in low-cost hardware.-Low-power consumption.In order to guarantee high autonomy of the batteries we need low energy consumption.-Minimizing the data loss and maximizing the sample rate.The sample rate should be as high as possible, finding a compromise solution.-Continuously over the time.This approach measures the energy consumption without storing large amounts of data and without delays in signal sampling over time.
In order to meet all the requirements, the approach consists of the three elements described in section 2.
The two software processes run in a cyclic schema and are performed in parallel.To deploy the two processes we have designed an algorithm which is in charge of programming the DAQ system taking into account the hardware components and their characteristics.It forms the core of the approach and implements the sample rate improvement.
The following three sections delve into the hardware components and the two processes by describing the algorithm that implements the double-buffered mechanism in Sec.7 and including the equations that computes the energy consumption in Sec. 8.

Hardware Components
This section presents the steps required by the hardware to take samples of the target input signal.Each step is in charge of a hardware component, therefore, each hardware component will be explained when its associated step is explained.
The following steps enable discrete samples acquisition of the input current to fed the data processing of Section 8: 1. Turn the current input signal to a voltage signal.2. Filter the noise by capacitors.3. Amplify the voltage signal.4. Convert the analogical signal into a digital signal.5. Keep the discrete values in a memory.
For the first step, we need to work in voltage values.For this purpose, a shunt resistor is used to convert the current input signal in a directly proportional voltage through the Ohm's law.The resistor has low error (ppm) since it does not pervert much the original signal.In the second step, the noise due to the high frequency is filtered by capacitors since the noise distorts the value of the current measure.In the third step, the signal is amplified by a chopper amplifier to eliminate the offset voltage and the typical 1/f noise of the operational amplifier.In the fourth step, the analogical signal is converted to a digital signal using an ADC.Finally, the digital values are periodically saved in a memory.
To address the above steps, we have used the commercial Current device as Geng et al. [15] suggest.Current is an electronic device that overcomes burden voltage problems of the commercial ammeters to allow current measures.Technically, the Current device already incorporates all electronic devices to support the steps 1, 2 and 3. Indeed, it incorporates three shunt resistors for allowing different ranges of conversion depending on the gain of the Ohm's law.In this regard, the accuracy is less than 0.2% on A and nA ranges, and less than 0.5% on mA ranges.Also, the Current device incorporates a MAX4239 chopper amplifier and filter capacitors.We have decided to use Current to expedite the deployment of the data acquisition logic and the data processing.The most important Current characteristics are shown hereunder: -Current ranges: 0-1250mA/A/nA.The ranges change depending on the selected shunt resistor.-Burden voltage: 20V/mA for mA range, 10V/A for A range and 10V/nA for nA range.-Output: 1mV/mA for mA range, 1mV/A for A range, 1mV/nA for nA range.
-Total Harmonic Distortion (THD) less than -60dB.Furthermore, we have chosen a Cortex M0+ microcontroller (LPC824) to support steps 4 and 5 adding the on-chip ADC and the on-chip memory of the LPC824.The LPC824 is a 32-bit ARM Cortex M0+ core microcontroller of NXP Semiconductors.We have chosen it since it has a 12-bit ADC with offset configuration and flexible full scale by the pinout, the ADC conversion rate is up to 1.2 MSamples/seg (nominal) and it has an on-chip ROOM API to control the ADC.
To store the sampled data, the LPC824 has 8KB Static Random Access Memory (SRAM) memory and 32 KB flash memory to record the program instructions.Also, it has a DMA with multiples channels and triggers to implement the algorithm, as well as timers to use the triggers.Nowadays, Cortex M0+ is the most energy-efficient processor available: 9.4W/MHz.Fig. 1 illustrates the hardware connection to cope with the previous steps.With the objective to turn the current input signal to a voltage signal (amplified and filtered), the Current device is wired in series with the device that we want to assess.With the aim of obtaining the digital signal and keeping the samples in a memory, the LPC824 is connected to the Current output.Finally, the LPC824 is connected to a PC through a serial interface with the purpose of logging the energy consumption along the time.The Current output is ground referenced and therefore the LPC824 and the Current are connected at the same common electronic ground.
Attention should be drawn to the fact that the Current selects the shutresistor to fix the current range of the signal.Also, the analogical pins Vrefp and Vrefn set the full scale and Vdd sets the offset in order to achieve the highest resolution of the ADC depending on the voltage signal amplitude on the Current output.Regarding to the PC connection, a serial interface is used wiring the RxD and TxD pins of the microcontroller to the USB port of the PC.In this direction, the hardware is configured depending on the range of the current input signal and its temporal resolution.This configuration is explained in Sections 7 and 8.

Data Acquisition Logic
The aim of this section is to describe the logic used for continuous sampling, allowing parallel data processing as presented in Sec. 8.
The data acquisition logic is a pipeline process consisting of a string of peripherals linked to the LPC824 that are programmed individually.In this sense, this section is divided in two subsections: 7.1 and 7.2.The first one deals with the LPC824's peripherals required for implementing the data acquisition logic, clarifying its characteristics and justifying its usage.The second subsection describes in detail the algorithm used to program the peripherals.To this end, the subsection contains a graph of the pipeline processing, a graph of the state machine deployed by the peripherals and a pseudocode of the algorithm.
It is worth mentioning that the data processing is executed by the algorithm.Thus, in this section we only indicate the places where the data processing is executed to make the exposition about the algorithm short.Subsequently, the data processing is described in Sec. 8.

Required Peripherals
The peripherals characteristics of the data acquisition logic are shown below.Firstly, they are named and then they are explained.
-Two buffers of SRAM.
In terms of synchronization, it is essential to know the sampled data rate.The mathematical method presented in Sec. 8 needs to be fed with the sample rate value.This value is fixed by the data acquisition logic and this will assure the correct discretization of the original current input signal.To address it, the start point of the data acquisition logic is the LPC824's SCTimer as can be seen in Fig. 2. It is a peripheral designed to implement state machines that we use to fix the sample rate of the ADC.Its configuration includes descending and ascending timers of 16 or 32 bits, events and states.The events are used to program the transition between states depending if they are activated or not.These events can be activated by an external peripheral trigger or by the internal timers.Furthermore, the events can trigger external peripherals when they are activated.
When sampling the signal, the use of an ADC is essential to discretize the input signal since the technique operates digitally.Also, it saves the discrete value of the signal in a register for further processing.As can be seen in Fig. 2, the ADC samples data when accordingly to the SCTimer.
Beside the ADC, the LPC824 has a DMA device with 18 channels that can be programmed independently.Each LPC824's DMA channel enables a memory data transfer.Each one can be programmed using their respective control registers to exchange data between memories and the LPC824's peripherals, such as the ADC.Regarding the initialization of the data transfer, 9 sources can trigger the DMA divided in internal events, pin interrupts, peripheral requests and DMA outputs.Our approach only uses two channels: channel 0 and channel 1 which are triggered by the ADC when it finishes to sample a datum.
Furthermore, the LPC824 includes a 8 kB on-chip static SRAM data memory in two separate blocks of 4 kB.Our approach uses the two 4 kB blocks in order to save the sampled data, deploying two buffers to manage data.Finally, the LPC824 has three UARTs to implement protocols to exchange information.In this sense, the technique uses an UART to send the measured information to a PC.The PC displays the measured energy consumption.

The Algorithm to program the DAQ system
In this subsection, we present the algorithm used to program the data acquisition logic using the peripherals through a pipeline process.Also, we have indicated in the algorithm that the data processing (Sec.8) takes place on lines 39 and 42 of Algorithm 1.
The algorithm uses two DMA channels providing two data transfers between the ADC data register and two buffers in the SRAM memory.Each DMA channel has an associated SRAM buffer: BUFFER0 and BUFFER1.As can been see in Fig. 2 the idea is to fill the SRAM buffers with the values of the input current signal sampled.When one of them is being filled, the other is used to calculate the energy consumption and vice versa.These operations increase the real ADC sample rate while a continuously double buffer is permitted.
As stated previously, the algorithm uses the SCTimer with the objective of fixing the sample rate.The SCTimer is used to create a state machine that controls the memory transfer as can be seen in Fig. 3.In this direction, the SCTimer enables a state machine that triggers the ADC.The ADC obtains one sample when it is triggered by the SCTimer.
The state machine is composed of the following elements to trigger the ADC at a determined frequency: -One state: State1.
-Two events associated to each timer: !timer 0 and !timer 1.
-An action to trigger the ADC: SCT trigger set.
-An action to clean the ADC activation input: SCT trigger clean.
As Fig. 3 illustrates, the SCTimer is a Mealy state machine composed of one state, two inputs and two outputs.The inputs are the values of two autoreload descending timers (timer 0 and timer 1 ).The outputs are two events associated with each timer (!timer 0 and !timer 1 ).Each event is activated when its associated timer value is 0. The frequency of the timer 1 is double that of the timer 0. And the !timer 0 event priority is higher that the !timer 1 event priority.
In order to trigger the ADC, the state machine uses a signal of the SCTimer peripheral (SCTimer trigger ) associating it to the ADC activation input by using the ADC configuration register.The ADC will take a sample when it finds a rising edge on its activation input synchronized with a rising edge in the SCTimer trigger.
Therefore, if the !timer 0 event is activated, the SCT trigger set action is activated setting the SCTimer trigger to a high boolean level and the ADC is activated.If the !timer 1 event is activated, the SCT trigger clean action is activated switching the SCTimer trigger to a low boolean level and the ADC is deactivated.
The state machine activates the SCTimer trigger at a given frequency.Then, the SCTimer trigger will activate the ADC sampling at that frequency.In addition, on the one hand we make sure that the SCTimer trigger is always activated per cycle by giving priority to the SCT trigger set action over the SCT trigger clean action.On the other hand, we ensure that there is a down flank in the SCTimer trigger between each sample of the ADC by allowing the frequency of the SCT trigger clean action to the double of the SCT trigger set action setting the timer 1 frequency value to the double of timer 0 value.
Concerning to the ADC, it is programmed to the maximum resolution and the highest sampling nominal rate, and to obtain one sample when it is triggered by the SCTimer.It has a trigger signal (ADC trigger ) to activate other peripherals when it finishes taking a sample.In this sense, the ADC trigger is connected to the DMA activation input.When the ADC ends taking one sample, it generates a trigger in the two DMA channels to exchange information between the ADC register (where the sampled datum is saved) and the associated SRAM buffers.
Regarding to the DMA, the two DMA channels are configured in an "oneshot" basis.This means that each DMA channel exchanges information once per request.The request occurs when the ADC finishes to take a sample and triggers the DMA.Always one DMA channel is enabled and one DMA channel is disabled.Therefore, the trigger produced by the ADC is listened only by the enabled DMA channel.When the DMA is triggered, one DMA channel fills its associated SRAM buffer with the data saved in the ADC register.When the associated SRAM buffer is completely filled, the DMA channel stops until the channel reset, while the other DMA channel is activated since its reset.
Each DMA channel fills up its associated SRAM buffer in a cyclic form alternatively, that is, first the DMA channel 0 associated buffer (BUFFER0 ) and then the DMA channel 1 (BUFFER1 ).The DMA channels are working alternately when they fill in their associated buffers; only one DMA channel will be exchanging data until its associated buffer is completely full while the other DMA channel is disabled.When a DMA channel fills out its associated buffer, an interruption generates a program exception in the CPU.Then, an Interrupt ReQuest (IRQ) handler gets what channel originated it and it cleans the associated flag.After that, the channel that did not cause the interruption is reseted to be filled again while the other one is disabled.Furthermore, the data of the buffer in idle state is used by the data processing method (detailed in Sec.8) to calculate the energy consumption.
It is worth saying that if we want to convert the approach to an "one-shot" DAQ system, we only have to eliminate the double buffer part of the data acquisition logic.We can convert the approach to an one-shot system using only one DMA channel and its associated SRAM buffer, while the SCTimer only has one downward timer without autoreload of the count.This allow measuring a discrete specific temporary window of the energy consumption, which it is not the goal of the paper.
The Algorithm 1 describes the program flow embedded in the LPC824 microcontroller.It starts from line 29 by configuring the DMA, the ADC and the SCTimer (functions on lines 1, 8 and 13).The CONFIGURE DMA() configures the information transfer, selecting the source (line 2), the destination (lines 3 and 4), it matches the DMA activation input with the ADC trigger (line 5) and finally it enables one of the two channels for the first information exchange (line 6).The CONFIGURE ADC() function selects the highest ADC speed configuration to take samples (line 9) and it matches its ADC activation input with the SCTimer trigger signal (line 10).The CONFIGURE SCT() function configures the ADC sample rate (lines 14 and 15) synchronizing the timer 0 frequency and the timer 1 frequency taking into account that the timer 1 frequency value is the double of the timer 0 frequency value.Also, it sets the autoreload mode for each timer (line 16), configures the priority of the events (lines 18 and 19) and matches the events with the actions to trigger the ADC (lines 21 and 22).That is, this function implements the Mealy state machine.
Once the DMA, the ADC and the SCTimer are configured, a buffer is filled while the program thread is stopped in the WFI() function (line 35).When the buffer is full, it generates a program exception with an interruption captured by the handler defined in line 24.This handler acknowledges which DMA channel generated the interruption and therefore, which buffer is full.Then, the program continues on lines 38 or 41 to reset the channel that did not generate the last program exception, while the other channel is disabled.
Then, the data processing of the buffer that generated the last exception begins in parallel to the data acquisition.The program processes the last filled buffer (line 39 or 42).While a buffer is processed, the other buffer is being filled until the DMA interruption.It is worth mentioning that there is only one thread in the program.
The variable n buffers (line 34) is the maximum number of filled buffers that the data processing allows before sending the measures to the PC via the UART.This number depends on the SRAM buffer size and the ADC sample rate.n buffers will be calculated in Sec. 9.Then, n buffers is reseted and the measure of the energy consumption continues.Each event is associated with the a timer counter.There is an event if the count is 0 while n buffers > 0 do n buffers=number of buffers filled until send data to UART if Enable DMA channel == DMA channel 0 then In the previous section, the data acquisition logic takes samples that must be processed in order to compute the energy consumed.In this regard, this section explains the method to allow the data processing, i.e., the energy computation.
To obtain the energy consumption, we must observe Eq. 1, where E is the voltage supply of the measured device, I j is the current input signal at a certain moment, T is the sample period of the input current signal and k is the number of samples along the time.
Then, in Eq. 2 we calculate the voltage measured by the ADC (V ADC ) where ADCbits is the number of bits of the ADC (12 bits in our case), result is the output value of the ADC, V ref p is the maximum voltage value of the ADC and V ref n is the minimum voltage value of the ADC.V ADC coincides with the current measured by the µCurrent device (I µCurrent ).
Since I j is equal to I µCurrent , ∀ j (1, k) I j coincides with each sample taken by the µCurrent, so we can define a system of equations with Eqs. 1 and 2 resulting in Eq. 3: (3) All variables of Eq. 3 are constant except the variable result.This variable varies depending on the I j value.Therefore, we can split this equation in the equation system presented in Eq. 4 where the independent variable x is the accumulation of result values k times and C 1 and C 2 are constant.This system of equations can be handled as two computational blocks: 4.a and 4.b.
The computational block 4.a is processed in the LPC824 (lines 39 and 42 of Algorithm 1) and the computational block 4.b is processed in the PC.Thus, the data processing has two computational blocks in order to reduce the computational effort in the LPC824, while the PC shows the energy consumption measure.We have decided to do this separation of computational blocks since we can maximize the sample rate minimizing the LPC824 processing time.This is possible because sampling and data processing are done in parallel.
With respect to the computational block 4.a, the result values are stored in the BUFFER0 and BUFFER1 by the data acquisition logic.Then, the LPC824 calculates x accumulating the result values.After that, the LPC824 sends the result of x to the PC through the UART.These operations are cyclical.
With respect to the computational block 4.b, when the PC receives a new data, it multiplies x by C 2 and adds C 1 .These operation are cyclical.Therefore, the PC calculates the energy consumption along the time using the Eq. 5 where n sends is the number of the transmissions done to the PC.
This data processing has limitations and it needs an experimental tuning.This limitations are described in Sec. 9 and the experimental tuning is described in Subsection 10.1.k is limited by the overflow problems of the x summation implementation.Also, k needs to be calculated in order to make the minimum number of transmissions to the PC, avoiding unnecessary transmission and maximizing the data processing speed.Finally, the maximum sample rate can be calculated depending on the temporal requirements of the Algorithm 1.

Setting The Parameters Of The Approach
This section presents the limitations and temporal properties of the proposed approach that are inherent to the hardware on which it is implemented.
First, we must take care that the data processing depends on the summation of the result values k times (as in Eq. 4.a).We have to take into account the size of the variable where the summation of the result is saved to avoid falling in overflow issues.For instance, if this variable has n bits of an unsigned int variable and the ADC measures in all the range, we can obtain the maximum number of samples until the buffer overflows (k max ) using the Eq. 6.
The variable that stores the summation of all the result values must keep data until the next transmission to the PC.Then, this variable is sent to the PC and cleaned, allowing to store new data.Therefore, the k max variable is the maximum number of samples until a data transmission.
The maximum number of filled buffers of BUFFER0 and BUFFER1 until transmit data to the PC (n buffers) is limited by k max and the size of the buffers (buffer size).Thus, n buffers is the nearest lower multiple obtained in Eq. 7. n buffers is the variable specified on line 34 of Algorithm 1.
Once n buffers is determined, the final number of samples between transmissions to the PC (k maxreal ) is fixed by the Eq. 8.
Besides that, it is worth mentioning that there are two important time constraints: the time to fill a buffer (t f ill ) and the time to process the buffer (t process ).Each of the two times corresponds to a parallel task and both times must be respected for the correct operation of the technique.
The fill of the buffers never stops since we are interested in sampling data on a continuously.In this regard, Eq. 9 shows the temporal limitations.When the data processing ends processing a buffer, it waits until the other buffer is filled.Therefore, t process is limited to t f ill .Thus, t f ill is the longest time between both times to ensure the continuous sampling process.In addition, the approach sends the information to the PC once in a while and therefore we must take into account the transmission time (t send ).
To obtain the t process we can experimentally find it and to obtain t send we can calculate it, since the transmission protocol (UART) is well known.Therefore, we can obtain the maximum sample rate using the Eq. 10.Where f max is the maximum sample rate to fill a buffer.
Finally, the time between PC transmissions (t betweensends ) corresponds to the Eq.11 where t process is the time of the last processed buffer and t send is the transmission time.
10 Experimental The aim of this section is to validate our approach by running the data acquisition logic and the data processing in the selected hardware.First, the time settings and the values of the variables are obtained in Subsection 10.1 using the equations of Sec. 9.Then, in Subsection 10.2, the energy consumption of some basic circuits are measured in order to know the reliability of the technique.Finally, Subsection 10.3 we assesses the energy consumption measurement of a WSN mote.at 2 V and it is discharged over time.At the same time, the prototype board is connected in series with each basic circuit measuring the energy consumption along 1.26 hours.Table 2 summarizes the energy consumption results.The first row shows the theoretical energy consumption in Joules (J) (Wtheoretical ).The second row shows the energy consumption in J measured in our experiments (Wmeasured ).The third row shows the error between Wtheoretical and Wmeasured energy consumption.
The error between Wtheoretical and Wmeasured is calculated applying Eq. 12.In this direction, the error in the R=1k circuit is 4,76%, the error in the R=1k8 circuit is 3,27% and the error in the C circuit is unpredictable since when C is discharged, our method measures leakage currents and noise.With these results we can conclude that the initial error is less than 5%.

Wireless Sensor Network Consumption
Our technique was used to assess the energy consumption of a WSN Zig-Bee mote.This mote is a prototype composed of a XB24-Z7WIT-004 radio transceiver and a LPC824 microcontroller.In this regard, we compared the energy consumption of the mote when it implements a duty-cycling algorithm to reduce the energy consumption (LOKA) [35] and without it.The results are shown in Fig. 4.This figure is plotted taking into account that the mote was activated for a certain period of time and periodically it sends information to others motes.The collected measures are divided into two classes of experiments: without LOKA and with LOKA.In the first one, all the electronics components are in active mode.In the second one, the integrated algorithm switches on/of the radio module and it changes the power mode of the electronic components to reduce the energy used.
In the experiments, the mote sends periodically 64 bytes of data, allowing us to analyze the energy consumption for a specified operation time.By one side, the mote without LOKA was tested in 360 experiments divided in blocks of 30 measures.By the other side, the prototype mote with LOKA was tested in 60 experiments divided in blocks of 5 measures.The energy consumption profile obtained shows that the standard deviation is small so we considered that the number of samples was enough to support our conclusions.
For both experiments, we considered different values for the transmit interval time and the operation time as can be seen in Fig. 4.
-Discrete values of transmit data interval: [10,30,60] (seconds).These values correspond with the rectangle placed in the upper left corner and with each line plotted.-Discrete values of operation time: [60, 300, 600, 900] (seconds).These values correspond to the abscissa axis.
Fig. 4 shows that the energy consumption measured in joules is reduced along the time using the LOKA algorithm.The energy consumption results provided were similar to those obtained in the state of art [7], [29], [12].
The work of Jiang et al. [18] is quite similar to our approach.In this sense, they designed a method to measure energy consumption in WSN motes through a comparator and a counter, measuring the energy consumption of a TelosB WSN mote to verify their design.They analyzed the WSN mote running at slightly higher than 2 Hz duty-cycling algorithm and assessing it with an oscilloscope.
Comparing the work of Jiang et al. with our approach, in Fig. 5 we can observe that they obtain an approximation of a linear progression between 0 and 400 seconds composed of small energy consumption steps.Their approach can not detect the energy consumption in sleep mode and therefore each step stays constant when the WSN mote is in sleep mode.In the contrary, our approach measures the energy consumption of the sleep state and adds it to the linear progression.
Moreover, we can estimate the energy consumption that they would measure if they ran a 0.1 Hz algorithm sending a message each 10 seconds.This is possible taking into account that each step in Fig. 5 corresponds to 0.4 mJ and it will only change the number of steps in a period of time: one energy step (of 0.4 mJ) every 10 seconds instead of one energy step every 0.5 seconds.
Table 3 shows the energy consumption for both approaches.The rows of the table show the time that the mote is running with a 0.1 Hz duty-cycling algorithm, the consumption of the TelosB mote, the consumption of our prototype mote with the LOKA algorithm and the consumption of our prototype mote without the LOKA algorithm.
We can see that the TelosB mote consumes more energy than our prototype, being the values in the same order of magnitude and the consumption between measures are constants for each mote such as in the original work of Jiang et al.The energy consumption of the TelosB mote is higher than the others since the microcontroller of the TelosB consumes more energy than the microcontroller of our prototype mote.Also, the energy consumption of the sensor motes depends on the data frame that the mote sends each transmission, the electronic devices connected to the mote, the electromagnetic noise, the channel of the transmission, the collision with other radio waves, etcetera.
In this paper we presented an energy measurement technique capable of greatly improving the sampling rate while processing the gathered data in parallel.The technique continuously measures the energy consumption using a double buffer in a low cost hardware, a data acquisition logic and a data processing algorithm for maximizing the sample rate while reducing the computational load of the sampling hardware.This particular profiling technique is of relevant application to low consumption devices such as WSN motes; our approach is able to analyze the energy consumption of the motes in all the states, including the sleep mode.
In future works we will test the newest magnetoresistive sensors to understand its impact on the proposed technique.Data generated or analysed during the current study will be made publicly avaialable on the GitHub of the group (https://github.com/ISG-UAH).

Competing Interest
The authors declare that they have no conflict of interest.

Funding
This work is funded by the JCLM project SBPLY/19/180501/000024 and the Spanish Ministry of Science and Innovation project PID2019-109891RB-I00, both under the European Regional Development Fund (FEDER).

Figures Title and Legend Section
Figure 1.Hardware components of the proposed approach for energy measurement.Figure 2. Pipeline processing data acquisition logic and data processing implemented for a double buffer in the SRAM.
Figure 3. SCTimer state machine.It has only one state with two autoreload downward timers and two events associated to each timer.The timer 1 is twice as fast as the timer 0. The priority of the !timer 0 event is higher than the priority of the !timer 1 event.Each event is activated when its associated timer value is 0. If the !timer 0 event is activated, the SCT trigger set action is activated.If the !timer 1 event is activated, the SCT trigger clean actionisactivated.
Figure 4. Energy consumption measure of a ZigBee mote using the LOKA algorithm [35] and without it.
Figure 5. Energy consumption measure of a TelosB mote using the Jiang et al. method (SPOT) with a 2 Hz duty-cycling algorithm [18].

Algorithm 1 1 : 2 : 3 : 4 : 8 : 10 : 11 :state 1 ←
Algorithm function CON F IGU RE DM A() Configuration of the DMA device DM A channel 0 & 1 source ← ADC ADC origin for all transfers DM A channel 0 destination ← BU F F ER0 Destination of the DMA channel 0 DM A channel 1 destination ← BU F F ER1 function CON F IGU RE ADC() Configuration of the ADC device.ADC can work as fast as possible & SCTimer selects ADC sample rate 9: maximum clock rate & minimum clock divider ← T RU E ADC activation input ← SCT trigger 1 ADC sample = 1 SCT event 0 ADC trigger ← T RU E when an ADC sample f inishes 12: return 13: function CON F IGU RE SCT () Configuration of the SCT device.SCT state machine has 2 timers, 1 state and 2 events 14: decrem count timer 0 ← F REQU EN CY Frequency = Sample rate 15: decrem count timer 1 at a f req ← 2 * F REQU EN CY T RU E 16: autoreload timer 0 & 1 ← T RU E 2 timers reset when count ends 17: event 0 & 1 State machine has 1 state composed of 2 events

18 : 21 : 22 :
event 0 =!timer 0 event ← priority HIGH !timer 0 event priority>!timer 1 event priority 19: event 1 =!timer 1 event ← priority LOW20: event 0 & 1 ← match mode Events occur when the timers finish SCT trigger set action ← event 0 Event 0 sets SCTimer trigger SCT trigger clear action ← event 1 Event 1 cleans SCTimer trigger 23: return 24: function handle IRQ DM A() IRQ when a SRAM block transfer finishes 25: Gets DMA channel associated with IRQ & clean flags 26: Enable DM A channel = READ DM A CHAN N EL F IN ISH T RAN SF ER F LAG() 27: CLEAN DM A CHAN N EL F IN ISH T RAN SF ER F LAG()

38 :
enable channel 1 ← T RU E Channel 1 starts info transfer 39: DAT A P ROCESSIN G BU F F ER0 Data Processing of BUFFER0 40: else Enable DMA channel == DMA channel 1 41: enable channel 0 ← T RU E Channel 0 starts the information transfer 42: DAT A P ROCESSIN G BU F F ER1 Data Processing of BUFFER1 43: Send U ART (result) Send energy measure to PC through UART 44: n buf f ers ← RESET Reset n buffers 8 Data Processing

Table 1
WSN platforms hardware characteristics (from manufactures data sheet) working at 3,3V.

Table 2
Energy consumption experiments using our approach and theoretical methods for 1.26 hours and voltage supply equal to 2V.