Scheduling and energy savings for small scale embedded FreeRTOS-based real-time systems

Evaluating the effectiveness of system scheduling and energy savings in embedded real-time systems with low-computing resources is the problem addressed in this paper. In such systems, the characteristics of the implemented scheduling policy play a relevant role in both schedulability and energy consumption. Ideally, the scheduling policy should provide higher schedulability bounds and low runtime overheads, allowing for better usage of available slack in the schedule for energy saving purposes. Due its low overhead and simple implementation, the usual scheduling policy employed in real-time embedded systems is based on fixed priority scheduling (FPS). Under this scheme, as the priority of all system tasks are assigned at design time, a simple priority vector suffices to indicate the current ready task to run. System schedulability, however, is usually lower than that provided by dynamic priority scheduling (DPS) according to which task priorities are assigned at runtime. Managing dynamic priority queues incurs higher overheads, though. Deciding whether DPS is a viable choice for such embedded systems requires careful evaluation. We evaluate two implementations of Earliest Deadline First (EDF), a classical DPS policy, implemented in FreeRTOS running on an ARM-M4 architecture. EDF is compared against an optimal FPS, namely Rate-Monotonic (RM). Further, two mechanisms for energy savings are described. They differ by the manner they compute the slack available in an EDF schedule, statically (SS-EDF) or dynamically (DS-EDF). These two approaches are experimentally evaluated. Results indicate that EDF can be effectively used for energy savings.


Introduction
Scheduling is key for ensuring real-time correctness. Classical scheduling policies are based on selecting at each scheduling instant the highest priority ready task to run. Depending on how priorities are assigned, different scheduling overheads are observed. When priorities are assigned in an offline manner, a simple vector can be used for ordering ready tasks. In this case, low overheads are expected since each priority is represented by a fixed position in the priority vector. This scenario differs from when task priorities are assigned online and may change over time, which requires reordering the ready task queue and incurs extra overheads. This is one of the reasons fixed priority scheduling (FPS) is the usual choice when implementing real-time embedded systems, which are often systems with low-computing resources. However, dynamic priority scheduling (DPS) offers high system schedulability when compared to FPS. This means that there are systems for which a set of tasks cannot be feasibly scheduled under FPS while it can be under DPS [8,26].
When designing a real-time embedded system, designers must choose the type of necessary execution infrastructure, either based on DPS or FPS. Depending on the architecture, there may be gains in choosing DPS over FPS since the former may compensate its higher overheads with better resource usage. Examples of Earliest Deadline First (EDF), a DPS solution, in existing operating systems (OS) can be easily found, mainly in Linux-based OS. This is the case of SCHED EDF patch [20] that includes EDF in the kernel in standard Linux version 3.14. Its ready task queue implementation is based on the red-black binary tree, which keeps insertion and deletion for n tasks in O(log n). EDF was also considered at user-space level in the first version of Mapping Real-Time to POSIX (MARTOP) [36]. An implementation of EDF in Xenomai, another Linux-flavor real-time OS (RTOS) can also be found [34]. Broader discussions about EDF and FPS are available [13]. However, in small scale embedded systems, the use of EDF is less often seen although efforts towards using EDF in this domain, showing that scheduling overhead can be affordable, are found. For example, using a specific kernel called ERIKA Enterprise, mechanisms for time representation and EDF scheduling were shown to be effective even when small microprocessors are considered [12,15]. However, the low overhead associated with FPS seems still dominant in the field of embedded systems. As energy saving mechanisms are mostly based on taking advantage of slacks left in the schedule, the amount of slack that can be safely used is a key for energy saving purposes.
When energy is taken into consideration in real-time systems, one wishes to reduce its consumption to a minimum without compromising system deadlines [5]. Thus, higher schedulability levels provided by DPS is appealing as the system processing capacity can be better exploited for energy savings. For example, take an EDF-scheduled system composed of preemptible periodic tasks with deadlines equal to their respective periods. If the tasks jointly utilize 80% of the system processor, they leave 20% of time (system slack) that can be exploited for energy savings. In the case of RM, up to about 31% of processing resources may have to be left unused for the sake of schedulability [26], decreasing the system energy saving potential. In practice, however, the overheads due to priority queue management may reduce the capacity of an EDF-scheduled system for saving energy. The level of this reduction is implementation-dependent and is also associated with the underline system infrastructure.
In this paper we evaluate the effectiveness of two implementations of EDF in FreeRTOS running on the ARM-M4 STM32F407 hardware [37], an architecture offering relatively low-computing resources but that is very popular for implementing embedded systems [38]. We compare them against native RM available in FreeRTOS. Based on the found results, which show the effectiveness of EDF for the target system, we then investigate its energy saving capabilities. To do so, we describe two mechanisms for implementing Dynamic Power Management (DPM).
DPM mechanisms provide energy savings by selectively turning off system components during idle or underutilized time intervals. For example, if it is known at some time t that the system can stop executing tasks during time interval [t, t + Δ t ), Δ t > 0, without making them miss their deadlines, the processor can be put in a state of low energy consumption during this interval. DPM proved to be effective at maximizing energy savings while keeping system degradation at control [9]. This effectiveness is related to the reduction of static power dissipation. Indeed, due to increasingly CMOS miniaturization, static power dissipation has exponentially increased. Due to this reason, DPM may be more effective than techniques based on Dynamic Voltage and Frequency Scaling (DVFS) [4], which aims at adjusting the frequency/power of the processor in function of system needs e.g., [10,18]. That is, DVFS focuses on reducing dynamic energy dissipation, which is associated with transistor internal processor current. As this energy is proportional to the processor frequency and to the square supply power, reducing processor operating voltage and frequency decreases energy consumption but also increases the application execution time [24]. Another possible approach for reducing energy consumption is via monitoring temperature in processors, as thermal dissipation represents leakage energy consumption. Thermal-aware scheduling is more indicated to processors that operate at high frequencies, usually composed of multiple cores, e.g., [21,35]. In this paper we focus on DPM-based approaches.
Two DPM mechanisms in line with EDF scheduling are described in this work. We first consider introducing an extra (virtual) high priority task in the system whose execution time corresponds to the time during which the system is put in a low-power state. As the parameters of this task are determined offline, no extra online overhead is introduced. We named this approach as Static Slack EDF (SS-EDF). Second, we adapted a recent solution designed for multiprocessor real-time systems [11]. This solution is based on computing the maximum idle intervals (dynamically) that can be inserted in an EDF schedule. This computation is carried out online and so the approach is named Dynamic Slack EDF (DS-EDF), which is effective in determining idle intervals but increases overheads due to the online computation.
In summary, in this paper we study the effectiveness of having EDF as the underline scheduler in small scale systems and then describe and evaluate EDF-based DPM mechanisms for energy savings. The former branch of contribution appeared in a previous conference publication [27], which is now extended with the latter branch. More specifically, this paper conveys the following.
-We describe two implementations of EDF for FreeRTOS at kernel space level for supporting real-time systems composed of periodic real-time tasks. The considered implementations were inspired by previous work: priority queues were implemented by multiple linked lists [29] (EDF-L); the other use the classical min-heap data structure [16] (EDF-H) similarly to other studies [16,40]. -Although both implementations of EDF have been considered before, to the best of our knowledge the evaluation of EDF-L has not been done in an actual OS and assessment has been done based only on simulation. Moreover, EDF-H, the classical implementation of dynamic priority queues, is the one usually taken for reporting EDF performance, with very few studies based on FreeRTOS. Experimental results on low-computing capacity platforms is hardly seen. -We describe and implement two DPM mechanisms, SS-EDF and DS-EDF, which exhibit different online overheads. These mechanisms are experimentally evaluated.
-Our assessments of both the schedulers and the DPM mechanisms take several possible scenarios into consideration with varying processor utilization loads and task set configurations. Results from our experiments serve to characterize EDF against FPS in low-processing capacity systems and then show its potential benefits for providing energy savings. The provided information can be useful for guiding designers of embedded systems.
As can be noticed, our focus is on systems composed of periodic tasks, namely tasks that are released by periodically triggered timers. This kind of system is very common in industry (e.g., sensor polling and control loops). Taking into account more generic task models is left as future work. The schedulability equations and algorithms we present here, however, can be employed (or can be easily adapted to) when tasks are sporadically released. Another aspect of our work is related to the fact that the results are strictly linked to the chosen OS (FreeRTOS) and ARM-M4 hardware board. Thus, the results we present describe the performance of our implementations for this environment and cannot be extrapolated to others; although they may serve as a good indicator for similar platforms.
The remainder of this paper is structured as follows. Section 2 gives background information on scheduling and FreeRTOS. It also presents previous work on implementing EDF in FreeRTOS and briefly reviews some principles used in energy savings. Our implementations of EDF are detailed in Sect. 3. This section also assesses them using native RM as comparison baseline. Section 4 describes two approaches for DPM and evaluates them under different scenarios. Conclusions are drawn in Sect. 5.

Task model and motivation
The type of application we aim at is composed of periodic tasks to be scheduled on a single processor. These are tasks that release jobs separated by a known time interval. A slightly more generic model would convey tasks for which only their minimum inter-release times are known, called sporadic tasks. Although the considered implementations of EDF can deal with the sporadic task model, our evaluation focuses on the periodic implicit deadline tasks (i.e., task deadlines equals their periods) as this kind is what is typically found in a large spectrum of embedded real-time systems (e.g., sensor polling, control loops). We denote this type of application as a set τ with n independent real-time tasks τ i , i = 1, 2, . . . , n. Each task τ i ∈ τ is characterized by its worst-case execution time and its period, denoted respectively by C i and T i . Each task τ i : (C i , T i ) is assumed to periodically release a (possible infinite) sequence of jobs, each of which executes for no more than C i time units and must finish execution by T i time units after its release time. That is, a job of task τ i released at instant r must finish by its (absolute) deadline r + T i . The utilization of each task τ i is defined as U τ i = C i /T i and system utilization is given by U τ = n i=1 U τ i . When we consider a task with deadline less than period, it will be represented by a triple (C i , T i , D i ) and in this case τ i must finish within D i time units from its release time. Although all tasks in τ have deadlines equal to their periods, as will be seen later, one of our approaches for energy savings will consider introducing an extra task in the system with deadline different from its period.
We consider two classical priority-based scheduling policies, RM and EDF. Under both schemes, the scheduler selects the highest priority ready job to execute at any scheduling instant. No constraint on preemption is assumed, which means that at any time the scheduler may preempt an executing job in favor of another. For RM, task priorities are given by the inverse of their periods. The lower the task period, the higher the priority of its jobs. This means that job priorities are known (and fixed) at design time, contrasting with EDF, for which priorities are defined by jobs' deadlines. The earliest the deadline of a ready job, the higher its priority. These characteristics imply that runtime overheads observed in RM schedulers tend to be smaller than those found in EDF since the latter must deal with re-ordering ready task queues as their priorities depend on their jobs' deadlines. Priority queues under RM can be represented by a bit-vector and the existence of a ready job at a given priority level can be indicated by the corresponding position in the vector.
For the implicit-deadline task model, composed of either periodic or sporadic tasks, it is well known that under EDF, there is no deadline misses if and only if [26] U τ ≤ 1 . (1) For RM, a necessary and sufficient schedulability condition can be given via response time analysis [1]. This type of analysis computes the worst-case response time R i for each task τ i . The system is considered schedulable under RM (or other FPS policy) if and only if For the considered task model, the value of R i can be computed as with hp(i) representing the set of tasks with higher priority than that of τ i . Recurrence (3) converges for any system that is feasibly scheduled via FPS. The recurrence starts with r 0 i = C i and stops whenever r k i = r k−1 i or r k i > T i . The latter case implies that τ i may miss its deadline and so the system is considered unschedulable by RM. In the former case, no task misses any of its deadlines with R i corresponding to the fixed-point solution of (3).
Note that there is no system feasibly schedulable by EDF or RM if the respective conditions established by (1) and (2) fail. As no system can be feasibly scheduled if it demands more than 100% of the system processing capacity, (1) implies that under EDF the system can be fully utilized without compromising deadlines. This is not true for RM. For illustration, Fig. 1 plots the ratio between all randomly generated task sets and those considered schedulable by (3). Each dot in the graph corresponds to 1000 synthetic tasks sets each of which with 10 tasks, generated by the UUnifast algorithm [19]. It can be seen that for systems with high processor utilization, schedulability under RM rapidly drops whereas all generated task sets would be schedulable by EDF. Figure 1, however, gives only a theoretical perspective as schedulability tests do not account for OS-related overheads, which includes costs associated with preemption, interrupt handlers, and managing priority queues. That is, less than what is established by these equations is actually available for running the system tasks. When looking at distinct scheduling policies, some natural questions arise: does the higher processor utilization in (1) compensate for the imposed overheads associated with managing dynamic priority queues in systems with low-computing resources? If so, can energy saving mechanisms be effective in EDFscheduled systems? The focus of this paper is on answering these questions taking RM and distinct implementations of EDF over FreeRTOS running on an ARM-M4 STM32F407 hardware [37]. As this hardware offers relatively low-computing resources but it is very popular for implementing embedded systems [38], the type of evaluation presented in this paper provides useful information for designers. When considering taking advantage of available

Processor utilization
Success ratio slacks for energy savings, for example, the behavior observed in Fig. 1 and the scheduler overhead are relevant aspects to be taken into account.

FreeRTOS and previous EDF implementation
FreeRTOS [6] is an open source real-time OS distributed under MIT license [7], which is ideally suited to deeply embedded real-time applications that uses microcontrollers. Its main features include: small footprint, low memory and low processing requirements; widespread availability and support as it can be found in more 40 processor architectures [38]; it is mostly written in C with small sections in assembler (PORT Layer). Applications can be implemented in FreeRTOS as a collection of independent threads.
FreeRTOS native scheduler is based on FPS. The number of priority levels is configured at compilation time. Each level is implemented as an independent queue. Tasks assigned to the same queue are scheduled in a round robin fashion. That is, FreeRTOS can be easily configured to implement any FPS by assigning a single task per priority queue. In our experiments, we configured this native scheduler as RM.
Although many implementations of EDF can be found in general, only a few of them target FreeRTOS. An implementation of EDF at user-space level has been described [23,28]. Carrying out scheduling decisions at user-space level incurs extra overheads. To our knowledge, the only existing implementation of EDF over FreeRTOS targets Raspberry Pi [40], a more powerful hardware platform. It is based on implementing priority queues as a classical min-heap [16]. This previous work deals with multi-mode real-time applications. One of our implementations is also based on min-heap but our hardware platform provides much less computing resources. Further, this previous work found that FPS would be more effective for systems with high utilization. This conclusion is, to some extent, validated by our experiments. However, we found that this does not hold in an alternative implementation of EDF we considered in this work, based on linked lists [29,30], a solution that was previously evaluated only via simulation. We provide here experimental results that sustain its good performance in an execution environment offering low level of processing resources. More details of this implementation are given in Sect. 3.

Energy savings in EDF-scheduled systems
Leakage Control EDF (LC-EDF) is one of the first EDF-based energy saving approaches aiming at reducing static power [25]. Its goal is to procrastinate the execution of tasks as much as possible so as to make idle intervals in the schedule longer. Tasks are assumed to be periodic with deadline equal to periods but system model relies on DVFS. LC-EDF computes, for each job release, the maximum time that the execution of the system tasks can be delayed without compromising their deadlines. LC-EDF inspired a DVFS-oriented solution, named CS-DVS-P [22], which computes task execution procrastination for setting the least processor speed in a safe manner by adjusting its frequency and voltage.
Computing maximum idle periods of time at a given instant t in an EDF schedule requires taking into consideration the system load after t, which may be computationally expensive. For example, an exact schedulability test, called demand bound function (dbf) [8], can be used for that purpose but its complexity depends on the number of jobs to be executed during the time interval considered. The Procrastination Scheme DBF (PDBF) [2] is based on computing the system demand, similarly to CS-DVS-P, but all computation is carried out offline.
As for FPS-based solutions it is worth mentioning the Rate Harmonized Scheduler (RHS) and the Energy-Saving RHS (ES-RHS) [32]. RHS introduces the concept of harmonization period, which constrains the instants at which scheduler is notified by the new job releases. The scheduler is only notified at instants that are integer multiples of the harmonization period. Also, an energy saver task, always scheduled at the highest priority level, is considered for energy savings. Whenever this task is executed, the processor state is changed to a low-energy mode. In general, however, the schedulability bound is reduced to 50%-utilization [5].
The solutions we describe in this paper have similarities with some of the aforementioned approaches. Like RHS, one of our solutions, called SS-EDF (Sect. 4.1) is also based on introducing a high priority task in the system so as to represent periods of time the processor can be put in a low-energy mode. The parameters of this task are computed offline using dbf. Further, like LC-EDF, our second approach, named DS-EDF (Sect. 4.2) online computes idle period intervals based on delaying the execution of tasks as much as possible. This approach is actually an adaptation of an algorithm originally designed for multiprocessor systems [11] scheduled according to RUN [31]. Both our solutions are DPM-oriented.

DPM requirements
DPM mechanisms rely on processors equipped with low-power states. That is, during idle periods in a schedule, processor state can be set so as to save energy, turning off processor components accordingly. Clearly that a system fully utilized can not take advantage of DPM mechanism since its processor cannot be put in low-power states. The more slack the system has, the higher its capability for energy savings [9].
When implementing DPM one must consider the time needed for switching from and to a low-power state. Indeed, turning off processor components and then turning them on again takes time. Usually, each operation mode specifies a break-even time (BET), which represents the minimum time necessary for switching to and from a given low-power state [3]. Hence, before changing to a low-power state, it is necessary to check whether the period the processor can stay in the new state is higher than its associated BET. If this is not the case, changing the processor operation mode is not worth it. More specifically, a transition from a fully operational mode to a low-power state should be carried out at an instant t only if the system can be safe without running tasks during [t, t + Δ t ) and Δ t > BET. In DPM mechanisms, such verification is implemented at the scheduler level. For reference, Table 1 gives the BET values associated with moving from the active state (fully operational mode) to a low-power state (sleep or stop) for ARM-M4 STM32F407. When the processor is fully operational, the system takes 2 ms and 3 ms for carrying out the switching associated with sleep and stop modes, respectively. These values were obtained by measurements. The table also shows current consumed per state. As can be noted, the values reported in the datasheet [37] differ from those observed in our measurements. Energy consumption values are highly dependent on the conditions under which the system operates. In this context, the measured data in Table 1 takes into consideration possible overheads due to OS and scheduling events. Besides, as can be seen in the table, not all information can be extracted from the datasheet. For example, for the sleep mode, the datasheet indicates current of 11 mA (when all peripherals are disabled) or 59 mA (when peripherals are enabled). We measured 12.1 mA with GPIO, timer and UART enabled. Due to these aspects, we preferred to rely on our measurements instead of those values reported in the datasheet.
For measuring energy consumption, board DC1812AB equipped with chip LTC2944 was used [17]. It is designed for evaluating battery capacity, being capable of measuring energy consumption, temperature, power and current. In our setup, an output port of STM32F407 was used to trigger the energy monitoring board. Data collected is then visualized by software STM32CubeIDE [39]. The values in Table 1 correspond to the average of 10 measured values for each power state.
In our experimental evaluation (Sect. 4.3) we did not consider the stop mode for DPM purposes since execution context is lost in mode. This would make it not feasible to reestablish the context after coming back to the active state. That means that we used a single BET of 2 ms for configuring out DPM mechanisms.

EDF implementation in FreeRTOS
This section summarizes our implementation 1 . We start by giving some information on the FreeRTOS infrastructure and describing how the periodic task model was implemented. Then we present the considered EDF implementations.

Basic infrastructure in FreeRTOS
In order to customize the FreeRTOS kernel, as it is the case of implementing additional scheduling policies, some data structures and functions need to be considered: -TCB. Tasks execute within their own context with no dependency on other tasks or the scheduler. Upon creation, each task is associated with a Task Control Block (TCB), which contains many task attributes. Information in TCB is to describe the task and its state, which includes task name, function pointer, priority, stack address among other attributes. -pxCurrentTCB. FreeRTOS has not a"running" list or state. The kernel maintains variable pxCurrentTCB to keep track of the thread id that is currently running. -xCreateTask(). This function is responsible for both allocating a pointer and inserting the new task in the internal priority queue. -vTaskDelayUntil(). It delays a task for a specific amount of time starting from a specified time instant. -vTaskStartScheduler(). It starts the scheduler. The kernel manages all tasks of the system via this function. -xTaskIncrementTick(). This is an interrupt service routine triggered by internal timer, configured as 1 ms in this work. It is used to base time ticks through incrementing variable xTickCount indefinitely. Moreover, this function calls the scheduler after checking whether the priority of a new job is higher than job currently running. -vTaskSwitchContext(). This is an interrupt handler in charge of performing context switches.
The aforementioned structures and functions were modified in some extent in our implementations of EDF.

Periodic tasks
In order to implement periodic tasks in FreeRTOS, some extensions of the TCB structure were needed to accommodate typical fields used in periodic real-time systems [14]. In addition to the original TCB attributes in FreeRTOS, the following ones with uint64_t data type were added: -MsPeriod. It represents the task period.
-MsWCET. This corresponds to the task worst-case execution time.
-MsRelDeadline. For the considered task model this should be equal to MsPeriod and represents the task relative deadline. -MsAbsDeadline. This is the current job deadline. It is updated upon the release of a new job at time t as MsAbsDeadline = t + MsRelDeadline. -MsNextWakeTime. The next waking time of a task, which is set to MsPeriod from the previous task release time. -MsNumberExecJob. This variable was added for statistics purposes. It counts the number of released jobs from the time the system starts. Furthermore, there are specific functions that were modified and/or created so as to provide the required functionality related to periodic task activation and execution. Function xCreateTask() was modified to deal with the new TCB parameters. The manner it enqueues the created tasks will be better explained in the next subsections. Function vTaskStartScheduler() had to be modified since the manner priorities are dealt with in EDF differs from the native FreeRTOS scheduler. It must get the highest priority ready job before triggering the scheduler. The value of pxCurrentTCB is then updated accordingly to represent this job.
Another modified function was xTaskIncrementTick(), which was adapted to deal with relprmt and relnoprmt events. These events take place depending on whether preempting the current running job is or is not necessary, respectively. As this routine deals with the priority queue, its code depends on how the queue is implemented (Sects. 3.3 and 3.4). In a nutshell, if there is no new released job, the counter is incremented. Otherwise, the priority of the new job is checked. If it has lower deadline than the deadline of the current running job, the running job should be preempted. Otherwise, the new job is enqueued and no preemption is generated.
A new function, namely xEndJob(), was introduced for notifying the scheduler upon the finishing of a job. Once the highest priority task is selected by the scheduler, a new task is scheduled to run. If there is no ready task, processor runs idle.
In general, tasks in our implementation have its code structured as follows. 1  It is worth stating that if tasks were sporadic, their job releases could be triggered by sporadically occurring events, instead of timers set based on task periods. This would require extending the described implementation by introducing proper interrupt handlers to capture such events.

EDF queue based on min-heap (EDF-H)
Min-heap is a data structure usually employed in priority queue implementations. Our EDF queue implementation follows algorithms found in classical textbooks [16]. Its basic structure is given by the following code in C. The heap is made of a TCB vector with each entry for a ready task. An auxiliary variable (count) was considered to represent the current enqueued number of jobs. Three basic functions to manage the heap were implemented. Their prototypes are as follows. Fig. 2 Basic structure for implementing an EDF queue using multiple linked lists. Illustration adapted from [29] In a nutshell: get_min() returns the highest priority (i.e., earliest deadline) of the currently enqueued jobs; extract_min() returns this job, removing it from the queue; and insert() enqueues a new released job. As the former function does not update the heap, it runs in O(1) whereas the other two have run time complexity of O(log n) for n jobs.

EDF queue based on multiple linked lists (EDF-L)
An EDF queue can be implemented using a linked list with jobs enqueued in EDF order. However, for n tasks this would require O(n) steps for inserting or searching. Aiming at reducing this cost, an implementation based on multiple linked lists was described [29]. This design is particularly interesting for small embedded systems, as it is the case considered in this work. We implemented this alternative approach. Figure 2 gives an overview of the considered implementation. A set of n linked lists stores the enqueued jobs for the n tasks with the following characteristics: (a) a list can hold more than one job and there may be empty lists; (b) the lists' nodes keep TCB pointers for the corresponding tasks; (c) jobs in a list k have deadlines greater than those in lists l for any l > k; (d) no job in list k at time t has deadline greater than t + T k . Note that if jobs had fixed priorities, properties (c)-(d) would clearly hold since there would be a job per list. Nonetheless, even for the EDF priority assignment, the expected number of jobs per list is small.
The reasoning of EDF-L is that dealing with several small lists might be better than a single long list if the search space can be efficiently reduced. The bitmap vector, seen in the figure, plays a role in this regard. It indicates whether there is some enqueued job in the corresponding list, which can be done in O(1) [29,30]. This allows the scheduler to skip empty lists in an efficient manner. Another way of speeding up the search is via the tail pointer vector from which the greatest deadline job belonging to a given list can be easily accessed. Once the list to enqueue a job is found, a new released job can be inserted into the list (in its first position) in O(1) in most cases.
Upon job releases at instant t, there are basically two scenarios for job insertions in the EDF-L queue, depending on the deadline of a newly released job J . If its priority is higher than that of the job J executing at t, then it preempts J . Assume that J is an instance of task τ k . As the deadline of the preempted job is guaranteed to be the second shortest deadline in the system at time t (due to EDF ordering), it can be inserted in the list in O(1) as follows. First, the list l on which J will be inserted is found, which can be done in O(1) using the bitmap vector. The value of l corresponds to the least value for which the bit indicates a non-empty list among the lists that can hold J , that is, those lists such that l ≥ k.
The other job insertion scenario is more costly. If the deadline of the new job J of a task τ k released at time t is greater than the currently executing job, it must be enqueued. However, in this case, inserting J into list k is not enough since there might be previously enqueued jobs in lists l > k with deadlines earlier than t + T k . To comply with property (d) previously stated this operation requires removing such jobs from lists l and inserting them into list k, which takes linear time. Again, as the number of jobs per list is expected to be low, such operation is likely to be fast.
Finally, when a job finishes executing, the corresponding list is updated accordingly in O(1). Likewise, a new job can be selected to execute in constant time as jobs are in EDF order in the lists. If processor is made idle upon finishing executing a job, the bitmap vector is updated accordingly, also in O(1).
The basic structure for implementing each of the n lists is given below. The prototypes of the basic functions that carry out the previously described operations are given below. In brief: list_add_head() and list_rm_head() insert and remove a TCB in a list, respectively; and next_list() returns the id l of the non-empty list that immediately follows a list k of interest. For a list k this latter function returns the id l > k for which there is some job.

Experimental evaluation
For our evaluation, task sets were synthetically generated using the UUniFast algorithm [19]. According to this procedure, utilization values of each task follows a Uniform distribution and sum up to the desired task set utilization. Once the utilization of a task was generated, its period was randomly chosen within the desired interval and its execution time was then computed. Task execution time was implemented as a loop according to its generated execution time value. The hardware platform for the experiments consisted of an ARM CORTEX-M4 microcontroller with a 168 MHz CPU, 512 KB flash memory and 128 KB of RAM memory. As this platform is mostly used for small embedded systems, in our evaluation we considered systems with no more than 60 tasks.
Several task set configurations were considered. We highlight here results obtained for two types of task sets, called hereafter easy and hard. The former corresponds to task sets containing tasks with periods varying from 2 to 1 000 clock ticks whereas tasks in the latter type have periods in between 2 and 50 clock ticks. The system was configured such that each clock tick takes 1 ms. Note that hard task sets cause a high number of scheduling events, possibly leading to higher overheads when compared to easy task sets. During the experiments, time was measured as clock cycles by means of the clock cycle counter (CYCCNT), a feature available in Cortex-M4 as part of the DWT (data watchpoint and trace).
The following sections compare EDF-H and EDF-L against RM. Recall that FreeRTOS offers native support for RM. Observed effects due to context switching, scheduling overhead and schedulability are reported. Experimental setups were defined for easy and hard task sets, varying the number of tasks in the sets and the task set utilization. For each setup, reported results correspond to the average of 30 different task sets, each of which executed during 10 seconds.

Context switches
As RM and EDF behave differently in terms of scheduling decisions, it is interesting to observe whether there are significant effects in terms of generated context switches. If it were, this could imply (or partially explain) their differences in observed scheduling overheads. Figure 3 shows the found results for task sets with utilization of 85%. As expected, the number of context switches varies for different task set sizes and is much greater for hard task sets (about 50% more). However, differences between RM and EDF are marginal. Similar trends were observed in other experiments with different task set utilization setups. In conclusion, differences in context switches cannot play a central role in explaining possible differences of scheduling overheads for RM and EDF.

Scheduling overhead
Scheduling overhead is measured in this section by computing the overall time the system spent on running scheduling related functions as compared to the total of 10 seconds each experiment ran. Results are given in Fig. 4. As expected, RM presents the lowest overhead when compared to both EDF implementations. EDF-L performs better than EDF-H. As the number of tasks grows, overhead proportionally increases for all approaches. Further, the overall overhead for scheduling hard task sets is up to 50% higher than what was observed for easy task sets. The difference follows the same trend observed for context switches, as shown in the previous section. This means that the overhead associated with managing priority queues is what causes most differences between the considered scheduling implementations in terms of overheads.

Schedulability
As usual, we measure schedulability by verifying whether some job misses its deadline during execution. That is, schedulability is being measured by Success ratio = # task sets with no missed deadlines # generated task sets Figure 5 shows the found results per task set size for task sets with 85% utilization. For easy task sets, all three scheduling approaches can successfully manage task sets with up to 30 tasks. Beyond that the performance of RM drops. Interestingly, RM performs better than EDF-H for sets with more than 50 tasks. That is, the scheduling overhead for managing the min-heap becomes a bottleneck for EDF-H (recall Fig. 4). In this case, the theoretical advantage of EDF (illustrated in Fig. 1) over RM does not hold for heap-based priority queues. EDF-L, on the other hand, is capable of scheduling all easy task sets.
The relative good performance of all three implementations observed for easy task sets does not hold when hard task sets are considered. As can be seen in Fig. 5, deadline misses under RM starts to appear in task sets whose size is as low as 10. Once again, EDF-H performs worse than EDF-L. While EDF-L was able to successfully schedule half of the generated task sets with 60 tasks, the success ratio of RM and EDF-H is zero when the number of tasks is 40 and 50, respectively. In conclusion, the theoretical schedulability of EDF partially compensates for the higher overheads in managing dynamic priority queues although the  use of heap is shown to consistently perform worse than the implementation using multiple linked lists.
Similar conclusions can be drawn from Fig. 6, which shows the schedulability trend observed for easy and hard task sets per utilization. Two extreme scenarios were considered, namely easy task sets with 10 tasks and hard ones with 60. Note that for the former case, EDF-H performs well w.r.t. RM. For stressed systems (i.e., hard and large task sets), the opposite trend is observed. Nonetheless, EDF-L consistently shows to be the best implementation choice in all considered scenarios.

EDF-oriented DPM mechanisms
In this section we describe two DPM mechanisms that were implemented for providing energy savings associated with EDF schedulers, namely SS-EDF and DS-EDF. Based on the results represented in the previous section, EDF-L was chosen as the scheduler.
The described DPM mechanisms are designed to take advantage of available slack in the schedule so as to put the processor in a low-energy consumption mode. SS-EDF (Sect. 4.1) is based on introducing an extra (energy saving) task in the system so that whenever it runs, the processor is put in a low-power state. No extra online overhead is incurred as all complexity comes offline for determining the parameters for the task. However, as this strategy does not consider information available online, it may not be very effective to take advantage of the slack made available. The DS-EDF mechanism addresses this problem (Sect. 4.2) by computing at runtime available slack in the schedule. Although more effective in determining slacks, it is associated with extra online overheads. Hence, it is necessary to experimentally characterize both DPM strategies (Sect. 4.3). We notice that SS-EDF correctly works under sporadic task releases and DS-EDF can easily be adapted to cope with such scenarios. Our experiments, however, focused on the target task model, restricted to periodic tasks.

Static Slack EDF (SS-EDF)
Let τ es : (C es , T es , D es ) be an extra (virtual) task introduced in the system for energy saving purposes. Task τ es is jointly scheduled according to EDF with system tasks in τ . Its deadline D es equals its time C es . This means that τ es is bound to continuously run from its released instant, otherwise it misses its deadline. That is, whenever τ es is set to start running at time r , the processor is put in a low-power state during r + C es . To ensure that there is no missed deadline of any task in τ ∪ τ es , the parameters C es and T es must be computed accordingly. This can done via an exact schedulability test for EDF based on time demand analysis. For that purpose, we rely on the function dbf(t) [8], which gives the demand bound due to jobs that have both their release times and deadlines within a time interval of size t. It is known that a set of tasks is EDF-schedulable (i.e., no deadline is missed in an EDF-generated schedule) in uniprocessors if and only if dbf(t) ≤ t for all time instants t. The value of dbf(t) can be computed as The values of t that must be considered are obtained in function of the task deadlines, their periods and system utilization. For example, consider τ = {(1, 5), (3, 10)} and τ es = (4, 10, 4). The values of t to be tested are 5 and 10. As dbf(5) = 4 and dbf(10) = 9, no task can miss any of its deadlines. However, in general, computing (4) for all necessary values of t in larger system can be computationally expensive since the procedure is of exponential complexity.
To date the fastest known method for computing (4) is the Quick convergence Processordemand Analysis (QPA) [41], which is able to skip a large number of values for t during the schedulability test. We use QPS for computing the parameters of τ es . However, we also use an alternative schedulability tests for improving the search of the parameters even further. As previously mentioned, task τ es is defined with no slack as its deadline equals its computation time. This means that any feasible EDF schedule for τ ∪ {τ es } would be the same as a schedule produced by a hierarchical scheduler, where τ es would be assigned the highest priority and all tasks in τ could be scheduled in background via EDF. This scheduling model has been considered in previous research, for which several efficient sufficient (but not exact) schedulability tests were derive [33]. Any task set passing a sufficient test is deemed schedulable but some schedulable task sets may not be identified as such by the test. In particular, for accelerating the search for the parameters of task τ es we use the following result, which requires that for all tasks τ i in τ , R i ≤ T i , with R i given by the fix point solution for the following equation [33].
Recurrence (5)  In this case, the test identifies the system as schedulable. The reader interested in details of the tests provided by QPA and that by (5) can refer to the original sources [33,41]. For now we call these schedulability tests respectively as Q(.) and P(.). Due to the characteristics of the tests these functions implement, Q(.) returns true if and only if the task set is schedulable whereas P(.) returning true implies that the task set is schedulable but it may return false for some schedulable task sets. For example, unlike the exact dbf-based test, the test based on (5) fails to identify the system as schedulable if C cs ≥ 3.
The search of the parameters for τ es is carried out by Algorithm 1. The search strategy is rather simple. The idea is to set the largest possible value for C es assuming that τ es has the period equal to the hyper-period of the tasks in τ . That is, T es is initially set to the minimum common multiple of the periods of those tasks in τ (line 1). Once a valid value for C es is found, the algorithm tries to reduce the period (lines [14][15][16][17][18][19][20][21][22]. Clearly, this simple strategy does not aim at finding the optimal parameters for τ es . It is intend to maximize the duration of executing τ es instead. As the kind of systems considered in this work is composed of a not large task set, we believe that the strategy implemented in Algorithm 1 suffices. The algorithm performs a binary search in the interval [a = BET, b = (1 − U τ ) × HP(τ )] (lines 5-12) to seek for a valid value of C es , where BET represents the break-even-time for the system (recall Sect. 2.4). The reasoning for this initial interval is the following. First, switching to a low-power state for less than BET is not worth it. Also, as the maximum value allowed for T es is HP(τ ), any value of C es greater than (1 − U τ )T es would make the system utilize more than 100% of the processor.
When it is not possible to find a suitable value for C es during the search in lines 5-12, the algorithm stops in line 13. Note that the last feasible value of C es found during the search is saved in line 11. If a value of C es is obtained, there is an attempt for decreasing the initial value of T es (lines [14][15][16][17][18][19][20][21][22] by searching for the least divisor of HP(τ ) that can be used as T es . Only values that do not make the system over-utilized (line 14) are considered.
For illustration, consider two tasks τ 1 : (1, 5) and τ 2 : (3, 10) and let BET be 2 time units. The initial search interval is set to a = 2 (BET) and b = 5 (0.5 × 10). The algorithm then starts the search in this interval, initially for C = 3. As the set of tasks τ ∪ {(3, 10, 3)} is considered schedulable by test Q(.) (whereas not by test P(.)), the value C = 4 is tried. In this case, Q(.) again identifies the system as schedulable. This is the maximum allowed value for C. Hence C es is set to 4. Further, the value of C es is not reduced during lines 14-22 as both tests return not schedulable for τ ∪ {(4, 5, 4)}. As a result, task τ es is configured by Algorithm 1 with C es = 4 and T es = 10. Figure 7 shows the schedules produced by EDF with (on the top) and without task τ es . Note that when τ es is not considered, during [0, 15), there are three idle periods, but only one is large enough to be used for energy savings, namely time interval [6,10]. The effect of introducing task τ es is that it is used to create larger continuous idle time intervals during which energy savings can be implemented. In this case, two slices of size 4 can be used for that purpose.
It is worth stressing that Algorithm 1 implements a simple strategy to seek for a task τ es as it keeps the same value of C es during the computation of T es . For more complex and larger system, it may be convenient to jointly search of the task parameters (C es , T es ) aiming at All tasks are assumed to be initially released at time 0. The execution of the energy saving task is represented by gray boxes, which indicate when the system is put in a low-power state maximizing the system utilization. However, as the systems considered in this work are in the domain of small embedded systems, more elaborated search strategies were not taken into consideration. Instead, we turned our attention to dynamically finding slacks in the schedule, which demands some online computation, as will be seen next.

Dynamic Slack EDF (DS-EDF)
In this section we adapt an approach originally developed for energy savings for multiprocessor real-time systems [11] scheduled by the RUN algorithm [31]. RUN first (a) transforms a multiprocessor scheduling problem into one or more uniprocessor scheduling problems in an offline manner, each of which scheduled via EDF; and then (b) maps back the EDF scheduling decisions into the multiprocessor system. The energy saving approach previously described is based on dynamically computing slacks in the EDF schedule for implementing DPM. Our adaptation uses the same principles for computing slacks for the EDF schedule. However, it simplifies the overall approach. First, phase (b) is not present and thus is not considered. Second, as we are dealing with uniprocessor embedded systems with limited computing resources, we also reduce the types of events upon which slack computation is triggered. For making the description self-contained, the main concepts about slack computation is described next. DS-EDF dynamically computes slack intervals [t, t + Δ t ) at a given instant t so that the processor can be put in a low-power state for Δ t time units without compromising system schedulability. This computation takes place at instants t upon which a job completes its execution and there is no other ready job to schedule. As an idle interval begins at t, the reasoning is extending this interval as much as possible (by Δ t ) taking future jobs into consideration.
Let [t, t + Δ j,t ) be the idle time interval that can be inserted into an EDF schedule without making a job of task τ j miss its deadline at instant d j,t . During [t, d j,t ) the system has to execute jobs with deadlines less than or equal to d j,t if any. Denoting all demand by these jobs (including that of τ j ) as c j,t , Δ j,t can be given by As no jobs of any task in τ may miss their deadlines and the task system is necessarily schedulable by EDF (recall Sect. 2), Δ j,t ≥ 0 and so, Δ t can be computed as In general, the demand c j,t in (6) must be computed taking into consideration (a) the jobs released at or before t but not yet finished by t; and (b) those jobs that will be released after t with deadlines not greater than d j,t . However, as DS-EDF is called only upon the start of an idle period (i.e., job completion time with no other ready job to execute) the demand due to (a) is null at t. Let r i, j,t be the latest release time of a job from τ i within [t, d j,t ], if any, and let d i,t be the latest deadline of a job by τ i that finished its execution by t. The demand due to (b) can then be expressed as Note that d i,t ≥ t since the processor is idle at t and τ i always releases a job at any of its deadline since it is a implicit-deadline periodic task. If sporadic tasks were to be considered, Both tasks are assumed to be initially released at time 0. The second job of τ 1 , which would be executed within time interval [5,6), has its execution postponed to interval [9, 10) as Δ 4 = 5. Thus, interval [4,9) can be safely used for energy savings the inner term in (8) should be modified to account for the maximum possible demand due to jobs of τ i within interval [t, d j,t ), which cannot be greater than (d j,t − t)/T i C i . Based on the knowledge of BET and Δ t , computed according to (7), the scheduler may switch to low-power state at t with duration Δ t whenever Δ t > BET. For illustration, consider again the 2-task example, with τ 1 : (1, 5) and τ 2 : (3, 10) for which Fig. 8 plots the schedule. At time 4, τ 2 finishes and the system becomes idle. The corresponding value of Δ t = 5. The execution of the second job of τ 1 can thus be postponed until time 9.
In theory, DS-EDF tends to be more effective than SS-EDF since it takes advantage of dynamically slacks available in the system, using slack periods (Δ t ) of different sizes accordingly. However, as can be noted, computing Δ t introduces extra overhead at the scheduler level. The complexity for computing Δ t is O(n 2 ) for n tasks. Experimental evaluation can better characterize these approaches.

Experimental evaluation
Our evaluation in this section takes the execution of synthetic task sets (easy and hard), similarly to what was presented in Sect. 3.5. Due to its lower overhead (recall Sect. 3.5), EDF-L was chosen as the scheduler implementation. For energy saving characterization, we took the system without implementing DPM (under EDF-L) and compared it to three other approaches, SS-EDF, DS-EDF and EDF-LP. This latter approach is obtained by taking the EDF-L schedule and considering its available slacks as they were to be used for DPM purposes (i.e., when the schedule naturally generates slack greater than BET). This allows us to comparatively verify the effects of procrastinating task execution via introducing SS-EDF or DS-EDF. The energy saving obtained was measured in percentage in comparison with EDF-L,
For measuring energy consumption, the same setup described in Sect. 2.4 for obtaining the values in Table 1 was used. After 10 seconds of execution, measurement stops. Each measurement is carried out 10 times. The results now presented correspond to the average of what was measured.

Energy consumption vs. system utilization
As can be observed in Fig. 9, EDF-LP presents the worst results when compared to SS-EDF and DS-EDF, for both easy and hard task sets. Note that according to (9), the lower values the better. For hard task sets this difference becomes more evident. Higher utilization systems do not usually contain enough slack, which explains the fact that all approaches behave similarly to each other from utilization higher than 75%. Note that DS-EDF shows slightly better performance than SS-EDF. Figure 10 shows the same results but exhibits the dispersion present in the collected data. Note that the same system configuration may exhibit considerable variation in energy consumption. As two task sets with the same utilization contain tasks with different parameters and slack sizes depend on task deadlines, this variation is somewhat expected.

Energy consumption vs. task set cardinality
The plots in Fig. 11 show energy consumption (on average) in function of the number of system tasks. System utilization is fixed in 50%. Note that all approaches tend to consume more energy for larger systems. This is due to the fact that more scheduling events are generated (for SS-EDF and DS-EDF). Even EDF-LP exhibits this behavior as at least the comparison with BET is needed to be performed online. However, it can be seen that DS-EDF suffers most when the number of tasks increases. This is explained by its quadratic online cost (in the number of tasks). For hard task sets with more than 50 tasks, for instance, DS-EDF behaves even worse than EDF-L (i.e., when no DPM mechanism is in place). This was not observed for the easy task sets. As the number of scheduling events is lower for easy task sets, the frequency that the DPM mechanism is called decreases, which explains the observed behavior.

Conclusion
We experimentally compared EDF against RM aiming at characterizing these policies for small embedded real-time systems based on FreeRTOS. As this application domain usually employs hardware with low-computing resources, the experimental characterization we provided serves to support system designers. Two implementations of EDF queue were considered, based on the classical min-heap data structure (EDF-H) or on multiple linked lists (EDF-L). EDF-L was shown to consistently perform better than EDF-H in all experimental setups. Further, although RM was shown to induce the lowest scheduling overheads, this was shown not to compensate for its poorer schedulability performance when compared to EDF-H. Although the debate on the use of dynamic priority versus fixed priority scheduling is not new, it is usually based on theoretical or simulated results. To the best of our knowledge, this is the first work to consider an actual implementation of EDF-L in FreeRTOS. As EDF makes a better utilization of the processor, spare processing capacity can be used for energy saving purposes. In this context, we implemented two DPM mechanisms, based on determining available slacks statically (SS-EDF) and dynamically (DS-EDF). These approaches were experimentally evaluated. Obtained results show that energy savings can be effectively carried out in low-computing resources embedded systems under EDF scheduling rules. DS-EDF was shown to be more effective than SS-EDF for systems with up to 20 tasks but its higher runtime overhead affects its performance for larger and more complex task sets. In these cases, SS-EDF was shown to be a better choice.
Scheduling based on dynamic priority assignment such as EDF in small scale real-time embedded systems is not a usual option. The results we presented contribute to diminish possible restrictions in adopting EDF as the underline scheduler and show its possible benefits for energy savings. Future evaluation considering other platforms and sporadically released tasks can be considered in future work. manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.