eheap Part 1: Configuration

By Ralph Moore

President

Micro Digital

March 04, 2016

eheap Part 1: Configuration

Many embedded systems use no heap. Many of those that do use a heap use a serial heap that must be searched sequentially, and thus can be very slow. M...

Many embedded systems use no heap. Many of those that do use a heap use a serial heap that must be searched sequentially, and thus can be very slow. Most other embedded systems probably use the widely available dlmalloc, which provides fast operation at the expense of a fixed structure that cannot be altered.
A new embedded heap called eheap provides a middle ground between these extremes. It provides high performance, adaptability, and safety for embedded systems running on RTOSs or stand-alone. eheap is designed to be easily customizable to specific embedded systems. It conforms to the ANSI C Standard for malloc(), free(), realloc(), and calloc(), and offers additional functions for heap control and safety.
This is the first in a series of three posts on eheap:
• eheap Part 1: Configuration
• eheap Part 2: Enhanced debugging
• eheap Part 3: Self-healing
eheap is a bin-type heap, like dlmalloc. It is assumed that the reader is familiar with this type of heap structure or is not interested in the details of how these heaps operate. Hence, this and subsequent posts will discuss what is unique about eheap and not how it works. For those desiring more detailed information, see eheap vs. dlmalloc.
Embedded system characteristics
eheap is designed for the following:
• Wide range of RAM sizes from very small to large.
• Expected to run forever.
• High performance and deterministic operation are required.
• High-priority tasks must be able to preempt and run quickly.
• Small code size is often necessary.
• Each embedded system has a relatively narrow range of heap requirements.
• Embedded systems have significant idle time.
• There is a growing need for self-healing in embedded systems.
Bin configuration
Like other bin-type heaps, eheap has a small bin array (SBA) for small chunks and an upper bin array (UBA) for large chunks. The SBA is like an array of block pools, but not as fast. Unlike dlmalloc, for which the bin sizes in the UBA are successive powers-of-2 bytes (256, 512, and so on), eheap permits bin sizes to be chosen to fit the application. eheap bin sizes are determined by a constant bin size array, specified by the programmer. For small bins, the bin size is the chunk size of the bin.[1] For large bins, the bin size is the smallest chunk size in the bin. The maximum chunk size of a bin is the size of the next bin – 8.[2]
The bin size array can be located in ROM, for safety, or in local RAM, for higher performance. It is used by the heap initialization function to create the bins and all related variables needed by eheap. The heap can be quickly reconfigured by changing values in the bin size array, recompiling, and restarting the application. Such rapid reconfiguration supports a heuristic approach to tuning eheap for best performance for a specific system.
Small system example
A small system might have a bin size array as shown in Figure 1.

[Figure 1]
For this system, the SBA consists of bins 0, 1, and 2 holding chunk sizes 24, 32, and 40, respectively; bin3 contains 10 sizes from 48 to 120; bin4 is a small bin containing only the size 128; bin 5 contains 14 sizes from 136 to 248; and bin6, the top bin, contains all sizes from 264 on up. The -1 marks the end of the size array. In this particular embedded system there is a limited need for small chunks, some need for medium chunks, and little need for large chunks above 264 bytes. Note that a small bin has been added for 128-byte chunks, which are frequently used in this system.
Upper bin searches
eheap and dlmalloc are similar with regard to their SBAs. Both are optimized for very fast accesses, as typically needed by object-oriented programs. A major difference is in the UBAs, where dlmalloc builds trees during free() operations to guide best-fit allocations during malloc() operations. Tree node links are cleverly stored in the chunks themselves.
For eheap, each bin heads a free list of chunks that fit in the bin. During a free() operation, if the chunk is larger than the bin’s first chunk, it is put at the end of the bin free list; otherwise, it is put at the start of the bin free list. During idle time, large bin free lists are sorted by increasing chunk size. During a malloc() operation the large bin free list is searched for the first large enough chunk, which will also be the best fit.
eheap best fits systems that have favored chunk sizes and do not use a wide range of chunk sizes. As shown in the previous example, a small bin can be put into the UBA to provide a frequently used size. Another way to accomplish this result is to make a large bin size equal to a frequently used chunk size. Then the best fit will always be the first chunk in the bin, and larger sizes will follow after it.
To some degree, large bins operate like mini serial heaps that have been sorted. Excessive search times can be bounded by judicious choice of bin sizes vs. system needs and judicious use of merging, as described later. A tenet of heap design is that larger chunks take longer to process, so longer malloc() times do not degrade average performance. Of course, a malloc() cannot take so long that higher-priority tasks miss their deadlines.
Figure 2 shows a one bin heap example.

[Figure 2]
In this case, there is no SBA and no UBA, only the top bin. This heap approaches the serial type heap, but still offers advantages over it.
Figure 3 shows a no SBA heap example.

[Figure 3]
Embedded systems written in C may have little use for small chunks, or alternatively, block pools may be preferable for small chunks in specific systems. In this case bin 0 starts with 24 because all sizes from 24 bytes and up must be covered, but bin 0 might be used only for chunks 64 bytes and larger.[3]
Figure 4 shows a no UBA heap example.

[Figure 4]
In this heap, the SBA covers chunk sizes 24 to 64, and all other chunks go into bin 6, which covers 72 bytes and up.
Nearly limitless heap configurations are possible. Currently, eheap is limited to 32 bins for best performance. This limit can be increased if more bins prove to be beneficial for some em-bedded systems.
Donor chunk
eheap has an optional donor chunk (dc), which is similar to dlmalloc’s designated victim chunk (dvc), but operates differently and serves different purposes:
• Initial source of fast small chunk allocations.
• Segregation of small chunks into the lower heap and large chunks into the upper heap.
• Localization of small chunks for cache-based systems.
If enabled, space for dc is allocated below the top chunk (tc) by the programmer. Normally, it is a small fraction of total initial free space, which is the combination of dc and tc. If the selected SBA bin for allocation of a small chunk is empty, the chunk is calved from dc, rather than taking it from the next larger occupied bin. This is not only faster, but also avoids depopulating larger bins. Since dc is initially the lower part of the heap, small chunks will come from the lower heap. This improves localization for small chunk allocations and also tends to segregate the heap by chunk size to help reduce in use small chunks from blocking merging of larger chunks.
After the heap has been in use for a while, small chunk allocations will primarily come from SBA bins and seldom from dc. At this point using dc can be turned off and what is left of it freed to a bin. Alternatively, dc can continue to be used. As freed chunks below and adjacent to it are merged into it, it will grow and as chunks are allocated from it, it will shrink. This will improve performance for empty SBA bins.
chunk merge and leaky bins
An important difference between dlmalloc and eheap is that dlmalloc always merges adjacent free chunks, whereas eheap permits deferred merging. eheap merging is controlled by its cmerge mode, which an application can turn ON of OFF. It is OFF following heap initialization.
The reason for deferred merging is that chunk merging produces leaky bins. For example, suppose a 24-byte chunk is freed to bin 0 and a physically adjacent 48-byte free chunk resides in bin 3. If merging is enabled, these chunks will be merged into a 72-byte chunk, which will be put into bin 6. This is not conducive to best performance if the application needs 24- and 48-byte chunks and not 72-byte chunks. Additionally, de-queuing the 48-byte chunk and merging it requires additional time during the free() operation. This is wasted time if a subsequent malloc() gets the 72-byte chunk, splits it into 24- and 48-byte chunks to get either, then re-queues the remnant.
Unlike general processing, which may use a wide variety of chunk sizes, embedded systems tend to use a limited repertoire of chunk sizes. Hence, leaky bins are not optimum for embedded systems.
cmerge may be directly controlled or automatically controlled. If the automatic mode, amerge, is ON, cmerge will be turned ON when free space drops below a lower limit and it will be turned OFF when free space rises above an upper limit. The limits are determined by configuration constants. Alternatively, the number of free chunks, the average free chunk size, or the total space in bins can be used, depending upon what works best in a specific system.
Summary
eheap is a new embedded heap that can be customized to a specific embedded system by adjusting the bin size array, deciding whether to use dc and how large it should be, and deciding whether or not to use deferred merging and, if used, how to control it. These options allow tuning eheap to a specific system in order to achieve optimum system performance without fragmentation causing allocation failures. eheap is freely available for non-commercial use. See www.smxrtos.com/eheap.
References
[1] Heaps work with chunks, whereas applications work with data blocks, which are contained in chunks. Chunks are larger because they also contain metadata to manage the heap.
[2] All chunks have sizes that are multiples of 8 bytes and are located on 8-byte boundaries. This is fairly standard among heaps.
[3] Splitting of larger chunks could result in chunks smaller than 64 bytes being put into bin 0. However, eheap allows specifying a MIN_FRAG constant to prevent this.

Ralph Moore, President and Founder of Micro Digital, graduated with a degree in Physics from Caltech. He spent his early career in computer research, then moved into mainframe design and consulting.
Micro Digital
www.smxrtos.com

[email protected]

Ralph Moore, Micro Digital

I am no longer running the daily business at Micro Digital. Instead, I have been involved for the past four years in improving the smx RTOS kernel. smx is a hard-real-time multitasking kernel, which is intended for embedded systems that require high efficiency and high performance.

More from Ralph