Evaluating the Performance of Different Embedded Algorithms in Python
May 16, 2022
Mohammed Billoo, founder of MAB Labs, looks at how Percepio Tracealyzer can be used to evaluate the performance of an algorithm written in different ways into Python.
Python is becoming more common in embedded application development, particularly for machine learning frameworks that run at the edge of the network. However, this high-level general-purpose programming language abstracts away many details in the code that can impact the performance of an implementation in ways that developers may not be aware of.
Let’s take an obvious example: computing the Fibonacci sequence. There are at least two different ways to perform this, a recursive algorithm and a standard iterative algorithm, with greatly varying levels of performance.
The performance of different implementations or algorithms can be assessed with a tool called Tracealyzer. This is a visual trace diagnostics tool from Percepio that gives embedded software developers insight into the code at runtime for easier debugging of system-level issues and helps them improve the design and performance of their software.
Tracealyzer can be used side-by-side with traditional debuggers such as the open-source Eclipse tools and complements the detailed debugger view with several additional views on system level. This helps understand real-time issues where a classic debugger is not sufficient.
Combined with the LTTng open-source tracing package in a Linux operating system distribution, Tracealyzer can show the different levels of performance. This is independent of the processor, and a result of the chosen algorithm.
For the evaluation, each implementation of the Fibonacci sequence is performed in a single module:
if n <=1 n:
return(recur_fibo(n-1) + recur_fibo(n-2))
result = 
a,b = 0,1
while a < n:
a,b = b, a+b
There are separate Python source files that call the two functions above:
logger = logging.getLogger(‘my-logger’)
if __name__ == ‘__main__’:
The following commands capture a trace in LTTng that can then be examined in Tracealyzer:
$> lttng create
$> lttng enable-event --kernel sched_switch
$> lttng enable-event --python my-logger
$> lttng start
$> lttng stop
$> lttng destroy
Replacing the standard Python logger with one called “my-logger” allows Tracealyzer to show events in the tool’s Trace View. As Tracealyzer is not capturing any application data in this particular example, the software does not need to be configured to read data values. Instead, all that is needed is a Custom Interval that marks the entry and exit of both functions.
While a substantial performance difference can be seen in the Trace View above, Tracealyzer can also provide more tangible performance metrics. This can be done by going to Views and clicking on Intervals and State Machines and create a custom interval using the “Start” and “Stop” strings that were inserted with the logger.info() calls in the code and which mark the entry and exit of the candidate functions.
The Interval Plot shows that there is a 20x difference between the recursive algorithm (which was executed first) and the iterative algorithm (which was executed second).
In this example we only compute 10 Fibonacci numbers with each algorithm. Without Tracealyzer, many more iterations would probably be needed to gain some meaningful insight, but this is problematic for two reasons. First, when running the recursive Fibonacci algorithm up to 1000 (or even 100), Python will simply sit there. This would be concerning as it would not be clear whether this non-response is due to a bug in the implementation or something else. In this case, we can probably guess why this happens, but for more complex problems, substantial logging would be required to understand where the bottleneck is.
Second, if there are multiple applications running on an embedded system, these other applications could be disrupting the target application which would also increase the time for the algorithm or function to complete execution. Without a trace, there is no easy way to find out if this is the case.
Instead, the combination of LTTng in Python and Tracealyzer highlights that it is the fundamental characteristic of the chosen algorithm that is the issue. This is invaluable when developing more complex algorithms. This example implementation serves as a reference on how the performance of future algorithm implementations can be evaluated. Implementing the core functions in separate Python modules is good programming practice in general, and this also simplifies the tracing of specific functions.
As the trace overhead is almost negligible, tracepoints can be kept in an application as it is tested on the target embedded system and even in production, allowing the Tracealyzer tool to generate performance metrics in the production codebase as well. This is tremendously useful for regular system testing and allows the same codebase to be used to ensure that the application is both functionally correct and performant, with only minimal changes.
Using Tracealyzer and LTTng to capture performance metrics in a Python application provides invaluable analysis of the implementation of an algorithm.
The minimal overhead of this approach means the instrumentation of the code can be retained for use on a target embedded system. This enables more monitoring of the target application and boosts the analysis of interactions with other applications and the operating system as well. For example, there may be another process or thread that pre-empts the target application and impacts performance. The combination of Tracealyzer and LTTng can identify the causes of such anomalies, and this allows developers to refine the implementation to guard against further problems.
While the example implementation of the Fibonacci sequence is relatively innocuous, it highlights a key characteristic of the Python language that can inform the development of more complex implementations.
This example also shows the value of using separate modules in the design. Using trace, developers can measure and validate the performance of the key core functions in these modules without significant overhead, before expanding into a full system implementation. This can help to demonstrate that an application is functionally correct and performant with minimal changes in the target environment.
For details on how to install the necessary software package, visit LTTng’s website here.