Implementing AI in Fuzz Testing: A Q & A with Paul Butcher, Senior Software Engineer at AdaCore Ltd
October 28, 2022
Story
There are many challenges that come with AI testing in safety-critical applications, as well as a variety of processes that can be used to find and fix security vulnerabilities using AI. We interviewed Paul Butcher, Senior Software Engineer at AdaCore Ltd, to discuss the most common security vulnerabilities impacting today's developers when integrating AI and ML in testing processes like fuzz testing, and what solutions AdaCore provides to help combat these concerns.
Embedded Computing Design:
So what is the process like for finding and fixing security vulnerabilities using AI?
Paul Butcher, AdaCore:
There's lots of different ways of doing it, two that immediately spring to mind and it's areas that we've been researching at AdaCore. And I may be setting myself up here, and you may find that some of your listeners might argue that they're not strictly machine learning or AI techniques. However, in my opinion, machine learning is all about if you can get an algorithm to use an existing data set or even experiences that it's learned over time, which could be executions of an application, and it can optimize the performance of itself, or it's the application that it's controlling based on that data set, or the experiences, or the state that's been changed by certain executions, then it's a machine learning algorithm arguably, in my opinion. Two key things that I can think of that fall into this category, and one is symbolic execution, and the second one is input-to-state correspondence.
Symbolic execution is quite a complicated topic, but instead of executing an application in the normal sense, you execute the application symbolically, where you think of the data that's being manipulated to state through as the program is executing through the control flow, you think of it as symbols. And the idea behind using symbolic execution as a solution for increasing your coverage within a fuzz testing campaign is that you can execute test cases that the fuzzer is determined is interesting, and then you can symbolically execute those test cases. And wherever there's a path constraint, you can solve it using the satisfiability modulo theories (SMT) solvers, one, to determine whether you can actually reach the diverging branch, and two, if you can, then you can work your way back through the execution and determine what the input would've needed to have been to take the diverging branch. So symbolic execution is a nice way of generating additional inputs to increase the coverage of a fuzzing campaign.
The other technique, input state correspondence, is quite an interesting approach where you theorize that there is a strong correlation between constant values within comparison instructions within your code base and the input data into those algorithms. If you had a statement in your source code that said something like, If X equals 10, then we can theorize that if we fed the value 10 into that algorithm, there's a reasonably high likelihood that we would solve that branch. It's not guaranteed, there could be other calculations performed on X before that comparison instruction, but that doesn't really matter. Because we're executing at a very high rate, we can have false positives with this approach and throw them away.
So, in fact, there's multiple ways of achieving it. One common approach, the approach that was taken in the REDQUEEN paper, and common fuzzing engine that I've spoke about in the past, which is AFL, implements it using a technique called COMPlog. And what they do is they add additional instructions into the generated assembly code around these comparison statements, and they will identify that there's a comparison going on and there's a magic number being compared against some state, and then it will feed the state value and the magic number back into the fuzzer. So, in the example of if X equals 10, we would give the fuzzer the value that's being held in the variable X, and we would give it the number 10. And then all the fuzzer needs to do is look through the test case, find the value that is held in the variable X and swap it for the value 10.
Now, there could be multiple places where there's a value that matches, but that doesn't really matter. We can just keep trying and swapping those values, and executing those tests until we get to a point where we've taken that diverging branch. So input a state correspondence is a really nice way to solve the problem of how do we mutate seeds intelligently. And it's not a particularly difficult problem to solve, but there are complexities around it. If you take the approach that you instrument through an additional compiler pass, then you have to accept that compilers are complicated beasts, and something as simple as if X equals 10 could be actually implemented in lots of different instruction calls. It may not just be a basic comparison instruction, but again, this approach isn't really affected by false positives, because we can keep trying lots of techniques until we get to a point where we can get through that divergence.
Embedded Computing Design:
What is fuzz testing and how can it be blended with AI?
Paul Butcher, AdaCore:
Fuzz testing in its simplest form is an automated software testing technique. It is all about the generation of test case input data, and it differs to unit testing in that we're not just firing an input into an algorithm and then checking that the output matches an expected output. What we're actually doing is we are firing the tool generated input, the auto generated input into the algorithm, and then we're going to observe the behavior of that algorithm. And more specifically, we're going to see if it falls over.
Falling over could mean that the run time of that application has just determined that a constraint check has failed, and it could raise a runtime exception, or it could be something more severe like the application's just written into an area of memory that it shouldn't, and It's got a segmentation fault and completely fallen over. The way that machine learning and AI and the sort of techniques that I've just talked about can help fuzz testing is all based around the fact that fuzz testing typically does this approach of mutation-based test case generation, which is based on randomly generating those values. So what we want to try and do is add some intelligence around that. So things like symbolic execution and input-to-state correspondence can achieve that.
Embedded Computing Design:
How are today's developers integrating AI with fuzz testing?
Paul Butcher, AdaCore:
Where I've talked about symbolic execution and input-to-state correspondence or REDQUEEN as it's more commonly known, one interesting point to make about these two techniques is that individually, they add a lot of value based on the ways that I've described. But as I've also stated, they've both got their own limitations. When you combine those techniques together and you integrate them into a fuzz testing solution, which is what we are aiming to do at AdaCore with our fuzz solution, you bring a really powerful solution to the table. And the reason being is that the limitations of these capabilities are actually, in many ways, they're canceled out by the benefits of the other.
So an example would be where symbolic execution can suffer from things like path explosion, it wouldn't make sense to use symbolic execution to try and solve a type branch with a case as simple as the example we used, if X equals 10. That's just going to be a waste of processing power. So we would just use, in our case, the AFL COMPlog approach to solve that particular branch. And where we're only looking at the test cases that increase coverage, the ones that we add onto the queue, we don't actually need to feed all of these into the symbolic executor. We can do it at a much slower rate and accept that symbolic execution takes longer to achieve.
There's an example of where input-to-state correspondence has a limitation where symbolic execution doesn't; it would be something like encodings. So if we go back to the original, the early example of a test case file that is in a JSON format, if we've added an instrumentation around our assembly code, we've executed our binary with our input-to-state correspondence instrumentation, and it's fed that value back into the fuzzer. But actually, the representation of that value within the test case is some encoding, which it could be an ASCII based in coding, it's not going to help to feed that value back in.
And there are examples of input-to-state correspondence based fuzzing solutions where they can make some predictions on what that encoding is and then encode the value. But if it's some unusual obscure encoding, it's not going to be able to achieve that. Symbolic execution doesn't suffer from that at all. It doesn't care about the encoding, it just uses the theorem solvers to work back through the control flow to work out what the input is. What we are finding is, if you use these two tools on their own, they will increase performance. You will get to a point where your fuzzing campaign will reach saturation in that it's explored your code base and found as many vulnerabilities as it can in as quick a time as possible. You'll get to that state quicker. If you bring both of those solutions together and integrate them into a fuzzing tool, then it's like putting the whole thing on steroids. It goes much, much quicker and increases the performance tenfold.
Embedded Computing Design:
So Paul, do you have any recommendations for our listeners to go and find more information about benchmarks, application examples, or anything else?
Paul Butcher, AdaCore:
Yeah, absolutely. If you're interested in symbolic execution, then I highly recommend the two tools I mentioned earlier, KLEE and SYMCC. They're actively being developed and researched, but they take quite a different approach to implementing symbolic execution solutions. There's pros and cons for both of those tools, and I highly recommend you go in and check them out.
If you are interested in the input state to correspondence theory, then REDQUEEN would be your first stopping point. You can just Google REDQUEEN fuzzing, and there's a really nice paper that's been put together by the team that came up with the theory. If you're using AFL as your fuzzing solution, then there's already a input state to correspondence solution that's been fed into AFL and there's an LLVM based solution, and more recently, AdaCore has upstreamed a GCC based solution for that capability.
And I suppose finally, if you would like to learn more about what we are doing at AdaCore, please go and have a look at our GNAT DAS webpage. It's our Dynamic Analysis Suite and from there, you'll find all the information that you need to know about our GNAT fuzz solution. And there's some technical papers that we put together. You can even look at the User Guide and find out much more information about it there.
Additonal Resources:
1. https://blog.adacore.com/fuzzing-out-bugs-in-safety-critical-embedded-software
2. Embedded Computing Design's Embedded Toolbox with Paul Butcher.