Fuzzing is an automated methodology for testing software. In this blog post, I’ll present some fundamental concepts of fuzzing, including the advantages and disadvantages of various fuzzers and strategies for addressing particular issues.

Let’s examine some scholarly definitions to clarify what fuzzing entails:

  • Fuzzing is an extremely automated testing approach that addresses numerous boundary cases by employing invalid data (sourced from files, network protocols, API calls, etc.) as application input to effectively ensure the non-existence of exploitable vulnerabilities. It originates from modern applications’ tendency to malfunction due to random input generated by line noise on “fuzzy” telephone lines - as defined by Oehlert.

  • Fuzzing aims exclusively at crashing the system. It involves instigating a range of inputs with the intention of detecting any reliability or robustness deficiencies in the software - as stated in Fuzzing for Software Security Testing and Quality Assurance.

In essence, fuzzing is a form of negative testing. Its main objective is to inundate the program’s input with a stream of data (which could be intelligently random or not) in order to identify failures or potentially exploitable issues within the program’s code. The crux of the matter lies in disseminating this data through different communication interfaces in the program.

Fuzzing should not be viewed as a substitute for code auditing, reverse engineering, or other software quality assurance practices. It’s crucial to mention that no single technique provides a fail-safe solution for bug detection. However, studies [1] suggest that fuzzing leads to failure in 80% of tested code, paving the way for subsequent rectification.

Brief History

Fuzzing was first born out of the more affordable, and curious, world of randomness.

  • 1983: The monkey
  • 1988: The internet Worm
  • 1989 - 1991:
    • Boris Beizer explains Syntax Testing (similar to robustness testing).
    • “Fuzz: An Empirical Study of Reliability . . .” by Miller et al. (Univ. of Wisconsin)
  • 1995-1996:
    • Fuzz revisited by Miller et al. (Univ. of Wisconsin).
    • Fault Injection of Solaris by OUSPG (Oulu University, Finland).
  • 1998: ISIC fuzzer for IPv4
  • 1999-2001:
    • PROTOS tests for: SNMP, HTTP, SIP, H.323, LDAP, WAP, . . .
    • Peach fuzzer from Michael Eddington, the most popular fuzzing framework still in use
    • Spike from Dave Aitel
  • 2002:
    • Codenomicon launch with GTP, SIP, and TLS robustness testers.
    • Click-to-Secure (now Cenzic) Hailstorm web application tester.
    • IWL and SimpleSoft SNMP fuzzers (and various other protocol specific tools).
    • SSHredder from Rapid7
  • 2003:
    • Open source fuzzers: dfuz, Flayer, Scapy
  • 2005–2006:
    • Open source fuzzers: antiparser, autodafe, AxMan, GPF, JBroFuzz, WSFuzzer
    • Commercial fuzzers: beStorm from Beyond Security, Flinder from SEARCHLAB, Mu-4000 from MuSecurity (now Spirent)
    • Exploratory fuzzing, EFS from Jared DeMott, using feedback loop from code execution to craft new test sequences
  • 2007:
    • Open source fuzzers: ProxyFuzz
    • Commercial: FuzzGuru from Microsoft, Achilles from Wurldtech (now GE), BPS-1000 from BreakingPoint, (now Ixia)
    • SAGE from Microsoft Research and CSE, using constraint solvers and coverage data to generate new tests
    • KIF fuzzer explores state diagrams, by Humberto Abdelnur, Olivier Festor, and Radu State
    • In-Memory Fuzz POC by Adam Greene, Michael Sutton and Pedram Amini, applying mutations inside the process
  • 2008:
    • Sulley from Aaron Portnoy and Pedram Amini
    • Defensics 3.0 from Codenomicon
  • 2009:
    • Traffic Capture Fuzzer from Codenomicon uses protocol dissectors to model protocols
  • 2010:
    • Radamsa from OUSPG using genetic algorithms to dissect protocols and build protocol models
  • 2014:
    • AFL by Michal Zalewski, using compile-time instrumentation and genetic algorithms to discover new paths in code
  • 2015:
    • LLVM libFuzzer, in-process coverage guided fuzzer using Sanitizer Coverage instrumentation
  • 2016:
    • Microsoft announces SAGE based fuzzing service, Project Springfield
    • Google announces open source software fuzzing project, OSS-fuzz

Classification

This is somehow difficult because no one group perfectly agrees on the definitions related to fuzzing. Here are the main three categories for fuzzings.

  • A black box fuzzer tends to focus on final requirements, system stability, and exposed interfaces. The interfaces to a system can consist of, for example: User interfaces: GUI, command line; Network protocols; Data structures such as files; System APIs such as system calls and device drivers (how to detect the input interface? Lets see this later). These kind of fuzzers executes the program as a functional abstraction. As a good thing, they compile the real code for a real environment, but the search gets difficult to explore due to infinite space measurement.

  • With a white box fuzzer, the full access to the source code is granted. Usually, this fuzzers are coverage based. Then the program is instrumented specific coverage instructions, having a third-party player listening for feedbacks, for example AFL (take a look to Kelinci). This is the only way to get 100% of coverage, even when some black-box techniques gets a very good accuracy

  • A gray box fuzzer combines some things of the previous classifications. An approach could be to implement a testing interface in the final application binary to be used by the fuzzer with some builtin feedbacks in the code. It uses the internals of the software to assist in the design of the tests of the external interfaces. In some cases, grammars are used to generate the well-formed inputs

Fuzzer input methods

The heart and soul of a fuzzer is its ability to create good inputs.

  • Generation
    • Generation-based fuzzers do not require any valid test cases. Instead, this type of fuzzer already understands the underlying protocol or input format. They can generate inputs based purely on this knowledge. A drawback of these solutions is if you are interested in testing an obscure or proprietary format, the fuzzer may not be preprogrammed to understand it.
  • Mutation
    • This method consists of first gathering valid inputs to the system and then adding anomalies to these inputs. These valid inputs may consist of a network packet capture or valid files or command line arguments, to name a few. There are a variety of ways to add anomalies to these inputs. They may be added randomly, ignoring any structure available in the inputs. These types of fuzzers work on the general principle: start from valid inputs and add a number of anomalies to the inputs to generate fuzzed inputs.
  • Pure random stream generators
    • This is a low speed option, for instance, the then branch of the conditional statement “if (x==10) then” has only one in \(2ˆ32\) chances of being exercised if x is a randomly chosen 32-bit input value, but, in the security context, these limitations mean that potentially serious security bugs, such as buffer overflows, may be missed because the code that contains the bug is not even exercised. In practice, time constraints limit the effectiveness of this approach. The result is, that due to its simplicity, this approach is unlikely to yield any significant results.

How to create a fuzzer?

We can follow the pseudo-code below to construct a fuzzer

1
2
3
4
5
6
7
8
9
10
11
12
13
Input: Seed Corpus S
  repeat
    s = ChooseNext(S) // Search strategy
    p = AssignEnergy(s) // Power schedule (How much to use the choosen seed?)

    for i = 1 to p
      s1 = MUTATE_INPUT(s)
      if s1 crashes then
        add s1 to Sx
      else if isInteresting(s1)
        add s1 to S
  until timeout reached or abort-signal
Output: Crashing Inputs Sx

The algorithm has four important pieces:

  • How to choose a good seed ?
  • How many times should be mutated the seed ?
  • How mutate a seed candidate ?
  • Which of the new generated/mutated is a good candidate for new seed ?

It is the same structure of a basic/literature genetic algorithm. [12] and [13] shows very encouraging results making changes in those cornerstones listed above.

Some available fuzzers

Modern fuzzers do not just focus solely on test generation. Fuzzers contain different functionalities and features that will help in both test automation and in failure identification.

* Some of the tools listed below aren’t available for free.

  • [GPF] GPF is an open source fuzzer. It is mutation-based network fuzzer that can fuzz server or client applications. GPF parses the packets and attempts to add the anomalies in as intelligent way as possible. It does this through the use if tokAids (implemented by writing C parsing code and compiling it direcly into GPF). GPF does not have any monitoring of analysis features.

  • [TAOF] The Art of Fuzzing is an open-source, mutation-based, network fuzzer written un Python. It works from an initial packet capture. TAOF captures packets between the client and the server by acting as a man-in-the-middle. It does know any of the protocols, instead it presents the packets to the user ina GUI, and user must dissect the packets and inform TAOF of the packet structure, including length fields. One dawback of TAOF is that in its current implementation, it cannot handle length fields within another length field, resulting in many missed protocols to fuzz. TAOF does not have any monitoring or analysis features.

  • [ProxyFuzz] ProxyFuzz is a man-in-the-middle non-deterministic network fuzzer written in Python. In other words, is exactly what it claims , a proxy server that fuzzes traffic. ProxyFuzz randomly changes (fuzzes) contents on the network traffic. It supports TCP and UDP protocols and can also be configured to fuzz only one side of the communication. ProxyFuzz is protocol agnostic so it can randomly fuzz any network communication. ProxyFuzz is a good tool for quickly testing network protocols and provide with basic proof of concepts. (Not changed since 2017)

  • *[Mu-4000] The Mu-4000 is an appliance-based fuzzer from Mu Dynamics (Spirent Communications Inc). It is a generation-based fuzzer understanding 55 different protocols at the moment. It is placed on the same network as the target system and configured and controlled via a Web browser. Within the protocols that it understands, it is extremely easy to use and is highly configurable. Options such as which test cases are sent at which timing periods can be precisely controlled. Furthermore, the Mu-4000 can be used as a pure proxy to send test cases from other fuzzers in order to use its monitoring functions, but otherwise cannot learn or be taught new or proprietary protocols. The Mu-4000 can only be used against protocols it understands. Another drawback is that, the Mu-4000 can only be used to fuzz servers and cannot fuzz client-side applications.

  • [FuzzGuru]Designed for Windows. It has a GUI and scriptable automation triggering. C++, C# and Java can be fuzzed if it processes data from unstrusted source. As a drawback, if a bug happens when two independently fields are malformed, FuzzGuru will not find it

  • *[Achilles]The Achilles project was a success and Wurldtech Security Technologies emerged as the leading provider of security solutions to SCADA, process control, and mission-critical industries, and the first company to offer a comprehensive suite of products and services designed specifically to protect the systems and networks that operate the foundation of the world’s critical infrastructure.

  • [Defensics 3.0]Defensics is a generation-based fuzzer from Codenomicon Ltd.It had support for over 130 protocols. As is the case with the Mu-4000, it had no ability to fuzz any protocols for which it does not already have support. It can be used to fuzz servers, clients, and even applications that process files. It is executed and controlled through a graphical Java application. Defensics can be configured to send a valid input between fuzzed inputs and compare the response to those in the past. In this way it can detect some critical behavioral faults such as the Heartbleed bug where the SUT replied with memory contents when fuzzed. It can also run custom external monitoring scripts. However, at the time of the analysis it didn’t have any built-in monitoring or analysis features.

  • [beSTORM]beSTORM from Beyond Security is another commercial fuzzer that can handle network or file fuzzing.It contained support for almost 50 protocols. However, unlike the other commercial offerings, it could be used for fuzzing of proprietary and unsupported protocols. A network packet capture, or in the case of file fuzzing, a file, is loaded into beSTORM. This valid file can then be manually dissected. Alternatively, beSTORM has the ability to automatically analyze the valid file and determine significant occurrences such as length fields, ASCII text, and delimiters. Once the unknown protocol is understood by beSTORM, it then fuzzes it using a large library of heuristics. beSTORM also supports the ability to describe a protocol specification completely in XML.

  • [AFL]American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time instrumentation and genetic algorithms to automatically discover clean, interesting test cases that trigger new internal states in the targeted binary. This substantially improves the functional coverage for the fuzzed code. The compact synthesized corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive testing regimes down the road. Compared to other instrumented fuzzers, afl-fuzz is designed to be practical: it has modest performance overhead, uses a variety of highly effective fuzzing strategies and effort minimization tricks, requires essentially no configuration, and seamlessly handles complex, real-world use cases - say, common image parsing or file compression libraries.

  • [Kelinci]Kelinci is one of the first AFL for Java implementations and is very promising, although the approach with having two processes per fuzzing instance is a little clumsy and can get confusing. One process is the native C side, which takes mutated inputs produced by AFL and sends them to the second process via TCP socket. The second process is the Java process that feeds the input to the target program and sends back the code paths taken with this input. There are certain error messages in the Java part of this fuzzer that are not always exactly clear (at least to me), but they seem to indicate that the fuzzer is not running in a healthy state anymore. However, so far Kelinci worked very well for me and came up with a lot of results.

  • [WINAFL]Instead of instrumenting the code at compilation time, WinAFL relies on dynamic instrumentation using DynamoRIO (http://dynamorio.org/) to measure and extract target coverage. This approach has been found to introduce an overhead about 2x compared to the native execution speed, which is comparable to the original AFL in binary instrumentation mode. WinAFL has been successfully used to identify bugs in Windows software

  • [LibFuzzer]LibFuzzer (LLVM) is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (“target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage. The code coverage information for libFuzzer is provided by LLVM’s SanitizerCoverage instrumentation.

  • [PROTOS]The PROTOS project researchs different approaches of testing implementations of protocols using black-box (i.e. functional) testing methods. The goal is to support pro-active elimination of faults with information security implications. Awareness in these issues is promoted. Methods are developed to support customer driven evaluation and acceptance testing of implementations. Improving the security robustness of products is attempted through supporting the development process.

AFL is the the most successful vulnerability detection tool to date. Many researchers had been forking the source code to improve the tool performance. Some examples are listed below:

  • [AFLFast] It is an AFL project fork, authors change the amount of fuzz that is generated for input and the order to pick a mutated seed from the generated input list. In other words, the tool uses the same number of inputs with a better distribution to maximize the coverage of tests. The preliminary results of the tool/paper launch show that this modification found 19x faster than AFL (on average). Coverage-based greybox Fuzzing as Markov Chain was tested using GNU Binutils, finding more crashes than AFL.
  • [AFLGo] Given a set of target locations (e.g., folder/file.c:582), AFLGo generates inputs specifically with the objective to exercise these target locations.
  • [AFLSmart]AFLSmart is a smart (input-structure aware) greybox fuzzer which leverages a high-level structural representation of the seed files to generate new files. It uses higher-order mutation operators that work on the virtual file structure rather than on the bit level which allows AFLSmart to explore completely new input domains while maintaining file validity. It uses a novel validity-based power schedule that enables AFLSmart to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. [13]

What about languages?

The choice of programming language also has an impact on the likelihood of vulnerabilities. Newer programming languages often make a considerable effort to prevent developers accidentally introducing vulnerabilities. For example, compiled C programs do not automatically perform bounds checking at runtime whereas Python, Ruby, C# or Java programs all do.

Why are fuzzers focused in c/c++ programs mostly?

C/C++ is often chosen when speed and efficiency are important and so many key applications. Operating system internals and also web browsers, are written in that language. Taking in count the previous paragraph, implementing bounds checking in any language compiler put an extra instructions in the low-level code evaluation.

That’s the main reason (I think) that makes fuzzers to be focused to c/c++ programs, because there are more “core” programs written in such languages.

Java, for an instance…

What are the principal java programs exploits ?

Many projects in the past focused on guarding against problems caused by the unsafe nature of C, such as buffer overruns and format string vulnerabilities. However, in recent years, Java has emerged as the language of choice for building large complex Web-based systems, in part because of language safety features that disallow direct memory access and eliminate problems such as buffer overruns. Platforms such as J2EE (Java 2 Enterprise Edition) also promoted the adoption of Java as a language for implementing e-commerce applications such as Web stores, banking sites, etc.

Java is immune to many things that a C programmer would face (buffer overflows are a big example), but it is slower. Java may be more secure locally than C++ but it still has many security weaknesses given its network capabilities. There are many kinds of exploits like this database shows.

While the Java platform includes numerous features designed to improve the security of Java applications, it’s critical for developers to ensure that their Java code is vulnerability free at the earliest stages of the software development life cycle. Avoiding Java security mistakes such as not restricting access to classes and variables, not finalizing classes, relying on package scoop and others is the best place to start when securing Java code, it’s also important for developers to familiarize themselves with the common security threats facing Java code, as well as Java frameworks.

High-Risk Java Security Vulnerabilities: With over 95% of all enterprise desktops in the world running Java, there are serious consequences when vulnerabilities in Java code make it to production and are exploited by malicious parties. The following is a list of some of the high-risk threats facing applications written in Java:

  • Code Injections
  • Command Injections
  • Connection String Injection
  • LDAP Injection
  • Reflected XSS
  • Resource Injection
  • Second Order SQL Injection
  • SQL Injection
  • Stored XSS
  • XPath Injection

Can these exploits/issues be detected with the use of fuzzing ?

Many of the Java programs exploits (SQL code injection, by example) can be treated with fuzzing, even with static analysis tools.

Fuzzing is originally applied to programs that are not memory safe, hoping that we are able to find memory corruption issues. Out of bound read or writes in Java code simply do not result in memory corruption but in more or less harmless Exceptions such as IndexOutOfBoundsException. While it might be desirable to find (code robustness) issues and might result in Denial of Service issues, the severity of these issues is usually low.

The question is what kind of behavior and fuzzing results are we looking for? There are different scenarios that might be of interest, but the attack vector (how does the attacker exploit the issue in the real world?) matters a lot when looking at them.

I think the find of Java program exploits must start from these basic ideas due to low-level JVM security coverage:

  • Finding issues such as Denial of Service (DoS), OutOfMemoryExceptions, high CPU load, high disk space usage, or functions that never return.

  • Finding low-severity or non-security issues such as RuntimeExceptions. Finding well-known security issues for Java code, such as Java deserialization vulnerabilities, Server Side Request Forgery (SSRF), and External Entity Injection (XXE).

Differential fuzzing may be another good starting point to solve injection issues, because, we have a lot of proved libraries that probably do the same as the desired application to test.

In interpreted languages… Must the interpreter be fuzzed too?

Arbitrary Java code as input for the JVM… This could be helpful in more exotic scenarios, for example when you need to escape from a sandboxed JVM. In most other scenarios this attack vector is probably just unrealistic, as an attacker would be executing Java code already.

What about injecting data in built-in classes (such as strings)? Maybe there is a deep hidden deserialization issue.

Not the same analysis if we use Javascript as a target. Fuzz javascript interpreters using javascript as inputs could bring better results due to fast browsers evolution and new features appeared.

Bibliography