Ghidra 11.0 Rust Support

Improved function demangling and Rust String analysis

Jan 02, 2024

Ghidra 11.0 marks a significant advancement in reverse engineering tools with its initial support for Rust, a language gaining traction in systems-level programming due to its robust security features. This comes with two new features:

Improved function and variable name demangling, now with RustPath awareness for enhanced parsing of Rust-specific constructs like traits.
A new Rust String Analysis pass that enhances the readability of String usage in the decompiler and simplifies data-flow analysis.

These additions not only streamline the analysis of Rust binaries but also pave the way for more sophisticated features in future updates. The rest of this article provides more detail on these changes in the source code of Ghidra, and provides some example Rust snippets to demonstrate the benefits that they bring.

Introduction

In the latest release notes for Ghidra, we see some new changes that add some initial support for Rust, a language increasingly favored in systems-level programming for its strong security features. The goal of this article is to take a look at these new analysis passes and see how they work.

Let's start by importing a standard “hello-world” binary to examine the new analyses:

fn main() {
    println!("Hello, world!");
}

There are two new analysis options that are Rust specific:

Demangler Rust
Rust String Analyzer

You can find the source files containing these additions here. We are not going to perform a full code-review, but we will pull out some parts that are interesting and say something about the current state of Rust support in Ghidra.

Generic Rust Support

Inside of RustUtilities we can find an isRust() method for tagging parts of memory that contain Rust signatures. The Ghidra developers consider the following strings (bytes) as part of their heuristics for Rust code in memory:

“RUST_BACKTRACE” - used in Rust for capturing runtime exceptions on the stack
“/rustc/” - paths related to the Rust Compiler (rustc)

These kind of utility methods are used by Object Loaders to detect if they may be working with Rust binaries, and therefore should present these options to analysts.

Demangler Rust

Revisiting the release notes from before, a core addition was Rust Demangler Support:

Initial support for Rust compiled binaries, mainly demangling of Rust method names and Rust in DWARF information, has been added

For those unfamiliar with the compilation process, a core requirement for the compiler is that all function names are unique. To ensure that all function names are unique, the compiler will take our function name, and mangle it with random combinations1 of characters. For example, if we take our hello world binary and analyze it in Ghidra, without the demangler pass, our main function label may look like __ZN11hello_world4main17h035b09a6adaa7cbdE. However if we then run a demangler pass, it is changed to hello_world::main2. Demangling is the inverse of this process, it takes this (seeming) random string and converts it back into something we would expect to see in source code.

The core of this analysis pass is found in RustDemanglerAnalyzer, and is structured as we would expect for an analysis pass. Like in CPP, Rust demangling aims to resolve the symbol to a namespace3 . It appears that the Rust demangling process is derived from those of a CPP project.

The implementation of the demangling logic can be found in the RustDemangler class. There are two variations of the demangler format defined:

Legacy which uses simple string and pattern matching across the whole to demangle the symbol
v0 which introduces a new Symbol object that captures of semantics specific to Rust symbols, namely the RustPath4

The demangler also has the option to apply the rustcall calling convention when trying to demangle function definitions. The calling convention helps inform the demangler what order the parameters are passed to functions, and how return values need to be treated. However, parameter mapping does not appear to be fully implemented yet in this demangler.

Testing Demangler based on v0 spec

As the legacy Demangler is based more on simple pattern matching, we are skipping it to focus on the changes introduced by the v0 format of the Demangler.

Simple PathBuf

When mangled by rustc std::path::PathBuf::new(); becomes _RNvMsr_NtCs3ssYzQotkvD_3std4pathNtB5_7PathBuf3newCs15kBYyAo9fc_7mycrate with a recommended demangling of <std::path::PathBuf>::new.

Let’s test this out with a simple Rust program:

use std::path::PathBuf;

fn main() {
    // Create a new empty PathBuf
    let mut path = PathBuf::new();

    // Add a path to the PathBuf
    path.push("my_directory");
    path.push("my_file.txt");

    // Display the resulting path
    println!("Final path: {:?}", path);
}

Running this without first running the GNUDemangler that is included with Ghidra results in loads of errors, and the GNU Demangler does a pretty good job at demangling this without actually running the Rust specific demangler. My current assumption is that the default demangler from the GNUDemangler Pass has some basic support for rust already, based on the existence of rust-demangle in the demanger_gnu_v3_41 source. Regardless, it does produce a semantically correct pseudo-CPP representation of the RustPath: std::path::PathBuf::new.

I did get a few null pointer errors for this binary of the form, but not enough to impact the analysis of the wider binary:

Demangler Rust> Unable to demangle symbol: __ZN3std3sys4unix17thread_local_dtor13register_dtor5DTORS17hade0f3a53b6f17eeE$tlv$init at 1000480c8.  Message: Cannot invoke "ghidra.app.plugin.core.analysis.rust.demangler.RustPath.toString()" because the return value of "ghidra.app.plugin.core.analysis.rust.demangler.RustPath.parse(ghidra.app.plugin.core.analysis.rust.demangler.Symbol)" is null

Extended PathBuf example using Traits

We extended the previous example to implement a trait, which you can see below:

use std::path::{Path, PathBuf};

// Define a custom trait for PathBuf extensions
trait PathBufExtensions {
    fn add_directory(&mut self, dir: &str);
    fn add_file(&mut self, file: &str);
}

// Implement the custom trait for PathBuf
impl PathBufExtensions for PathBuf {
    fn add_directory(&mut self, dir: &str) {
        self.push(dir);
    }

    fn add_file(&mut self, file: &str) {
        self.push(file);
    }
}

fn main() {
    // Create a new empty PathBuf
    let mut path = PathBuf::new();

    // Use the custom trait methods to add components
    path.add_directory("my_directory");
    path.add_file("my_file.txt");

    // Display the resulting path
    println!("Final path: {:?}", path);
}

Focusing on the add_directory() function inside the trait, when compiled it will look like __ZN79_$LT$std..path..PathBuf$u20$as$u20$pathbuf_extended_test..PathBufExtensions$GT$13add_directory17h1c151ad7eb08d3d4E. After running the GNUDemangler Pass it was resolved to <std::path::PathBuf as pathbuf_extended_test::PathBufExtensions>::add_directory, exactly as the spec recommends. Note that this is only visible as a function comment, and the function label in the decompiler view is <>::add_directory, so still room for improvements in the decompiler front, but this will likely require storing some local variable information as the analysis is running to identify the what the calling module might be.

What this shows me is that the demangler support for rust is certainly an improvement over what existed before, as any demangling would only be able to treat it as CPP code. The introduction of RustPath awareness to the demangler also makes labelling of traits much more user-friendly.

Rust String Analyzer

We can see the second addition again in the release notes:

In addition, Rust strings are marked up so that the decompiler will display Rust strings correctly.

These updates can be found in the RustStringAnalyzer class. This pass gathers all the strings defined in the programs data section, and iterates over each of them in an attempt to (recursively) separate each string into its own unique datatype.

In order to explore why we need this, lets consider the below example:

fn main() {
    // 1. Static string (string literal)
    let static_string: &'static str = "Hello, World!"; // Static string

    // 2. Owned string (String)
    let mut owned_string = String::from("Hello, "); // Owned string
    owned_string.push_str("Rust!");

    // 3. Mutable string slice (&mut str)
    let mut mutable_string_slice = String::from("Hello, Mutable World!");
    let slice = &mut mutable_string_slice[..6]; // Mutable string slice

    // 4. Immutable string slice (&str)
    let immutable_string_slice: &str = &static_string[0..5]; // Immutable string slice

    // Print all the strings
    println!("Static String: {}", static_string);
    println!("Owned String: {}", owned_string);
    println!("Mutable String Slice: {}", slice);
    println!("Immutable String Slice: {}", immutable_string_slice);
}

This is a simple program that demonstrates each of the ways we can use and define strings in Rust. Upon inspection of the disassembly, we can see that the compiler has compiled all of these strings into one big object in memory:

"Hello, World!Hello, Rust!Hello, Mutable World!src/main.rsStatic String: \nOwned String: Mutable String Slice: Immutable String Slice: "

Each of the local variables and function calls then perform a substring on this to get the part of the string they need, for example:

// in source
let mut owned_string = String::from("Hello, ");

// decompiled - subslice "Hello, " from the first 7 bytes
_<>::from("Hello, Rust!Hello, Mutable World!src/main.rsStatic String: \nOwned String: Mutable Stri ng Slice: Immutable String Slice: ",7);

Having one big string with subslices would make certain kinds of analysis difficult, as they would all point to the same data-object, just a different offsets. Instead, this pass appears to split the larger string into substrings, and create new datatypes that are then used in the decompiler. Comparing the previous example:

// in source
let mut owned_string = String::from("Hello, ");

// decompiled (with updates)
_<>::from("Hello, ",7);

This is much more readable from an analysis perspective, and updates the decompilation view with each of the substrings, rather than having one large string in the local variable.

Conclusions

I really liked the structure of the v0 demangler, and how it now supports demangling as specified in the v0 spec. This will make analysis of Rust binaries a little more analysis-friendly in the short-term. In the long-term, it will be interesting to see how the Ghidra team can make the decompiler view read more like Rust source for concepts like traits, as pre-comments are good, but limited in how we can use them in automated analysis without implementing string parsers. The great thing about mangled function names is that we also get access to information about the number of function parameters and their types, future releases (I imagine) will be able to leverage this to also pass parameter details to the decompiler and improve the output further.

The RustStringAnalyzer provides a much-needed quality of life improvement for reverse engineering rust binaries, removing the need to figure out what substring the binary is reading, and also improving the quality of data-flow analysis as each substring is now its own type.

Overall, these are promising first steps in integrating Rust support into Ghidra. I encourage you to download the latest version of Ghidra and explore these features for yourself. If you create any cool changes or enhancements, feel free to submit a pull request on the project's GitHub page. This way, everyone can benefit from your contributions :)

These are not actually random, and instead also provide information about the number of parameters, return types, and the namespace of a function.

If you are already familiar with C++ compiler mangling, then this format will seem familiar to you, and allows some Rust support out of the box.

If you are a Java/Python programmer, namespaces can be thought of as an equivalent of a package. If you are coming as a Rust programmer, namespaces are the equivalent of modules.

See v0 Symbol Format for a full description of the RustPath mangling

Nathan’s Substack