PyGhidra Backend

Use the ofrak_pyghidra module to disassemble and decompile binaries using Ghidra via the PyGhidra Python bindings. Unlike the ofrak_ghidra module, ofrak_pyghidra does not require a Ghidra server. Instead, it runs Ghidra in headless mode to analyze files.

Install

NativeDocker

Create a virtual environment to which you will install code:
```
% python3 -m venv venv
% source venv/bin/activate
```
Install ofrak and its dependencies.
Set the GHIDRA_INSTALL_DIR environment variable with export GHIDRA_INSTALL_DIR=/install/ghidra_11.3.2_PUBLIC/, substituting in your actual Ghidra install path.
Install PyGhidra with: cd ${GHIDRA_INSTALL_DIR}/Ghidra/Features/PyGhidra/pypkg/ && python3 -m pip install -e .
Run make install or make develop inside of the ofrak_cached_disassembly/ directory.
Run make install or make develop inside of the ofrak_pyghidra/ directory.

Note: If you are using an ARM processor, you might need to compile the native binaries for decompilation to work.

Follow the instructions in the OFRAK environment setup guide to build a Docker container with PyGhidra. Ghidra and PyGhidra will be automatically installed if the disassemblers/ofrak_pyghidra package is included in the YAML configuration file. An example configuration is provided in ofrak-ghidra.yml.

Usage

Once installed, you can import ofrak_pyghidra into any script, as you would with the other analysis back ends.

import ofrak_pyghidra

ofrak = OFRAK(logging.INFO)
ofrak.discover(ofrak_pyghidra)

You can also open the GUI with ofrak gui --backend pyghidra to unpack and analyze a binary.

If the resource is correctly tagged as a Program or IHex, it should automatically be tagged as PyGhidraProject when identified, if the ofrak_pyghidra module is discovered.

PyGhidra Analysis

The first time you run the analysis, it will disassemble and decompile the entire program. The results will be cached in a cached analysis store, so the next time you disassemble (unpack) or decompile (analyze), the data will be available immediately. To save the analysis for faster loading times, see the Cached Analysis section below.

PyGhidra auto-analysis

ofrak_pyghidra will automatically analyze program attributes for Elf, Ihex, and Pe file formats.

resource = await ofrak_context.create_root_resource_from_file("my_file.elf")

await resource.unpack_recursively()
await resource.analyze_recursively()

PyGhidra manual analysis

If your file is not in one of the formats that OFRAK can analyze automatically, you will need to manually tag the resource as a Program and add ProgramAttributes.

resource = await ofrak_context.create_root_resource_from_file(file_path)
resource.add_tag(Program)
program_attributes = ProgramAttributes(
    InstructionSet.ARM,
    bit_width=BitWidth.BIT_32,
    endianness=Endianness.LITTLE_ENDIAN,
    sub_isa=None,
    processor=None,
)

resource.add_attributes(program_attributes)
await resource.save()

resource.identify()

You will need to add the CodeRegion view manually so that OFRAK knows where to unpack code in the binary.

new_length = await resource.get_data_length()
await resource.create_child_from_view(
    CodeRegion(
        virtual_address=0,
        size=new_length,
    ),
    Range.from_size(0, new_length)
)
await resource.save()

Cached analysis

PyGhidra can store the results of any disassembly and decompilation for later use.

Saving cached analysis

To save a cache to a JSON file:

With the ofrak_pyghidra module.

python -m ofrak_pyghidra analyze --infile my_file.elf --outfile cache_file.json --language ARM:LE:32:v7 --decompile

See python3 -m ofrak_pyghidra analyze -h for more details on usage.

In a script using the unpack function.

import json
from ofrak_pyghidra.standalone.pyghidra_analysis import unpack

decompile = True  # decompile in addition to disassembling
language = "..."
res = unpack(resource_file, decompile, language)
with open("cache_file.json", "w") as fh:
    json.dump(res, fh, indent=4)

In a script after running the analysis manually.

root_resource = await ofrak_context.create_root_resource_from_file(
    "my_file.elf"
)

# Run some analysis here

injector = ofrak_context.injector
cached_store = await injector.get_instance(CachedAnalysisStore)
analysis = cached_store.get_analysis(root_resource.get_id())

with open("cache_file.json", "w") as fh:
    json.dump(analysis, fh, indent=4)

Loading cached analysis

To load an analysis JSON file, see the Cached Disassembly Backend.

Troubleshooting

In some cases (not sure what produced this), the PyGhidra backend can get stuck and hang whenever pyghidra.open_program(...) is called. In such cases, moving the /home/<username>/.config/ghidra directory to a backup location (in case you have anything important in there) has proven to be a good way to bring the PyGhidra backend back to a good state.