Adding Identifiers
OFRAK Contributors and Users can extend the tool's identification capability in one of two ways:
- Extend the MagicIdentifier by registering a new magic pattern match
- Implement a new Identifier
Extend the MagicIdentifier
First, consider extending the magic identifier by registering a new magic pattern match. The MagicIdentifier uses three pattern matchers:
- MagicMimePattern allows users to register matches to magic's mime output
- MagicDescriptionPattern allows users to create matching functions that run on the magic description output
- RawMagicPattern allows users to create custom raw byte matching patterns against a resource's binary data
Combining these pattern matching strategies can provide expanded identification coverage, particularly when libmagic's output contains false negatives.
For example, all three patterns are used to identify DeviceTreeBlob
MagicMimePattern.register(DeviceTreeBlob, "Device Tree Blob")
MagicDescriptionPattern.register(DeviceTreeBlob, lambda s: "device tree blob" in s.lower())
def match_dtb_magic(data: bytes):
if len(data) < 4:
return False
return data[:4] == DTB_MAGIC_BYTES
RawMagicPattern.register(DeviceTreeBlob, match_dtb_magic)
These patterns (along with all other identifier patterns) will get run when the MagicIdentifier runs, adding a DeviceTreeBlob
tag to matching resources.
See the docstrings for each pattern for implementation details.
Generally speaking, it makes sense to start with a magic mime or magic description pattern, implementing a raw magic pattern only when necessary.
Implement a New Identifier
Additionally, it is possible to implement a new Identifier. Doing so should be reserved for situations where extending the MagicIdentifier is impractical. The ApkIdentifier is an example of a custom identifier implementation.
Warning
Adding new identifiers should be done with care to minimize overall performance impact to OFRAK workflows.
Try to carefully select the resource tags the identifier targets to minimize the frequency with which
it is run: generally speaking, targeting GenericBinary
will result in this identifier getting run on the largest
number of possible resources. ApkIdentifier
targets JavaArchive
and ZipArchive
only for this reason.
Handling External Dependencies
If the Identifier makes use of tools that are not packaged in modules installable via pip
from
PyPI (commonly command-line tools), these dependencies must be explicitly declared as part of the
identifier's class declaration. See the Components Using External Tools doc
for information on how to do that.