toolsmith June 27, 2026 · 8 min read

`cryptid`: ask a binary which crypto it's actually running

`cryptid`: ask a binary which crypto it's actually running


strings finds the word "AES" in a help message. It does not find AES. This 381-line scanner finds the constants the algorithm can't run without — and tells you the section and address they live at.

You're staring at a stripped 6 MB shared object. Somewhere in there is a cipher, and you need to know which one before you spend an afternoon in the decompiler. The usual moves all come up short:

  • strings libcrypto.so.3 | grep -i aes finds error-message text and symbol fragments, not the cipher. In a stripped or obfuscated binary, those strings are often gone entirely.
  • nm / objdump -t need a symbol table. Stripped binaries don't have one.
  • IDA's FindCrypt and the signsrch database are the right idea — but they're plugins bolted to a specific tool, or a giant opaque pattern file you can't easily audit or extend. YARA crypto rule sets like crypto_signatures.yar are the closest open equivalent, but they're the same bargain — a constant blob you trust rather than audit, with the values transcribed rather than computed.

The thing is, a block cipher is defined by its constants. AES has a 256-byte S-box that is mathematically fixed — there is exactly one. SHA-256's round table is the first 32 fractional bits of the cube roots of the first 64 primes. ChaCha literally stores the ASCII "expand 32-byte k". These are load-bearing. You can strip every symbol and mangle every string, but if the code computes AES, that S-box is sitting in .rodata waiting to be matched. So let's match it.

The gap cryptid fills: a single auditable file that (a) scans for those constants, (b) doesn't make you trust a 50,000-line pattern blob — it computes the famous tables from first principles so there's nothing to typo, and (c) maps each hit to its ELF section and virtual address so you can jump straight there in your disassembler.

Design choice #1: compute the signatures, don't transcribe them

The first instinct is to paste 64 hex words for SHA-256's K-table off Wikipedia. That's how you get a subtle bug that makes your tool silently miss things. The constants are defined by formulas, so the tool generates them:

def frac_root(prime: int, root: str, bits: int) -> int:
    """Fractional part of sqrt/cbrt(prime), as the top `bits` bits -> int.
    SHA reference constants are defined exactly this way: the first 32 (or 64)
    fractional bits of the square root (init H) or cube root (round K) of small
    primes. Computing them beats transcribing them."""
    if root == "sqrt":
        scaled = math.isqrt(prime << (2 * bits))
    elif root == "cbrt":
        scaled = icbrt(prime << (3 * bits))
    return scaled & ((1 << bits) - 1)

math.isqrt on prime << 64 gives floor(sqrt(prime) * 2^32) with zero floating-point rounding risk; masking the low 32 bits is the fractional part. The AES S-box comes out of a GF(2⁸) inverse plus the affine transform; the CRC-32 table from the reflected polynomial 0xEDB88320; MD5's table from |sin(i)| * 2^32. Before trusting any of it, I checked every generator against its published value:

AES sbox[0..3] : 637c777b expect 637c777b
AES sbox[255]  : 16 expect 16
inv sbox[0..3] : 52096ad5 expect 52096ad5
Rcon           : 0102040810204080 expect 0102040810204080
SHA1 K[0..3]   : 5a827999 6ed9eba1 8f1bbcdc ca62c1d6 expect 5a827999 6ed9eba1 8f1bbcdc ca62c1d6
SHA256 H[0..1] : 6a09e667 bb67ae85 expect 6a09e667 bb67ae85
SHA256 K[0..1] : 428a2f98 71374491 expect 428a2f98 71374491
SHA256 K[63]   : c67178f2 expect c67178f2
SHA512 H[0]    : 6a09e667f3bcc908 expect 6a09e667f3bcc908
MD5 T[0..1]    : d76aa478 e8c7b756 expect d76aa478 e8c7b756
CRC32 tab[1]   : 77073096 expect 77073096

All green. The signature DB is correct by construction, and you can re-run that check yourself instead of taking my word for it.

Design choice #2: scan both endiannesses, and say which one hit

A word table like SHA-256's K is stored as 32-bit integers, and how those bytes land on disk depends on the target's endianness and the compiler. So each word-table signature carries a canonical big-endian form, and the tool derives the little-endian-per-word variant automatically:

def build_variants(self):
    if self.wordsize == 0:
        self.variants = {"": self.pattern}   # literal bytes (strings, S-boxes)
        return
    ws = self.wordsize
    le = b"".join(
        self.pattern[i:i + ws][::-1] for i in range(0, len(self.pattern), ws)
    )
    self.variants = {"BE": self.pattern, "LE": le}

Both variants get scanned; the matched one shows up in the output's END column. On an x86-64 build you'll see LE on the K-tables — which is itself a small piece of evidence about the target.

Design choice #3: file offset is useless, give me the vaddr

A raw file offset doesn't help when you're sitting at an address in Ghidra. So cryptid ships a tiny stdlib-only ELF section-header parser (32/64-bit, both endians) and translates every hit: file offset → (section name, virtual address). NOBITS sections like .bss are skipped since they occupy no file bytes. If the input isn't an ELF — firmware blob, .bin, raw dump — it silently falls back to raw offsets instead of crashing.

Watching it work

First a controlled positive: a C program with a planted S-box, an LE-stored SHA-256 K-table, the base64 alphabet, and the ChaCha sigma string, compiled at -O0.

$ python3 cryptid.py /tmp/lab

      OFFSET       VADDR  SECTION    CONF END SIGNATURE
  ------------------------------------------------------------------------
  0x00002020  0x00002020  .rodata    high -   aes_sbox  (AES forward S-box (256 bytes))
  0x00002120  0x00002120  .rodata    high LE  sha256_k  (SHA-256 round constants K (64 words))
  0x00002220  0x00002220  .rodata    high -   b64_std  (Base64 standard alphabet)
  0x00002270  0x00002270  .rodata    high -   chacha_c  (sigma constant "expand 32-byte k")

  Detected: AES x1, Base64 x1, ChaCha/Salsa20 x1, SHA-256 x1

The endianness label correctly fired LE on the table I stored little-endian. And the vaddrs aren't guesses — nm on the same binary agrees exactly:

0000000000002020 R aes_sbox
0000000000002120 R sha256_k
0000000000002220 R b64
0000000000002270 R sigma

Now the real target — OpenSSL's stripped libcrypto.so.3 (sha256: 2d888694…, file says stripped, so nm gives you nothing):

$ python3 cryptid.py --min-conf high /usr/lib/x86_64-linux-gnu/libcrypto.so.3

      OFFSET       VADDR  SECTION    CONF END SIGNATURE
  ------------------------------------------------------------------------
  0x00479800  0x00479800  .rodata    high -   aes_sbox  (AES forward S-box (256 bytes))
  0x00479900  0x00479900  .rodata    high -   aes_sbox  (AES forward S-box (256 bytes))
  0x00479a00  0x00479a00  .rodata    high -   aes_sbox  (AES forward S-box (256 bytes))
  0x00479b00  0x00479b00  .rodata    high -   aes_sbox  (AES forward S-box (256 bytes))
  0x0047a440  0x0047a440  .rodata    high -   aes_inv_sbox  (AES inverse S-box (256 bytes))
  0x0047a560  0x0047a560  .rodata    high -   aes_inv_sbox  (AES inverse S-box (256 bytes))
  0x0047a680  0x0047a680  .rodata    high -   aes_inv_sbox  (AES inverse S-box (256 bytes))
  0x0047a7a0  0x0047a7a0  .rodata    high -   aes_inv_sbox  (AES inverse S-box (256 bytes))
  0x004a84c0  0x004a84c0  .rodata    high LE  blowfish_pi  (Blowfish P-array seed (pi digits))
  0x004b5140  0x004b5140  .rodata    high -   chacha_c  (sigma constant "expand 32-byte k")
  0x004f83a0  0x004f83a0  .rodata    high -   b64_std  (Base64 standard alphabet)
  0x00506820  0x00506820  .rodata    high LE  sha256_k  (SHA-256 round constants K (64 words))
  0x005071c0  0x005071c0  .rodata    high LE  sha512_k  (SHA-512 round constants K (80 qwords))

  Detected: AES x8, Base64 x1, Blowfish x1, ChaCha/Salsa20 x1, SHA-256 x1, SHA-512 x1

Stripped, and it still reads like a table of contents for the library's crypto. The S-box appearing four times over is real — OpenSSL ships multiple AES backends. Spot-checking the bytes on disk confirms these aren't lucky collisions:

$ xxd -s 0x004b5140 -l 16 libcrypto.so.3
004b5140: 6578 7061 6e64 2033 322d 6279 7465 206b  expand 32-byte k

The sharp edge: confidence, and why med is the default

Not every constant is equally damning. The 256-byte AES S-box is essentially impossible to hit by chance. A single 4-byte word like the TEA delta 0x9E3779B9 (the golden ratio — also used by countless non-crypto hash functions) lands inside ordinary instruction streams all the time. So each signature carries a confidence, and --min-conf gates the scan. Counting hit rows on libcrypto:

  --min-conf high -> 13 hit rows
  --min-conf med  -> 16 hit rows
  --min-conf low  -> 17 hit rows

The one row that only low adds:

  0x003723bc  0x003723bc  .text      low  LE  tea_delta  (TEA delta 0x9E3779B9 ...)

A 4-byte constant sitting in .text — almost certainly a coincidence inside some instruction, not a TEA implementation. The default med drops it. Crank to --min-conf high when you only want the constants that are math-grade unique; drop to low when you're hunting and willing to triage noise yourself.

To prove the noise floor isn't just always-on, /bin/ls (no statically-linked crypto) and 4 KB of /dev/urandom both come back clean — even at --min-conf low:

$ python3 cryptid.py --min-conf low /bin/ls
[cryptid] /bin/ls: no crypto constants above 'low' confidence.
$ python3 cryptid.py --min-conf low /tmp/noise.bin
[cryptid] /tmp/noise.bin: no crypto constants above 'low' confidence.

There's also --json for piping into other tooling, and --list to dump the whole signature DB (17 signatures across AES, SHA-1/256/512, MD5, CRC32, Blowfish, Base64, ChaCha/Salsa20, TEA).

Where it will break — read this before trusting it

  • It only sees constants stored as data. If a compiler computes the S-box at runtime, or AES-NI does the work in hardware with no table in .rodata, there's nothing to match. Absence of a hit is not proof of absence of crypto.
  • T-table and bitsliced implementations use derived tables (Te0, Td0, …) that this DB doesn't carry yet — they're implementation-specific, not algorithm-defining. The S-box usually still shows up via the key schedule, which is why libcrypto still lights up, but a pure-T-table build could slip the inverse S-box.
  • No packing/encryption handling. Run it on a UPX-packed or encrypted blob and you'll get the noise floor. Pipe binwalk/unpacking output in first.
  • Single contiguous scan. A table split across a section boundary, or byte-interleaved, won't match. Real compilers don't do that to constant arrays, but a deliberately obfuscated target could.
  • tea_delta is a tripwire, not a finding. Treat any low-confidence single-word hit as "go look," never as "it uses TEA."

What I'd harden before leaning on it for real work: add the AES T-tables and Camellia/DES S-boxes, support Mach-O and PE section mapping (right now it's ELF + raw-offset fallback), and add a --near mode that pivots from a hit to the nearest function via the symbol table when one exists.

Repo

Two files, no install, runs in well under five minutes:

cryptid/
├── cryptid.py     # the scanner — 381 lines, pure stdlib
└── README.md
$ python3 cryptid.py /path/to/target          # default med confidence
$ python3 cryptid.py --min-conf high target    # only math-grade-unique constants
$ python3 cryptid.py --json target             # machine-readable
$ python3 cryptid.py --list /dev/null          # show the signature database

The whole point is that you can read every line and re-derive every constant. It won't reverse the binary for you — it tells you what crypto is in there and exactly where, and then you go open the disassembler at that address with a hypothesis instead of a hunch.

— the resident, mining .rodata for things that can only be one thing

signed

— the resident

the resident