Reverse OpenSSL libcrypto HMAC key by dynamic analysis

You are given an executable a.out that takes a single command line argument, a string, and then computes a 32 byte HMAC on the input and prints it to stdout using base64 encoding. Can you retrieve the secret key used to compute the HMAC?

For example:

$ ./a.out foo
A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ

You start to inspect the executable and see that it's a 64 bit ELF compiled for aarch64, stripped of debug symbols.

$ file a.out
main: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-aarch64.so.1, stripped

The binary dynamically links to some common, unimportant shared libraries. In particular you can see it dynamically links OpenSSL libcrypto, perhaps for computing the HMAC?

$ ldd a.out 
linux-vdso.so.1 (0x0000f06f34466000)
libcrypto.so.1.1 => /lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000f06f34193000)
libc.musl-aarch64.so.1 => not found
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000f06f3417f000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000f06f3414e000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000f06f33fdb000)
/lib/ld-musl-aarch64.so.1 => /lib/ld-linux-aarch64.so.1 (0x0000f06f34436000)

If the crypto stack is statically linked (which it usually is in hardened executables) rather than dynamically linked to the system's, you'd first need to recognize which cryptography stack is being used — a significant reverse engineering challenge involving library fingerprinting and code pattern matching.

Fortunately, this binary dynamically links OpenSSL libcrypto. Static analysis in Ghidra confirmed the executable calls the libcrypto HMAC function with signature HMAC(hash_function, &key, key_len, src, src_len, out, &out_len) — the symbol name wasn't stripped, making identification straightforward.

When an executable is stripped of debug symbols, the resulting binary may not contain variable names, function names, line numbers, data types, etc. In this case the executable was stripped of some debug symbols but apparently not the HMAC function name. Usually an executable will be completely stripped of all debug symbols, in which case we would additionally need to recognize the HMAC function and make note of its obfuscated name.

Ghidra disassembly showing HMAC function

The function takes a secret key by pointer (with its length), plus source and output buffers. The key is generated before the function call in a heavily obfuscated routine — likely LLVM-based obfuscation designed to prevent static analysis.

Rather than reversing the obfuscated key generation (possible but time-consuming), dynamic analysis provides a faster path. Since the crypto stack is dynamically linked and symbols are intact, we can simply inspect function arguments at runtime. Load the binary in gdb, set a breakpoint at the HMAC function, and run with a test input:

$ gdb ./a.out
(gdb) b HMAC
(gdb) run foo

Unfortunately gdb could not provide information about the arguments of the HMAC function because the symbol table is missing.

(gdb) info args
No symbol table info available.

However, one can always try to meaningfully print the contents of the CPU registers at a function breakpoint, in this case keeping in mind the 64-bit ARM (AArch64) calling convention. In gdb run:

(gdb) info registers
x0             0xffffa69fd738      281473477236536
x1             0xffffd886e430      281474314462256
x2             0xa                 10
x3             0xffffd886ef5d      281474314465117
x4             0x3                 3
x5             0xffffd886e4f8      281474314462456
x6             0xffffd886e42c      281474314462252
x7             0x5                 5
x8             0x20                32
x9             0xf5                245
...

Recall the 64-bit ARM calling convention:

The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:
  • x31 (SP): Stack pointer or a zero register, depending on context.
  • x30 (LR): Procedure link register, used to return from subroutines.
  • ...
  • x9 to x15: Local variables, caller saved.
  • x8 (XR): Indirect return value address.
  • x0 to x7: Argument values passed to and results returned from a subroutine.

Basically x0 will hold the first argument to a function, x1 the second argument, and so on and so forth. When a function has more arguments than the number of general-purpose registers, the remaining arguments are typically passed on the stack. If we focus on gdb again everything should be clear now:

Everything is consistent. To verify we extracted the correct key, let's write an equivalent program and confirm the output matches. The first argument (register x0) points to the hash function — we can examine it in gdb or observe that the output is 32 bytes (256 bits), indicating SHA256.

#include <openssl/hmac.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <openssl/bio.h>
#include <openssl/evp.h>

int main(int argc, char *argv[]) {
char key[] = "1234567890";
unsigned char result[32];
unsigned int len;
HMAC(EVP_sha256(), key, strlen(key), (unsigned char*) argv[1], strlen(argv[1]), result, &len);

// Print as base64 with no line breaks
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
BIO *bio = BIO_new_fp(stdout, BIO_NOCLOSE);

// Print
BIO_push(b64, bio);
BIO_write(b64, result, len);
BIO_flush(b64);
BIO_free_all(bio);

return 0;
}

Sure enough we get the same result as the mysterious binary:

$ sudo apt install -y build-essential libssl-dev
$ gcc hmac.c -lcrypto
$ ./a.out foo
A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ=

In conclusion, by taking a dynamic analysis approach and using gdb to inspect the contents of the CPU registers at the HMAC function breakpoint, we were able to retrieve the key. This was achieved by examining the general-purpose registers allocated by the 64-bit ARM calling convention, which helped us identify the registers containing the pointer to the key and its length.

This exercise highlights the importance of understanding the underlying CPU architecture and function calling convention, as well as the usefulness of dynamic analysis tools like gdb for reverse engineering and debugging. With the right techniques and tools, even a stripped binary can reveal its secrets.