Dynamic Binary Visualization

Notice that this article is based on this cool video!

Binary Visualization explores a simple idea: turn raw binary files into images, so humans can spot patterns with their eyes instead of reading endless hex dumps.

A small Rust script (compiled to WASM) converts bytes into 256×256 visual fingerprints. Potential applications include data recovery, malware understanding, document clustering, and traffic analysis.

Early results look promising, but also show a hard limit: large binaries are difficult to compress into a tiny image without losing structure.

Binviz Console

$ hexdump

file.bin

00000005089 474e 0a0d 0a1a 0000 0d00 4849 5244

00000105089 474e 0a0d 0a1a 0000 0d00 4849 5244

00000200031 4c00 49f4 4144 7854 ed01 01e0 2490

00000309249 4924 8b12 99aa 47bb 4444 6666 5666

0000090cccc cccc cccc cccc cccc cccc cccc 74cc

00000a07777 7777 5757 5555 5555 6666 4646 8444

00000b09bbb 0a99 4ccf 5766 7577 7777 cf4f cccc

00000c0cccc 624c 70c5 1a0e 0404 b00c fb09 6150

...

Drop a file or click to select

Generate a binary visualization

The Binary Visualization Console lets you upload a binary file and visualize it as an image. I recommend using it with multiple executable files and compare them visually. Exec files are good to quickly identify your first patterns 😎

Why visualize binaries?

Reverse engineering tools usually display binaries as hex dumps or disassembly, which is accurate but cognitively expensive. Chris Domas describes the core problem well: there’s a gap between raw bytes and high-level reasoning tools—and humans are left to bridge it with intuition and experience.

Dynamic binary visualization tries to bridge that gap with a representation the brain is good at: spatial patterns. Instead of parsing structure first, you get a quick “shape” of the binary: regions, repetitions, anomalies, and transitions.

Visualizations of six distinct ARM executables generated using the Binary Visualization Dataset script

A naive (but effective) generator

The dataset generator is intentionally simple:

Read the binary file
Slide over it in a window of two bytes
Interpret the pair as (x, y) coordinates where each byte is in [0, 255]
Plot points into a 256×256 image

This approach is fast and produces recognizable patterns, but it is also naive:

it doesn’t preserve locality in a principled way
it doesn’t explicitly surface compression/encryption zones
it compresses huge files into a fixed image size, which can blur structure

Scaling up: chunking

Large executables don’t fit nicely into a single 256×256 summary. A practical fix is chunking: split the file into smaller ranges and visualize each chunk. This enables exploration at multiple scales and makes “where is the weird region?” questions much easier.

Better locality with space-filling curves

Aldo Cortesi explored a more advanced approach using space-filling curves (zig-zag, Z-order, Hilbert). The idea is to map a 1D byte stream into 2D while preserving locality as much as possible.

Key takeaways from Cortesi’s experiments:

Three space-filling curves

Hilbert curves preserve locality best (but are more complex to generate)
Z-order is simpler and faster, but weaker on locality
zig-zag tends to wash out small-scale features

I didn’t integrate this approach yet, but it’s one of the most promising directions to reveal global structure at a glance.

Here are some visualizations by Aldo Cortesi using space-filling curves:

Aldo Cortesi’s visualizations using space filling curves methods

Dataset creation

The Binary Visualization Dataset was built with a simple pipeline:

Collect binaries (Kaggle datasets + executables from a personal ARM laptop)
Clean / preprocess (remove metadata when relevant, keep raw bytes)
Visualize using the Rust script
Annotate by directory-based labeling
Validate basic integrity and reproducibility

Covered categories include:

images (jpg, png, bitmap, …)
audio (wav, …)
executables (ARM and x86_64)

Example layout:

binary_visualization_dataset/
  audio/
    wav/
  executables/
    ARM/
    x86_64/
  images/
    bitmap/

To preserve provenance, visualizations keep the original filename and add a suffix like:

original.bin.bvtool.png

The dataset is publicly available via the project repository.

Conclusion

This project reinforced a simple lesson: visualization can turn an expert-only task into pattern recognition, which humans are very good at!

But it also exposed the main challenge: large binaries are hard to summarize without losing meaning. Chunking helps, and locality-preserving mappings (like Hilbert curves) look like the next step.

References

C. Domas, The Future of RE, REcon 2013 (Jun. 2013).
A. Cortesi, Visualizing binaries with space-filling curves (Dec. 2011).
A. Cortesi, Visualizing entropy in binary files (May. 2016).

Want to continue reading? Check out this article to learn a magic trick that instantly reduces the memory footprint of your programs 🤓