User guide¶
Quickstart¶
unblob has a very simple command line interface with sensible defaults. You just need to pass it a file you want to extract:
$ unblob alpine-minirootfs-3.16.1-x86_64.tar.gz
2022-07-30 06:33.07 [info ] Start processing file file=openwrt-21.02.2-x86-64-generic-ext4-combined.img.gz pid=7092
It will make a new directory with the original filename appended with _extract
:
$ ls -l
total 2656
drwxrwxr-x 3 walkman walkman 4096 Jul 30 08:43 alpine-minirootfs-3.16.1-x86_64.tar.gz_extract
-rw-r--r-- 1 walkman walkman 2711958 Jul 30 08:43 alpine-minirootfs-3.16.1-x86_64.tar.gz
And will extract all known file formats recursively until the specified recursion depth level (which is 10 by default):
$ tree -L 2
alpine-minirootfs-3.16.1-x86_64.tar.gz_extract
├── alpine-minirootfs-3.16.1-x86_64.tar
└── alpine-minirootfs-3.16.1-x86_64.tar_extract
├── bin
├── dev
├── etc
├── home
├── lib
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin
├── srv
├── sys
├── tmp
├── usr
└── var
18 directories, 1 file
Features¶
Metadata extraction¶
unblob can generate a metadata file about the extracted files in a JSON format
by using the --report
CLI option:
$ unblob --report alpine-report.json alpine-minirootfs-3.16.1-x86_64.tar.gz
2022-07-30 07:06.59 [info ] Start processing file file=alpine-minirootfs-3.16.1-x86_64.tar.gz pid=13586
2022-07-30 07:07.00 [info ] JSON report written path=alpine-report.json pid=13586
$ cat alpine-report.json
[
{
"task": {
"path": "/home/walkman/Projects/unblob/demo/alpine-minirootfs-3.16.1-x86_64.tar.gz",
"depth": 0,
"chunk_id": "",
"__typename__": "Task"
},
"reports": [
{
"path": "/home/walkman/Projects/unblob/demo/alpine-minirootfs-3.16.1-x86_64.tar.gz",
"size": 2711958,
"is_dir": false,
"is_file": true,
"is_link": false,
"link_target": null,
"__typename__": "StatReport"
},
{
"magic": "gzip compressed data, max compression, from Unix, original size modulo 2^32 5816320\\012- data",
"mime_type": "application/gzip",
"__typename__": "FileMagicReport"
},
{
"id": "13590:1",
"handler_name": "gzip",
"start_offset": 0,
"end_offset": 2711958,
"size": 2711958,
"is_encrypted": false,
"extraction_reports": [],
"__typename__": "ChunkReport"
}
],
"subtasks": [
{
"path": "/home/walkman/Projects/unblob/demo/alpine-minirootfs-3.16.1-x86_64.tar.gz_extract",
"depth": 1,
"chunk_id": "13590:1",
"__typename__": "Task"
}
],
"__typename__": "TaskResult"
},
...
]
Entropy calculation¶
If you are analyzing an unknown file format, it might be useful to know the entropy of the contained files, so you can quickly see for example whether the file is encrypted or contains some random content.
Let's make a file with fully random content at the start and end:
$ dd if=/dev/random of=random1.bin bs=10M count=1
$ dd if=/dev/random of=random2.bin bs=10M count=1
$ cat random1.bin alpine-minirootfs-3.16.1-x86_64.tar.gz random2.bin > unknown-file
A nice ASCII entropy plot is drawn on verbose level 3:
$ unblob -vvv unknown-file | grep -C 15 "Entropy distribution"
2022-07-30 07:58.16 [debug ] Ended searching for chunks all_chunks=[0xa00000-0xc96196] pid=19803
2022-07-30 07:58.16 [debug ] Removed inner chunks outer_chunk_count=1 pid=19803 removed_inner_chunk_count=0
2022-07-30 07:58.16 [warning ] Found unknown Chunks chunks=[0x0-0xa00000, 0xc96196-0x1696196] pid=19803
2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0x0-0xa00000 path=unknown-file_extract/0-10485760.unknown pid=19803
2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/0-10485760.unknown pid=19803
2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/0-10485760.unknown pid=19803 size=0xa00000
2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
2022-07-30 07:58.16 [debug ] Entropy chart chart=
Entropy distribution
┌---------------------------------------------------------------------------┐
100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
90┤ │
80┤ │
70┤ │
60┤ │
50┤ │
40┤ │
30┤ │
20┤ │
10┤ │
0┤ │
└┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
[y] entropy % [x] mB
pid=19803
2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0xc96196-0x1696196 path=unknown-file_extract/13197718-23683478.unknown pid=19803
2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/13197718-23683478.unknown pid=19803
2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/13197718-23683478.unknown pid=19803 size=0xa00000
2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
2022-07-30 07:58.16 [debug ] Entropy chart chart=
Entropy distribution
┌---------------------------------------------------------------------------┐
100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
90┤ │
80┤ │
70┤ │
60┤ │
50┤ │
40┤ │
30┤ │
20┤ │
10┤ │
0┤ │
└┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
[y] entropy % [x] mB
Skip extraction with file magic¶
The extraction process can be faster and produce fewer false positives if we just ignore some files, which we know will not contain meaningful results, or it makes no sense to extract them. Examples of such file formats are SQLite, images, fonts, or PDF documents.
We have a default for the skip list,
but you can change it with the --skip-magic
CLI option. Here is a silly example:
$ unblob --skip-magic "POSIX tar archive" alpine-minirootfs-3.16.1-x86_64.tar.gz
2022-07-30 07:18.09 [info ] Start processing file file=alpine-minirootfs-3.16.1-x86_64.tar.gz pid=14971
$ tree .
├── alpine-minirootfs-3.16.1-x86_64.tar.gz
└── alpine-minirootfs-3.16.1-x86_64.tar.gz_extract
└── alpine-minirootfs-3.16.1-x86_64.tar
Here gzip has been extracted, but we skipped the tar extraction, so no other files have been extracted further.
Full Command line interface¶
Usage: unblob [OPTIONS] FILE
A tool for getting information out of any kind of binary blob.
You also need these extractor commands to be able to extract the supported
file types: 7z, debugfs, jefferson, lz4, lziprecover, lzop, sasquatch,
sasquatch-v4be, simg2img, ubireader_extract_files, ubireader_extract_images,
unar, zstd
NOTE: Some older extractors might not be compatible.
Options:
-e, --extract-dir DIRECTORY Extract the files to this directory. Will be
created if doesn't exist.
-f, --force Force extraction even if outputs already
exist (they are removed).
-d, --depth INTEGER RANGE Recursion depth. How deep should we extract
containers. [default: 10; x>=1]
-n, --entropy-depth INTEGER RANGE
Entropy calculation depth. How deep should
we calculate entropy for unknown files? 1
means input files only, 0 turns it off.
[default: 1; x>=0]
-P, --plugins-path PATH Load plugins from the provided path.
-S, --skip-magic TEXT Skip processing files with given magic
prefix [default: BFLT, JPEG, GIF, PNG,
SQLite, compiled Java class, TrueType Font
data, PDF document, magic binary file, MS
Windows icon resource, PE32+ executable (EFI
application)]
-p, --process-num INTEGER RANGE
Number of worker processes to process files
parallelly. [default: 12; x>=1]
--report PATH File to store metadata generated during the
extraction process (in JSON format).
-k, --keep-extracted-chunks Keep extracted chunks
-v, --verbose Verbosity level, counting, maximum level: 3
(use: -v, -vv, -vvv)
--show-external-dependencies Shows commands needs to be available for
unblob to work properly
-h, --help Show this message and exit.