Reference Sample Archive

While there is some test coverage via unit tests, the major bulk of testing is achieved via integration tests over some sample set.

What is considered a sample set

Here and onwards, a sample set is just a directory with samples, and two special files. There should be a timestamp.txt containing an Unix time (presumably, of when the set was last updated). Most importantly, it must also contain filelist.sha1 file in the top-level directory, which is used as a digest to the contents of said sample set. Said file must be a valid sha1sum output, with format:

<40-char SHA1><space><asterisk><filename>

Canonical Sample Set

The canonical raw sample data set is It is freely licensed - all new samples are in Public Domain under CC0 1.0 license (85+% of samples and counting), however some older samples are still under more restrictive CC BY-NC-SA 4.0 license.

Please read this for more info on how to contribute samples!

Full sample set

The complete set, that includes every sample available, and thus has as good coverage as we can get, but as downside it is quite bulky - rpu-button-size total, spanning rpu-button-samples.

It is accessible at:


But there is also a masterset, with just a handful hand-picked samples that provide reasonable-ish coverage while spanning only ~ \(1/22\)’th of the disk footprint and ~ \(1/44\) sample count of the full set.


Unless you want to perform rigorous regression testing the masterset is strongly recommended!


Masterset only contains samples that are in public domain.

It is accessible at:

Acquiring Canonical Sample Set

Pick which sample set you will want to acquire. Be wary of disk footprint! Probably the easiest way to fetch it is via rsync, for example:

$ rsync -vvrLtW --preallocate --delete --compress --compress-level=1 --progress \
        rsync:// ~/raw-camera-samples/
$ # it might be a good idea to verify consistency afterwards:
$ sha1sum -c --strict ~/raw-camera-samples/