Reference Sample Archive¶
While there is some test coverage via unit tests, the major bulk of testing is achieved via integration tests over some sample set.
What is considered a sample set¶
Here and onwards, a sample set is just a directory with samples, and two special
files. There should be a
timestamp.txt containing an
Unix time (presumably, of when the set was last updated).
Most importantly, it must also contain
filelist.sha1 file in the
top-level directory, which is used as a digest to the contents of said sample
set. Said file must be a valid sha1sum output, with format:
Canonical Sample Set¶
The canonical raw sample data set is raw.pixls.us. It is freely licensed - all new samples are in Public Domain under CC0 1.0 license (85+% of samples and counting), however some older samples are still under more restrictive CC BY-NC-SA 4.0 license.
Please read this for more info on how to contribute samples!
Full sample set¶
It is accessible at: https://raw.pixls.us/data/
But there is also a masterset, with just a handful hand-picked samples that provide reasonable-ish coverage while spanning only ~ \(1/22\)’th of the disk footprint and ~ \(1/44\) sample count of the full set.
Unless you want to perform rigorous regression testing the masterset is strongly recommended!
Masterset only contains samples that are in public domain.
It is accessible at: https://raw.pixls.us/data-unique/
Acquiring Canonical Sample Set¶
Pick which sample set you will want to acquire. Be wary of disk footprint! Probably the easiest way to fetch it is via rsync, for example:
$ rsync -vvrLtW --preallocate --delete --compress --compress-level=1 --progress \ rsync://raw.pixls.us/data-unique/ ~/raw-camera-samples/raw.pixls.us-unique/ $ # it might be a good idea to verify consistency afterwards: $ sha1sum -c --strict ~/raw-camera-samples/raw.pixls.us-unique/filelist.sha1