Cold-storage and live-storage of ROM and ISO sets, or "compression vs deduplication"

This weekend I’ve been thinking about cold storage of my software archive, that include some ROM and ISO sets. The idea was to find the better way to store them in Blu-ray recordable discs.

Considering how usually two sets can contain almost the same data (for example “Tomb Raider” for Saturn and PlayStation can contain same audio, movies and maps, differing only on the small executables), or even the same set (all of the “Super Mario World” clones differ only in a small part of the ROM), both solid-mode compression and deduplication offer really high promises.

So, for the purposes of checking how promises behold, I took two systems, Nintendo Game Boy and Nintendo Game Boy Color, along with all the sets I have from them: Cowering (aka GoodTools), TOSEC, no-intro and NonGood.

For compression I used torrent7z, that given the same files (and filenames) will get the same compressed file exactly using 7zip (and, from it, the LZMA algorithm), and GoodMerge, a tool that takes all the clones from the Cowering set and compresses them together (merges) with 7zip at maximum, to get the maximum space savings.

For the deduplication, first I got theoretical estimations using an application I developed specifically for that (DedupStat, you can get it here, GPL, open source) and then created ZFS (a highly complex and feature-able filesystem available on Solaris, FreeBSD, Linux and Mac OS X, but not Windows, at all) pools with both deduplication and compression enabled.

Cowering set contains 7,930 ROMs, TOSEC contains 135 ROMs, contains 2,980 ROMs and NonGood contains 138 ROMs.

They make 11,183 files for a total of 7,598,579,214 bytes (~7,246Mb).

Test computer is an Intel Core 2 E6400 @ 2.13Ghz, Linux 3.9.0-server (Sabayon), 8Gb DDR2-800 RAM, torrent7z 0.9.1beta (7zip 4.65), ZFSonLinux 0.6.2-r3, on a Maxtor/Seagate STM3320820A.

First of all, let’s test compression.

Cowering (GoodMerge7z + torrent7z) + TOSEC (torrent7z) + no-intro (torrent7z) 5,394 files for a total of 1,407,450,696 bytes (~1,342Mb, 18.52%), took 4,813.013 seconds, approx. 0.27887927590956 Mb/sec

Cowering (GoodMerge7z + torrent7z) + TOSEC (torrent7z) + no-intro (torrent7z), deduplicated on 4096 bytes/block (typical filesystem) 5,394 files for a total of 946,782,208 bytes (~902Mb, 12.46%), took 4,972.689598 seconds, approx. 0.269924264109135 Mb/sec

This of course gets us a lot of space savings, compression alone makes the sets be only about 18% of the original size. But making this (only compression speed is tested, but decompression speed is not blazing fast either) takes a lot of time, almost an hour and a half. Deduplication over the compressed sets can still give more savings, because there is repeated data between sets (compression is per-ROM), and in the same set (only Cowering merges clones and hacks).

The next test I did was deduplication estimations only, using my tool.

Cowering + TOSEC + no-intro, deduplicated on 2048 bytes/block (CD/DVD/BD blocksize) 11,186 files for a total of 1,746,548,736 bytes (~1,665Mb, 22.99%), took 480.059208 seconds, approx. 15.0939714919498 Mb/sec

Cowering + TOSEC + no-intro, deduplicated on 4096 bytes/block (typical filesystem blocksize) 11,186 files for a total of 1,871,421,440 bytes (~1,784Mb, 24.63%), took 513.864285 seconds, approx. 14.1009994496893 Mb/sec

Cowering + TOSEC + no-intro, deduplicated on 131072 bytes/block (typical ZFS blocksize) 11,186 files for a total of 3,480,354,816 bytes (~3,319Mb, 45.80%), took 425.701966 seconds, approx. 17.0212979472122 Mb/sec

On the smallest blocksize, more duplicate blocks are found, giving biggest savings, but still quite far from compression. However, in any case, it’s about 10 times faster.

As Cowering contains lots of hacks, clones, and bad dumps, that may not be interesing for some people, I tested both compression and deduplication with the other sets alone.

TOSEC + no-intro, uncompressed 3,256 files for a total of 2,685,937,612 bytes (~2561Mb)

TOSEC (torrent7z) + no-intro (torrent7z) 3,256 files for a total of 795,382,502 bytes (~758Mb, 29.61%)

TOSEC + no-intro, deduplicated on 2048 bytes/block (CD/DVD/BD) 3,256 files for a total of 1,412,812,800 bytes (~1,347Mb, 52.60%), took 181.490064 seconds, approx. 14.1109653253525 Mb/sec

TOSEC + no-intro, deduplicated on 4096 bytes/block (typical filesystem) 3,256 files for a total of 1,491,755,008 bytes (~1,422Mb, 55.54%), took 168.831435 seconds, approx. 15.1689760855258 Mb/sec

TOSEC + no-intro, deduplicated on 131072 bytes/block (typical ZFS) 3,256 files for a total of 2,127,953,920 bytes (~2,029Mb, 79.23%), took 248.102553 seconds, approx. 10.3223444057023 Mb/sec

Things seem worse here for deduplication, being twice as big as compressed sets.

But all of this, is highly theoretical, so better test with a real life scenario. I created three ZFS pools, all of them with deduplication and compression enabled, for the three block sizes (2048 as in CD/DVD/BD, 4096 as typical by other filesystems and real sector size of Advanced Format hard disks and 131072, default ZFS blocksize). Note: ZFS calls its block size “recordsize”, and dynamically uses a bigger or smaller than the configured one, depending on the stored data.

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=2048 11,188 files for a total of 1,904,936,848 bytes (~1,817Mb, 25.07%), took 10,986.604 seconds, approx. 0.659580853138233 Mb/sec

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=4096 11,188 files for a total of 1,515,458,176 bytes (~1,445Mb, 19.94%), took 8,865.023 seconds, approx. 0.817432017876539 Mb/sec

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=131072 11,188 files for a total of 1,493,678,664 bytes (~1,424Mb, 19.66%), took 1,185.156 seconds, approx. 6.114430201097515 Mb/sec

While compression is called gzip, it should really be called deflate, as that’s the name of the algorithm, used in ZIP files, and theoretically, marginally faster and less powerful than LZMA (the 7zip algorithm). “-9” marks maximum compression.

Clearly ZFS does not behave very well when the recordsize is made smaller than the default, getting worse deduplication, worse compression, and marginally slower speeds (8 to 10 times slower).

Before achieving a conclusion, a thing about RAM usage must be noted. 7-zip RAM usage depends on dictionary size (and so, compression power), and only happens once (on compression). On decompression, RAM usage is marginally lower. On the contrary, deduplication ram usage depends on total blocks (unique and duplicate, with duplicate counting only as 1 block), and its usage is more or less permanent as long as the deduplicated volume is attached.

In case of ZFS, each block takes 320 bytes of RAM. For this set, meaning:

3710244 blocks of 2048 bytes each, taking 1Gb of RAM.

1855122 blocks of 4096 bytes each, taking 566Mb of RAM.

57972 blocks of 131072 bytes each, taking 18Mb of RAM.

Fortunately you can use a SSD to store the deduplication table, so instead of 1Gb/566Mb/18Mb of RAM it would be 1Gb/566Mb/18Mb of SSD.

More than a conclusion, a list of advantages and disadvantages should be noted:

Advantages of 7zip

  • Biggest space savings
  • RAM usage is not permanent, only when working with the archives

Disadvantages of 7zip

  • Slow as floppy drives
  • Merged archives need to be traversed and uncompressed fully before accessing one of the ROMs it contains
  • Not supported by any emulator afaik, and not completely supported by any ROM manager (Romcenter does not support them, Romulus says it does but it does not work, CLRMAME Pro does if not solid-compressed so defeating merging benefits)

Advantages of deduplication

  • Faster
  • Depending on deduplication software used (if it provides a live volume) it can allow ROMs to be accessed by emulators and managers

Disadvantages of deduplication

  • RAM usage MAY be permanent, depending on deduplication software

Advantages of ZFS with deduplication and compression enabled

  • Best of both worlds, space savings almost as high as 7zip and speeds almost as fast as deduplication alone
  • Supported in practically every operating system (but Windows)
  • Allows live access, so supported by emulators and ROM managers
  • RAM usage can be moved to SSD, that’s several times cheaper and bigger.

Disadvantages of ZFS with deduplication and compression enabled

  • Contrary to what deduplication alone can give, smaller blocks give less space savings and speed, and higher RAM and CPU usage.
  • Not supported at all by Windows
  • RAM usage is permanently 320 bytes per block

Comparing advantages vs disadvantages, ZFS seems the best solution for archival right now. While other filesystems may get same features in the future (btrfs for example), currently they don’t support them or are experimental or vendor-locked (btrfs is Linux-only)

An error has occurred. This application may no longer respond until reloaded. Reload x