Neal, thanks for the feedback. After taking your comments into consideration, here's version 2.
+-+-+-+-+-+------------------+-+-+-+-+-+-+-+-+ | ID | Compression type | Index size | +-+-+-+-+-+------------------+-+-+-+-+-+-+-+-+
+==================+=================+ | Compressed Index | Compressed Dict | +==================+=================+
+===========+===========+ | Chunk | Chunk | ==> More chunks +===========+===========+
ID '\0ZCK1', identifies file as zchunk version 1 file
Compression type Type of compression used to compress dict and chunks
Current values: 0 - Uncompressed 2 - zstd
Index size This is a 64-bit unsigned integer containing the size of compressed index.
Compressed Index This is the index, which is described in the next section. The index is compressed without a custom dictionary.
Compressed Dict (optional) This is a custom dictionary used when compressing each chunk. Because each chunk is compressed completely separately from the others, the custom dictionary gives us much better overall compression. The custom dictionary is compressed without a custom dictionary (for obvious reasons).
Chunk This is a chunk of data, compressed with the custom dictionary provided above.
The index:
+---------------+======================+ | Checksum type | Checksum of all data | +---------------+======================+
+================+-+-+-+-+-+-+-+-+ | Dict checksum | End of dict | +================+-+-+-+-+-+-+-+-+
+================+-+-+-+-+-+-+-+-+ | Chunk checksum | End of chunk | ==> More +================+-+-+-+-+-+-+-+-+
Checksum type This is the type of checksum used to generate the checksums in the index.
Current values: 0 = SHA-256
Checksum of all data This is the checksum of the compressed dict and all the compressed chunks, used to verify that the file is actually the same, even in the unlikely event of a hash collision for one of the chunks
Dict checksum This is the checksum of the compressed dict, used to detect whether two dicts are identical. If there is no dict, the checksum must be all zeros.
End of dict This is the location of the end of the dict starting from the end of the index. This gives us the information we need to find and decompress the dict. If there is no dict, the checksum must be all zeros.
Chunk checksum This is the checksum of the compressed chunk, used to detect whether any two chunks are identical.
End of chunk This is the location of the end of the chunk starting from the end of the index. This gives us the information we need to find and decompress each chunk.
The index is designed to be able to be extracted from the file on the server and downloaded separately, to facilitate downloading only the parts of the file that are needed, but must then be re-embedded when assembling the file so the user only needs to keep one file.