Encryption and Compression

General Discussion

Blubox uses a set of user-configurable compression filters to encode data stored within Blubox Archives. When data is imported into an Archive, Blubox detects the type of data being imported and compress that data using the best suited compression technique available. After compression takes place, the compressed data is then encrypted and written to the storage file where it remains compressed and encrypted until it is exported again. Data is always compressed before it is encrypted as the effects of encryption will counteract the effects of the compression. That is, compression works by identifying patterns within data while the purpose of encryption is to remove patterns from data.

When importing a file, Blubox will determine if the file is an image file or not. If it is an image file, Blubox will then use Image Compression according to the settings supplied by the user.  Blubox rasterizes the image before applying Image Compression. All other types of files are compressed using binary compression.

Binary Compression

By default, Blubox compresses file data using a binary compression algorithm provided by Xceed Software, Inc. (http://xceed.com). This type of compression attempts to identify recurring patterns within the data and replace them with unique, shorter patterns. The compressed data contains a dictionary of the original patterns and the corresponding replacement patterns. Decompression is then a matter of simply undoing those replacements found in the dictionary.

Since compressed data is made up of a number of short, unique patterns, re-compressing the same data often produces a negligible to undesired effect. The main benefit of binary compression is that it can be applied to any type of data; However, some types of data generally compress better than others. Text and document files compress well due to that they typically contain highly redundant sequences of values (i.e. words and sentences). A shortcoming of binary compression is that multimedia data (images, videos, music) usually contains a wide range of the seemingly random values which negate the effects of this type of compression. A truly random set of values would not compress at all as there would be no patterns for the compression mechanism to replace.

Image Compression

Different types of compression have been developed to exploit the unique properties of multimedia data. Two primary methods of compression for image data are DCT (Discrete Cosine Transform) and Wavelet which can achieve more effective compression than is possible through the use of binary compression. Essentially, image compression takes advantage of that fact that an image can be altered to a certain extent and still be recognizable as the same image to another person. The altered/compressed image therefore conveys the same information or meaning as the original image.

Blubox combines Wavelet and DCT-based image compression which operates by reducing a closely grouped set of pixels of a similar color to a set of pixels of the same exact color. By repeating this throughout an image, long patterns are formed by new blocks of same colored pixels, thereby making the image more suitable for compression. Image compression mechanisms are often configurable to control the point at which the algorithm determines if a group of pixels is similar enough to try to lump together into a single block. A very relaxed setting allows the compression to larger, more compressible blocks while stricter controls will tend to make smaller blocks. One can actually see this effect in certain images which are highly compressed. The decompressed image will display “artifacts” appearing as blocky edges and seemingly random splotches of color.

This type of image compression technique requires that a certain amount of information needs to be discarded from the image. This is called lossy compression because information is lost from the image during the compression process. In contrast, binary compression is inherently lossless as the decompressed data is always exactly the same as the original data.

Binary Encryption

Encryption is almost the opposite of compression. Compression relies upon repeated patterns within the data while encryption tries to remove all patterns from the data. The whole point of encryption is to produce data which conveys no immediate information and defies interpretation. The success of an encryption algorithm depends on that ability to distort data into an unrecognizable and pattern-less series of values.

Encryption algorithms typically operate by applying a mathematical transformation on the data it is encrypting. Additionally, a unique password or encryption key is supplied to the algorithm to be incorporated into the transformation. Given the same password or key, this process can be reversed through decryption to recover the original data. Due to the deterministic nature of mathematical operations, an encryption algorithm usually undergoes years of professional and academic scrutiny before it is accepted as a viable encryption mechanism.

The size of the encryption key used by an algorithm is often used to gauge the strength or protectiveness of that algorithm. The reasoning behind this is that there are more combinations of key values in longer keys than in shorter keys, thus making any single encryption key more difficult to guess using the brute force method (trying every possible key until the correct one is found). It is important to remember that the size of an encryption key is irrelevant if the underlying algorithm itself is faulty.

Blubox uses the TEA algorithm with a 256 bit (32 character) key.