The Basic Principles of Data Compression

The basic principles of data compression are set out to achieve a reduction in file size by encoding data more efficiently. One type of data compression available is referred to as lossless compression.

This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. The importance of this is paramount as the file would be corrupted and unusable should data be lost. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Another compression category which is often used in multimedia files for music and images (e.g. JPEG files) and where data is discarded is referred to as “lossy” compression. In this class of data encoding methods, inexact approximations (or partial data discarding) are used to represent the content. These techniques are simply used to reduce data size for storage, handling and transmitting content. 

The advantages of compressing files can be huge as the quantity of ‘bits’ used to store the information is massively reduced. This, in effect, means that compressed files take up much less storage space. File compression can also zip up several small files into a single file for more convenient email transmission as smaller files in size will result in smaller transmission times when they are transferred on the internet. 

When there are a large number of files involved, compression can be a mathematically intense and time-consuming process. With so many compression algorithm variants, a user downloading a compressed file may not have the necessary program to un-compress it. Varying levels of compression can be offered by some compression algorithms, with the higher levels achieving a smaller file size but taking up an even longer amount of compression time. It is a system intensive process that takes up valuable resources that can sometimes result in “out of memory” errors. 

Data compression is extremely important in the computing world and it is commonly used by many applications. In providing a brief overview on how compression works in general, it is hoped this blog allows users of data compression to weigh the advantages and disadvantages when working with it.

Register your place to attend our 8th Big Data Analytics conference, happening on 12 November 2015, by visiting:

Author: Dan Myers