This section describes the evaluation protocol and the experimental results of the compression of simulated data streams for PLANCK-LFI.
First tests were performed on a HP-UX workstation on four compressors (Maris et al. 1998) but given the limited number of off-the-shelf compression codes for such platform, we migrated the compression pipeline on a Pentium III based Windows/NT workstation.
As described in Sect. 2 the signal composition is defined by many components, both astrophysical and instrumental in origin. In particular, it is important to understand how each component or instrumental parameter, introducing deviations to the pure white noise statistics, affects the final compression rate.
To scan systematically all the relevant combinations of signal compositions and off-the-shelf compressors, a Compression Pipeline was created. The pipeline is based on five main components: the signal quantization pipeline, the signal database, the compression pipeline, the compression data base, the post-processing pipeline. The signal quantization pipeline performs the operations described in the upper part of Fig. 1. The simulated astrophysical signals are hold in a dedicated section of the signal archive, they are processed by the quantization pipeline and stored back in a reserved section of the signal archive. So quantized data streams are generated for each relevant combination of the quantization parameters, signal composition and sky pointing.
Each compressor is then applied by the compression pipeline to the full set of quantized signals in the signal archive. Results, in terms of compression efficiency as a function of quantization parameters are stored in the compression database. The statistical analysis of Sect. 5 are performed with a similar pipeline.
Finally the post-processing pipeline scans the compression data base in order to produce plots, statistics, tables and synthetic fits. Its results are usually stored into one of the two databases.
The pipeline is managed by PERL 5.004 script files which drive
FORTRAN, C, IDL programs or on-the-shelf utilities gluing and
coordinating their activities. Up to
lines of
repetitive code are required per simulation run. They are
generated by a specifically designed Automated Program Generator
(APG) written in IDL (Maris et al. 1998).
The APG takes as an input a table which
specifies: the set of compressors to be tested, the set of
quantization parameters to be used, the order in which to perform
the scan of each parameter/compressor, the list of supporting
programs to be used, other servicing parameters. The program
linearizes the resulting parameter space and generates the PERL simulation code or, alternatively, performs other operations such
as: to scan the results data base to produce statistics, plots,
tables, and so on. The advantage of this method is that a large
amount of repetitive code, may be quickly produced, maintained or
replaced with a minor effort each time a new object (compressor,
parameter or analysis method) is added to the system.
Purpose of these compression tests is to give an upper limit to the lossless compression efficiency for LFI data and to look for an optimal compressor to be proposed to the LFI consortium.
A decision about the final compression scheme for PLANCK-LFI has not been taken yet and only future studies will be able to decide if the best performing one will be compatible with on-board operations (constrained by: packet independence and DPU capabilities) and will be accepted by the PLANCK-LFI collaboration.
For this reason up to now only off-the-shelf compressors and hardware where considered. To test any reasonable compression scheme a wide selection of lossless compression algorithms, covering all the known methods, was applied to our simulated data. Lacking a comprehensive criteria to fix a final compressor, as memory and CPU constrains, we report in a compact form the results related to all the tested compressors. We are confident that the experience which will be gained inside the CMB community developing ground based experiments (i.e. Lasenby et al. 1998), long duration flight balloon experiments (i.e. De Bernardis & Masi 1998) such as BOOMERANG (De Bernardis et al. 2000; Lange et al. 2000) and MAXIMA (Hanany et al. 2000; Balbi et al. 2000), and specialized space missions such as the Microwave Anisotropy Probe (MAP) (Bennet et al. 1996), joined to the experience which will be gained inside the PLANCK collaboration developing on-board electronics prototypes, will provide us with a more solid base to test and improve the final compression algorithms applied to real data.
Tables 4
and 5
list the selected compression programs.
Macro | Code | Parameters | Note |
ahuff-c | ahuff-c | Adaptive Huffman (Nelson & Gailly 1996) | |
AR | ar | ||
arc | arc | ||
arha | arhangel | http://www.geocities.com/SiliconValley/Lab/6606 | |
arhaASC | " | -1 | ASC method |
arhaHSC | " | -2 | HSC method |
arith-c | arith-c | Arithmetic coding (Nelson & Gailly 1996) | |
arith-n | arith-n | Adaptive Arithmetic Coding (AC) (Nelson & Gailly 1996) | |
arith-n0 | " |
![]() |
Zeroth order Arithmetic coding |
arith-n1 | " |
![]() |
First order AC |
arith-n2 | " |
![]() |
Second order AC |
arith-n3 | " |
![]() |
Third order AC |
arith-n4 | " |
![]() |
Fourth order AC |
arith-n5 | " |
![]() |
Fifth order AC |
arith-n6 | " |
![]() |
Sixth order AC |
arith-n7 | " |
![]() |
Seventh order AC |
arj | arj | ||
arj0 | " |
![]() |
method 0 (no compression) |
arj1 | " |
![]() |
method 1 |
arj2 | " |
![]() |
method 2 |
arj3 | " |
![]() |
method 3 |
arj4 | " |
![]() |
method 4 |
boa | boa | ||
bzip | bzip2090 | ||
bziprb | " | -repetitive-best | best compression of repetitive blocks |
bziprf | " | -repetitive-fast | fast compression of repetitive blocks |
gzip1 | gzip | -1 | fast compression |
gzip9 | " | -9 | best compression |
huff-c | huff-c | Hauffman (Nelson & Gailly 1996) | |
jar | jar32 | ||
jar1 | " | ![]() |
method 1 |
jar2 | " | ![]() |
method 2 |
jar3 | " | ![]() |
method 3 |
jar4 | " | ![]() |
method 4 |
lha | lha | ||
lzss | lzss | ||
lzw12 | lzw12 | ||
lzw15v | lzw15v |
Macro | Code | Parameters | Note |
pkzip | pkzip | from PKWARE | |
pkzip-ef | " | -ef | fast compression |
pkzip-en | " | -en | normal compression |
pkzip-es | " | -es | super fast compression |
pkzip-ex | " | -ex | extra compression |
rar-m0 | rar | -m0 | level 0 compression |
rar-m1 | " | -m1 | level 1 compression |
rar-m2 | " | -m2 | level 2 compression |
rar-m3 | " | -m3 | level 3 compression |
rar-m4 | " | -m4 | level 4 compression |
rar-m5 | " | -m5 | level 5 compression |
splint | splint | ||
SZIP00 | szip | Rice Algorithm and Rice compression chip simulator | |
szip0ec | " | -ec | entropy coding compression mode |
szip0nu | " | -nn | nearest neighbor compression mode |
szipc0 | " | -chip | compress exactly as chip |
SZIPCEC | " | -chip -ec | as szip0ec + chip compression |
SZIPCNU | " | -chip -nn | as szip0nu + chip compression |
uses | uses | -n 16 -s 64 -rr | Universal Source Encoder for Space |
16 bits per sample, | |||
64 samples for scanline, | |||
correlates near samples (CNS) | |||
uses008 | " | -n 16 -s 8 -j 8 | 8 samples, 8 samples per block |
uses008rr | " | -n 16 -s 8 -rr -j 8 | as uses008 + CNS |
uses016 | " | -n 16 -s 16 | 16 samples per block |
uses016rr | " | -n 16 -s 16 -rr | 16 samples per block + CNS |
uses032 | " | -n 16 -s 32 | 32 samples per block |
uses032rr | " | -n 16 -s 32 -rr | 32 samples per block + CNS |
uses064 | " | -n 16 -s 64 | 64 samples per block |
uses064rr | " | -n 16 -s 64 -rr | 64 samples per block + CNS |
uses320 | " | -n 16 -s 320 | 320 samples per block |
uses320rr | " | -n 16 -s 320 -rr | 320 samples per block + CNS |
uses960 | " | -n 16 -s 960 | 960 samples per block |
uses960rr | " | -n 16 -s 960 -rr | 960 samples per block + CNS |
zoo | zoo |
To evaluate the performances of each compressor, figures of
merit are drawn like the one in Fig. 5 which
shows the results for the best performing compressor: arith-n1.
The only noticeable (i.e. about )
effect due to an increase in
the signal complexity, occurs when the cosmic dipole is added. In
the present signal the dipole amplitude is comparable with the
white noise amplitude (
mK) so its effect is to
distort the sample distribution, making it leptocurtic. As a
consequence compressors, which usually work best for a normal
distributed signal, becomes less effective. Since the dipole
introduces correlations over one full scan circle, i.e. few
103 samples, while compressors establish the proper coding
table observing the data stream for a small set of consecutive
samples (from some tens to some hundred samples), even a self
adaptive compressor will likely loose the correlation introduced
by the dipole. A proper solution to this problem is suggested in
Sect. 9. The other signal components do
not introduce any noticeable systematic effect. The small
differences shown by the figures of merit may be due to the
compression variance and depend strongly on the compressor of
choice. As an example a given compressor may be more effective to
compress the simulated data stream with the full signal than the
associated simpler data stream containing only white noise, 1/fnoise, CMB and dipole. At the same time another compressor may
show an opposite behaviour.
As shown by Fig. 6, and as expected from
Eq. (9) increasing
,
i.e. increasing the
quantization step, increases the compression rate.
![]() |
Figure 6:
Compression rates for arith-n1 as a function of
the
![]() ![]() ![]() ![]() ![]() |
The linear dependency of
over
is a direct
consequence of Eq. (9), and is confirmed by
a set of tests performed over the full set of our numerical
results for the compression efficiency, the rms residual
between the best fit (12) and simulated data being
less than
,
in almost the
of the cases and less
than
in
of the cases. The dependencies of its
parameters
and
over
are obtained
by a test-and-error method performed on our data set and we did
not investigate further on their nature. For all practical
purposes our analysis shows that these functions are well
approximated by a series expansion:
Since an accuracy of some percent in determining the free
parameters of
is enough, the fitting
procedure was simplified as follow. For a given compressor, signal
component, swap status, and
value
and
where determined by a
fitting procedure.
The list of
and
as a function of
have been fitted by using relations (13)
and (14) respectively. The fitting algorithm tests different degrees of the
polynomial in the aforementioned relations (up to 2 for
,
up to 5 for
)
stopping when
the maximum deviation of the fitted relation respect to the data
is smaller than
for
or 0.0001 for
,
or when the maximum degree is reached.
Tables 6-9
report the results of the compression exercise ordered for decreasing
.
![]() |
Many compressors are sensitive to the ordering of the Least and
Most Significant Bytes of a 16 bits word in the computer memory
and files. Two ordering conventions are assumed:
little Endian memory layout
i.e. Least Significant Byte is stored First or
big Endian memory layout
i.e. Most Significant Byte is stored First;
Intel processors, as PENTIUM III,
work with little Endian memory layout.
For this reason each test was
repeated twice, one time with the original data stream file, i.e.
with the little Endian configuration, and the other after
swapping bytes, so to simulate big Endian configuration.
If the gain in
after
byte swapping
is bigger than some percent,
big Endian
compression is reported,
otherwise the
little Endian
is reported. These two cases are distinct by the second column of
Tables 6-9
which is marked with a y if
swapping
is applied before compressing. It is
interesting to note that not only 16 bits compressors, such as
uses, are sensitive to swapping. Also many 8 bits
compressors are sensitive to it,
probably
this is due to the fact that if the most probable 8 bits symbol is presented first at
the compressor a slightly better balanced coding table is built.
It should be noted that the coefficients reported here are
obtained compressing one or more full scan circles at a time, so
their use to extrapolate
when each scan circle is divided in
small chunks which are separately compressed has to be performed
carefully, especially for
V/K where some
extrapolated
grows instead of to decrease for a decreasing
as in most of the cases. However we did not investigate
further the problem because the time required to perform all the
tests over all the compressors increases decreasing
,
and
because up to now a final decision about the packet length has not
been made yet. Moreover, short data chunks introduce other
constrains which are not accounted for by Eq. (9)
but which are discussed in Sect. 8.
Apart from the choice of the best compressor, Tables 6 to 9 allow interesting comparisons.
The performances of the arithmetic compression arith are very sensitive to changes in the coding order n = 0, ,
7. The computational weight grows with n, while
is
minimal at n=0, maximal for n=1 and decreases increasing nfurther.
Both non-Adaptive Huffman (huff-c) and Adaptive Huffman (ahuff-c) are in the list of the worst compressors, considering both the pure white noise signal and the full signal.
We implemented the space-qualified uses compressor with a
wide selection of combinations of its control parameters: the
number of coding bits, the number of samples per block, the
possibility to search for correlations between neighborhood
samples. We report the tests for 16 bits coding only, changing the
other parameters. Uses is very sensitive to byte unswapping,
when not performed uses does not compress at all. On the
other hand, opposite to arith the sensitivity of the final
to the various control parameters is small or negligible. In
most cases
differs of less than 0.01 for changing the
combination of control parameters, such changes are not displayed
by the two digits approximation in the tables, but they are
accounted for by the sorting procedure which fixes the table
ordering. At 30 GHz most of the tested compressors cluster around
and at this level arith-n3 is as good as uses.
At 100 GHz the best uses macros clusters around
,
equivalent to arith-n2 performance. In our tests uses performs worst at 8 samples per block without
correlation search, but apart from it, in our case the correlation
search does not improve significantly the compression
performances. Some commercial programs such as boa, bzip compress better than uses.
Copyright The European Southern Observatory (ESO)