A realistic estimation of the compression efficiency must be based on a quantitative analysis of the signal statistics, which includes: statistics of the binary representation (Sect. 5.1), entropy (Sect. 5.2) and normality tests (Sect. 5.3).
Figure 3 represents the frequency distribution of
symbols when the full data stream of 60 scan circles is divided
into 8 bits words. Since for most of the samples the range spans
over
levels (5 bits) only the bytes corresponding to
the MSB words assume a limited range of values producing the
narrow spike in the figure.
From the distribution in Fig. 3 one may wonder if
it would not be possible to obtain a more effective compression
splitting the data stream into two substreams: the MSB substream
(with compression efficiency
)
and the LSB substream (with
compression efficiency
). Since the two components are so
different in their statistics, with the MSB substream having an
higher level of redundancy than the original data stream, it would
be reasonable to expect that the final compression rate
be greater than the compression rate obtained compressing
directly the original data stream. We tested this procedure taking
some of the compressors considered for the final test. From these
tests it is clear that
but since most of the
redundancy of the original data stream is contained in the MSB
substream the LSB substream can not be compressed in an effective
way, as a result
and
.
So the best way to perform an efficient compression is to apply
the compressor to the full stream without performing the MSB/LSB
separation. Apart from these theoretical considerations, we
performed some tests with our simulated data stream confirming
these result.
Equation (9) is valid in the limit of a
continuous distribution of quantization levels. Since in our case
the quantization step is about one tenth of the signal rms this is
no longer true. To properly estimate the maximum compression
rate attainable from these data we evaluate the entropy of the
discretized signal using different values of the
.
Our entropy evaluation code takes the input data stream and
determines the frequency fs of each symbol s in the
quantized data stream and computing the entropy as:
where s is the symbol index. In our simulation we
take both 8 and 16 bits symbols (s spanning over 0,
,
255 and 0,
,
). Since in our scheme the ADC
output is 16 bits, we considered 8 bits symbols entropy both for
the LSB and MSB 8 bits word and 8 bits entropy after merging the
LSB and MSB significant bits set.
As expected, since
merely shifts the quantized signal
distribution, entropy does not depend on
.
For this reason we
take
V, i.e., no shift.
Table 2 reports the 16 bits entropy as a function
of
,
composition and frequency.
30 GHz, White Noise | ||||||
Entropy (bits) |
![]() |
|||||
![]() |
Total | Mean | RMS | Total | Mean | RMS |
16 | 5.1618 | 3.5596 | 0.1989 | 3.10 | 4.49 | 0.251 |
32 | 5.1618 | 4.1815 | 0.1658 | 3.10 | 3.83 | 0.152 |
64 | 5.1618 | 4.6108 | 0.1262 | 3.10 | 3.47 | 0.095 |
135 | 5.1618 | 4.8791 | 0.0890 | 3.10 | 3.28 | 0.060 |
8640 | 5.1618 | 5.1561 | 0.0114 | 3.10 | 3.10 | 0.007 |
17280 | 5.1618 | 5.1589 | 0.0061 | 3.10 | 3.10 | 0.004 |
30 GHz, Full Signal | ||||||
Entropy (bits) |
![]() |
|||||
![]() |
Total | Mean | RMS | Total | Mean | RMS |
16 | 5.5213 | 3.5602 | 0.1982 | 2.90 | 4.49 | 0.250 |
32 | 5.5213 | 4.1849 | 0.1664 | 2.90 | 3.82 | 0.152 |
64 | 5.5213 | 4.6162 | 0.1278 | 2.90 | 3.47 | 0.096 |
135 | 5.5213 | 4.8885 | 0.0893 | 2.90 | 3.27 | 0.060 |
8640 | 5.5213 | 5.5119 | 0.0176 | 2.90 | 2.90 | 0.009 |
17280 | 5.5213 | 5.5157 | 0.0118 | 2.90 | 2.90 | 0.006 |
100 GHz, White Noise | ||||||
Entropy (bits) |
![]() |
|||||
![]() |
Total | Mean | RMS | Total | Mean | RMS |
16 | 5.7436 | 3.6962 | 0.1740 | 2.79 | 4.33 | 0.204 |
32 | 5.7436 | 4.4174 | 0.1521 | 2.79 | 3.62 | 0.125 |
64 | 5.7436 | 4.9627 | 0.1230 | 2.79 | 3.22 | 0.080 |
135 | 5.7436 | 5.3354 | 0.0875 | 2.79 | 3.00 | 0.049 |
8640 | 5.7436 | 5.7352 | 0.0115 | 2.79 | 2.79 | 0.006 |
17280 | 5.7436 | 5.7394 | 0.0063 | 2.79 | 2.79 | 0.003 |
100 GHz, Full Signal | ||||||
Entropy (bits) |
![]() |
|||||
![]() |
Total | Mean | RMS | Total | Mean | RMS |
16 | 5.8737 | 3.6970 | 0.1734 | 2.72 | 4.33 | 0.203 |
32 | 5.8737 | 4.4186 | 0.1526 | 2.72 | 3.62 | 0.125 |
64 | 5.8737 | 4.9655 | 0.1224 | 2.72 | 3.22 | 0.079 |
135 | 5.8737 | 5.3419 | 0.0887 | 2.72 | 3.00 | 0.050 |
8640 | 5.8737 | 5.8604 | 0.0180 | 2.72 | 2.73 | 0.008 |
17280 | 5.8737 | 5.8655 | 0.0127 | 2.72 | 2.73 | 0.006 |
The entropy distribution per chunck is approximately described by a
normal distribution (see Fig. 4),
so the mean entropy and its rms are enough to characterize the results.
The mean entropy measured over one scan circle
(
samples) coincides with the entropy
measured for the full set of 60 scan circles, the entropy rms
being of the order of 10-2 bits. Consequently the expected
rms for
compressing one or more circles at a time will be
less than
.
The mean entropy and its rms are not independent quantities.
Averaged entropy decreases as
decreases, but
correspondingly the entropy rms increases. As a consequence
the averaged
decreases decreasing
,
but the fraction
of chunks in which the compressor performs significantly worst than
in average increases. The overall compression rate,
i.e. the
referred to the full mission, being affected by them.
Since normal distribution of signals is assumed in Sect. 4 it would be interesting to fix how much the digitized signal distribution
deviates from the normality.
Also it would be important to characterize the influence
of the 1/f noise and of the other signal components, especially
the cosmic dipole, in the genesis of such deviations. To obtain an
efficient compression it would be important that the samples are
as more as possible statistically uncorrelated and normally
distributed. In addition one should make sure that the detection
chain does not cause any systematic effect which will introduce
spurious not normal distributed components. This is relevant not
only for the compression problem itself, which is among the data
processing operations the least sensitive to small deviations from
the normal distribution, but also in view of the future data
reduction, calibration and analysis. For them the hypothesis of
normality in the signal distribution is very important in order to
allow a good separation of the foreground components. Last but not
least, the hypothesis of conservation of normality along the
detection chain, is important for the scientific interpretation of
the results, since the accuracy expected from the PLANCK-LFI
experiment should allow to verify if really the distribution of
the CMB fluctuations at
is normal, as predicted by
the standard inflationary models, or as seems suggested by
recent 4 years COBE/DMR results
(Bromley & Tegmark 1999; Ferreira et al. 1999).
For this reason a set of normality tests was applied to the different components of the simulated signal before and after digitization in order to characterize the signal statistics and its variation along the detection process. Of course this work may be regarded as a first step in this direction, a true calibration of the signal statistics will be possible only when the front end electronics simulator will be available. Those tests have furthermore the value of a preparation to the study of the true signal.
Normality tests were applied on the same data streams used for data compression. Given on board memory limits, it is unlikely that more than a few circles at a time can be stored before compression, so statistical tests where performed regarding each data stream for a given pointing, as a collection of 60 independent realizations of the same process. Of course this is only approximately true. The 1/f noise correlates subsequent scan circles, but since its rms amplitude per sample is typically about one-tenth of the white noise rms or less, these correlations can be neglected in this analysis.
Starting from the folded data streams a given normality test was applied to each set of 60 realizations for each one of the 8640 samples, transforming the stream of samples in a stream of test results for the given test. The cumulative distribution of frequency was then computed over the 8640 test results. Since 60 samples does not represent a large statistics, significant deviations from theorethically evaluated confidence levels are expected resulting in an excessive rejection or acceptation rates. For this reason each test was calibrated applying it to the undigitized white noise data stream. Moreover, in order to analyze how the normality evolves increasing the signal complexity, tests was repeated increasing the information content of the generated data stream.
To simplify the discussion we considered as a reference test the
usual Kolmogorow - Smirnov D test from
Press et al. (1986) and we fix a
acceptance
level. The test was "calibrated'' using the Monte-Carlo white
noise generator of our mission simulator in order to fix the
threshold level
as the D value for which
more than
of our samples show
.
From Table 3 the quantization effect is evident, at
twice the nominal quantization step (
V/K) in
of the
samples (i.e. 2592 samples) the distribution of realizations
deviates from a normal distribution (
).
![]() |
|||
1.220 | 0.610 | 0.406 | |
![]() |
0.28 | 0.70 | 0.84 |
![]() |
0.27 | 0.71 | 0.86 |
![]() |
0.2449 | 0.1851 | 0.1678 |
![]() |
0.95 | 0.95 | 0.95 |
As for the entropy distribution and the binary statistics, even in this case most of the differences between the results obtained for a pure white noise signal and the full signals are explained by the presence of the cosmological dipole. However these simulations are not accurate enough to draw any quantitative conclusions about the distortion in the sampling statistics induced by digitization, but they suggest that to approximate the instrumental signal as a quantized white noise plus a cosinusoidal term associated to the cosmic dipole is more than adequate in order to understand the optimal loss-less compression rate achievable in the case of the PLANCK-LFI mission.
Copyright The European Southern Observatory (ESO)