next previous
Up: Data streams from the efficiency

9 Proposed coding and compression scheme

The basic principle of the first method named Least Significant Bits Packing (LSBP) is to send only those bits of the 16 bits output from the ADC which are affected by the signal and the noise. This is effective for the nominal mission since with the planned quantization step of 0.3 mK/adu, at one sigma the noise will fill about 21 levels, this will require at least 5 bits over 16 and it is reasonable to expect a final data flow equivalent to $\mbox{$C_{{\rm r},1}$ } < 3$. It is not possible to improve much the compression rate by compressing the resulting 5 bits data stream, since its entropy would be H < 5.4 bits and $\mbox{$C_{{\rm r}}$ } \lesssim 1.08$.

In order to ensure the compression to be lossless all the samples exceeding the [$-\sigma$, $+\sigma$] (5 bits) range have to be sent separately coding at the same time: their position (address) in the stream vector and their value. So, for $\mbox{$N_{{\rm bits}}$ } < 16$ bits corresponding to a threshold $\mbox{$x_{{\rm th}}$ } = 2^{\scriptsize\mbox{$N_{{\rm bits}}$ }}$, each group of samples stored into a packet is partitioned into two classes accordingly with their value x:

Regular Samples (RS) $\langle$def$\rangle$ all those samples for which: $\vert x\vert \leq \mbox{$x_{{\rm th}}$ }$,
Spike Samples (SS) $\langle$def$\rangle$ all those samples for which: $\vert x\vert \geq \mbox{$x_{{\rm th}}$ }.$

The coding process then consists of two main steps: i) to split the data stream in Regular and Spike Samples preserving the original ordering in the stream of Regular Samples, ii) to store (send) the first $N_{{\rm bits}}$ bits of the regular samples and, in a separated area, the 16 bits values and the location in the original data stream of each Spike Sample, i.e. Spike Samples will require more space to be stored than regular ones. The decoding process will be the reverse of this packing process.

In this scheme each packet will be divided into two main areas: the Regular Samples Area (RSA) which hold the stream of Regular Samples, the Spike Sample Area (SSA) which hold the stream of Spike Samples, plus a number of fields which will contain packing parameters such as: the number of samples, the number of regular samples, the offset, etc. Since the number of samples in each area will change randomly it will be not possible to completely fill a packet. The filling process will leave an empty area in the packet in average smaller than $N_{{\rm bits}}$.

In Maris (1999a) a first evaluation for the 30 GHz channel is given assuming that the signal is composed only of white noise plus the CMB dipole. As noticed in Sect. 7.2 the cosmological dipole affects the compression efficiency reducing it of a small amount. To deal with it a possible solution would be to subdivide each data stream in packets, subtract to each measure of a given packet the integer average of samples (computed as a 16 bits integer number) and then compress the residuals. Each integer average will be sent to Earth together with the related packet where the operation will be reversed. Since all the numbers are coded as 16 bits integers all the operations are fully reversible and no round off error occurs. However it cannot be excluded that the computational cost of such operation will compensate the gain in $C_{{\rm r}}$.

Two schemes are proposed to perform the cosmological dipole self-adaptement. In Scheme A the average of samples in the packet are subtracted before coding and then sent separately. In Scheme B $x_{{\rm th}}$ is varied proportionally to the dipole contribution. Both of them assumes that the dipole contribution is about a constant over a packet length. From this assumption: $Lpacket \lesssim 200$ samples i.e. Lpacket < 512 bytes, since for Lpacket > 512 bytes the cosmic dipole contribution can not be considered as a time constant. For larger packets a better modeling (i.e. more parameters) will be required in order not to degrade the compression efficiency.

A critical point is to fix the best $x_{{\rm th}}$, i.e. $N_{{\rm bits}}$, for a given signal statistics, coding scheme and packet length $L_{{\rm p}}$. Even here $C_{{\rm r}}$ grows with the packet length but it does not change monotously with $x_{{\rm th}}$. An increase in $x_{{\rm th}}$ ( $N_{{\rm bits}}$) decreases the number of spike samples, but increases the size of each regular sample. While the opposite occurs when $x_{{\rm th}}$ is decreased, and when $\mbox{$N_{{\rm bits}}$ }<4$ bits $\mbox{$C_{{\rm r}}$ } < 1$. For both the schemes the optimality is reached for $\mbox{$N_{{\rm bits}}$ } = 6$ bits, but Scheme A is better than B, with: $\mbox{$C_{{\rm r}}$ }(\mbox{{\em Scheme A}}$, $Lpacket =
512 \mbox{ bytes}) = 2.61$, $\mbox{$C_{{\rm r}}$ }(\mbox{{\em Scheme B}}$, $Lpacket =
512 \mbox{ bytes}) = 2.29$.

Compared with arith-n1, this compression rate is worse by about a $14 - 30\%$. This is due to two reasons: i) coding by a threshold cut is less effective than to apply an optimized compressor; ii) the results reported in Tables 6-9 refer to the compression of a full circle of data instead of a small packet, resulting in a higher efficiency. However, the efficiency of this coding method is similar to the efficiency of the bulk of the other true loss-less compressors tested up to now, and when the need to send a decoding table is considered, is even higher. A compression scheme based on the same principle, but with a different organization of fields, has been proposed also by Guzzi & Silvestri (1999) which report a similar compression efficiency. The second possible solution to the packeting problem is to use one or more standardized coding tables for the compression scheme of choice (Maris 1999b). In this case the coding table would be loaded into the on-board computer before launch or time by time in flight and the table should be known in advance at Earth. Major advantages would be: 1. the coding table has not to be sent to Earth; 2. the compression operator will be reduced to a mapping operator which may be implement as a tabular search, driven by the input 8 or 16 bits word to be compressed; 3. any compression scheme (Huffman, arithmetic, etc.) may be implemented replacing the coding table without changes to the compression program; 4. the compression procedure may be easily written in C or the native assembler language for the on-board computer or, alternatively, a simple, dedicated hardware may be implemented and interfaced to the on-board computer. The disadvantages of this scheme are: 1. each table must reside permanently in the central computer memory unless a dedicated hardware is interfaced to it; 2. it is difficult to use adaptive schemes in order to tune the compressor to the input signal, as a consequence the $C_{{\rm r}}$ may be somewhat smaller than in the case of a true self-adapting compressor code.

The first problem may be circumvented limiting the length of the words to be compressed. In our case the data streams may be divided in chunks of 8 bits and the typical table size would be $\lesssim 1$ Kbyte. Precomputed coding tables may be accurately optimized by Monte-Carlo simulations on ground or using signals from ground tests of true hardware.

The second problem may be overcome by using a preconditioning stage, reducing the statistics of the input signal to the statistics for which the pre-calculated table is optimized. In addition more tables may reside in the computer memory and selected looking to the signal statistics. With a simple reversible statistical preconditioner, about ten tables per frequency channel would be stored in the computer memory, so that the total memory occupation would be less than about 40 Kbytes. It cannot be excluded that the two methods just outlined can be merged.

next previous
Up: Data streams from the efficiency

Copyright The European Southern Observatory (ESO)