Notwithstanding this huge success, there is good reason for dissatisfaction. Indeed, the selfcal algorithm is a scalar one and therefore fundamentally incompatible with the vector nature of the electromagnetic radiation field. Observers have nonetheless found their way through the polarization landscape, following a narrow path marked by four guide-posts:

The time has come to leave the quasi-scalar trail and find a less restrictive way of navigating in polarization land. A suitable type of vehicle has been proposed in Paper I of this series ([Hamaker et al. 1996]). Its principle is to abandon the notion of scalar electromagnetic signals and visibilities in favour of signal 2-vectors and coherency 4-vectors, and to represent their transformations by multiplications with matrices. This results in a simple, modular and complete description of the system without the need for simplifying assumptions.

In matters of calibration, Paper I looks backward, showing how the traditional methods can be described and justified in terms of the matrix/vector formalism. Paper II ([Sault et al. 1996]) takes a first step forward by showing how our treatment can be used to view an interferometer array as a single imaging instrument, and discussing the fundamental limitations to which such an array and its calibration are necessarily subject. Here, I zoom in on that part of the calibration process that Paper II takes for granted: My purpose is to develop a comprehensive matrix-based theory of self-calibration and find out how similar and/or different matrix selfcal is from the scalar selfcal that we know.

For this particular purpose I find that a representation of coherency and brightness in the form of matrices is more convenient than the vector representation of the preceding papers. However, such matrices are difficult to visualise. A third equivalent representation, that of Stokes parameters, is much more enlightening and therefore widely used. Outside its physical context, the Stokes representation is valid for any $2 \times 2\$ matrix; this leads to the mathematical concept of quaternions: "hypercomplex'' numbers composed of a scalar and a three-vector. Quaternions have multiplication rules of their own which can be used in analysing matrix products in more detail. This proves to be a powerful tool for studying the effects represented by the matrix equations.

In its essence, the quaternion concept entails little more than a simple extension of undergraduate-level linear algebra, but it is new to radio astronomy and may take some effort to digest. Therefore, although derivations and proofs are an essential part of this paper, I have relegated them to appendices. The main text concentrates on the results and their interpretation. For some of the mathematical effects and properties that the analysis uncovers, I have chosen to introduce suitable polarimetry-specific terminology.

The layout of the paper is briefly as follows: Section 2 establishes the basic mathematical components: coherency, Jones and brightness matrices and the Stokes brightness vector/quaternion.

Sections 3 to 5 develop the matrix form of self-calibration by exploiting the close analogies between scalar and matrix algebras. It turns out that the matrix form provides a "calibration'' that is seriously incomplete: self-alignment describes more accurately what the algorithm actually achieves. An arbitrary poldistortion is left undefined; it is an in-place transformation of the brightness, composed of a polrotation of the polvector (whose components are the Stokes Q, U and V brightnesses), and a polconversion between the polvector and total intensity I.

Section 6 considers the elimination of poldistortion through the use of unpolarized calibrators, supplemented with prior knowledge about the feed and/or additional observations. It confirms, reinterprets amd extends the results of Paper II.

Section 7 discusses quasi-scalar calibration methods in the perspective of the matrix approach. It is shown that most of the concepts revealed by the latter also appear in one form or another in the quasi-scalar context. Recent attempts at calibrating without recourse to an unpolarized calibrator are also discussed and evaluated.

A new option that the matrix formalism offers is the use of heterogeneous arrays, i.e. arrays combining antennas with non-identical feeds. It is explored in Sect. 8. In such arrays, feed errors and receiver phases are coupled in such a way that constraining the former in the usual way also fixes the latter; no additional phase measurement is needed to complete the calibration. Section 9 discusses the general problem of calibrating an observation of a completely unknown source. In this case, one depends entirely on a priori knowledge of the instrument and/or ground-based measurements. My analysis is at present inconclusive and the problem needs further study.

Section 10 makes a comparison between quasi-scalar and matrix approaches, summarises the results of the latter and speculates on its practical application.

The Appendix provides a brief summary of the mathematical background as well as proofs of the assertions in the body of the paper. It also contains a few small digressions related to polarimetry that would not fit elsewhere.

Table 1: Analogies between scalars and $2 \times 2\$ matrices, their algebraic properties and their application in interferometry. Particulars are to be found in the sections listed

Scalar form Matrix form Section(s)

Arbitrary scalar a Arbitrary $2 \times 2\$ matrix $\vec{A}$

Unity = 1 Identity $2 \times 2\$ matrix = I

Phase factor $\exp i\alpha$ Unitary $2 \times 2\$ matrix $\vec{X}$
Unimodular unitary $2 \times 2\$ matrix $\vec{Y}$ Appendix B.1

Positive real number |a| Positive hermitian $2 \times 2\$ matrix $\vec{G}$
Unimodular pos.-herm. $2 \times 2\$ matrix $\vec{H}$ Appendix B.4

Polar representation $a=a\exp i\alpha$ Polar representation $\vec{A}= a\, \exp i\alpha\, \vec{H}\vec{Y}$ 5.1, Appendix B.6

Complex conjugation a^* Hermitian transposition $\vec{A}^{\dagger}\equiv \vec{A}^{\rm *T}$

$d(\psi) = \vert a {\rm e}^{i\psi} - 1\vert^2$ minimal for $\psi = -\arg a$ Minimum-variance theorem Appendix C.5

Multiplication c = ab = ba Multiplication $\vec{C}=\vec{A}\vec{B}\neq \vec{B}\vec{A}$

Field or voltage transfer $e'_{j \vphantom{'}}= g_{j \vphantom{'}}e_{j \vphantom{'}}$ $g_{j \vphantom{'}}=$ (complex) gain Field or voltage vector transfer $\vec{e}'_{j \vphantom{'}}= \vec{J}_{j \vphantom{'}}\vec{e}_{j \vphantom{'}}$ $\vec{J}_{j \vphantom{'}}=$ (complex) Jones matrix 2.3

Visibility $e_{jk \vphantom{'}}= < e_{j \vphantom{'}}e_{k \vphantom{'}}^{*} >$ Coherency $\vec{E}_{jk \vphantom{'}}= < \vec{e}_{j \vphantom{'}}\vec{e}_{k \vphantom{'}}^{\dagger}>$ 2.2

Visibility transfer $e'_{jk \vphantom{'}}= g_{j \vphantom{'}}e_{jk \vphantom{'}}g_{k \vphantom{'}}^*$ Coherency transfer $\vec{E}'_{jk \vphantom{'}}= \vec{J}_{j \vphantom{'}}\vec{E}_{jk \vphantom{'}}\vec{J}_{k \vphantom{'}}^{\dagger}$ 2.3

1.1 Terminology and notation

Since I will be discussing scalar selfcal and its full-polarization analogue side by side, it is necessary to put a precise terminology in place.

The analogue of the scalar visibility is the coherency. It consists of four components and can be represented in various coordinate systems in the form of either a vector or a matrix. Each of these forms contains four scalar elements that I shall occasionally refer to as visibilities.

I shall use "matrix'' as an antonym of "scalar'', -- as in "matrix selfcal''. This is not strictly correct, because alternative formulations of my methods are possible that rely on other representations, e.g. using vector or tensor forms. But within the context of this paper it is the most convenient word to describe the antithesis. The device that converts the electromagnetic field vector into a pair of voltages is called a feed; it consists of two receptors that are usually (but not necessarily, cf. Appendix B.3) sensitive to nominally opposite polarizations. In a homogeneous array all feeds are nominally identical; in a heterogeneous array they differ.

The imaging process that I consider consists of the observations proper followed by a process of self-calibration. In the latter, models of the instrumental errors and the source brightness distribution are developed jointly in an iterative procedure. It is assumed to have converged when the models together correctly represent the observed coherencies within the noise. The image is the pictorial representation of the source model; the two words are almost synonymous. The model itself may take various forms; its essential property is that it can be used to "predict'' model coherency values that can be compared to those actually observed in order to estimate instrumental errors.

It is important to correctly understand the word "intensity'' used as an adjective, as in "intensity calibration''. I use it to say that the calibrator source is characterised by its intensity alone, i.e. it is unpolarized. It does not imply the exclusive calibration of receiver parameters that one might associate with intensity, e.g. voltage gains. Vectors are denoted by bold lowercase symbols; bold uppercase represents matrices. Constant scalars and vectors are shown in roman, variables in italic font. A unit vector in the direction ${\vec{x}}$ will be denoted by $\vec{1}_{{\vec{x}}}$ . The "dagger'' superscript $\dagger$ stands for hermitian transposition or conjugation, i.e. transposition plus complex conjugation.

Primes are generally used to distinguish observed or fitted values from true ones; occasionally they will also be used to distinguish input and output of a transformation or values of one variable under different conditions.

I follow the subscript notation of Paper II: quantities in the signal domain carry a single antenna subscript j or k; those in the coherency domain get an interferometer subscript jk. An additional subscript t will be used to indicate successive integration intervals or "time slices''. The array consists of N antennas and an observation comprises M integration intervals.


Scalar form	Matrix form	Section(s)


Arbitrary scalar a	Arbitrary $2 \times 2\$ matrix $\vec{A}$

Unity = 1	Identity $2 \times 2\$ matrix = I

Phase factor $\exp i\alpha$	Unitary $2 \times 2\$ matrix $\vec{X}$ Unimodular unitary $2 \times 2\$ matrix $\vec{Y}$	Appendix B.1

Positive real number \|a\|	Positive hermitian $2 \times 2\$ matrix $\vec{G}$ Unimodular pos.-herm. $2 \times 2\$ matrix $\vec{H}$	Appendix B.4

Polar representation $a=a\exp i\alpha$	Polar representation $\vec{A}= a\, \exp i\alpha\, \vec{H}\vec{Y}$	5.1, Appendix B.6

Complex conjugation a^*	Hermitian transposition $\vec{A}^{\dagger}\equiv \vec{A}^{\rm *T}$

$d(\psi) = \vert a {\rm e}^{i\psi} - 1\vert^2$ minimal for $\psi = -\arg a$	Minimum-variance theorem	Appendix C.5

Multiplication c = ab = ba	Multiplication $\vec{C}=\vec{A}\vec{B}\neq \vec{B}\vec{A}$

Field or voltage transfer $e'_{j \vphantom{'}}= g_{j \vphantom{'}}e_{j \vphantom{'}}$ $g_{j \vphantom{'}}=$ (complex) gain	Field or voltage vector transfer $\vec{e}'_{j \vphantom{'}}= \vec{J}_{j \vphantom{'}}\vec{e}_{j \vphantom{'}}$ $\vec{J}_{j \vphantom{'}}=$ (complex) Jones matrix	2.3

Visibility $e_{jk \vphantom{'}}= < e_{j \vphantom{'}}e_{k \vphantom{'}}^{*} >$	Coherency $\vec{E}_{jk \vphantom{'}}= < \vec{e}_{j \vphantom{'}}\vec{e}_{k \vphantom{'}}^{\dagger}>$	2.2

Visibility transfer $e'_{jk \vphantom{'}}= g_{j \vphantom{'}}e_{jk \vphantom{'}}g_{k \vphantom{'}}^*$	Coherency transfer $\vec{E}'_{jk \vphantom{'}}= \vec{J}_{j \vphantom{'}}\vec{E}_{jk \vphantom{'}}\vec{J}_{k \vphantom{'}}^{\dagger}$	2.3

1 Introduction

1.1 Terminology and notation