next previous
Up: The VizieR database of


Subsections

  
5 VizieR organisation

VizieR[*] is a natural extension of the usage of the metadata stored in the ReadMe files, as an implementation of these metadata in terms of tables managed by a relational database management system (RDBMS).

The first prototype of VizieR was the result of a fruitful collaboration between ESIS (European Space Information System, a project managed by ESRIN, a department of the European Space Agency) and the CDS; VizieR has been under full responsibility of CDS since January 1996. It was presented at the 1996 AAS meeting (Ochsenbein et al. [1996]), and became fully operational in February 1996. This prototype has been significantly upgraded in May 1997, just in time for the implementation of the final catalogues of the Hipparcos mission. The number of catalogues accessible within the VizieR system has grown since that time to 2 374 catalogues (Table 6).

The core of VizieR consists in the organisation of the meta dictionary, i.e. the set of metadata extracted from the standardized ReadMe descriptions discussed in Sect. 4. There are however two main problems which had to be solved: the access to very large catalogues (larger than a few million rows) for which RDBMS proved to be inefficient, requiring therefore dedicated search methods, and the generation of links allowing to connect two related pieces of information, like other tables in the same catalog, or spectra, images from remote services, etc.


 

 
Table 6: Summary of the VizieR contents (November 1999)
VizieR contents All Dealing with objects
in terms of: Catalogues having positions
Catalogues: 2374 1247
Tables: 6071 1929
Columns: 77260 30261
Rows: $1.17\ 10^9$ $1.16\ 10^9$
  (without megacatalogs) $40.3\ 10^6$ $31.6\ 10^6$


  
5.1 META dictionary

The meta-dictionary consists in 3 main tables detailed below, and about 20 annex tables, all stored in a relational database:

1.
METAcat describes the catalogues, a catalogue being defined as a set of related tables published together: typically a catalogue gathers a table of observations, a table of mean values, a table of references, a list of related images, etc.; METAcat details the authors, reference, title, explanations of each stored catalogues. This table contains currently 2 374 rows (Table 6);
2.
METAtab describes each data table stored in VizieR: table caption, number of rows, how to access the actual data, the equinox and epoch of the coordinates, etc. This table contains currently 6 071 rows (Table 6) -- i.e.the average catalogue is made of 2.6 tables;
3.
METAcol details each of the 77 260 Cols. (Table 6) currently stored in VizieR: column name or label, the textual explanation of the column contents, datatypes (numeric or character) and storage mode within the database (integer or floating-point, maximal length of strings, etc.), units in which the data are stored in the data-base and units in which the data are presented to the user, edition formats, and a few flags used for searches (e.g. column used as primary key) or data presentation (e.g. column to be displayed in the default presentation of the result). The average table is therefore made of $\sim 12.7$ columns -- in fact $\sim 11.7$ because each table contains an identification column in addition to the original set of columns.
Note that, since the set of META tables is itself described in VizieR, the meta-dictionary can be viewed and queried like any of the catalogues stored in VizieR -- allowing to locate easily e.g. tables with a large number of rows, or catalogues having the words mass loss in the description of one of their columns, etc.

The annex tables of the meta-dictionary contain some definitions, like the list of known data-types (METAtypes) and keywords (METAkwdef); or other details like the acronyms used to designate well-known catalogues like HIP, GSC...(METAcro), the keywords associated to each catalogue (METAkwd), detailed notes and remarks (METAnot), or the list of those objects which are individually quoted in the ReadMe files (METAobj). A special indexing scheme (METAcell), explained briefly in Sect. 5.5, was built to locate the existing objects in all catalogues in a single run. Details on how to generate links are stored in the METAmor table.

  
5.2 Links in VizieR

The interest of having a link, or an anchor in HTML terms, becomes obvious when a table contains a column representing a reference to an original paper, as for example in Véron and Véron's compilation of quasars[*]: once the rules to transform the contents of this column into an actual link to e.g.the ADS bibliographic service[*] is set up, details about the authors and references, or even the full article, can then be displayed on the screen by a simple mouse click. Another frequent example is the possible expansion of some footnote symbol into the lengthy note detailed in some other table.

The links existing in VizieR may be classified in the following categories:

1.
hard-wired links which are part of the standard description presented in Sect. 4, like the existence of notes (stored in the METAnot table), or the r_ prefix (Table 5) which indicates a reference which may be detailed in a table of references;
2.
internal links which connect tables of the same catalogue: such links may be expressed in terms of keys in the RDBMS terminology (definitions of columns as primary and/or foreign keys), by the existence of note flags, or by more complex relations stored in the METAmor table. Another type of internal link allows one to retrieve the spectra or images which are part of the catalogue, but which are stored as separate files;
3.
VizieR links which refer to another catalogue within the VizieR system;
4.
external links which refer to any other service, like bibliographic services, external databases or archives, image servers, etc.

While links of the first 3 categories can easily be maintained, the maintenance of the external links depends on modifications which are completely outside VizieR's control. These external links are maintained by the GLU system (Fernique et al. [1998]), a system which (i) allows one to use symbolic names instead of hard-coded URLs, and (ii) translates these symbolic names with the help of a distributed dictionary in which the service providers keep up the descriptions of their own services only in terms of URL addresses and actual presentation of the query parameters.

  
5.3 VizieR feeding pipeline

On the average, about one new catalog - or 2.6 tables - is added daily into VizieR. Such figures imposed the following constraints on the addition of new tables into VizieR:
1.
no human intervention is required to populate the database (the meta dictionary and the data tables): all meta-data related to a catalogue can be found or computed on the basis of documentation and configuration files which are read by the VizieR feeding pipe-line;
2.
we rely as much as possible on the standardized description of the catalogues presented in Sect. 4: this means that the configuration file associated to each catalogue should be minimized, i.e. as few ad-hoc details as possible should be needed besides the ReadMe files.

The actual delay required to ingest a new catalogue into the system is currently estimated to something between a few minutes and several days for the preparation of the ReadMe description file, depending on the initial presentation supplied by the authors and on the catalogue complexity -- the delay can be occasionally longer when problems are encountered, requiring interactions with the authors; and a few seconds up to an hour for the actual ingestion into VizieR from the standardized files.


 

 
Table 7: Large catalogues currently implemented in VizieR
Acronym Rows Catalogue designation
  ( $\times 10^6$)  
USNO-A1.0 488.0 The USNO-A1.0 Catalog (Monet 1997)
USNO-A2.0 526.3 The USNO-A2.0 Catalog (Monet 1998), calibrated against Tycho data
GSC1.1 25.2 HST Guide Star Catalog, 1992 version
GSC1.2 25.2 HST Guide Star Catalog, 1996 version
GSC-ACT 25.2 HST Guide Star Catalog, calibrated against Tycho data${}^\dag $
2MASS 20.2 $2~\mu$m All Sky Survey, Spring 1999 release (Skrutskie et al. [1997])
DENIS 17.5 Deep Near-IR Survey first release (Epchtein et al. [1999])

$\dag $ calibration made by the Pluto project
(http://www.projectpluto.com/gsc_act.htm)


  
5.4 Access to very large catalogues

The second challenge is to open a fast access for querying the mega-catalogues introduced in Sect. 2. This denomination was somewhat arbitrarily assigned to catalogues having 107 or more rows. Such large catalogues are essentially surveys used as reference catalogues, typically to find all objects detected in some region of the sky under some conditions of wavelength, time, object structure, etc. The set of such catalogues currently implemented is summarized in Table 7, but this set will grow rapidly in the near future with the continuation of the infra-red surveys, and the emergence of surveys presently in preparation (SLOAN, GSC-II, NVSS, ...).

The limit of 107 rows corresponds to a limit in performance and time required to ingest the tables into the relational databases; the largest table, in terms of number of rows, currently stored in VizieR is the AC2000 catalog (Urban et al. [1997]), with $4.62\ 10^6$ rows.

The method used to access these very large catalogues consists in grouping the objects within carefully designed groups based essentially on the location in the sky, followed by a lossless compression obtained by replacing the actual values by offsets within the group; details about the actual results and performances are described in another paper (Derriere & Ochsenbein [1999]). Each very large catalogue has presently its own organisation which depends on its actual column contents, and therefore requires a dedicated program for accessing it. VizieR stores in its META dictionary (see Sect. 5.1) which program has to be called to actually access the catalogue, and the description of the columns as they are returned from the dedicated program.

  
5.5 Accessing all catalogues from a position in the sky

In order to allow a fast answer to the question: find out all objects for all available catalogues around some target position, an indexing mechanism is necessary. The total number of object positions currently stored in VizieR, excluding the megacatalogues, is about $32\ 10^6$ (Table 6); a classical indexation, in terms of relational DMBS, shows very poor performances especially in the updating phase: the addition of a new catalogue can require up to 4.6 millions modifications or additions - which becomes dramatically slow.

The method adopted for this indexation consists first in a mapping of the celestial coordinates into a set of boxes using a hierarchical spherical-cubic projection similar to the techique used by SIMBAD (Wenger et al. [2000]), but down to a level 8 which corresponds to a granularity of about 20', or $6\ 4^8$ ( $\simeq 4\ 10^5$) individual boxes. The list of catalogues which exhibit sources in the region of the sky covered by the box is then stored for each of the defined boxes, allowing therefore a fast answer to the question: "what is the list of catalogues which have a fair chance of having at least one source close to a specified target ?'' The final step consists in looking successively into the matching catalogues.

The method offers the particularity of being hierarchical: 6 boxes are defined at level 0, 24 at level 1, ..., and going down one step in the hierarchy consists in dividing each box into four parts. The indexing mechanism recursively groups contiguous non-empty boxes represented by a single box at the upper level, meaning that a dense survey covering the whole sky is just represented by the 6 boxes of level 0 in this index. In practice, the 1247 catalogues with positions are summarized in this index by $3.9\ 10^6$ elements (to be compared to the $31.6\ 10^6$ sources in Table 6), i.e.an average of 3 000 elements per catalogue.

  
5.6 Current contents


  \begin{figure}\par\psfig{figure=vizh1.ps,width=8.5cm}\par
\end{figure} Figure 2: Histogram of the number of rows among VizieR tables (the darker bars correspond to tables containing celestial coordinates)

The status of VizieR contents is presented in Table 6, where we distinguished those tables representing data about actual astronomical objects which can be accessed by their position in the sky. In terms of number of available records, those containing celestial positions represent over 78% even when the megacatalogs are omitted, even though only 32% of the tables are concerned. In other words, the average table dealing with actual astronomical objects contains around 16 000 rows -- a theoretical mean, as can be seen from the histogram of the table populations in VizieR represented in Fig. 2 which shows a modal value around tables of 100 objects.


next previous
Up: The VizieR database of

Copyright The European Southern Observatory (ESO)