next previous
Up: The NASA Astrophysics Data


3 Description of the data

The original set of data from STI contained several basic fields of data (author, title, keywords, and abstracts) to be indexed and made available for searching. All records were keyed on STI's accession number, a nine-digit code consisting of a letter prefix (A or N) followed by a two-digit publication year, followed by a five-letter identifier (e.g. A95-12345). Data were stored in files named by accession number.

With the inclusion of data from other sources, primarily the journal publishers and SIMBAD, we extended STI's concept of the accession number to handle other abstracts as well. Since the ADS may receive the same abstract from multiple sources, we originally adopted a system of using a different prefix letter with the remainder of the accession number being the same to describe abstracts received from different sources. Thus, the same abstract for the above accession number from STI would be listed as J95-12345 from the journal publisher and S95-12345 from SIMBAD. This allowed the indexing routines to consider only one instance of the record when indexing. Recently, limitations in the format of accession numbers and the desire to index data from multiple sources (rather than just STI's version) have prompted us to move to a data storage system based entirely on the bibliographic code.

3.1 Bibliographic codes

The concept of a unique bibliographic code used to identify an article was originally conceived of by SIMBAD and NED (NASA's Extragalactic Database, [Helou & Madore 1988]). The original specification is detailed in [Schmitz et al. (1995)]. In the years since, the ADS has adopted and expanded their definition to be able to describe references outside of the scope of those projects.

The bibliographic code is a 19-character string comprised of several fields which usually enables a user to identify the full reference from that string. It is defined as follows:


where the fields are defined in Table 1.


Table 1: Bibliographic code definition (e.g. 1996A&AS..115....1S)
Field Definition Example
YYYY Publication Year 1997
JJJJJ Journal Abbreviation ApJ, A&A, MNRAS, etc.
VVVV Volume Number 480
M Qualifier for Publication L (for Letter), P (for Pink Page)
    Q, R, S, etc. for unduplicating
    a, b, c, etc. for issue number
PPPP Page Number 129
A First Letter of the First Author's Surname N

The journal field is left-justified and the volume and page fields are right-justified. Blank spaces and leading zeroes are replaced by periods. For articles with page numbers greater than 9999, the M field contains the first digit of the page number.

Creating bibliographic codes for the astronomical journals is uncontroversial. Each journal typically has a commonly-used abbreviation, and the volume and page are easily assigned (e.g. 1999PASP..111..438F). Each volume tends to have individual page numbering, and in those cases where more than one article appears on a page (such as errata), a "Q","R","S", etc. is used as the qualifier for publication to make bibliographic codes unique. When page numbering is not continuous across issue numbers (such as Sky & Telescope), the issue number is represented by a lower case letter as the qualifier for publication (e.g. "a" for issue 1). This is because there may be multiple articles in a volume starting on the same page number.

Creating bibliographic codes for the "grey" literature such as conference proceedings and technical reports is a more difficult task. The expansion into these additional types of data included in the ADS required us to modify the original prototype bibliographic code definition in order to present identifiers which are easily recognizable to the user. The prototype definition of the bibliographic code suggested using a single letter in the second place of the volume field to identify non-standard references (catalogs, PhD theses, reports, preprints, etc.) and using the third and fourth place of that field to unduplicate and report volume numbers (e.g. 1981CRJS..R.3...14W). Since we felt that this created codes unidentifiable to the typical user and since NED and SIMBAD did not feel that users needed to be able to identify books directly from their bibliographic codes, the ADS adopted different rules for creating codes to identify the grey literature.

It is straightforward to create bibliographic codes for conference proceedings which are part of a series. For example, the IAU Symposia Series (IAUS) contains volume numbers and therefore fits the journal model for bibliographic codes. Other conference proceedings, books, colloquia, and reports in the ADS typically contain a four letter word in the volume field such as "conf", "proc", "book", "coll", or "rept". When this is the case with a bibliographic code, the journal field typically consists of the first letter from important words in the title. This can give the user the ability to identify a conference proceeding at a glance (e.g. "" for "Information and On-Line Data in Astronomy"). We will often leave the fifth place of the journal field as a dot for "readability" (e.g. For most proceedings which are also published as part of a series (e.g. ASP Conference Series, IAU Colloquia, AIP Conference Series), we include in the system two bibliographic codes, one as described above and one which contains the series name and the volume (see Sect. 5.1). We do this so that users can see, for example, that a paper published in one of the "Astronomical Data Analysis Software and Systems" series is clearly labelled as "adass" whereas a typical user might not remember which volume of ASPC contained those ADASS papers. This increases the user's readability of bibliographic codes.

With the STI data, the details were often unclear as to whether an article was from a conference proceeding, a meeting, a colloquium, etc. We assigned those codes as best we could, making no significant distinction between them. For conference abstracts submitted by the editors of a proceedings prior to publication, we often do not have page numbers. In this case, we use a counter in lieu of a page number and use an "E" (for "Electronic") in the fourteenth column, the qualifier for publication. If these conference abstracts are then published, their bibliographic codes are replaced by a bibliographic code complete with page number. If the conference abstracts are published only on-line, they retain their electronic bibliographic code with its E and counter number.

There are several other instances of datasets where the bibliographic codes are non-standard. PhD theses in the system use "PhDT" as the journal abbreviation, contain no volume number, and contain a counter in lieu of a page number. Since PhD theses, like all bibliographic codes, are unique across all of the databases, the counter makes the bibliographic code an identifier for only one thesis. IAU Circulars also use a counter instead of a page number. Current Circulars are electronic in form, and although not technically a new page, the second item of an IAU Circular is the electronic equivalent of a second page. Using the page number as a counter enables us to minimize use of the M identifier in the fourteenth place of a bibliographic code for unduplicating. This is desirable since codes containing those identifiers are essentially impossible to create a priori, either by the journals or by users.

The last set of data currently included in the ADS which contain non-standard bibliographic codes is the "QB" book entries from the Library of Congress. QB is the Library of Congress code for astronomy-related books and we have put approximately 17 000 of these references in the system. Because the QB numbers are identifiers by themselves, we have made an exception to the bibliographic code format to use the QB number (complete with any series or part numbers), prepended with the publication year as the bibliographic code. Such an entry is easily identifiable as a book, and these codes enable users to locate the books in most libraries.

It is worth noting that while the bibliographic code makes identification simple for the vast majority of references in the system, we are aware of two instances where the bibliographic definition breaks down. The use of the fourteenth column for a qualifier such as "L" for ApJ Letters makes it impossible to use that column for unduplicating. Therefore, if there are two errata on the same page with the same author initial, there is no way to create unique bibliographic codes for them. We are aware of only one such instance in the 33 years of publication of ApJ Letters. Second, with the electronic publishing of an increasing number of journals, the requirement of page numbers to locate articles becomes unnecessary. The journal Physical Review D is currently using 6-digit article identifiers as page numbers. Since the bibliographic code allows for page numbers not longer than 5 digits, we are currently converting these 6-digit identifiers to their 5-digit hexagesimal equivalent. Both of these anomalies indicate that over the next few years we will likely need to alter the current bibliographic definition in order to allow consistent identification of articles for all journals.

3.2 Data fields

The databases are set up such that some data fields are searchable and others are not. The searchable fields (title, author, and text) are the bulk of the important data, and these fields are indexed so that a query to the database returns the maximum number of meaningful results. (see [Accomazzi et al. 2000], hereafter ARCHITECTURE). The text field is the union of the abstract, title, keywords, and comments. Thus, if a user requests a particular word in the text field, all papers are returned which contain that word in the abstract OR in the title OR in the keywords OR in the comments. Appendix A shows version 1.0 of the Extensible Markup Language (XML, see Sect. 3.4) Document Type Definition (DTD) for text files in the ADS Abstract Service. The DTD lists fields currently used or expected to be used in text files in the ADS (see Sect. 5.2 for details on the text files). We intend to reprocess the current journal and affiliation fields in order to extract some of these fields. Since STI ceased abstracting the journal literature, we decided to make the keywords themselves no longer a searchable entity for the time being - they are searchable only through the abstract text field. STI used a different standard set of keywords from the AAS journals, who use a different set of keywords from AIP journals (e.g. AJ prior to 1998). In addition, keywords from a single journal such as the Astrophysical Journal (ApJ) have evolved over the years so that early ApJ volume keywords are not consistent with later volumes. In order to build one coherent set of keywords, an equivalence or synonym table for these different keyword sets must be created. We are investigating different schemes for doing this, and currently plan to have a searchable keyword field again, which encompasses all keywords in the system and equates those from different keyword systems which are similar ([Lee et al. 1999]).

The current non-searchable fields in the ADS databases include the journal field, author affiliation, category, abstract copyright, and abstract origin. Although we may decide to create an index and search interface for some of these entities (such as category), others will continue to remain unsearchable since searching them is not useful to the typical user. In particular, author affiliations would be useful to search, however this information is inconsistently formatted so it is virtually impossible to collect all variations of a given institution for indexing coherently. Furthermore, we have the author affiliations for only about half of the entries in the Astronomy database so we have decided to keep this field non-searchable. For researchers wishing to analyze affiliations on a large scale, we can provide this information on a collaborative basis.

3.3 Data sources

The ADS currently receives abstracts or table of contents (ToC) references from almost two hundred journal sources. Tables 2, 3, and 4 list these journals, along with their bibliographic code abbreviation, source, frequency with which we receive the data, what data are received, and any links we can create to the data. ToC references typically contain only author and title, although sometimes keywords are included as well. The data are contributed via email, ftp, or retrieved from web sites around the world at a frequency ranging from once a week to approximately once a year. The term "often" used in the frequency column implies that we get them more frequently than once a month, but not necessarily on a regular basis. The term "occasionally" is used for those journals who submit data to us infrequently.


Table 2: The ADS astronomy database
Journal Source Full Name How Often Kind of Data Links a
A&A Springer Astronomy & Astrophysics 3$\times$ month abstracts E, F
A&ARv Springer Astronomy & Astrophysics Review occasionally abstracts F
A&AS EDP Sciences Astronomy & Astrophysics Supplement 2$\times$ month abstracts E, F, R
AcA AcA Acta Astronomica 4$\times$ year abstracts G
ADIL NCSA ADIL Astronomy Data Image Library occasionally abstracts D
AdSpR Elsevier Advances in Space Research often abstracts  
AGAb AG b Astronomische Gesellschaft Abstracts occasionally abstracts  
AJ UCP c Astronomical Journal monthly abstracts E, F, R
AN AN Astronomische Nachrichten bimonthly abstracts  
Ap&SS Kluwer Astrophysics and Space Science often abstracts  
APh Elsevier Astroparticle Physics bimonthly abstracts E
ApJ UCP Astrophysical Journal 3$\times$ month abstracts E, F, R
ApJL UCP Astrophysical Journal Letters 3$\times$ month abstracts E, F, R
ApJS UCP Astrophysical Journal Supplement monthly abstracts E, F, R
ARA&A AnnRev Annual Review of Astronomy and Astrophysics 1$\times$ year abstracts E, F
AREPS AnnRev Annual Review of Earth and Planetary Sciences 1$\times$ year abstracts E, F
ARep AIP d Astronomy Reports bimonthly abstracts M
AstL AstL Astronomy Letters bimonthly abstracts M
ATel ATel The Astronomer's Telegram often abstracts E
AVest AVest Astronomicheskii Vestnik bimonthly abstracts  
BAAS AAS Bulletin of the American Astronomical Society 2$\times$ year (AAS) abstracts  
BAAS AAS Bulletin of the American Astronomical Society 1$\times$ year (DDA) abstracts  
BAAS AAS Bulletin of the American Astronomical Society 1$\times$ year (DPS) abstracts  
BAAS AAS Bulletin of the American Astronomical Society 1$\times$ year (HEA) abstracts  
BAAS AAS Bulletin of the American Astronomical Society 1$\times$ year (SPD) abstracts  
BaltA BaltA Baltic Astronomy 4$\times$ year abstracts R
BeSN BeSN BE Star Newsletter occasionally ToC G
BOBeo BOBeo Bulletin Astronomique de Belgrade occasionally abstracts G
Books CUP Cambridge University Press occasionally ToC  
Books LOC Library of Congress occasionally abstracts L
Books Springer Springer Verlag occasionally ToC  
Books USci University Science Books Publishers occasionally abstracts M
Books Wiley Wiley Publishers occasionally abstracts  
CeMDA Kluwer Celestial Mechanics and Dynamical Astronomy often abstracts  
ChA&A Elsevier Chinese Astronomy and Astrophysics 4$\times$ year ToC  
CoKon Konkoly Communications of the Konkoly Observatory occasionally abstracts G
Conferences Boon Priscilla Boon, Conference Proceedings occasionally ToC  
Conferences Editors Conference Proceeding Editor Submissions often abstracts  
Conferences ESO European Southern Observatory Library monthly ToC  
Conferences LPI Lunar and Planetary Institute Proceedings occasionally abstracts F
Conferences STSci Space Telescope Science Institute Library monthly ToC  
Conferences UTAL University of Toronto Library weekly ToC  
CoSka CoSka Contributions from the Ast. Obs. Skalnate Pleso occasionally abstracts G
DSSN DSSN Delta Scuti Star Newsletter occasionally abstracts E
DyAtO Elsevier Dynamics of Atmospheres and Oceans occasionally ToC  
E&PSL Elsevier Earth & Planetary Science Letters occasionally ToC E, D
EM&P Kluwer Earth, Moon, and Planets often abstracts  
ESRv Elsevier Earth Science Reviews occasionally ToC  
ExA Kluwer Experimental Astronomy occasionally abstracts  
FCPh OPA e Fundamentals of Cosmic Physics occasionally abstracts  
GeCoA Elsevier Geochimica et Cosmochimica Acta often ToC  
GeoRL AGU f Geophysical Research Letters 2$\times$ month ToC E, F
GeoJI Blackwell Geophysical Journal International 2$\times$ month abstracts E, F, R
GReGr Plenum General Relativity and Gravitation monthly abstracts  
IAUC CBAT g IAU Circulars weekly abstracts E
IBVS Konkoly Information Bulletin on Variable Stars often abstracts E, F
Icar AP h Icarus monthly abstracts E, F, R

Table 2: continued
Journal Source Full Name How Often Kind of Data Links a
IrAJ IrAJ Irish Astronomical Journal 2$\times$ year abstracts  
JASS JASS Journal of Astronomy and Space Sciences occasionally abstracts F
JAVSO AAVSO Journal of the A.A.V.S.O. occasionally abstracts  
JBAA BAA Journal of the British Astronomical Association bimonthly abstracts  
JIMO IMO Journal of the International Meteor Organization occasionally abstracts  
JGR AGU Journal of Geophysical Research A (Space Physics) monthly ToC  
JGR AGU Journal of Geophysical Research E (Planets) monthly ToC  
JKAS KAS Journal of the Korean Astronomical Society occasionally abstracts  
JRASC RASC Journal of the Royal Astronomical Society of Canada occasionally abstracts  
M&PS M&PS Meteoritics & Planetary Science bimonthly abstracts  
MNRAS Blackwell Monthly Notices of the Royal Astronomical Society 3$\times$ month abstracts E, F, R
MPEC CBAT Minor Planet Electronic Circulars weekly abstracts E
Nature Nature Nature weekly abstracts  
NewA Elsevier New Astronomy often abstracts E
NewAR Elsevier New Astronomy Reviews (formerly VA) occasionally abstracts  
OAP OAP Odessa Astronomical Publications occasionally abstracts  
Obs Obs The Observatory occasionally ToC  
P&SS Elsevier Planetary and Space Science monthly ToC  
PASA PASA Publications of the Astronomical Society of Australia 2$\times$ year abstracts E, F
PASJ PASJ Publications of the Astronomical Society of Japan bimonthly abstracts R
PASP UCP Publications of the Astronomical Society of the Pacific monthly abstracts E, F, R
PDS PDS Planetary Data System occasionally abstracts P
PEPI Elsevier Physics of the Earth and Planetary Interiors monthly ToC  
PhDT UMass University of Massachusetts occasionally abstracts D
PhDT UMI University Microfilm, Inc. occasionally abstracts M
PKAS KAS Publications of the Korean Astronomical Society occasionally abstracts  
RvMA AG Reviews of Modern Astronomy occasionally ToC  
RMxAC UNAM i Revista Mexicana Conference Series occasionally ToC  
S&T Sky Publishing Sky & Telescope 2$\times$ year ToC  
Sci Science Science weekly ToC E
SoPh Stet. Solar Physics often abstracts  
SSRv Kluwer Space Science Reviews often abstracts  
VA Elsevier Vistas in Astronomy occasionally ToC  
Various ARI j Veröffentlichungen ARI occasionally ToC L
Various Authors Author Submissions often abstracts  
Various Knudsen Helen Knudsen's Monthly Index of Astronomy occasionally ToC  
Various NED NASA Extragalactic Database occasionally ToC N
Various SIMBAD SIMBAD 2$\times$ month ToC D, S
Various STI NASA's Science and Technical Index 2$\times$ month abstracts  
Various USNO United States Naval Observatory Library occasionally ToC  

$\textstyle \parbox{12cm}{
$^{\mathrm{a}}$\space Letter codes describing what da...
...l Autonoma de Mexico.\\
$^{\mathrm{j}}$\space Astronomisches Rechen-Institut.}$


Table 3: The ADS instrumentation database
Journal Source Full Name How Often Kind of Data Links a
ACAau Elsevier Acta Astronautica often ToC  
ApOpt OSA b Applied Optics often abstracts M
ApScR Kluwer Applied Scientific Research occasionally ToC  
ChJLB OSA Chinese Journal of Lasers B occasionally abstracts  
IJQE OSA Journal of Quantum Electronics often abstracts  
JBO SPIE c Journal of Biomedical Optics occasionally abstracts  
JEI SPIE Journal of Electronic Imaging occasionally abstracts  
JEnMa Kluwer Journal of Engineering Mathematics occasionally ToC  
JMiMi IOP d Journal of Micromechanics & Microengineering often ToC E
JO IOP Journal of Optics often ToC E
JOptT OSA Journal of Optical Technology often abstracts M
JVST AIP Journal of Vacuum & Science Technology often ToC M
OptCo Elsevier Optics Communications often abstracts  
OptEn SPIE Optical Engineering often abstracts  
OptFT AP Optical Fiber Technology often abstracts  
OptL OSA Optics Letters often ToC M
OptLE Elsevier Optics and Lasers in Engineering bimonthly ToC  
OptLT Elsevier Optics & Laser Technology occasionally ToC  
OptPN AIP Optics & Photonics News often abstracts M
OptSp OSA Optics and Spectroscopy often abstracts M
OSAJ AIP Journal of the Optical Society of America A often abstracts M
OSAJB AIP Journal of the Optical Society of America B often abstracts M
PApO IOP Pure Applied Optics often ToC E
PrAeS Elsevier Progress in Aerospace Sciences occasionally ToC  
RScI AIP Review of Scientific Instruments often ToC M
SPIE SPIE SPIE Proceedings often abstracts M

$\textstyle \parbox{12cm}{
$^{\mathrm{a}}$\space Letter codes describing what da...
...for Optical Engineering (SPIE).\\
$^{\mathrm{d}}$\space Institute of Physics.}$


Table 4: The ADS physics database
Journal Source Full Name How Often Kind of Data Links a
AcPhy AIP Acoustical Physics occasionally ToC M
ADNDT AP Atomic Data and Nuclear Data Tables occasionally abstracts  
AnPhy AP Annals of Physics often abstracts  
ApPhL AIP Applied Physics Letters often ToC M
ASAJ AIP Journal of the Acoustical Society of America often ToC M
Chaos AIP Chaos occasionally ToC M
ComPh AIP Computers In Physics occasionally ToC M
CQGra IOP Classical Quantum Gravity often ToC  
Cryo Elsevier Cryogenics occasionally ToC  
CryRp AIP Crystallography Reports occasionally ToC M
CTM IOP Combustion Theory Modelling often ToC  
DokPh AIP Physics - Doklady occasionally ToC  
EJPh IOP European Journal of Physics often ToC  
InfPh Elsevier Infrared Physics and Technology often abstracts  
JAP AIP Journal of Applied Physics often ToC M
JATP Elsevier Journal Atmospheric and Terrestrial Physics occasionally ToC  
JChPh AIP Journal of Chemical Physics often ToC M
JCoPh AP Journal of Computational Physics occasionally abstracts  
JETP AIP JETP occasionally ToC M
JETPL AIP JETP Letters occasionally ToC M
JFS AP Journal of Fluids and Structures occasionally ToC  
JGP Elsevier Journal of Geometry and Physics occasionally ToC  
JLTP OSA Journal of Low Temperature Physics occasionally ToC M
JLwT OSA Journal of Lightwave Technology occasionally ToC  
JMagR AP Journal of Magnetic Resonance occasionally abstracts  
JMMM Elsevier Journal of Magnetism and Magnetic Materials occasionally abstracts  
JMPS AIP Journal of Mathematical Physics often ToC M
JMoSp AP Journal of Molecular Spectroscopy occasionally abstracts  
JNM Elsevier Journal of Nuclear Materials occasionally ToC  
JPCM IOP Journal of the Physics of Condensed Matter often ToC  
JPCRD AIP Journal of Physical and Chemical Reference Data occasionally ToC M
JPCS Elsevier Journal of Physics and Chemistry of Solids occasionally ToC  
JPhA IOP Journal of Physics A: Mathematical General often ToC  
JPhB IOP Journal of Physics B: Atomic Molecular Physics often ToC  
JPhD IOP Journal of Physics D: Applied Physics often ToC  
JPhG IOP Journal of Physics G: Nuclear Physics often ToC  
JRheo AIP Journal of Rheology often ToC M
JSSCh AP Journal of Solid State Chemistry occasionally abstracts  
JSV AP Journal of Sound and Vibration often ToC  
JTePh AIP Journal of Technical Physics occasionally ToC M
MedPh AIP Medical Physics often ToC M
MSMSE IOP Modelling Simul. Mater. Sci. Eng. often ToC  
MSSP AP Mechanical Systems & Signal Processing occasionally abstracts  
NIMPA Elsevier Nuclear Instruments/Methods Physics Research A often abstracts  
NIMPB Elsevier Nuclear Instruments/Methods Physics Research B often abstracts  
Nanot IOP Nanotechnology often ToC  
NDS AP Nuclear Data Sheets occasionally abstracts  
Nonli IOP Nonlinearity often ToC  
NuGeo Elsevier Nuclear Geophysics occasionally ToC  
NuPhA Elsevier Nuclear Physics A weekly abstracts E
NuPhB Elsevier Nuclear Physics B weekly abstracts E
NuPhS Elsevier Nuclear Physics B Proceedings Supplements monthly abstracts E
PAN AIP Physics of Atomic Nuclei occasionally ToC M
PCEB Elsevier Physics and Chemistry of the Earth Part B occasionally ToC  
PCEC Elsevier Physics and Chemistry of the Earth Part C occasionally ToC  
PhFl AIP Physics of Fluids often ToC M
PhLA Elsevier Physics Letters A often abstracts  
PhLB Elsevier Physics Letters B often abstracts  

Table 4: continued
Journal Source Full Name How Often Kind of Data Links a
PhPl AIP Physics of Plasmas often ToC M
PhR Elsevier Physics Reports often ToC  
PhRvA AIP Physical Review A often ToC M
PhRvB AIP Physical Review B often ToC M
PhRvC AIP Physical Review C often ToC M
PhRvD AIP Physical Review D often ToC M
PhRvE AIP Physical Review E often ToC M
PhRvL AIP Physical Review Letters often ToC M
PhSS AIP Physics of the Solid State occasionally ToC M
PhT AIP Physics Today occasionally ToC M
PhyA Elsevier Physica A often ToC  
PhyB Elsevier Physica B often abstracts  
PhyC Elsevier Physica C often ToC  
PhyD Elsevier Physica D often abstracts  
PhyE Elsevier Physica E occasionally abstracts  
PhyEd IOP Physics Education often ToC  
PMB IOP Physics Medicine and Biology often ToC  
PPCF IOP Plasma Physics and Controlled Fusion often ToC  
PPN AIP Physics of Particles and Nuclei occasionally ToC M
PPNP Elsevier Progress in Particle and Nuclear Physics occasionally ToC  
PQE Elsevier Progress in Quantum Electronics occasionally ToC  
PSST IOP Plasma Sources Science Technology often ToC  
QuSOp IOP Quantum Semiclassical Optics often ToC  
RaPC Elsevier Radiation Physics and Chemistry often abstracts  
RPPh IOP Reports on Progress in Physics often ToC  
RvMP AIP Reviews of Modern Physics occasionally ToC M
Semic AIP Semiconductors occasionally ToC M
SeScT IOP Semiconductor Science Technology often ToC  
SMaS IOP Smart Material Structures often ToC  
SuScT IOP Superconductor Science Technology often ToC  
SuMi AP Superlattices and Microstructures occasionally abstracts  
TePhL AIP Technical Physics Letters occasionally ToC M
PhDT UMI University Microfilm, Inc. occasionally abstracts  
WRM IOP Waves Random Media often ToC  

$\textstyle \parbox{12cm}{$^{\mathrm{a}}$\space Letter codes describing what data are available.}$

Updates to the Astronomy and Instrumentation databases occur approximately every two weeks, or more often if logistically possible, in order to keep the database current. Recent enhancements to the indexing software have enabled us to perform instantaneous updates, triggered by an email containing new data (see ARCHITECTURE). Updates to the Physics database occurs approximately once every two months. As stated earlier, the Preprint database is updated nightly.

3.4 Data formats

The ADS is able to benefit from certain standards which are adhered to in the writing and submission practices of astronomical literature. The journals share common abbreviations and text formatting routines which are used by the astronomers as well. The use of TeX ([Knuth 1984]) and LaTeX ([Lamport 1986]), and their extension to BibTeX ([Lamport 1986]) and AASTeX ([American Astronomical Society 1999]) results in common formats among some of our data sources. This enables the reuse of parsing routines to convert these formats to our standard format. Other variations of TeX used by journal publishers also allows us to use common parsing routines which greatly facilitates data loading.

TeX is a public domain typesetting program designed especially for math and science. It is a markup system, which means that formatting commands are interspersed with the text in the TeX input file. In addition to commands for formatting ordinary text, TeX includes many special symbols and commands with which you can format mathematical formulae with both ease and precision. Because of its extraordinary capabilities, TeX has become the leading typesetting system for science, mathematics, and engineering. It was developed by Donald Knuth at Stanford University.

LaTeX is a simplified document preparation system built on TeX. Because LaTeX is available for just about any type of computer and because LaTeX files are ASCII, scientists are able to send their papers electronically to colleagues around the world in the form of LaTeX input. This is also true for other variants of TeX, although the astronomical publishing community has largely centered their publishing standards on LaTeX or one of the software packages based on LaTeX, such as BibTeX or AASTeX. BibTeX is a program and file format designed by Oren Patashnik and Leslie Lamport in 1985 for the LaTeX document preparation system, and AASTeX is a LaTeX-based package that can be used to mark up manuscripts specifically for American Astronomical Society (AAS) journals.

Similar to the widespread acceptance of TeX and its variants, the extensive use of SGML (Standard Generalized Markup Language, [Goldfarb & Rubinsky] 1991) by the members of the publishing community has given us the ability to standardize many of our parsing routines. All data gleaned off the World Wide Web share features due to the use of HTML (HyperText Markup Language, [Powell & Whitworth 1998]), an example of SGML. Furthermore, the trend towards using XML (Extensible Markup Language, [Harold 1999]) to describe text documents will enable us to share standard document attributes with other members of the astronomical community. XML is a subset of SGML which is intended to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. The ADS parsing routines benefit from these standards in several ways: we can reuse routines designed around these systems; we are able to preserve original text representations of entities such as embedded accents so these entities are displayed correctly in the user's browser; and we are able to capture value-added features such as electronic URLs and email addresses for use elsewhere in our system.

In order to facilitate data exchange between different parts of the ADS, we make use of a tagged format similar to the "Refer" format ([Jacobsen 1996]). Refer is a preprocessor for the word processors nroff and troff which finds and formats references. While our tagged formats share some common fields (%A, %T, %J, %D), the Refer format is not specific enough to be used for our purposes. Items such as objects, URLs and copyright notices are beyond the scope of the Refer syntax. Details on our tagged format are provided in Table 5. Reading and writing routines for this format are shared by loading and indexing routines, and a number of our data sources submit abstracts to us in this format.

next previous
Up: The NASA Astrophysics Data

Copyright The European Southern Observatory (ESO)