3 Application to bibliographic classification: Creation of a bibliographic map

We applied the method just described to the classification of articles published in Astronomy and Astrophysics in the period 1994 to 1996 (3325 articles). The descriptors were based on the bibliographic keywords. For this journal, and some others in astronomy, there is a uniquely defined list of keywords. A necessary preliminary phase was to homogenize them (since they were assigned by one of the Editors but not in a completely systematic fashion). We kept only the keywords that appear in at least 5 different articles, so we limited our descriptors to the 250 most frequent. The documents characterized in this way constitute a set of 3325 stimuli to be applied to the network. Learning through 20 iterations (heuristically determined) gives good results for the primary map ( $15 \times 15$ units) requiring about 1 hour of processing on a Sparcstation 10 (dependent on other users). For the secondary maps, the learning time is much shorter, since fewer documents are processed (200 at most), and the network has a smaller dimensionality (25 units).

When the training of all the maps, principal and secondary, is finished, the next task is to make these maps accessible to the user.

3.1 Density maps

At the end of the training of a map, the number of documents assigned to each node is known. We therefore have a table of numbers. Because it is much easier to visualize the colours of an image than a matrix of numbers we transformed it into an image. For this image the colour scale indicates qualitatively the number of documents per node. The primary map is of dimension $15 \times 15$ , and each of the secondary maps is $5 \times 5$ . These images are then scaled up by a factor of 40 (determined by aesthetics and most common Web browser default window sizes). This transformation uses a linear interpolation since otherwise the map would have clear discontinuities.

3.2 Indexing the maps

For map interpretability, the different themes associated with the document/node assignments have to be indicated. Although our maps have a relatively limited number of units, while in comparison the maps proposed by the team of T. Kohonen (WEBSOM 1997) have about 8 times more neurons, it is still impossible to characterize all nodes without overlapping annotations. Therefore it is preferable to select a limited number of nodes for characterization.

These nodes are selected from the frequent occurence of a keyword, which is written on the map. This was done manually, but could later be automated. The strategy is as follows:

-: determine the density peaks;
-: examine the keyword associated with the documents assigned to the peaks;
-: write the most significant one beside the peak.

Up: A spatial user interface