The FictionMags Index Family
The v2 Index Project

Introduction

At the moment it is not possible to regenerate the FictionMags Indexes with the original suite of programs written by Bill Contento, and it is unclear when it might again be possible to do so. In an attempt to address this, Phil Stephensen-Payne has developed a new suite of programs (v2) and this document will attempt to provide some background on the project and to highlight some of the differences between the indexes generated by the two sets of software.

A couple of points should probably be emphasized at the outset:

The current intention is to generate a set of indexes that, in general, look and feel as close to the v1 indexes as possible, partly because that format is familiar to the users, but mainly because it has proven to be an excellent format so there seems little need to mess with it.

The remainder of this document is divided into the following sections for ease of reference:

The documentation of the Index Structure also gives some more specific details on the two implementations. For a discussion of recent changes to the software, as well as a list of known problems and possible future enhancements, see the Change Log.

As always, all comments are welcome and should be sent to


New Features in the v2 Indexes

There are a number of new/different features in the v2 Indexes:

Inclusion of Book Files

Although an earlier version of the v1 programs was capable of handling both books and magazines (c.f. The Locus Index to Science Fiction: 1984-1998), the current version is not able to because of changes made to support the Fictionmags Index Family properly. The v2 programs have been written so that they can handle both books and magazines and the intention is that the v2 Indexes will include both.

However the data files containing the data for the books are in rather a poor state, partly because they are an amalgam of multiple, disparate sources (e.g. Al Hubins' Index to Crime Fiction IV and the Miscellaneous Anthologies Index) and partly because little work has been done over the years to bring them up to the same standard as the magazine data files. From time to time I have tried to do something about this, but after five or more years have only managed to "clean up" about 5% of the data. At the moment I am not sure how to address this.

In addition there are various questions about how best book files should be handled as the old Locus index had a number of problems. For that reason, although an early "proof of concept" version of the Book Author Index was demonstrated, I intend to concentrate on the magazine indexes (which is all that will be published for the time being) and then revisit the book index when things quieten down.

Inclusion of Pseudonymous Stories in Author Lists

There was an anomaly in the v1 indexes that, if Author A wrote a story under house name B, the story would be listed under both A and B in the indexes; however, if Author A wrote a story under private pseudonym C, the story would only be listed under the pseudonym. This has been addressed in the v2 indexes such that all items known (or suspected) to be by author A will be listed as by author A (though this has raised its own issues as discussed under Known Issues below). Note that one side-effect of this is that the "also as xxx" clauses in v1 are no longer needed as the details of such alternate titles/authors are listed in full. See, for example, Edward Aarons in v1 & v2.

Portability and Flexibility

In an attempt to avoid a repeat of the current situation, the v2 programs have been written in such a way that they should be usable on any computer (though this has yet to be tested). The generation of each index is controlled by an Index Configuration File and bases the page layouts on external boilerplate files, which should allow a degree of flexibility without requiring program changes.

Minor implementation differences

The following minor differences have been implemented deliberately (but are still open to debate):


Current Implementation Progress

There are 15 main groups of files (potentially) generated by the index generation programs, plus 21 index levels, making a total of 36 distinct entities as follows:

The above names which will be used where appropriate in the rest of this document and each has been linked to a representative sample page in the current indexes for illustration purposes.

Other than the points listed under Known Issues everything should be in place so please let me know if you spot anything missing (as well as anything that is malformed/links to the wrong place/is just downright wrong).


Known Differences

There are a handful of features in the current index that I am thinking of not implementing in the new software and would welcome feedback on:


Discussion Points

Semi-capitalization of Author Names

In the Story Author and Book Author Index, author names are partially capitalized (e.g. ABBE, GEORGE (Bancroft)) but the algorithm does throw up some anomalies (such as AARD-vark’sson, AVRAM and ABREN, KATH’anth). For some reason the author names in the Biographical Notes aren't capitalized and, to my untrained eyes, they look just as good (if not better) so I'm inclined to leave them uncapitalized in all the indexes.

How should "Artists" be handled?

One unresolved issue for some time has been the distinction between authors and artists. At the moment anybody listed as the illustrator of a book/magazine or story, or as the "author" of an item with an item type of "il" (I think). Some questions that arise include:

Aggregation of [unknown items]

As discussed above both versions of the index attempt to aggregate repeated items such as columns or serial parts. This approach is also used for multiple stories with the same title (typically when there are a series of stories about a given character which all have the same title). In the v1 indexes this also applies to "unknown" stories as with Myrtle Juliette Corey. I'm not convinced this makes a lot of sense as it implies a commonality among the stories which doesn't exist, so in v2 these are listed separately rather than being aggregated, but I would welcome views on the matter.

Normalisation of volume/issue numbers

In general, when indexing magazines, we try to capture the information (e.g. author names and titles) in the magazine as it is listed in the magazine rather than normalising it (beyond capitalisation) - e.g. if one issue had a story by Robert Heinlein and another by Robert A. Heinlein then we record them as such, rather than changing the former to Robert A. Heinlein, say. (Actually, in the very early days this wasn't the case so some of the very old data in the database is incorrect in this regard).

One area this has never applied to is the recording of volume and issue numbers which, for the v1 index programmes, are always in the format vnn #mm (this is a hangover from an early version of the code which relied on these fields to sort the data into order). For some years I have been recording these as specified in the magazine and simply convert them to the standard format when I pass the data to Bill.

It will never be possible to use the "correct" volume/issue number format for the whole of the FMI, partly because of the time involved in correcting 150,000 issues, but mainly because we simply don't have the information - much of the data in the FMI comes from secondary sources which doesn't list such things "correctly".

The question is whether the "mixed mode" is acceptable with a degree of tidying up (e.g. in most pulps you can extrapolate the format of a whole batch of issues if you know the format of the first and last in the range) or whether people feel it is so "untidy" that we stick with vnn #mm ad infinitum.

After some discussion the current approach is to normalise them in the Magazine Issue Bottom Level Index but leave them as entered in the Magazine Issue Listings (which is analogous to the way we handle author names).