IdxGen - Generate a set of Index Files

Introduction

IdxGen generates a set of Index Files based on the contents of an Index Configuration File. As well as a (brief) discussion of each module in IdxGen, this file discusses some of the general background issues:


Basic Index Structure

While the individual indexes look quite different one from the other, there is an over-riding structure that is common to each index, although not all indexes use all parts. The core level is what might be called the Index 1 Level which contains a list of all the "things" being indexed (e.g. Artist Names, Magazine Names, Story Titles, etc.).

In most cases there is a level below this which we could call the Listings Level which lists all the items for each "thing" (e.g. issues for each Magazine, items by each author, items in each series, etc. etc.). Note that this level is not used for the Story and Book Title Indexes as these do not have an expansion. Conversely, for Magazine Issues and for Book Authors there is a further level below which we could call the Contents Level which lists the contents of each magazine issue or book.

Above the Index 1 Level there then might be up to two higher, hierarchical, levels which provide index ranges into the lower-level indexes. The intention is that these should be dynamic based upon the number of items in the Index 1 Level. As each level will contain MAXPAGESIZE entries of the next level, we can see that with a modest page size of, say, 300 lines:

Currently (June 2021) the largest index (the Story Title Index) has about 2,000,000 items in it so this should do for the foreseeable future. The table below summarises the situation as of June 2021. Note that one complication is that the top level of the index is always a single page with a fixed name, so a small index will only have the Index 3 Level; a medium-sized one will have Index 3 and Index 1 Levels and large indexes would have all three.

Note also that, while the formats of Listings and Contents Levels vary quite widely from index to index, the lowest Index level will always be simply a name or a continuation line and all the levels above that will be a range of names.

Index 3/Top Index 2 Index 1 Listings Contents Notes
Artists A01 BBBnnnn BBnnnn Bnnnn
Biographical Notes A02 CCCnnnn CCnnnn Cnnnn
Book Authors A03 DDDnnnn DDnnnn Dnnnn Ennnn
Book Titles A04 FFnnnn Fnnnn This index has no listings level
Chronological A05 HHnnnn Hnnnn Innnn
Magazine Issues A06 JJJnnnn JJnnnn Jnnnn Knnnn
Series A07 LLLnnnn LLnnnn Lnnnn
Story Authors A08 MMnnnn Mnnnn Nnnnn
Story Titles A09 Onnnn Pnnnn This index has no listings level
Full Text Links A10 QQQnnnn QQnnnn Qnnnn
Names A11 Rnnn Snnn This index has no listings level
Defined by topfilnam_ptr idxlv2pre_ptr idxlv1pre_ptr lstingspre_ptr contentspre_ptr

The creation and the structuring of the index levels is all handled by WRITE_INDEX_LINE which depends on a set of static tables as listed in the "Defined by" line above. These are accessed by the index type which is held in the curr_index_type field in the Configuration Data structure. There are also four other static tables similarly referenced:

Using WRITE_INDEX_LINE

Using the routine for normal indexes is very straightforward:

Note that the code generated for the Level 1 index is of the form:

<LI><A HREF="xnn.htm#Annn">formatted name</LI> (IDXLIN_NORMAL)
<LI>________: <A HREF="xnn.htm.htm#Annn">continuation text</LI> (for IDXLIN_CONT)
<LI>special text</LI> (for others)

It is up to the caller to ensure that formatted_name and continuation text contain "</A>" at the appropriate point to terminate the hyperlink - this allows just the first part of the text to be hyperlinked if so desired.


Calculating Bookmarks and Page Breaks

The indexes are linked together by hyperlinks which requires the creation of unique anchors for anything that might be the target of a link. This obviously presents a minor challenge as we need to know what the anchors in Index 1 are when creating Index 2, but also know what the anchors are in Index 2 when creating Index 1. To handle this always requires two passes through the indexes - one to set up the anchors and the other to implement them. There are various ways of doing this, but the approach IdxGen uses is to have one set of modules (SETUP_xxx_IDX) whose primary purpose is calculate the anchors and a second set (BUILD_xxx_IDX) which generates each index in turn. A key part of the first pass is also to decide where to insert a page break in the index so that pages do not get too large - depending on the index, these may or may not be in the middle of a section, as discussed below.

There are two approaches to handling the anchors in the two passes. Each requires the anchor to be stored in the first pass, but the second pass can then take one of two approaches:

  1. Look up the anchor in the table, throwing a new page when the anchor says one is needed;
  2. Calculate the anchors and page breaks again and periodically check they match those stored in the table.

The first of these is potentially the easiest, but the second provides a belts-and-braces sanity check - i.e., if the anchors/page numbers don't match then the code in either the SETUP or BUILD module (or both) is incorrect. For this reason the program currently adopts the second approach, although it can take quite a bit of time to track down an anomaly if the program detects the two don't match.

The anchors are either stored in an in-memory table or added to the sorted files and the page breaks vary from index to index, as follows:


Boilerplate Files

The use of boilerplate files was first used in the programs used for generating the GCP Website and provide a means for having standard, potentially fairly complex, HTML page layouts that can be easily changed without requiring programmatic changes. The principle is very straightforward - the file is a standard HTML file that can be created with any HTML editor and which contains a number of special flags of the form <!x>, <!x+> and <!x-> where x is some special code known to the program (often just a single letter).

A line that starts with <!x> indicates that the whole line should be replaced with whatever x represents to the program while <!x+> and <!x-> delimit an entire section that should be omitted under certain conditions (e.g. the "Previous Page" link on the first page).

There are three boilerplate files used by IdxGen:

A consistent set of substitution variables are used across the three files, even though no single file uses them all:


Sorted Files

While much of the data needed by the program resides in structures in memory it has proved impossible to hold everything in memory as discussed below. As such, a number of sorted data files are created and used by the program as follows:


Data File Naming Convention

The program supports the input of a mixture of magazine data files, book data files and additional reference files. At the moment, at least, these files need to be distinct (i.e. you can't have a single file with both magazine and book data, other than where the book is being listed as part of a magazine) and need to follow a strict naming convention:


External Sort Command File

The data in the indexes is too large for in-memory sorts with the 32-bit compiler, and a brief experiment with the 64-bit compiler resulted in 8 hours for a single full sort of the data (as opposed to 5 minutes for an external sort program) so, at various points, IdxGen calls an external command file (specified in the Index Configuration File) to sort a pair of files. As the sort order varies from instance to instance the command file is also passed a parameter indicating the type of sort that is required. Thus the formatted command that is executed might be:

   sortfil BOOKDATA IdxGen.tm1 IdxGen.Bks

where IdxGen.Tm1 is the input file and IdxGen.Bks is the output file. The possible command parameters are:

The current tool used for sorting is CMSort where the key switches are:

So, for instance, a sort key could be :

    /V=$1F /SV=1,1,0 /SV=2,1,0 /SV=3,1,0 /SV=4,1,0 /SV=5,1,0 /SV=6,1,0 /SV=7,1,0 /SV=11,1,0 /SV=8,1,0 /SV=9,1,0 /SV=10,1,0

However, experimentation shows that sorting with a key like this takes 4-5 times longer than a straight sort so, instead, the files are created in a different order depending on the way we want to sort them as described in the documentation of SETUP_SCANITEM


Configuration Data Structure

To allow the code to be broken into smaller sections without huge parameter lists, the bulk of the data that the program uses is held in a structure called config_data which is passed as an argument to most routines. It contains the following groups of fields:

Settings from the Index Configuration File:

	char	idxnam[128];			/* Name of Index */
char editor[128]; /* Name of Editor(s) */
char idxdir[128]; /* Folder to generate indexes into */
char boiler[128]; /* Folder containing boilerplates */
char sortfile[128]; /* Name of the file containing the sort commands */
char ctrlfile[128]; /* Name of the file defining the files to include in the index */ char ablkfile[128]; /* Name of the file containing the about links for the index */
char intrfile[128]; /* Name of the file containing the Introduction */
char missfile[128]; /* Name of the file containing a list of missing issues */
char omitfile[128]; /* Name of the file containing a list of magazines deliberately omitted from the index */
char nxtufile[128]; /* Name of the file containing a list of items for the next update */
char toctext[128]; /* Name of the file containing additional text to insert in Table of Contents */
char lastupdate[10]; /* Date of last update */
int subfolders; /* Set to PSP_TRUE if files should be generated in sub-folders */
int pubindex; /* Set to PSP_TRUE if we want a "by Publisher" index */
int sortnames; /* Set to PSP_TRUE if the magazine files should be sorted by name */
int fullimages; /* Set to PSP_TRUE if cover scans are to be displayed full-size */ int minpagesize; /* The minimum number of lines to display on a page (default 200) */
int maxpagesize; /* The maximum number of lines to display on a page (default 1000) */ int permlinks; /* Set to PSP_TRUE if permanent links should be output */
int report_diags; /* Set to PSP_TRUE if extended diagnostics should be output */ char special1[128]; /* First internal special flag */
char special2[128]; /* Second internal special flag */
char special3[128]; /* Third internal special flag */
char special4[128]; /* Fourth internal special flag */
char special5[128]; /* Fifth internal special flag */
char special6[128]; /* Sixth internal special flag */
char special7[128]; /* Seventh internal special flag */
char special8[128]; /* Eighth internal special flag */
char special9[128]; /* Ninth internal special flag */

The scandata structure used by SCAN_FILE:

	struct	scandata *scandata_ptr;		/* Pointer to scandata structure */

The lists of magazine files (in a sub-structure so that they can be sorted by magazine sort name) and the consolidated magazine file name:

	struct magfile {
char *magnam_ptr; /* Pointers to Magazine Names */
char filnam[MAXFILENAME]; /* Magazine File Name */
};
struct magfile magfiles[MAXMAGFILES]; /* Magazine File & Sort Names */
int magfile_cnt; /* Count of Magazine Files */ char magfilnam[128]; /* Magazine File Name */

The name of the consolidated book file name and flag to indicate that we have some books:

	int	got_books;			/* Flag to say we have some books */
	char	bookfilnam[128];		/* Book File Name */

A number of fields related to the index type currently being built:

	char	curr_index_type;		/* Current index type */
FILE *topfil_ptr; /* Top-Level Index Output File Pointer */
FILE *midfil_ptr; /* Middle-Level Index Output File Pointer */
int midpage_count; /* Page and line counts for the middle-level index */
int midline_count;
int lstpage_count; /* Page, line & anchor counts for the listings */
int lstline_count;
int lstanchor_count;
char first_name[1024]; /* First name for top-level index */
char last_name[1024]; /* Last name for top-level index */
char curr_name[1024]; /* Current name */
char formatted_name[4096]; /* formatted version of same for index headings (can be huge for house names) */

And some miscellaneous (useful) data:

	char	uplink[4];			/* Set to "../" if we have subfolders; to "" otherwise */
char toplink[6]; /* Set to "../a/" if we have subfolders; to "" otherwise */
char tmpdir[256]; /* Folder to use for temporary files */ FILE *namfil_ptr; /* Names Link File */
FILE *dmpfil_ptr; /* Diagnostic Dump File (if needed) */
FILE *csvfil_ptr; /* CSV file for Names Link Database (if needed) */

Note that, as we store key information related to the current index type in the structure, it is critical that the index types are processed sequentially, not concurrently (see BUILD_FULLTEXT_IDX below).


Other Global Data

We also need to expose some of the data globally so that we can sort them efficiently via qsort. This includes:

The Issue Link Table containing the list of magazine issue IDs and book IDs and the associated links and a list of subscripts into that array:

static	char	*isslink_det_ptr[MAXISSUES];	/* Issue Details Table */
static char *isslink_exp_ptr[MAXISSUES]; /* Expanded Details Table (also used for cover scan links in READ_MAG_FILES) */
static char *isslink_txt_ptr[MAXISSUES]; /* Full Text/About Link Table (in READ_MAG_FILES) */ /* Book Title Link (elsewhere) */
static char *isslink_pre_ptr[MAXISSUES]; /* Issue Link Page/Anchor Prefix Table */
static int isslink_pageno[MAXISSUES];
static int isslink_anchor[MAXISSUES]; /* Issue Link Page/Anchor Table */
static char isslink_edition[MAXISSUES]; /* and edition */
static int isslink_idx[MAXISSUES]; /* and Indexes into Table(s) */
static int isslink_cnt; /* Count of Issue Links */

The Names Link Table containing the list of authors in the Story Author, Book Author, Artist, Chronological & Biographical Notes Indexes:

static	char	*nameslink_nrmaut_ptr [MAXNAMES]; /* Normalised author names */
static char *nameslink_auth_ptr [MAXNAMES]; /* Standard author names */
static char nameslink_namtyp[MAXNAMES]; /* Name Type Table: Bitmap indicating which indexes the name appears in: */ /* 1=Story Author; 2=Artist; 4=Book Author */
static int nameslink_pseudsub[MAXNAMES]; /* Index into PSEUD.CVT */
static int nameslink_stypag[MAXNAMES];
static int nameslink_styanc[MAXNAMES];
static int nameslink_stylin[MAXNAMES]; /* Story Author Index Page/Anchor/Line Table */
static int nameslink_artpag[MAXNAMES];
static int nameslink_artanc[MAXNAMES]; /* Artist Index Page/Anchor Table */
static int nameslink_bokpag[MAXNAMES];
static int nameslink_bokanc[MAXNAMES]; /* Book Author Index page/anchor table */
static int nameslink_crnpag[MAXNAMES];
static int nameslink_crnanc[MAXNAMES]; /* Chronological Index Page/Anchor Table */
static int nameslink_biopag[MAXNAMES];
static int nameslink_bioanc[MAXNAMES];
static int nameslink_biotyp[MAXNAMES]; /* Biographical Notes page/anchor/type table */
static int nameslink_idx[MAXNAMES]; /* and indexes into table */
static int nameslink_cnt; /* and count of Links */

Similar information for the book titles in the Book Title Index:

static	int	bokttllink_recnum[MAXBOOKS];	/* Record number in Books File */
static int bokttllink_pageno[MAXBOOKS];
static int bokttllink_anchor[MAXBOOKS]; /* Book Title page/anchor table */
static char *bokttllink_pubdet[MAXBOOKS]; /* Abbreviated publication details */
static int bokttllink_idx[MAXBOOKS]; /* and indexes into table */
static int bokttllink_cnt; /* and count of Links */

Similar information for the series names in the Series Index:

static	char	*serieslink_nam_ptr [MAXSERIES]; /* Series Names Table */
static int serieslink_pageno[MAXSERIES];
static int serieslink_anchor[MAXSERIES]; /* Series Page/Anchor Table */
static int serieslink_idx[MAXSERIES]; /* and indexes into table */
static int serieslink_cnt; /* and count of Links */

The array of column headers that should be suppressed when encountered:

static	char	*colhdr_arr[MAXCOLHDRS]; 	/* Array of column headers to be suppressed */
static int colhdr_cnt; /* and count thereof */

Key information (mainly static) related to the different index types:

static	char*	topfilnam_ptr;			/* File names for the top-level index for each index type */
static char* idxlv2pre_ptr[]; /* Prefix letter(s) for the level 2 index for each index type */
static char* idxlv1pre_ptr[]; /* Prefix letter(s) for the level 1 index for each index type */
static char* lstingspre_ptr[]; /* Prefix letter(s) for the listings level for each index type ("" if none) */
static char* contentspre_ptr[]; /* Prefix letter(s) for the contents level for each index type ("" if none) */
static char* idxtyp_ptr[]; /* Index Type Name to be used when linking to top-level index for each index type */
static char* idxpgttl_ptr[]; /* Page Title to be used in Index Headers for each index type */ static int idxemphchr[]; /* Type of emphasis to be used in index levels: 'B' = Bold; 'I' = Italics; ' ' = None */ static int idxlinecnt[]; /* Number of lines in the bottom-level index */
static int idxsinglvl[]; /* Set to true if this index is wholly contained in the top-level index page */
static char* idxbgcolor_ptr[]; /* RGB colour value to set as background colour for this index */

Lastly, some odd counts for the Statistics File and for Terminal Diagnostics:

static	int	author_cnt=0;			/* Number of authors in the index */
static int artist_cnt=0; /* Number of artists in the index */
static int magtitles_cnt=0; /* Number of magazine titles in the index */
static int magissues_cnt=0; /* Number of magazine issues in the index */
static int books_cnt=0; /* Number of books in the index */
static int fiction_cnt=0; /* Number of fiction items in the index */
static int poems_cnt=0; /* Number of poems and plays in the index */
static int nonfiction_cnt=0; /* Number of non-fiction items in the index */
static int covers_cnt=0; /* Number of cover images in the index */
static int pages_cnt=0; /* Number of pages in the index */
static int ftlinks_cnt=0; /* Number of full-text links in the index */
static int biog_cnt=0; /* Number of biographical notes in the index */
static int nonfatal_diags=0; /* Number of non-fatal diagnostics (check idxgen.prt) */
static int max_html_files=0; /* Maximum number of HTML files for any index type (compared to MAXHTMLFILES) */
static int multgrp_max=0; /* Maximum number of different filnam/byline groups (compared to MAX_MULTGRP) */
static int multitm_max=0; /* Maximum number of occurrences for a single item (compared to MAX_MULTITM) */
static int multrep_max=0; /* Maximum number of entries with reprint author/title details for an item (compared to MAX_MULTREP) */
static int multxtr_max=0; /* Maximum number of groups with distinct bylines/original titles (compared to MAX_MULTXTR) */ static int do_diag=PSP_FALSE; /* Diagnostic flag; this can be used anywhere to create a line on which a breakpoint */ /* can be set when a particular condition occurs */

Program Modules

The program consists of the following modules:

which communicate via the Configuration Data structure and the Other Global Data. There are also a number of significant support routines:

Note that several of the BUILD_xxx routines revolve around the use of Boilerplate Files and the whole program relies on a sequence of Sorted Files. Note also that the program relies on a naming convention for the data files as discussed here and an external sort command file.

Currently IDXGEN only supports UK 7-bit format input files - it wouldn't be rocket science to extend this to handle US format files as well but it isn't a priority at present.


The main program

The main program:


PARSE_CONFIG - Read and Parse the Index Configuration File

This module simply reads the Index Configuration File, checking that all essential fields have been specified and defaulting any unspecified, optional, fields.


SETUP_FILES - Set up the list of Files

This module reads the list of files specified for the index and handles these differently depending on which type of file they are:

When all the files have been read it checks to see if this index wants the magazine names to be sorted and, if so, calls qsort to sort the magfiles array and READ_MAG_FILES to read all the magazine files and create the consildated magazine data file (with added cover scan, full text and about link information).

It then checks to see if we had some books and, if so, calls the sort routine to sort it into order (IdxGen.tm2 in the temporary folder). It then opens the sorted file and copies the contents to the real consolidated book data file (IdxGen.bks in the temporary folder) stripping off the sort header.


SETUP_BOKAUT_IDX - Set up all the anchors for the Book Author Index

This module is only called if some books have been detected and, if so, reads all the records in the consolidated book data file (IdxGen.bks) and:

Note that:


SETUP_BOKTTL_IDX - Set up all the anchors for the Book Title Index

This routine ...


SETUP_ISSUE_IDX - Read all the files and set up the Issue Index

This module reads all the all the records in the consolidated book data file (IdxGen.mgs) and:

Note that:


WRITE_LINKS_FILE - Create the files for links from the GCP Website

This routine creates the Magazine Links File from the feature record entries in the Issue Link Table added by SETUP_ISSUE_IDX. It simply creates the top level file and then loops through the Issue Link Table looking for entries that start with four spaces and have an embedded ^^. As discussed above, there may be multiple entries for a given abbreviation, distinguished by a suffix after ^^ so the routine just strips off the suffix from the first such and writes an entry to the approproate Links File for it.


SETUP_STYTTL_IDX - Set up the Title Index

This module sorts the raw SCANITEM file (IdxGen.tmp) into title-order (as IdxGen.tm3). It then reads through the file setting up the structure of the Story Title Index, rewriting the file (as IdxGen.tm4) with the relevant links to the Story Title Index embedded.


SETUP_STYAUT_IDX - Set up the Story Author Index

This module sorts the title-order SCANITEM file into author-order. It then reads through the file setting up the structures of the Story Author and Artist Indexes, storing links for each author/artist in the internal author structures.


SETUP_ARTIST_IDX - Set up all the anchors for the Artist Index

This module ...


SETUP_CHRON_IDX - Set up the Chronological Index

This module sorts the title-order SCANITEM file into chronological-order. It then reads through the file setting up the structures of the Chronological Index, storing links for each author in the internal author structures.


SETUP_SERIES_IDX - Set up the Series Index

This module first needs to read through the file generated by SETUP_STYAUT_IDX (IdxGen.Ttl), which was in Story Title Index order and rewrite the entries that belong to a series (as IdxGen.tm8) in the order needed by the series index. This is complicated by a number of factors.

Firstly, if an item in a series also has a series prefix it will appear in the input file twice (once with the prefix and once without) but we only want a single entry in the series index so we ignore any titles that contain a "|" divider. A similar thing happens with articles about a series that quote the author as a subject which will have entries both for the author of the article and the subject so we want explicitly to ignore the latter.

Secondly, if the item in question is part of a serial, it will appear in the input file for each part of the serial, and the convention is that we only list the first (known) part with ", etc." appended if there are multiple parts (currently this also applies to multiple stories with the same name, as per v1, but we might want to consider changing this). We use the val_level field in the scanitem structure to indicate this - it is set to '0' if it is not part of a serial and to '1' otherwise. However, we won't know until we read the "next" record whether the flag needs to be set for the "current" record so the routine has to play around with two records at the same time.

Thirdly, if the item in question is by multiple authors, it will appear in the input file for both authors. In simple cases we just pass both through to the sorted file and then only list the first instance of each item. A complication occurs if the item is by a house name or joint pseudonym where the author(s) is known. In this instance the item will appear in the input file for the house name/joint pseudonym as well as the real author(s). For reasons that aren't currently clear the former does not contain the information needed to create a proper entry (listing house name/pseudonym as well as real author(s)) which is a problem if it sorts before the other entries. To address this the code checks to see if the current item is for a pseudonym (nonempty secnam_ptr) and the next is for a real name (empty secnam_ptr) then the current one is ignored in favour of the next. (This problem was noted for Petra Christian items authored by Christopher Priest on his own).

Fourthly, we want the items sorted in order of original publication date. If the original publication is in the file then it will appear first, but in some cases all we have in the file is a reprint so we need to sort out the original publication date, which might involve decoding a book abbreviation to see if there is a more precise date in ABBREV.CVT. In all cases it attempts to create a numeric version of the date so that we can sort by it.

Any item may belong to multiple series simultaneously so the routine next needs to split the series field into individual series. Then, for each series:

processes the internal series records array, setting up the structure of the Series Index, and storing a pointer to each series entry in the internal series structures.


SETUP_BIOG_NOTES - Set up all the anchors for the Biographical Notes

This module ...


WRITE_TABLE_OF_CONTENTS - Write the Table of Contents File


BUILD_BOKAUT_IDX - Build the Book Author Index

This module ...


BUILD_BOKTTL_IDX - Build the Book Title Index

This module ...


BUILD_ISSUE_IDX - Build the Magazine Issue and Book Author Indexes

This module reads all the files defined in the (sorted) magazine file array followed by the consolidated book data file and creates the Magazine Issue and Book Author Indexes. This gets potentially complex as we are building four levels of index simultaneously:

There are controlled by parallel sets of variables as follows:

For simplicity (?) the code is potentially executed twice - once for magazines and then again for books - on the assumption that the two passes have more in common than they have in differences (though it is debatable). For each pass, the code first creates (and writes a header to) the top-level file but, for simplicity, all the other files are created as needed based on the associated line_count and the values specified in the Configuration Data structure for minimum and maximum page sizes. Put simply:

Note that, in the above, books edited by an author are treated as being the same as those written by the author.

One particular issue that arises with both sets of files is the handling of 'D' records for two reasons:

To handle this, whenever we encounter an 'A' record we call FLUSH_DREC to read all following 'D' records and the first (real) 'E' record if there is one.


BUILD_FULLTEXT_IDX - Build the Full-Text Index

This module ...


BUILD_STYAUT_IDX - Build the Story Author Index

This module reads the author-order SCANITEM data file and creates the Story Author Index.


BUILD_STYTTL_IDX - Build the Story Title Index

This module reads the title-order SCANITEM data file (IdxGen.ttl) and creates the Story Title Index. This is somewhat different to most of the build routines as the bottom level of the index is actually the Level 1 Index rather than the Listings Level.


BUILD_ARTIST_IDX - Build the Artist Index

This module ...


BUILD_CHRON_IDX - Build the Chronological Index

This module reads the chronological-order SCANITEM data file and creates the Chronological Index.


BUILD_SERIES_IDX - Build the Series Index

This module reads the internal series record array and creates the Series Index.


BUILD_BIOG_NOTES - Build the Biographical Notes

This module ...


WRITE_STATISTICS - Write the Statistics Page

This module ...



CREATE_NAME_LINK - Create a new nameslink entry

/************************************************************************/
/* */
/* CREATE_NAME_LINK - Create a new nameslink entry */
/* */
/* Calling Format: */
/* */
/* linksub = CREATE_NAME_LINK (scanitem_ptr, prtfil_ptr); */
/* */
/* Where: */
/* */
/* linksub = Subscript into nameslink table of (main) item */
/* = -1 if an error occurred */
/* scanitem_ptr = Pointer to SCANITEM structure */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine allocates an entry in the Names Link table for the */
/* name specified in the supplied scanitem. */
/* */
/************************************************************************/

This routine allocates a new entry in the Names Link Table, setting up the name from the normalised name in the scanitem passed as a parameter. It calls FIND_NAME to set up the PSEUD.CVT entry (if any) and initialises all the other fields.


CREATE_NEW_PAGE - Create a new page for the specified file

/************************************************************************/
/* */
/* CREATE_NEW_PAGE - Create a new page for the specified file */
/* */
/* Calling Format: */
/* */
/* status = CREATE_NEW_PAGE (config_ptr, outfil_ptr, page_count, */
/* prefix_ptr, pagttl_ptr, prtfil_ptr); */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* outfil_ptr = Pointer to file pointer for creating new page */
/* page_count = Current page count */
/* prefix_ptr = File/Folder prefix letter */
/* pagttl_ptr = Title of page */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

All the HTML pages in the index follow the same basic format so this routine is called whenever we want to create a new page in an index level. If we have a current page (i.e. page_count > 1) it calls WRITE_PAGE_TRAILER to write the trailer to it and closes the old file. It then opens a new one and calls WRITE_PAGE_HEADER to write the page header to it. Note that, as the routine opens a new file (and hence creates a new file handle) the file parameter passed to it is a pointer to the file handle rather than the file handle itself. Note also that this routine depends on the value of config_ptr->curr_index_type when writing the page headers and trailers to indicate the home link.


FLUSH_DREC - Read all following 'D' records and first 'E' record (if any)

/************************************************************************/
/* */
/* FLUSH_DREC - Read all 'D' records and 1st 'E' record */
/* */
/* Calling Format: */
/* */
/* status = FLUSH_DREC (inpfil_ptr, linecnt_ptr, note_ptr_ptr, */
/* cvartist_ptr_ptr, series_ptr_ptr, */
/* flags_ptr_ptr, nxtrec_ptr_ptr, prtfil_ptr) */
/* */
/* Where: */
/* */
/* status = Routine return status: */
/* = PSP_TRUE if everything OK */
/* = PSP_FALSE if we hit an error */
/* inpfil_ptr = Pointer to input file */
/* linecnt_ptr = Pointer to line count to update */
/* note_ptr_ptr = Pointer to buffer to hold pointer to notes */
/* cvartist_ptr_ptr = Pointer to buffer to hold pointer to artist */
/* series_ptr_ptr = Pointer to buffer to hold pointer to series */
/* flags_ptr_ptr = Pointer to buffer to hold pointer to flags */
/* nxtrec_ptr_ptr = Pointer to buffer to hold pointer to record */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

There are a number of places where we need to know what the contents of any following 'D' records are when we process an 'A' record. It is also important to concatenate all the text in the 'D' records into a single buffer before calling CVT_TXT_TO_HTML so that it can correctly check for mismatched commands like "Bold On". This routine is called whenever we encounter a new 'A' record to read through and parse all the following 'D' records. As it won't know when we've run out of 'D' records it also has to read the first record after the 'D' records and return that to the caller. Because of this we are also able to parse any front cover records specified on a 'cv' record (see below). Some notes to bear in mind are:


FORMAT_AUTHOR - Format an Author Name as heading (with scanitem as input)

/************************************************************************/
/* */
/* FORMAT_AUTHOR - Format an Author Name as heading */
/* */
/* Calling Format: */
/* */
/* outbuf_ptr = FORMAT_AUTHOR (config_ptr, scanitem_ptr, */
/* auth_flag, link_typ, got_items, */
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to buffer containing formatted name */
/* = NULL if an error */
/* = "" if we don't want this one */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* scanitem_ptr = Pointer to SCANITEM structure */
/* auth_flag = FMTAUT_NEW if author name has changed */
/* = FMTAUT_CNT if we want a continuation name */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external */
/* got_items = PSP_TRUE if author has items in this index */
/* = PSP_FALSE otherwise */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This formats a story author into the format used for the headers */
/* in the listing file and for the middle-level index. */
/* */
/* Note that the returned string has an embedded </A> to terminate */
/* the <A HREF or <A NAME that will precede it. */
/* */
/************************************************************************/

This routine ...


FORMAT_AUTHOR2 - Format an Author Name as heading (with AUTH structure as input)

/************************************************************************/
/* */
/* FORMAT_AUTHOR2 - Format an Author Name as heading */
/* */
/* Calling Format: */
/* */
/* outbuf_ptr = FORMAT_AUTHOR2 (config_ptr, auth_ptr, link_typ, */
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to buffer containing formatted name */
/* = NULL if an error */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* auth_ptr = Pointer to author name */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This is a shell around FORMAT_AUTHOR that is called when we don't */
/* have a scanitem to hand. */
/* */
/************************************************************************/

This routine simple sets up the Normalised Author field in a local scanitem structure from the passed parameter and then calls FORMAT_AUTHOR.


FORMAT_AUTHTYPE - Format an Author Type as subheading

/************************************************************************/
/* */
/* FORMAT_AUTHTYPE - Format an Author Type as subheading */
/* */
/* Calling Format: */
/* */
/* output_ptr = FORMAT_AUTHTYPE (auth_type) */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to formatted buffer */
/* auth_type = Author type */
/* */
/* This formats a story author type into the format used for the */
/* subheaders in the Story Author Listings File. */
/* */
/************************************************************************/

This routine checks to see if the SCANAUT_MAYBE flag is set and, if so, starts with ", [?]"; it then appends to the buffer a suffix such as ", after." that corresponds to the author type.


FORMAT_BOOK_DETAILS - Format a set of book details

/************************************************************************/
/* */
/* FORMAT_BOOK_DETAILS - Format book details */
/* */
/* Calling Format: */
/* */
/* FORMAT_BOOK_DETAILS (outbuf_ptr, fld_ptr, bufsiz) */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to output buffer */
/* fld_ptr = Array of field pointers */
/* bufsiz = Size of output buffer */
/* */
/************************************************************************/

This routine formats the contents of an 'A' record into the book detail format generated by FORMAT_PUBDET for a real abbreviation and is called from SETUP_BOKAUT_IDX for books without a real ID. It generates a string along the lines of:

Twelve Tales by Grant Allen, G. Richards, 1899

where the string has been made HTML-safe, the title and publisher have had any extraneous bits stripped off and the author/editor name(s) are formatted via BUILD_AUTH.


FORMAT_FULLTEXT_LINK - Format a Full Text link field

/************************************************************************/
/* */
/* FORMAT_FULLTEXT_LINK - Format a Full Text link field */
/* */
/* Calling Format: */
/* */
/* buff_ptr = FORMAT_FULLTEXT_LINK (link_ptr) */
/* */
/* Where: */
/* */
/* buff_ptr = Returned pointer to formatted field */
/* date_ptr = Pointer to link field */
/* */
/* This routine formats the contents of a full text link field. */
/* */
/************************************************************************/

This routine ...


FORMAT_ISSUE_LINK - Format link to specified issue (if possible)

/************************************************************************/
/* */
/* FORMAT_ISSUE_LINK - Format link to specified issue (if possible) */
/* */
/* Calling Format: */
/* */
/* link_ptr = FORMAT_ISSUE_LINK (config_ptr, pubdet_ptr, */
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* link_ptr = Pointer to Issue Details with link */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* pubdet_ptr = Issue details to find link for */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine calls GET_PUBDET_LINK to see if the specified issue/book is in our index and, if so, to return the page and anchor references for it. It then calls FORMAT_PUBDET to format the publication details into the format we want and returns a pointer to an internal static buffer containing the formatted publication details, surrounded (if appropriate) by an A HREF link to the associated part of the index.


FORMAT_MAGPUB - Format Magazine Issue Details

/************************************************************************/
/* */
/* FORMAT_MAGPUB - Format magazine issue details */
/* */
/* Calling Format: */
/* */
/* FORMAT_MAGPUB (pubdet_ptr, fld_ptr, bufsiz) */
/* */
/* Where: */
/* */
/* pubdet_ptr = Pointer to output buffer */
/* fld_ptr = Array of field pointers */
/* bufsiz = Size of output buffer */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

FORMAT_NAMES - Format a names field.

/************************************************************************/
/* */
/* FORMAT_NAMES - Format names field */
/* */
/* Calling Format: */
/* */
/* buff_ptr = FORMAT_NAMES (config_ptr, names_ptr, link_typ, */
/* format_typ, newtxt_ptr, prtfil_ptr) */
/* */
/* Where: */
/* */
/* buff_ptr = Returned pointer to formatted names */
/* = NULL if we had an error */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* names_ptr = Pointer to names field */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external flag */
/* format_typ = FMTNAM_NORMAL if we want all names */
/* = FMTNAM_HOUSE if we only want names in the index */
/* and want dates added to each */
/* = FMTNAM_PSEUD if we only want names in the index */
/* newtxt_ptr = Replacement text to display */
/* (this is only valid if there is only a single primary */
/* name and no secondary names) */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine formats the contents of a names field, adding links to the associated section for the name. This is a fairly complex routine that attempts to parse a string of names into the format required by the indexes. It is (seriously) complicated because the basic mechanism is common to a wide range of differing circumstances that need careful handling. For example:

A general complication is that we want to handle dividers neatly so that we have ", " separating all pairs of authors except the last pair which is separated by " & ", and we need to do this for the primary authors as a group, and for each set of (local or global) authors as a group. As it isn't easy to know when we've reached the last pair of authors (for primary authors because some might be suppressed; for secondary authors because there are multiple types held in the same structures) we handle this by inserting ", " dividers everywhere and remembering in each batch where we put them; then, at the end of the batch, we check to see if we had at least one divider and, if so, replace the last one with " & ".

In addition to the above, the routine is called in two different contexts. In general we have the whole author string to be formatted, but in STYAUT the author itself has already been output as the heading, so all we have to parse are the secondary names (which could be local or global). If we have both author name and secondary name then these are identical. Thus, for example:

	Maupassant, Guy de ,(tr:Wilson, Mary W.)

is translated as:

	Guy de Maupassant; translated by Mary W. Wilson

while:

	 ,(tr:Wilson, Mary W.)

is translated as:

	translated by Mary W. Wilson

However, complications arise if one of the names is specified as "Anon." or "[Unknown Author]" and we need to distinguish between the first case with a primary name of "Anon." and the second case where we don't have the primary name. For example:

	Anon. ,(by:Wilson, Mary W.)

is translated as:

	Mary W. Wilson, uncredited.

while:

	 ,(by:Wilson, Mary W.)

is translated as:

	(by Mary W. Wilson)

This is handled by the two flags got_anon which is set in the former case and no_primary_name which is set in the latter case. Note that got_anon is only set if there are some secondary names as otherwise we just output it as [uncredited].


FORMAT_NOTES - Format a Notes Field

/************************************************************************/
/* */
/* FORMAT_NOTES - Format a notes field */
/* */
/* Calling Format: */
/* */
/* status = FORMAT_NOTES (config_ptr, outbuf_ptr, notes_ptr, */
/* outbuflen, link_typ, prtfil_ptr); */
/* */
/* Where: */
/* */
/* status = PSP_TRUE if notes formatted successfully */
/* = PSP_FALSE otherwise */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* outbuf_ptr = Pointer to output buffer */
/* notes_ptr = Pointer to input buffer */
/* outbuflen = Size of output buffer */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external flag */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine formats the contents of a notes field, translating */
/* any [@ or [% hyperlinks. */
/* */
/* Note that this routine APPENDS to the output buffer. */
/* */
/************************************************************************/

This routine ...


FORMAT_PUBDATE - Format Pub. Info. date field

/************************************************************************/
/* */
/* FORMAT_PUBDATE - Format Pub. Info. date field */
/* */
/* Calling Format: */
/* */
/* buff_ptr = FORMAT_PUBDATE (date_ptr) */
/* */
/* Where: */
/* */
/* buff_ptr = Returned pointer to formatted date */
/* = NULL if we had an error */
/* date_ptr = Pointer to date field */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine formats the contents of a date field in the Pub. Info. */
/* section. These are of the form [cc]yy/mm[/dd]. */
/* */
/************************************************************************/

This routine ...


FORMAT_SERIES - Format series field

/************************************************************************/
/* */
/* FORMAT_SERIES - Format series field */
/* */
/* Calling Format: */
/* */
/* buff_ptr = FORMAT_SERIES (config_ptr, series_ptr, include_flg, */
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* buff_ptr = Returned pointer to formatted names */
/* = NULL if we had an error */
/* = "" if no matching series */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* series_ptr = Pointer to series field */
/* include_flg = PSP_TRUE if we want all series */
/* = PSP_FALSE if we only want series in the index */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine formats the contents of a series field, adding links */
/* to the associated section for the series. */
/* */
/************************************************************************/

FORMAT_URL - Format a URL specified in an FM data file

/************************************************************************/
/* */
/* FORMAT_URL - Format a URL specified in an FM data file */
/* */
/* Calling Format: */
/* */
/* output_ptr = FORMAT_URL (url_ptr, text_flg, prtfil_ptr) */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to formatted buffer */
/* url_ptr = Pointer to input URL */
/* text_ptr = PSP_FALSE if formatting the URL as hyperlink */
/* = PSP_TRUE if formatting the URL for display */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This formats a URL into the format required for use as a hypelink */
/* or for display to the user. */
/* */
/************************************************************************/

This routine ...


GET_ANCHOR - Get the HTML to link to a specific anchor

/************************************************************************/
/* */
/* GET_ANCHOR - Get the HTML to link to a specific anchor */
/* */
/* Calling Format: */
/* */
/* link_ptr = GET_ANCHOR (config_ptr, prefix_ptr, page_count, */
/* anchor_count) */
/* */
/* Where: */
/* */
/* link_ptr = Pointer to HTML link */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* prefix_ptr = Filename/folder prefix */
/* page_count = Page number */
/* anchor_count = Anchor number */
/* = -1 if just page link wanted */
/* */
/************************************************************************/

Depending on the setting in the Index Configuration File all the files in an index might be in a single folder or in multiple subfolders. To avoid checking this all over the place (and to allow for future flexibility) all links are set up via this routine which simply constructs the relevant HTML link.


GET_ANCHOR2 - Get the HTML to link to a name index anchor

/************************************************************************/
/* */
/* GET_ANCHOR2 - Get the HTML to link to a name index anchor */
/* */
/* Calling Format: */
/* */
/* link_ptr = GET_ANCHOR (config_ptr, link_typ, page_count, */
/* anchor_count) */
/* */
/* Where: */
/* */
/* link_ptr = Pointer to HTML link */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* link_typ = Type of link required */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG or LINK_BIOG2 if biographical notes */
/* page_count = Page number */
/* = -1 if just directory link wanted */
/* anchor_count = Anchor number */
/* = -1 if just page link wanted */
/* */
/* This is just a shell around GET_ANCHOR */
/* */
/************************************************************************/

This routine ...


GET_NAME_INDEX - Get index of specified name (if possible)

/************************************************************************/
/* */
/* GET_NAME_INDEX - Get index of specified name (if possible) */
/* */
/* Calling Format: */
/* */
/* index = GET_NAME_INDEX (name_ptr, link_typ, prtfil_ptr) */
/* */
/* Where: */
/* */
/* index = Index into tables for specified name */
/* = -1 if link not found */
/* name_ptr = Name to find link for */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external flag */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine ...


GET_NAME_LINK - Get link to specified name (if possible)

/************************************************************************/
/* */
/* GET_NAME_LINK - Get link to specified name (if possible) */
/* */
/* Calling Format: */
/* */
/* GET_NAME_LINK (name_ptr, link_typ, pageno_ptr, anchor_ptr, */
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* name_ptr = Name to find link for */
/* link_typ = Type of link required */
/* = LINK_UNKNOWN if unknown */
/* = LINK_STYAUT if story authors */
/* = LINK_BOKAUT if book authors */
/* = LINK_ARTIST if artists */
/* = LINK_CHRON if chronological index */
/* = LINK_BIOG if biographical notes */
/* = LINK_BIOG2 if biographical notes OR external flag */
/* pageno_ptr = Pointer to int to hold page number */
/* = -1 if link not found */
/* = -2 if LINK_BIOG2 & we have external link */
/* anchor_ptr = Pointer to int to hold anchor */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

GET_PUBDET_LINK - Get offset of link to specified item (if possible)

/************************************************************************/
/* */
/* GET_PUBDET_LINK - Find set of publication details in issue links */
/* table (if possible) */
/* */
/* Calling Format: */
/* */
/* index = GET_PUBDET_LINK (pubdet_ptr) */
/* */
/* Where: */
/* */
/* index = Index into issue links table */
/* = -1 if no link found */
/* pubdet_ptr = Publication details to look for */
/* */
/************************************************************************/

This routine simply looks up the specified publication details is the isslinks array (first converting it to the "old" format if necessary) and returns the offset into the arrays (or -1 if the details can't be found).


GET_SERIES_LINK - Get link to specified series (if possible)

/************************************************************************/
/* */
/* GET_SERIES_LINK - Get link to specified series (if possible) */
/* */
/* Calling Format: */
/* */
/* GET_SERIES_LINK (series_ptr, pageno_ptr, anchor_ptr, prtfil_ptr)*/
/* */
/* Where: */
/* */
/* series_ptr = Series name to find link for */
/* pageno_ptr = Pointer to int to hold page number */
/* = -1 if link not found */
/* anchor_ptr = Pointer to int to hold anchor */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

READ_BOOK_FILE - Read book data file into consolidated book data file

/************************************************************************/
/* */
/* READ_BOOK_FILE - Read book data file & add to consolidated file */
/* */
/* Calling Format: */
/* */
/* status = READ_BOOK_FILE (filnam_ptr, outfil_ptr, prtfil_ptr); */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* filnam_ptr = Name of book data file */
/* outfil_ptr = Consolidated data file to write to */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine opens the specified book data file and copies all the */
/* the contents to the output file, adding a sort prefix. */
/* */
/************************************************************************/

While each individual book data file will probably be sorted into order (thanks to ASORT) we will be combining multiple book data files so we need to sort them all into a consistent order. This routine is called for each book data file and simply copies the contents to the consolidated data file with a sort prefix for each record consisting of the output from LOCUS_COMPACT for the most recent/current 'A' record followed by a six-digit record number (to keep items in order within a book and books with the same compacted form in the same order in the data file) followed by a second '~' divider, followed by the input record.

Note that this routine:

The main complication is introduced by the use of DQN records to suppress an entire book. Early versions of the code left it up to the later code to handle this, but that led to page-sizing problems (as well as being inelegant). The next version tried to use fgetpos & fsetpos on the output file to reset it to just before the preceding 'A' record if we hit a DQN record, but that presented possible portability problems and, more to the point, didn't seem to work for obscure reasons. Rather than try to sort out the problem the next version got rid of fgetpos and fsetpos by using an array of buffers to read the records into:


READ_MAG_FILES - Read all magazine files into a consolidated magazine data file

/************************************************************************/
/* */
/* READ_MAG_FILES - Read magazine data files, cover scan file and */
/* full-text link file & create consolidated file */
/* */
/* Calling Format: */
/* */
/* status = READ_MAG_FILES (config_ptr, prtfil_ptr); */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine sets up sorted lists of cover scan, full text and about links and then reads through the magazine data files creating a consolidated data file with the relevant links added. Note that these are added as "extra" fields (FLD_IMGLNK & FLD_TXTLNK) at the end of the 'A' record. In an ideal world we could just read the files into memory and access them when needed (as we do for PSEUD.CVT and such-like) but I'm getting paranoid about memory usage and we only need the links when processing the 'A' records and can then free up the allocated memory. It also means we only have to deal with two input files in SETUP_ISSUE_IDX and BUILD_ISSUE_IDX rather than reading through all the magazine files each time.

Note that we check the cover scans first because there are many more of them (130,000 vs. 9,000) and, when adding the full-text links, we do a binary chop on the cover links part of what we've set up to save time (we initially did a sequential search and this module took 15 seconds or more). The about links are associated only with (main) feature records so are just added as separate entries. Note that the file containing the about links is not in a standard folder and hence is specified via the Index Configuration File.


REPORT_DIAGS - Report Extended Diagnostics (if required)

/************************************************************************/
/* */
/* REPORT_DIAGS - Report Extended Diagnostics (if required) */
/* */
/* Calling Format: */
/* */
/* REPORT_DIAGS (config_ptr, report_type); */
/* */
/* Where: */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* report_type = Type of data to report: */
/* = DMP_MAGLIST for Magazine File List */
/* = DMP_CONFIG for Configuration Data */
/* = IDX_ARTIST for Artist Index */
/* = IDX_BIOG for Biographical Notes */
/* = IDX_BOKAUT for Book Author Index */
/* = IDX_CHRON for Chronological Index */
/* = IDX_ISSUE for Issue Index */
/* = IDX_SERIES for Series Index */
/* = IDX_STYAUT for Story Author Index */
/* = IDX_FULLTXT for Full Text Index */
/* */
/************************************************************************/

This routine simpy checks to see if extended diagnostics are required and, if so, dumps out the array(s) associated with the report type specified.


SPLIT_TITLE - Split title field into constituent parts

/************************************************************************/
/* */
/* SPLIT_TITLE - Split title field into constituent parts */
/* */
/* Calling Format: */
/* */
/* SPLIT_TITLE (title_ptr, titlad_ptr, itemad_ptr, colhdr_ptr_ptr, */
/* divider_ptr_ptr, title_ptr_ptr) */
/* */
/* Where: */
/* */
/* buff_ptr = Returned pointer to formatted title */
/* title_ptr = Pointer to title field */
/* titlad_ptr = Pointer to title additional field */
/* itemad_ptr = Pointer to item additional field */
/* colhdr_ptr_ptr = Pointer to hold pointer to column title */
/* divider_ptr_ptr = Pointer to hold pointer to divider */
/* titl_ptr_ptr = Pointer to hold pointer to item title */
/* */
/* This routine combines the title and item additional fields with */
/* the title, intelligently handling sort titles, series dividers */
/* and story numbers, and returns pointers to the three elements of */
/* the title - the column/series name, the string to use as a divider */
/* and the item title (including any series sequence). The first */
/* two fields are returned as an empty string if not relevant. */
/* */
/************************************************************************/

This routine ...


TIDY_DNOTES - Tidy up contents of any 'D' Notes

/************************************************************************/
/* */
/* TIDY_DNOTES - Tidy up contents of any 'D' Notes */
/* */
/* Calling Format: */
/* */
/* outbuf_ptr = TIDY_DNOTES (config_ptr, inpbuf_ptr, prtfil_ptr); */
/* */
/* Where: */
/* */
/* outbuf_ptr = Pointer to buffer with tidied notes. */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* inpbuf_ptr = Pointer to input buffer */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine checks to see if the 'D' notes need any special */
/* formatting (e.g. "--- see under xxx). */
/* */
/************************************************************************/

This routine first checks to see if the notes start with the special prefix "--- see under " and, if so, check the next character to see what sort of link is required:

Publication details and magazine names are converted via FORMAT_ISSUE_LINK (with a ':' prefixed in the latter case); names are converted to internal format via TRANSLATE_AUTH and then converted via FORMAT_NAMES. Magazine links are output in italics; book links in bold and author links normally.

If the notes do not start with the special prefix then the routine simply calls the shared FORMAT_NOTES routine.


WRITE_INDEX_LINE - Write a line to the intermediate index(es)

/************************************************************************/
/* */
/* WRITE_INDEX_LINE - Write a line to the intermediate index(es) */
/* */
/* Calling Format: */
/* */
/* status = WRITE_INDEX_LINE (config_ptr, reqtype, text_ptr, */ /* prtfil_ptr); */ /* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* reqtype = Request Type: */
/* = IDXLIN_NORMAL for normal index line */
/* = IDXLIN_CONT for continuation line */
/* = IDXLIN_SPECIAL for special index lines */
/* = IDXLIN_TRAIL for trailers */
/* text_ptr = text to use for continuation or for special */
/* (if IDXLIN_CONT or IDXLIN_SPECIAL) */
/* = text to use for continuation otherwise */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/* This routine outputs a line to the Level 1 Index, throwing a new */
/* page and adding a line to the Level 2 Index, etc., if necessary. */
/* */
/************************************************************************/

This routine is called when a line is to be output to the intermediate (Level 1 to 3) Indexes and implements the 3-level index structure described in the Basic Index Structure - see the section on using WRITE_INDEX_LINE for an outline description of how to use the routine.

It outputs the relevant line to the Level 1 Index; if necessary it also throws a new page and adds a line to the Level 2 Index and, if also necessary, throws a new page in the Level 2 Index and adds a line to the Level 3 Index. In addition to the formal parameters, it relies on the following fields in config_ptr:

It operates in one of four modes depending on the contents of the reqtype parameter. By default this is set to IDXLIN_NORMAL indicating that a normal index line is to be output. If so:

If reqtype is set to IDXLIN_SPECIAL the routine continues exactly as above except that the line for the level 1 index is passed in the text_ptr parameter rather than being set up automatically (this is used for the Story and Book Title Indexes)

If reqtype is set to IDXLIN_CONT then we have a continuation line. In this case we still count a line in the level 1 index and output the continuation line, but do not throw any new pages.

If reqtype is set to IDXLIN_TRAIL then we need to clean up for this index and reset for the next index:


WRITE_PAGE_HEADER - Write a standard header to a file

/************************************************************************/
/* */
/* WRITE_PAGE_HEADER - Write a standard header to the specified file */
/* */
/* Calling Format: */
/* */
/* status = WRITE_PAGE_HEADER (config_ptr, outfil_ptr, pagttl_ptr, */
/* prvlnk_ptr, homlnk_ptr, prtfil_ptr);*/
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* outfil_ptr = File to write header to */
/* pagttl_ptr = Title of page */
/* prvlnk_ptr = Previous link (if NULL, none created) */
/* homlnk_ptr = Index home link (if NULL, none created) */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine simply reads through the Page Header boilerplate file (index_hdr.htm), making the necessary substitutions and outputting the result to the specified file. Note that, for performance reasons, the whole boilerplate file is stored in memory when it is first read to avoid unnecessary file I/O.


WRITE_PAGE_TRAILER - Write a standard trailer to a file

/************************************************************************/
/* */
/* WRITE_PAGE_TRAILER - Write a standard trailer to the specified file */
/* */
/* Calling Format: */
/* */
/* status = WRITE_PAGE_TRAILER (config_ptr, outfil_ptr, */
/* nxtlnk_ptr, homlnk_ptr, */
/* prtfil_ptr); */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* outfil_ptr = File to write header to */
/* nxtlnk_ptr = Next link (if NULL, none created) */
/* homlnk_ptr = Index home link (if NULL, none created) */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/
This routine simply reads through the Page Trailer boilerplate file (index_trl.htm), making the necessary substitutions and outputting the result to the specified file. Note that, for performance reasons, the whole boilerplate file is stored in memory when it is first read to avoid unnecessary file I/O.

WRITE_PUB_INFO - Format and output all "Pub Info" records

/************************************************************************/
/* */
/* WRITE_PUB_INFO - Format and output all "Pub Info" records */
/* */
/* Calling Format: */
/* */
/* status = WRITE_PUB_INFO (config_ptr, inpfil_ptr, outfil_ptr, */
/* nxtrec_ptr_ptr, prtfil_ptr) */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* inpfil_ptr = Pointer to input file */
/* outfil_ptr = File to write header to */
/* nxtrec_ptr_ptr = Pointer to buffer to hold pointer to record */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine ...


WRITE_STORY_ITEM - Write a single story item group

/************************************************************************/
/* */
/* WRITE_STORY_ITEM - Write a single story item group */
/* */
/* Calling Format: */
/* */
/* status = WRITE_STORY_ITEM (config_ptr, outfil_ptr, fldbuf_ptr, */
/* fldbufsiz, scanitem_ptr, newpage_flg,*/
/* prtfil_ptr) */
/* */
/* Where: */
/* */
/* status = Result of operation: */
/* = PSP_TRUE if OK; else PSP_FALSE */
/* config_ptr = Pointer to CONFIG_DATA structure */
/* outfil_ptr = File to write header to */
/* fldbuf_ptr = Pointer to buffer for current record */
/* fldbufsize = Size of buffer for current record */
/* scanitem_ptr = Pointer to SCANITEM structure for current record */
/* newpage_flg = PSP_TRUE if we've just thrown a new page */
/* = PSP_FALSE otherwise */
/* prtfil_ptr = Pointer to diagnostics file (may be NULL) */
/* */
/************************************************************************/

This routine formats and outputs all records which have the same author, author type, title, coauthors, secondary names and ED notes. While this seems fairly simple, it has turned out to be fiendishly complex for a number of reasons, primarily to do with aggregation (see the discussion under Notes on Some (Past) Problem Areas for some examples). For simple, uncomplicated, items the intent is simply to display the first appearance of an item, followed by any reprint information on subsequent lines (in chronological order) as in:

In this simple case, we have entries in the sorted data for the original appearance as well as the reprints, each of which identifies the original appearance. However, there are cases where all we have is an instance defining the reprint appearance which may or may not identify the original appearance, as in (in the WFI):

Clearly, in this case, we need to output the first line (even if the first appearance is unknown) as part of processing the second line, while in the previous case above we want to suppress the first line for the reprints as we have already displayed it. We handle this by checking the pubdet_ptr field in each scanitem structure to see if it is the same as the previous one.

There are even (obscure) cases where we have (vague) original appearance data without any reprint information, as in:

in which case we need to output the first line but not the second line even though we are dealing with a reprint record (in this case magid_ptr is empty).

However, if the item is a serial (say) then we don't want to list each part separately so we try to aggregate all the instances into a single line so that, if the above were serialised over three issues, it might appear as:

There are several points to note about this simple case:

To make life even more interesting, items appear as a single item in their original appearance but multiple items when reprinted or vice versa (note that the first of these is particularly common if the original appearance lies outside the current index), as in (in the WFI):

or:

This latter highlights yet another problem with aggregation, albeit one new to the v2 indexes, - i.e. bylines. Although this example is fairly straightforward, when we are talking about a long column, it is possible that the column will appear under multiple bylines (the most common example known being when an editor uses their initials for some instances). In a perfect world we would list these separately, but that would mean sorting by byline_ptr before dtpubl_ptr which would disrupt the "normal entries". We might be able to "put aside" any with a different byline while we process the current set or come up with an alternative (e.g. append the byline to only those issues that differ, which is fine as long as there is isn't an overall byline to display as well) - all this has been left as an exercise for later: for now we just (try to) ignore such differences.

To try to manage the above, the routine has a bewildering number of flags it sets and checks:

The first part of the routine simply sets up the item header that we need whether we have one item or a thousand:

Note that this potentially causes problems with book reviews (see the discussion under Notes on Some (Past) Problem Areas below) as we might want to change the byline in the itme header within a single set of results. We bodge this by saving the offsets in the itemheader before and after we add the byline so that we can reset it when needed.

As we read them in we organise them into groups using a set of interlinked structures:

The routine first sets up the entries for the first item (i.e. the one in scanitem_ptr when the routine was first called) - in the vast majority of cases this will be the only item in the set. It then reads the next item and checks to see if it:

If all these are true then it aggregates the item with the previous one(s). It first checks to see if it can be added to an existing group, i.e.:

If not, it create a new group. It then creates an entry in the Item Array (and possibly Item Extras Array) and links it to the matched or new group. It then goes back to read the next item and repeats the above.

Once we have all the arrays set up, we need to tidy them up and sort them into the order necessary.

First we look at each reprint group (edition=2 or book_type=3) and try to find a prior group with an item with matching publication details. Generally there will always be one such because the code in SETUP_STYAUT_IDX always tries to create one, and the presence of the edition in the sort order will sort it before the reprints (and hence the group will be set up earlier). If it finds a match then it saves the Group ID as the Parent ID.

It then runs through the groups again, looking for original groups (edition=1 or book_type=1) or an orphaned reprint (i.e. parent_sub = -1) adding each to the ordered list of groups. For each group it finds it then checks the remaining groups for which it is parent and adds them next before moving on. Thus, for a complex set-up we should end up with:

We then step through each group in turn. If the original title


Notes on Some (Past) Problem Areas

Aggregation Problems

Many of the items in the database effectively contain two different pieces of publication data – details of the current publication and of the original publication – although obviously in many case these are the same and/or the second one is missing. In many/most cases there is also a record in the database for the original publication so we don't need to worry about it, but there are many instances where this is not the case.

If the original publication was under a different byline or title then this has always been catered for in FLUSH_SORT as we would need records in different areas of the sort hierarchy. However, if this was not the case, originally the issue was posponed until WRITE_STORY_ITEM when the code checked to see if we already had the original publication and, if not, output it. This works fine for simple cases, but led to problems in aggregation for two reasons:

A brief attempt was made to fix this by creating new records in FLUSH_SORT for such records but this seemed to introduce more problems (not least a major increase in the size of the files) so a second attempt (in 2024) focussed on SETUP_STYAUT_IDX. Having sorted the data into the order required for the STYAUT (i.e. Names) index, the routine then read through the data sequentially and for each set of records for a given item checked that we had a record for the original publication and, if not, added one. This worked pretty well, but there were three immediate complications:

Even with the above fix there was still a problem with book reviews. Initially there was an additional check in the aggregation check to see if the next item "was either not about the specified author or had the same byline on the piece about the author". This works fine when the only reviews of a given book are by the same author. However, if the same book is reviewed by multiple different authors, and those reviews are then reprinted elsewhere (very common in British Reprint Editions of SF magazines) this breaks as all the original editions are sorted before all the reprint editions so that for each review the original and the reprint are probably not adjacent and hence end up in separate groups. Both before and after the above fix this meant the same review was listed twice, separated by other reviews of the same book by other people.

This was fixed by removing the check in WRITE_STORY_ITEM, when aggregating items for a "subject" that they were all by the same byline and then handling this via the group mechanism as discussed under WRITE_STORY_ITEM.