Desirable Attributes for Interactive Identification Programs

Dallwitz, M. J., Paine, T. A. and Zurcher, E. J. (1998). Interactive keys. In ‘Information Technology, Plant Pathology and Biodiversity’, pp. 201-212. (Eds P. Bridge, P. Jeffries, D. R. Morse, and P. R. Scott.) (CAB International: Wallingford.)
Error tolerance. The ability to reach a correct identification after errors have been made, or if there are errors in the data. Unrestricted character use. The absence of restrictions on the order in which characters can be used (apart from restrictions imposed by character dependencies - see below). Numeric characters. Whether numeric characters can be used directly (without dividing them into ranges). Multiple state selection. Whether the user can specify uncertainty by entering more than one state value, or a range of numeric values. ‘Best’ characters. Whether the program can advise on the most suitable characters for use at any stage of an identification. (Undesirable limitations: a lack of flexibility in this area. Examples: Inability to handle numeric characters. Recommendations built into the data, as in a conventional key or a rule-based expert system.) Character weighting. Whether character weights can be used in the calculation of ‘best’ characters. (Undesirable limitations: higher weights always implying ‘better’ characters, regardless of other considerations.) Character deletion/changing. Whether characters used in an identification can be removed, or their values changed. (Undesirable limitations: removal only in the reverse order of use.) Locating errors. Whether the program can locate user and/or data errors which were circumvented by the error tolerance mechanism. Gaps for integer numeric characters. Whether recorded values for integer numeric characters can contain gaps, e.g. ‘5 or 10’ distinguishable from ‘5 to 10’. Character dependencies. Whether the program is aware of character dependencies, i.e., characters which are inapplicable when other characters take certain values. No dependency restrictions. Whether there are restrictions on the order in which dependent/controlling characters may be used. Characters for separating a particular taxon. Whether characters can be ranked according to how well they separate a given taxon from the rest. Inapplicable/unknown. Whether inapplicable values, including those not resulting from character dependencies, are distinguished from unknown values. Text characters. Whether free-text information about taxa can be stored and searched. Expanded ranges for numeric characters. Whether single numeric values in the original data can be treated as ranges for identification purposes. (Undesirable limitations: the transformation not being under the control of the user of the key (as in the ABSOLUTE/PERCENTAGE ERROR mechanisms in Confor/Intkey).) Flagging of misapplied character values. Whether there is possible to flag character values which are likely to be assigned to a taxon in error. (Undesirable limitations: the use of the flagged values not being under the control of the user of the key.) Probabilistic identification. Whether the program can use probabilistic identification methods. Global restriction to subsets. Whether it is possible to specify subsets of characters and taxa to which all subsequent operations will be restricted. Local restriction to subsets. Whether it is possible to specify subsets of characters and taxa for the operation of a single command. Fixing character values. Whether it is possible to specify character values which are not to be cleared when a new identification is started. Named subsets of the characters and taxa. Whether there is a mechanism for referring to subsets of the characters and taxa. (Undesirable limitations: subsets being built into the identification package, and not definable by the user.) Character notes. Whether extensive text to aid interpretation of characters can be conveniently available within the system. Glossaries. Whether definitions of terminology can be conveniently available within the system. Information retrieval. Whether the program can be used for information retrieval (i.e. finding all taxa which have certain combinations of attributes). Descriptions. Whether the program can display descriptions generated from the data used in identification. Differences between taxa. Whether the program can find the differences between members of a set of taxa. (Undesirable limitations: restrictions on the size of the set of taxa.) Similarities between taxa. Whether the program can find the similarities between members of a set of taxa. Diagnostic descriptions. Whether the program can find diagnostic descriptions. (Undesirable limitations: inability to distinguish between taxon and specimen diagnostic descriptions; inability to restrict the choice of characters to those not used in the current identification; inability to set the strength of the descriptions (e.g. DiagLevel in Intkey).) Control of value matching. Whether the user has control over whether overlapping, unknown, and inapplicable values are deemed to match other values. (Undesirable limitations: limited control, e.g. ‘identification’ vs. ‘information retrieval’ settings.) Character-value distributions. Whether the program can display the distribution of character values within a set of taxa. Searching the character list. Whether the program can find text strings in the character list. Searching for taxon names. Whether the program can search the taxon names and synonyms. (Undesirable limitation: separate searching for correct taxon names and synonyms.) Character illustrations. Whether illustrations of characters can be displayed. State selection from character illustrations. Whether character state values can be selected from illustration screens during identification. Taxon illustrations. Whether illustrations of taxa can be displayed. Flexible display of illustrations. Whether illustrations of any size can be scaled, scrolled, repositioned, and displayed simultaneously. Text on illustrations. Whether text can be superimposed on illustrations (instead of being built into the illustrations). (Legible text after scaling, possibility of multiple languages.) Running without illustrations. Whether a package containing illustrations can be used without them. Import DELTA format. Whether DELTA-format data can be used to create the interactive system. Export DELTA format. Whether DELTA-format data can be exported from the interactive system. Links with description writing. Whether publication-quality descriptions can be generated from the same data that are used to construct the identification system. Links with key generation. Whether conventional keys can be generated from the same data that are used to construct the identification system. Links with classification. Whether cladistic and phenetic analyses can be carried out from the same data that are used to construct the identification system. Command files or macros. Whether there is a mechanism for storing and repeating a series of operations. Log files. Whether it is possible to create a file showing the history (input and output) of a session. Data output. Whether it is possible to output program results in forms suitable for input to other programs. Online help. Whether the program has complete, built-in help. (Undesirable limitations: help is not context sensitive.) External program text. Whether the program text (commands, help, messages, etc.) is external to the program, allowing easy creation and use of different language versions. Unlimited field lengths. Whether the lengths of text and other fields (e.g. taxon names, text of characters, character notes, number of character states) are unlimited. Unlimited data size. Whether the numbers of characters and taxa are unlimited. No special memory requirements. Whether the program will run with the minimum amount of memory normally needed to run the operating system (including dependence on data size, if applicable). Execution speed. Execution times of representative operations on a reasonably large data set (e.g. 200 characters, 400 taxa). Internet capability. Whether the program can access data and images over the Internet. Simple user interface. Whether the user interface is simple, efficient, and consistent, and provides an easy transition from use of basic features to full functionality.