ICTVdB - A Universal Virus Database

Descriptions of ICTV Approved Virus Taxa

C. Büchen-Osmond and M. J. Dallwitz

(cf. Arch. Virol.(1996). "Towards a universal virus database -progress in ICTVdB" 141: 392-399, for full length version of this introduction to the ICTVdB project.)


Contents:


Origins and Goals

In 1991, the Executive Committee of the ICTV committed itself to develop a universal virus database (ICTVdB), in response to, among other things, a petition brought forward by the American Type Culture Collection (ATCC). Drafted during a workshop in March 1990, the petition from leading virologists and database experts agreed on the need for a database for all viruses, which would be in conformity with the ICTV Reports and prepared under the auspices of the Committee. This paper concentrates on progress since that reported at the CODATA conference, Chambéry September 1994 (Büchen-Osmond et al; 1996)

The goal of the ICTVdB is to describe all viruses of animals (vertebrates, invertebrates, protozoa), plants (higher plants and algae), bacteria, fungi, and archaea from the family level down to strains and isolates. The lower levels of classification have important applications in medicine and agriculture, but also give insight into evolutionary trends. The database will thus benefit research and applications at all levels of expertise.

The Language

The ICTV decided that the database should use the DELTA system, a DEscription Language for TAxonomy developed by Dallwitz (1980). The DELTA system is an integrated set of programs based on the DELTA format. The DELTA format is a flexible and powerful method of recording taxonomic descriptions for computer processing.

The DELTA system is capable of producing high-quality printed descriptions as well as descriptions in HyperText Markup Language (HTML) for display on the Web. DELTA data can include any amount of text to qualify or amplify the coded information, and this text can be carried through into the descriptions. Common features can be omitted from the data and the descriptions, while remaining available for identification and analysis. These attributes are exemplified in books such as Viruses of Plants in Australia (Büchen-Osmond et al, 1988) and The Grass Genera of the World (Watson & Dallwitz,1992), which were generated automatically from DELTA databases.

The DELTA system is particularly useful in the international context envisaged by ICTV. For example, Intkey packages can be prepared in different spoken languages simply by translating the character list. Intkey is particularly easy to translate into other languages, as all of the program text (menus, commands, prompts, diagnostic messages, and help) are in simple text files separate from the program files. English, French, German, Malay, Portuguese, and Spanish versions are currently available.

Decimal code - identifiers in virus classification

Early in the development of ICTVdB it became clear that it is often difficult to identify uniquely a virus, because names used at the level of family, genus and species can be very similar, and the vernacular name usually does not indicate the family or genus to which the virus belongs. Thus, for the purposes of the database it was found it to be convenient to use a decimal numbering system similar to that used for enzyme nomenclature (Büchen-Osmond et al., 1996). The families have been sorted in alphabetical order and each has been assigned a number which represents a particular family, or genus, if the genus is not yet assigned to a family (Table 1). The system can carry more levels to accommodate strains and isolates, and in common with present practice in genomic and protein databases, new families, revisions etc, are added as they appear without alphabetical consideration
Table 1: Decimal classification illustrated from family to species in the Parvoviridae


This type of numbering system allows the user to follow the path linking the various features of one genus or family, and permits the presentation of similarities between different groups at more than one level. The numbering system also gives an internal structure to the database that indicates the descriptors needed for completing the coding of a specific virus or data set. With this numbering system the database is structured such that we do not need to repeat the same information between levels. When calling up a description of a strain we can supply the pointers to the next higher level where the full description of the species is stored as illustrated in Table 2.


Table 2: Natural language excerpt from ICTVdB illustrating the economic, non-redundant accumulation of information using the decimalised virus nomenclature.

50. Parvoviridae

00.050.1. Parvovirinae

00.050.1.01. Parvovirus

00.050.1.01.001. Minute virus of mice


The numbers assigned to each virus also serve as locators numbers within the database and as an unchanging reference. The locator number is easily transformed into a file accession number by adding htm after the accession number example of the family Parvoviridae where the family level becomes 00.050.htm , the genus level, that is the genus Parvovirus, 00.050.1.01.htm, and for the type species Minute virus of mice 00.050.1.01.001.htm. These accession numbers are used throughout the ICTVdB as file names to access the computer generated virus descriptions. Thus they can also be used as pointers to link from other databases to a particular virus description at this server.

The Character Lists

Present State of Database

Standardised characters for particle morphology and genome properties have been used to code all virus families and genera including their type species. The description of the morphology has been tackled first because the shape of the virion is the most discriminatory character for identifying to which genus a virus belongs. At this stage of development, the descriptions also contain genome sequence accession numbers and reference lists. Thus far, only a few descriptions accommodate images (mainly electron micrographs of the virus particle), but it is planned to include many more images, such as gene maps, distribution maps, images of host and vector, images of symptoms and histopathology, as the database grows. Primary lists of characters exist for most other particle and biological properties and are in the process of culling, amalgamation, sorting and reviewing.

ICTVdB on the World Wide Web

Recognising that all aspects of the generation and maintenance of the ICTVdB will be greatly facilitated by using the World Wide Web (WWW), over the last year efforts have been directed to make the database available on the WWW. The natural language translations of the descriptions of all families and genera, including type species treated thus far in ICTVdB, are available on index.htm. They are presented with a few examples of electron micrographs of virus particles. In cases where genome sequence data are available, links to EMBL/GenBank have been established. Links to other databases on the Web relevant to the ICTVdB are listed and as the project progresses many more links to other databases will be established. In the process of its construction, ICTVdB on the WWW will be a fully federated database (Wertheim, 1995).

Layout and WWW access of the ICTVdB

WWW browser. The ICTVdB Web site has been set-up using Netscape, but other viewing software can be used, although the positioning of images and the layout might not be optimal.

Home Page - Introduction to the virus databases on-line. The user can best access the ICTVdB Web site through the

Home Page which also introduces other virus databases on-line that have been developed by researchers in the Molecular Evolution and Systematics Group, Research School of Biological Sciences, Australian National University. The information belonging to the ICTVdB project is marked with the ICTV logo, and is contained in about 800 files. Other databases on the Web relevant to the ICTVdB and which have been used to establish links, for example, to electron micrographs or genomic sequence data, are also listed on the Home Page.

Indexes. The ability to select records is a quintessential feature of a functional database. As shown in Figure 1, in the ICTVdB Web site facilitates this by making data accessible both alphabetically, in the

Index of Viruses (the complete lists of virus families, genera and species from the VIth ICTV Report) and deci-numerically in the ICTVdB Index. For example, the Index of Viruses may be searched from a species name or synonym in alphabetical order, or from a family name in alphabetical order, by host range , or from nucleic acid composition. As pointed out above, classification beyond family in the ICTVdB is numerically based following alphabetic listing. It is clear in Figure 1 that a variety of paths all lead to the current natural language translation of the ICTVdB descriptions.

The Index of Viruses is the main entry point to the Classification and Nomenclature section of the ICTVdB. It provides easy access to the records describing an individual virus, as well as to information on its relatives. In addition to the Index of Viruses there is an index to the virus descriptions that have been generated from DELTA records. This index is automatically regenerated and updated every time new information is added to the database and translated into hypertext.

Figure 1: ICTVdB on the Web. This diagram displays the various index files that guide the browser through the different parts of the ICTVdB to the virus descriptions.

Thus Figure 1 summarises the present functional basis of ICTVdB, which will be greatly expanded by future data input, and improved interoperability with other related databases. Already, in cases where the genomic sequence accession number for a virus species is listed, links to either GenBank (NCBI) or EMBL (EBI) have been established. It is planned to have mirror sites in Europe and North America, so that the access time can be reduced for the user and the mirror sites will have links to appropriate genome databank. Figure 1 further indicates that it is planned to add a search engine to improve the speed and flexibility with which a particular virus can be retrieved, even if the user does not know the correct name.

Intkey. In future, Intkey will be available to interrogate the database on the Web. Using Intkey, a virus can be identified by comparing its attributes with stored descriptions of taxa. At present, the required data files and images must be down loaded to the users PC (via ftp, gopher, WWW), but a future version of Intkey will be able to access these files directly from the WWW. Images which are part of the database are used by Intkey. All images in the database can be accessed via hyperlinks from other databases. By the same token, images from other databases can be linked to the ICTVdB and will thus become accessible through the database, without becoming physically a part of it.

The Tasks Ahead

The progress in assembly of the ICTVdB from now on is critically dependent on data input from the virological community. The ICTV Data Subcommittee and others, meeting in Utrecht February 1994, decided that the ICTVdB should be maintained by a coordinator (C. Büchen-Osmond). Although the coordinator has taken the initiative to prove the concept of ICTVdB on the Web, this role must now change. In future, all those submitting or reviewing data will become authors of those descriptions. Data and images must be provided by the virology community.

The WWW format should now greatly facilitate the formerly cumbersome process of data acquisition by printed questionnaires, posted to the potential suppliers of data. This method, used to construct the VIDE plant virus database (eg. Büchen-Osmond et al., 1988) often did not attract enthusiastic cooperation of colleagues through the sheer complexity of the questionnaires, and the repetitive handling of data.

The first task will be to devise a new way of data acquisition. To this end, an electronic questionnaire based on standardised characters, will be devised as the primary data sheet. It will contain a highly structured index of keywords and headings to the different sections within the character list, that are collapsible or expandable. From previous experience we know that it helps greatly if the questionnaire contains already available data, the expert recipient being invited to fill in the gaps, and review existing data. It also helps if expert opinion is restricted to characters appropriate to the particular virus family. The data will be transformed into DELTA format and will go through the same reviewing process engaged for any description prepared for the ICTV Reports. Only after the reviewing committee is satisfied with the new submission, the coded description will be placed permanently into the database. The Web accessibility of the virus descriptions at all stages of preparation will also facilitate the reviewing process by the Study Groups of the ICTV.

The second task is to maintain and coordinate data acquisition and entry. Data entries must be checked and regularly updated, to take account of developments in the field. However, the coordinator will not be able to keep up with all the latest movements in virus research. New findings must be provided to the ICTVdB from the virological community regularly. This is essential if the database is to provide the community with a reliable up-to-date source of information.

A third task is to use ICTVdB to generate future ICTV Reports on the Nomenclature and Classification of Viruses, now laboriously compiled by the ICTV Study Groups. It is envisaged that the future Reports will be generated from the DELTA database. In future, most of the descriptions in the database will be of species and strains, as a DELTA program can summarise the characteristics of all species of one genus, for example, thus generating an accurate summary that will reflect much more objectively the features of a genus. The description of genera and families in the future reports can be based on these summaries.

Judging from recent comments on other microbial databases (Wertheim, 1995), the ICTVdB may be already one of the most advanced, interoperable databases in biology, in structural terms at least. A major effort is now required to complete the database, drawing on the expertise of the virological community as a whole. The future success of ICTVdB thus depends heavily on the help and goodwill of all virologists, and the continued enthusiasm of the experts in the ICTV Study Groups.

Acknowledgement

Progress thus far has depended heavily on the sponsorship of this project by Lois Blaine at the ATCC, and Adrian Gibbs at ANU. The preparation of a standardised character list has been supported by an NSF Grant (DIR-91- 07464) to Lois Blaine.

References

Büchen-Osmond C, Crabtree K, Gibbs A, Maclean G (1988). Viruses of Plants in Australia (ISBN 0 7315 0460 7) Australian National University, Canberra, 590 pp.

Büchen-Osmond C, Blaine L, Horzinek MC (1996). In: Data and Knowledge in a Changing World: The Quest for a Healthier Environment; Chambéry, 94 CODATA Conference, Ed PS Glaeser. CODATA, Paris 8 pp (in press).

Büchen-Osmond C (1995) http://life.anu.edu.au/viruses/welcome.htm

Dallwitz MJ (1980). Taxon 29, 41-46.

Dallwitz MJ, Paine TA, Zurcher EJ (1993). DELTA User's Guide: a general system for processing taxonomic descriptions. 4th edition, CSIRO, Canberra. 136 pp.

Murhpy FA, Fauquet CM, Mayo MA, Jarvis AW, Ghabrial SA, Summers MD, Martelli GP, Bishop DHL (1995). Sixth Report on the International Committee on Taxonomy of Viruses. Springer Verlag, Wien, New York

Watson L, Dallwitz MJ (1992). Grass Genera of the World. CABI, Wallingford 1083 pp.

Wertheim M (1995). Science 269, 1516.



© 1995-1999 Cornelia Büchen-Osmond. All rights reserved. Created: April 1995 Last updated: 20 July 1999.