Information Infrastructure for Vegetation Science

An information infrastructure for Vegetation Science:
Project overview and progress report

Project Directors:

Robert K. Peet, University of North Carolina at Chapel Hill
John Harris, National Center for Ecological Analysis and Synthesis
Dennis Grossman, Association of Biological Information
Michael Jennings, USGS Gap Analysis Program, Univ. California Santa Barbara
Marilyn D. Walker, USDA Forest Service, University of Alaska

1. Project overview
2. Data flow and project objectives
3. The process
4. Plants database
            4.1    Concept-based taxonomy
            4.2    The NCEAS taxonomic data model
5. Classification database
6. Plots database
            6.1 Plot attributes
            6.2 Plot observations
            6.3 Plot contributors
            6.4    Plot classification
            6.5    Supporting functions
            6.6    Physical design and implementation
7. Interface tools
            7.1 Plot interpretation
            7.2 Plot extraction
            7.3 Plot submission
8. Demonstration datasets
            8.1 Plants database
            8.2 Plots database
            8.3 Classification database

Notes:

This document has been written with the expectation that the text will form the core of a new project website to be hosted by NCEAS.
In the following discussion, database Tables are indicated by bold italics in blue and table fields are indicated by lower case standard, and hotlinks are indicated by underlined blue.
Certain linked tables and figures require a userid and password. If prompted, login = vegclass & password = veg2data

1. Project overview

Vegetation classification is of central importance to biological conservation for planning and inventory, to resource management for monitoring and planning, and to basic scientific research as a tool for organizing and interpreting ecological information. All of these activities require that ecological units be defined and that their distributions on the landscape be known and understood. Vegetation classification contributes significantly to analysis of ecological problems that vary in scale from persistence of tiny populations of endangered species to global projections of the impact of climate change. Technological advances have made practical large-scale analyses that cross agency jurisdictions or geographic regions and address applied ecological issues as diverse as global change, ecosystem management, and conservation planning. However, all such efforts depend on having available a common set of well defined and broadly accepted classification units.

Through the combined efforts of The Association of Biological Information (ABI, an offshoot of The Nature Conservancy), the Ecological Society of America Vegetation Panel (ESA-VP), and the Federal Geographic Data Committee (FGDC), the United States is on the verge of having its first fully functional, widely-applied vegetation classification system. The federal government has declared the need for a single standard, and on October 22, 1997, the Secretary of Interior, acting as Chair of the Federal Geographic Data Committee, approved the Vegetation Information and Classification Standard, which is now the standard vegetation classification for U.S. federal agencies and their cooperators. Yet, there are still major obstacles to overcome to make such a system operational and broadly accepted. ESA-VP is working in close collaboration with ABI and FGDC to draft standards for field data acquisition, type definition, and peer review of proposed additions and changes. The current draft standards are available for public comment. A core component still missing is an information infrastructure to manage the anticipated 10⁷ plots and 10⁴ plant associations required for a national system, and to distribute this information across the web in a continually revised but perfectly archived format. This represents a major intellectual and practical obstacle to the realization of the system. The major goal of this NCEAS-based project is to create a working prototype of the requisite information system for potential adoption by federal agencies and partner organizations.

2. Data flow and project objectives

To understand the functions of the required information infrastructure for vegetation science, it is necessary to understand the flow of information through the system.

Vegetation scientists observe and record vegetation in the field. The fundamental unit of vegetation observation is the plot. At a minimum, a plot contains information on location, spatial extent, and the species present with cover values as a measure of importance. A Vegetation Plots Database is needed to store, preserve and distribute plot data that meet recognized minimum standards.

Plots are used to classify vegetation. Eventually, each vegetation type will be documented in a Vegetation Classification Database, which will refer to the plots used to develop the types. Proposals to change the classification will be required to be based on plots. All of these plots will be referenced in the Vegetation Classification Database and will be available from the Vegetation Plots Database.
The resultant Vegetation Classification Database will be viewable over the web, will be continuously updated, and will be constructed so that the content on any given date will be viewable so as to allow citation in literature and legal contracts.
All stages in this information flow contain references to plant names, but use of different standards for different plots weakens our ability to compare and synthesize plot data. A concept-based Taxon Database (embedding some of the principles espoused in Berendsohn 1995) is needed to avoid these problems.

We are working to design, construct and test prototypes of the three core components of the information infrastructure necessary to support the U.S. National Vegetation Classification (US-NVC). A stand-alone Vegetation Plots Database with internet access tools will be developed, as will a Windows 2000-based desktop version for private/institutional use. A prototype of a linked concept-based Taxon Database compliant with anticipated FGDC, ITIS, and IOPI standards will be developed and populated with data drawn from the ITIS database. A prototype of a concept-based Vegetation Classification Database compliant with anticipated FGDC standards will be developed and populated with data from ABI. The databases will be designed so as to allow continual revision while simultaneously being perfectly archived so as to allow citation in literature and legal documents. We anticipate that once developed and peer reviewed, the modules of the working prototype will be adopted and maintained by some combination of ABI, USGS (NBII), FGDC and ESA.

3. The process

We have engaged in a four-stage process for database development. The first stage (July 1999-June 2000) largely consisted of a period of scoping and organizing. This included a major meeting held at NCEAS in October 1999 for the purpose introducing the major participants to the project’s background and goals, and for refining the major objectives of the project, the lists of participants, the budget, and the schedule. The kickoff meeting was followed by a search for a staff programmer, which successfully concluded with John Harris starting employment at NCEAS in early July 2000.

The second stage of the project (July 2000-January 2001) focused on design and construction of the core plots database. Toward this end back-to-back meetings were held at NCEAS in September 2000. The first, attended primarily by active field ecologists, focused on refinement of the database requirements first articulated in Chapter 3 of the Vegetation Panel draft standards document. The second meeting was attended primarily by data modelers and overlapped the first meeting for a day, thereby facilitating communication between the two groups. The plots data model, initially developed in September and subsequently revised through an iterative process involving various participants from the September meetings, is summarized in section 6 of this document.

Simultaneous with stage two, we conducted a parallel design study for a concept-based taxonomic database. Although it is not our plan for the NCEAS working group to fully develop and implement a national taxonomic database, we consider creation of such a database essential for our planned vegetation information infrastructure, and especially for the process of incorporating legacy plot data collected at different times by different parties using different taxonomic standards. Accordingly, we are collaborating with partner organizations, aiding in the design and implementation of this key database. Toward this end potential database architecture was discussed at the September NCEAS meetings, and Steve Taswell (ABI) and Robert Peet subsequently collaborated to design a data architecture for possible implementation by both ABI and ITIS. Peet and Taswell then joined Janet Gomon (representing ITIS and the FGDC subcommittee on Biological Nomenclature) to host a meeting at the Smithsonian (November 2000) to review the Taswell-Peet model as a candidate for adoption by ITIS and FDGC as a federal standard. Four products are anticipated from this initiative. Peet and Taswell will draft a formal proposal for an FGDC standard for biological nomenclature, and will write for publication a synthetic treatment that compares competing data models for taxonomic databases. ABI will build a nation-wide, concept-based database for organisms as a component of their new HDMS database system (Heritage Data Management System). Finally, Harris and Peet will complete development of a demonstration prototype national plants database employing the proposed FGDC standard.

Stage three (February - September 2001) will focus on design and development of the database interface tools (including the internet-based toolbox and the desk-top resident tools) for both loading data into and retrieving data from the Vegetation Plots Database. An initial meeting in January 2001 will specify the primary interface requirements. This will be followed by probably two small workshops where vegetation scientists critique and refine prototype interface tools.

The fourth stage of the project (October 2001 - June 2002) will consist of testing and refinement of the database and iterative developmental improvements of tools for data preparation, data loading, database querying, and manipulation/analysis of extracted data.

4. Plants database

4.1 Concept-based taxonomy
Use of a plant name does not necessarily convey accurate information on the taxonomic concept employed by the user of that name. As a simple example, when a taxon is split into two taxa, one of the two retains the original name with the consequence that use of that name in the future is ambiguous unless additional information is given. A more complex situation derives from the fact that different works recognize different sets of taxa and the concepts behind those taxa are often deducible only by knowledge of what other taxa are recognized by the author for the region treated, as is the case with the current North American checklists of Kartesz & Meacham (1999), the USDA Plants Database, and ITIS.

In order for complete information to be conveyed about the identification of a taxon, it is necessary to list both a name and a reference, which together convey the necessary information about the application of the name. For example, one might write Aristida stricta sensu Peet 1993 to distinguish the narrow concept of that species from the broader version conveyed by Aristida strica sensu Hitchcock 1950 or its synonym Aristida stricta sensu Radford, Ahles & Bell 1968. This linkage of name and use has been variously called “potential taxon” (Berendsohn 1995) and “assertion” (Pyle 2000), among other appellations, and will be an essential component of any database system that links datasets containing references to taxa recognized by multiple workers at multiple times in multiple places using multiple taxonomic standards (such as the Vegetation Plots Database).

The Vegetation Plots Database and the Vegetation Classification Database will both need to contain species names and concepts drawn from a national standard database. ABI expects to implement the necessary data architecture in HDMS, and ITIS (together with USDA Plants) and FGDC are both interested in a U.S. federal standard implementation. We will implement a stripped-down version of this general model as a module in our database system. Local implementation will be necessary for local stand-alone versions of the database, as well as to substitute for a national standard until such a standard is available. Once the national standard becomes available, we will refresh our taxonomic tables routinely using downloads from or queries to that standard, be it HDMS, It is, or some other database. Until that time we will populate our database with information extracted from the 1996 and 1999 implementations of the ITIS list.

4.2 The NCEAS taxonomic data model
This section will be completed as soon as ABI completes its ongoing reevaluation of the database architecture to be used in the first version of HDMS. Meanwhile, we have a working prototype of the database based on the architecture as anticipated in early October. We have populated the prototype with complete data for the genus Carya in North Ameirca representing the perspectives of two parties, ITIS-Kartesz, and Flora North America, with the ITIS-Kartesz component representing four distinct temporal views (Kartesz 1980, Kartesz-1994, ITIS 1996 and ITIS 1999).

The October version of the plant taxonomy database architecture is shown in Table 4. Plant names are stored in a table called name. Taxon concepts are stored in circumscription, which points to a reference, which in turn points to both a literature citation and the speciesName used in the reference. This couplet is sufficient to identify a taxon concept. The name that currently applies to a concept (in the perception of a party) is mapped in the usage table, which contains a start and stop data for that name application. A particular party's view as to the status of a particular taxon concept (e.g., recognized, nonstandard) and the start and stop dates for that status are recorded in status. The standard taxon concept that a nonstandard concept maps to (in the perspective of a party) is identified through the contents of correlation, together with an indication of the degree of convergence (equal, larger, smaller, overlapping).

5. Classification database

The Vegetation Classification Database will be a component of the new HDMS database system of ABI and will be managed and maintained by ABI for the foreseeable future. Our NCEAS-based group is collaborating with ABI in development of requirements for that database system.

In order for our prototype information system consisting of independent, linked databases to be fully functional prior to the implementation of HDMS, and in order for local implementations of our database system to be functional in isolation, it is necessary to include a basic version of the Vegetation Classification Database in our information system (see Figure 5). All Alliances and Associations recognized as occurring in the U.S. will be included so that the plots database can be searched for representatives of those types, and so that summary reports can reference those types. Little of the supporting data will be included; instead, those data will be drawn from HDMS (or a mirror of HDMS) as needed. We anticipate that the Vegetation Classification Database will be a concept-based system of the same general design as the Taxon Database. The tables in our implementation will be periodically refreshed by downloads from or queries to HDMS (or its mirror).

The fundamental units of vegetation classification will be stored in the communityConcept table (Figure 5). Unlike the situation with plant taxonomy, community concepts are not available in traditional primary literature. Instead, the detailed descriptions will reside in HDMS. We expect to place a small portion of this core information in a table called CommunityDescription, which will be a child of the communityConcept table.

Community concepts will be listed in a communityConcept table, which will include fields for conceptStatus (recognized, nonstandard), start and stop dates, and party. Relationships among community concepts will be tracked by a communityCorrelation table and a communityLineage table. All levels of the classification hierarchy will be stored in communityConcept, with hierarchy level tracked via a field called level, and the hierarchical relationships tracked via a recursive loop defined by a foreign key to the next higher level in the hierarchy. The community names will be stored in a communityName table, and mapped to concept through a communityUsage table (which, as with the taxonomy module, will track party-specific nameStatus and associated start and stop dates.

The architecture of the Vegetation Classification Database is sufficiently general that alternative classifications will be able to reside here. They could include simple alternative classifications such as the ecological groups recognized by ABI and broadly different classifications such as the Society of American Foresters cover types. (We have not yet included, but perhaps should, the option for associations to be members of multiple hierarchies simultaneously, such as FGDC, ABI ecological groups, and the Braun-Blanquet hierarchy.)

6. Plots database.

The plots database is intended primarily to hold, maintain, and provide public access to information on vegetation plots that might be helpful in the development, refinement and documentation of the US National Vegetation Classification. Minimum data required for this purpose are defined in the draft standards document (http://esa.sdsc.edu/vegwebpg.htm) prepared by the ESA Vegetation Panel, and are indicated in red in the entity relationship diagram or ERD (Figure 6.0). However, the plots database has been designed hold diverse plot types and provide greater functionality beyond that which is useful for vegetation classification. This additional functionality is provided to encourage greater use of the database, which in turn should provide submission of more plots useful for classification purposes.

6.1 Plot attributes
Relationships among tables containing information on essential plot attributes are shown in Figure6.1. The roles of these tables and their critical fields are described below.

Project. A set of related plots can (but is not required to) constitute a project wherein the plots have various attributes in common, such as purpose, geographic scope, and primary investigator. Information about a project can be found in or linked to project. A plot can belong to no more than one project. Other groups of plots must be identified through shared attributes such as plot contributor, place name, and date range.

What is a plot? At the most basic level, a plot includes the geocoordinates of a point (the plot origin), the plot surface area, and a species list with associated species cover estimates. Basic, unchanging attributes of plots are recorded in plot, including especially plot identification and location information. Information that might change between replicate sampling events is stored in a child table of plot called plotObservation (see section 6.2) or its child tables. Observations on species cover and abundance are stored in child tables of plotObservation.

Plot identification. Plots are uniquely identified by the primary key of plot (called PLOT_ID). Plots can also be identified by the author’s plot code (authorCode, stored in plot) combined with the name of the plot author (stored in plotContributor).

Subplots & nested plots. Each member of a hierarchical family of subplots has its own record in plot. The field PARENT is a foreign key that links the subplot to its immediate parent, indicated by a recursive loop on plot within the ERD.

Plot location. Location is recorded as the latitude (origLat) and longitude (origLong) of a specific point called the "plot origin" (UTM coordinates are not used because they can be derived with interface tools and because they are less effective at high latitudes). Accuracy of plot location is recorded as a restricted domain variable called horizPosAccuracy (e.g., within 10m, 100m, 1km, 10km, 100km, >100km), whereas precision is always to the nearest meter. Notes on how to relocate the plot can be recorded in locationNarrative.

Plot shape. Plot shape is recorded as a restricted domain variable (e.g. square, rectangle, circle, irregular). Specific shape and location of plot edges can be recorded as a series of way points relative to the plot origin (stored in dsgpolyo). These are recorded in meters in a local coordinate system where the origin equals the plot origin and the x-axis is parallel with the plot azimuth. The method used for locating the plot in a stand can be recorded as a restricted domain variable called placementMethod (e.g., random, systematic, subjective). Additional information on plot layout and placement can be recorded in layoutNarrative, or if generic across multiple plots in sampleMethod (a child table of plotObservation), which provides details of sampling methodology.

Plot area. The surface area of a plot is recorded in area; this is the area for which the complete species list was determined and the cover values determined. However, sometimes plot species lists, cover values, and stem counts are made for different areas or with a set of subplots. Species richness assessments are biased if the species count is not complete or if it is based on a set of dispersed (not contiguous) subplots. If the species list and cover values are based on subplots that do not entirely cover the area they are intended to represent, the total area within which the subplots occurred and about which an inference is being made is called subPlotInferenceArea. If the species list is based on a set of subplots totaling plotArea (< subPlotInferenceArea), the dispersion of the subplots is reported as a restricted domain variable subplotDispersion (e.g. contiguous, dispersed and regular, dispersed and random, single large subplot). If tree stems are recorded for an area different than plotArea, this area is recorded in the table plotObservation as treeStemArea.

Place names. Required or frequently referenced place names are stored as dedicated fields in plot (country, state, county, USGS quadrangle). Any number of additional place names can be recorded for each plot, with the names stored in the table namedPlace with the linkage to plots mapped in plotPlace.

Attributes. Each plot can have site-specific attributes that are expected to never change, such as topographic position, elevation, and surficial geology. Attributes required for FGDC compliance, or which are expected to be used frequently, have dedicated fields (e.g., elevation, elevation accuracy, slope aspect, slope gradient, surficial geology). Other variables can be added as user-defined fields as described below in section 6.5. Attributes that might change between observations (e.g., soil chemistry) are stored in plotObservation.

Graphics. Graphic records such as photographs and maps can be stored in the table graphic. Rules need to be established as to acceptable formats for graphics and total space that can be dedicated to graphics.

6.2 Plot observations
Many plots are established as permanent plots with the expectation that they will be resampled. To accommodate the need to measure and record some plots more than once, those aspects of plots that may change between observation events are split off from plot and placed in plotObservation and its children. The sequential observations are tracked via a recursive loop with the previous observation being referenced as a foreign key (PREVIOUSOBS). The relationships among the child tables of plotObservation in the National Plots Database model are represented in Figure 6.2. The roles of these tables and their critical fields are described below.

Plot observations. Environmental and conditional observations related to a single plot observation are stored in plotObservation - a table that is a child to plot. Examples include soil attributes (e.g., soilPH), landcover attributes (e.g., percentSoil, percentLitter, percentWater), and site history (e.g., landUseFormer, timeSinceFire). The number of plot observations within a given plot is not constrained by the database; however, the database does enforce that all observations be at the same location as the parent plot.

Cover estimation methods. Although an observation event for a plot have many similarities with previous observation events, not all observations of a plot are necessarily carried out using the same methodology. For this reason sampleMethod and coverMethod are children of plotObservation rather than plot. The indices used to represent the cover classes are stored in the coverMethod table, whereas the meanings of the component values of the specific indices are stored in coverIndex. This allows any number of different cover class scales to be defined and stored within the database. Similarly, the storage of attributes related to sampling method are stored in the sampleMethod, which has a foreign key referencing the citation table where references related to the sampling methods are stored. Details of the method are embedded in a text field called sampleMethodDescription.

Taxon observations. Multiple taxa are typically observed in a plot and various attributes of those taxa are recorded. The taxon list is stored in taxonObservation. Typically, all interpretations of a taxon’s identification (that is, the taxonomic concept) will be tracked in taxonInterpretation, including that of the original author. The total species cover in the plot (across all strata) is recorded in taxonObservation (cumStrataCoverage), but all other attributes of taxa observed in the plot are stored in child tables discussed below.

Taxon identifications. Identifications of taxa are open to multiple interpretation. Various parties will choose to follow different taxon concepts as standard, and will have differing interpretations of which names to assign to the taxon concepts recognized. At very least, it will be necessary to track the taxon concepts used by the author of the plot and by authors who have generate publications using the plot data. Interpretations of which taxon concepts apply to taxa observed in a plot are stored in the taxonInterpretation table. The mapping of names to concepts is handled in the taxonomy module of the database discussed in section 4. However, if a specific name were applied by the interpreter (as in a publication), that name is tracked as taxonName_id.

Taxon concepts (circumscriptions) are uniquely identified in taxonInterpretation by reference to a citation (taxonRef_id) and the taxon name used in that reference (taxonCircumName; which need not be the same as the name the identifying party applies to the taxon). The database also tracks the date the interpretation was made (interpretationDate, in part to record which name was applied to the concept at that time by a particular party), and the party making the interpretation. The original author’s interpretation is required, although the associated reference would be just the author’s name in the event no authority was listed in the published record of a legacy plot. Business rules will be required to establish which database users have the privilege of adding interpretations. Certainly the original plot author, the management team, and authors of derivative publications will be eligible.

Strata designation. Definitions of vertical strata differ among plot authors and studies. The database allows users to define strata as appropriate to their plot methods. The table stratumType contains the name and description of each stratum recognized. For a given study, each stratum used is defined in stratum, along with summary values for stratum cover and stratum height.

Strata composition. The cover value for a taxon in a stratum is recorded in stratumComposition, which contains foreign keys that point to both the taxon (taxonObservation) and the stratum definition (stratum).

Tree stem data. Individual stems of woody plants are often recorded for plots. Stem counts will be recorded in treeStems. Fields are provided for stem height, stem diameter, and spatial coordinates, all to be used as needed. In addition, stem number and size accuracy can be recorded, which makes for efficient recording when tallies by size classes are used.

Inference area. The area from which certain attributes of a plot are inferred sometimes differs from the absolute area of the plot itself. For example, stem counts for taxa of dense shrubs might be made for a subset of the plot, whereas the stem count for a sparse canopy dominant in a savanna might be inferred from a superset of the plot. Similarly, cover for the highest stratum might be assessed for an area differing from that of the plot as a whole. We use the inferenceAreaIndex table to track such deviations for woody plant stems and cover estimates for strata, with the critical field being inferenceArea. By allowing the inferenceAreaIndex to reference taxonObservation, plotObservation, and strataComposition by foreign key, we allow inference area to vary independently for individual stems, size classes, and strata.

6.3 Plot contributors
Various parties have contributed information, interpretation, or effort to plots archived in the plots database. Those contributions are tracked through a series of relationships with the table vegPlotParty (Figure 6.3).

Participating parties. Information about parties (either individuals or organizations) is stored in the vegPlotPartytable and its child tables. Within the vegPlotParty table itself, such information as name (givenName and surName, or organizationName), and contact instructions (contactInstructions) are stored and linked to a plot through the plotContributor table. Notice that there exist other contributor tables (e.g., citationContributor, projectContributor) that are used to link a party with a citation or project.

Roles. Individuals and organizations listed in vegPlotParty can play many roles. These roles are identified in intersection tables such as plotContributor, projectContributor, and citationContributor. Role constitutes a closed domain field that includes author, field assistant, land owner, and principal investigator, among many more.

Contact information. The children tables of the vegPlotParty table used for storing contact information include address, email and telephone, and are used to store past and present addresses, current email address and current telephone number..

Citations. The citation table stores information about various references, and is linked to entities where references are used, such as plotObservation, sampleMethod, taxonObservation. Some of the key attributes stored in the citation table are: title, year, edition, series and pageNumber.

6.4 Plot classification
Plot users and authors will both assign plots to alliances or associations in the Vegetation Classification Database, and these assignments will range from tentative field identification to assignment via quantitative tools, to definitive assignment as typal plots that form the definition of a vegetation type. In addition, many users will wish to query the plot database by association or other level in the vegetation classification. Finally, some parties will wish to assign names from different classification schemes to selected plots, as might be the case with a Forest Service employee assigning habitat types or SAF dominance types to plots.

As mentioned in the Vegetation Classification Database section of this document (section 5), it is our intent to track plot assignments to communities through a database architecture similar to the HDMS system for managing the community classification. The key table is communityAssignment (see Table 6.4) where in there is a foreign key to the communityConcept table in the community classification module (section 5). Classification is made at the level of the plotObservation because classifications generally refer to existing vegetation. Thus, there is a foreign key back to plotObservation combined with a foreign key to plotParty to indicate who made the assignment, a foreign key to citation (for published assignments), start and stop dates for that assignment by the party, and a quality indicator of fixed domain (e.g., field observation, expert assignment, quantitative assignment, published, type plot).

6.5 Support functions
Relationships among tables holding information of a general supporting nature are shown in Figure 6.5. The roles of these tables and their critical fields are described below.

User-defined Variables. For several tables (especially plot and plotObservation), it will be necessary to allow users to define additional fields not anticipated in plot design, or not frequent enough to be routinely provided. For example, soil chemistry variables tend to be idiosyncratic to a small set of projects as methods differ among projects. Such variables are defined in userDefined where the name of the of the variable is given (userDefinedName) along with the table the variable is assigned to (tableName), its assignment to a category (such as “soil” or “disturbance”) used for search purposes, and a narrative field for providing metadata on the details of what the variable is and how it was measured or determined (userDefinedMetadata). The actual values are stored in definedValue as a specific value (value) and the primary key of the relevant table (tableRecord), which is used to identify the record to which the value belongs.

User & management notations. We provide the capability of placing notations on individual entries in the database through the tables notation and notationIntersection. Business rules dictate that there are three types of notations (notationType): “databaseManagement,” “databaseUser” and “internal.” 'databaseManagement' notations are those added by the data manager and may be queried by the public. 'databaseUser' are notes entered by the users of the database and they too are publicly viewable (these are generally viewed as permanent, though they can be deleted by the database manager). 'internal' notes are for management personnel to use internally and are not broadcast.

Revisions and revision records. Generally, data placed in the database will be viewed as permanently archived, so changes will not be allowed. However, there will inevitably be exceptions to this rule, especially when the changes are not substantive. For example, spelling mistakes might be corrected. All such changes will be logged in the table revisions. We do not anticipate making this table or the associated roll-back functionality available to the public, but it will be available for management use. Substantive changes in plot data will be handled through submission of new versions of a plot (plotVersion) or plotObservation (observationVersion) by the plot author. Proposed substantive changes not formally introduced in a new version may be posted as notations by the database manager or by database users.

6.6 Physical design and implementation
A general understanding of the access and analytical tools, used for inserting, updating, deleting, querying, and transforming (using XML) vegetation-related data associated with the National Plots Database may be gained through examination of the broad architectural diagram of the software system. Communication with the Vegetation Plots Database will be implemented using Java Servlet technology, which will allow the end-user access via the common HTTP protocol - leading to easy integration of multiple front-end user-tools (including a web-interface and desk-top client tools). The stand-alone (desk-top) version will be developed using the same servlet classes as used in the centralized version, but will be implemented from a Java Application that will communicate with an embedded database instead of a centralized RDBMS. Java has been chosen as the general tool used for the bulk of the programming in this project, and was chosen because it is platform-agnostic, running equally well on all common platforms. Also, all data being loaded into or extracted from the database will be converted to an XML format parallel to the database schema, which allows for efficient transformation into a multitude of data formats for use in other analytical tools.

The plots, plant taxa and community classification databases described separately herein will be implemented and maintained separately. This decision derives from the assumption that at some point in the future the plots database will be a stand-alone implementation that will communicate directly with a plant taxa and/or community classification database(s) maintained by a party independent from the party that maintains the plots database. In the meantime, these three databases will be integrated using JDBC for all inter-database communication.

7. Interface tools

Easy to use interface tools will be essential if the prototype database system is to be widely accepted and used. Development of interface tools is time consuming and cannot be completed in full with available resources. For this project database interface development will focus primary on the plots database so as to increase the likelihood that the product will be widely used. For the plant taxonomy and vegetation classification database modules, only those tools essential for system function and demonstration will be implemented. Development of other interface tools for these latter two modules will be left to our partner organizations, ABI and ITIS.

Design of interface tools for the plots database has not yet been initiated. We anticipate a workgroup meeting in January 2001 to initiate this effort.

7.1 Plot interpretation
This work has not yet been initiated. The plots database design allows for annotations of certain tables by the user community, interpretations of taxon identifications by users of plots, recording of vegetation classification assignments by users and authors, and recording of literature citations of plots. Tools will be needed to facilitate use of these capabilities.

7.2 Plot extraction
This work has not yet been initiated. Web-based tools are needs for querying the database to identify and extract plots of interest. Specific common queries (e.g. community type, geography, species occurrence, investigator, methodology) will be specifically coded, and capability for user design of more complex queries will be provided. Capability will be provided for the export in total (all data on selected plots) or in part (e.g., essential information needed for classification development or refinement). Export options will allow for formats specific to commonly-used vegetation analysis programs (e.g. TurboVeg, Canoco, PC-ORD).

We are considering developing a light-weight, stand-along version of the plots database that would run on desktop computers. This would allow users to download blocks of data from the central database based on a rather general query and later refine their searches or supplement the data using their own proprietary plot data.

7.3 Plot submission
This work has not yet been initiated. At a minimum, a stand-alone program is needed for preparing data for submission to the plots database. This would perform all necessary error and consistency checks and aid in formatting the data. Management tools will be needed for the central database managers to verify the integrity of data before embedding it in the database.

8. Demonstration datasets

8.1 Plants database
We will implement two demonstration datasets in the plant taxonomy database. The first will be based on two substantially different versions (1996, 1999) of the ITIS plants database, with partial tracking of changes in concepts between the versions. This will be sufficient to implement concept-based taxonomy for future submissions to the plots database, but will not provide a detailed documentation of the historical treatment of the taxa (historical usages will need to be added on an ad hoc basis as legacy data are added). A historically detailed, fully referenced implementation will be provided for the family Juglandaceae in North America for demonstration purposes.

8.2 Plots database
To assess and demonstrate the efficacy of the database system, we will populate the plots database with a selection of plots from four sources representing substantially different plot types:

Plots collected by TNC/ABI as part of the National Park Service Vegetation Mapping project for the Greater Yosemite Ecosystem;
Circum-Arctic plots collected using the Braun-Blanquet methodology and assembled for a biome-wide synthesis by Marilyn Walker;
North Carolina Vegetation Survey data collected using the methodology of Peet et al (1998. Castanea 63:262-274);
Forest habitat-type data collected by the U.S. Forest Service.

8.3 Vegetation classification database

We will populate the prototype vegetation classification database with vegetation types recognized in either the NatureServe or the EcoArt database of ABI. We anticipate that information on type names, position in the classification hierarchy, current acceptance, and lineage will be loaded. Maintenance of detailed information on the types recorded in the vegetation classification will be the institutional responsibility of ABI.

Contents