TDM ICIS 6.0 Schemata Structural Revisions

From ICISWiki

Jump to: navigation, search

Back to Main Page > ICIS 6.0 Home Page > ICIS 6.0 Models & Schemata


ICIS 6.0 Schemata Structural Revisions

Contents

Introduction

Since its inception in the early 1990's, the ICIS schemata has evolved organically, periodically by gross additions of new modules, but mostly by incremental refinements to tables and fields, based on end user experience with the system. In some cases, the naming of tables reflected typical nomenclature at the time, used within the relatively modest ICIS community. Moreover, table and field names are typically terse, in some cases, cryptic, due to early database platform limitations.

Most innovations in the ICIS schemata were internal to ICIS, borrowing little from outside sources of inspiration but rather, reflecting internal expertise of the community, with a few notable exceptions (such as the adaptation of the basic DMS design from CGIAR forestry biometric data management experience).

In more recent years, a flurry of activity in allied fields like non-crop bioinformatics has developed other communities of practice in the area of biological data management. The experiences and design principles of these communities could conceivable benefit the ICIS schemata design.

In addition, significant CGIAR hosted informatics initiatives, most notably, the Generation Challenge Programme (GCP), have brought together sizeable teams of crop informatic experts to review current semantic and data integration standards in the field. In particular, the GCP Demeter domain model covers a wide range of crop information data types. The Germplasm and Study models are in fact, directly inspired by the experience and data types of the ICIS community, but in addition, leverage the semantics of a shared set of core domain meta-data models.

Other initiatives such as the CGIAR World Bank funded upgrade of CG genebanks included efforts to develop genetic resources information systems. The decision at IRRI was to apply such resources toward the merging of IRRI genebank accession data into ICIS, an activity which has resulted in the current ICIS Genetic Resources Information System (GRIMS). Presently, the Global Crop Diversity Trust has commissioned the USDA GRIN team to create a new, portable (open source?) genetic resources information system (GRIN-Global). Some discussions are underway between the ICIS community and the GRIN team on how to share expertise and collaborate in this task, leveraging the ICIS community crop informatics design experience.

Finally, experience with the ICIS schemata 5.0 suggest that refactoring portions of the schemata could lead to more flexible, maintainable system with increased performance.

The above suggests that a rational global review and revision of the ICIS schemata, for the next major ICIS release (6.0), although presenting major challenges relating backwards compatibility which could trigger widespread software engineering changes, could nonetheless give the global crop research community a superior crop information management system.

This document will attempt to summarize community discussions relating to a general review of the ICIS schemata and on the general design principles that might guide a possible refactoring of the ICIS schemata.

Proposed general design principles

If it ain't broken, don't fix it...

The inherent core strengths of the ICIS schemata with respect to crop information management should be maintained. Generally speaking, this means that the bulk of the semantics of the topology, tables and fields of the ICIS schemata should be maintained. A permissible refactoring, upon community discussion, and careful planning, is to rationalize and improve the descriptive naming of ICIS tables and fields. Explicit table and field mappings to facilitate ICIS 5.* to ICIS 6.0 database conversion should be systematically spelled out.

Compliance with public biological semantics and schemata standards

The above commentary notwithstanding, consideration should be given to refactor the schema to better reflect recently develop shared public models for crop (plant, biological) informatics semantics.

In particular, the structure of the ICIS schemata should be compared and contrasted against the and core domain models of the Generation Challenge Programme (GCP) to guide the rationalization of the ICIS schemata.

For biological scientific standards immediately adjacent to existing crop information, some consideration should be given to the issue of interoperability. In particular, the modular Generic Model Organism Database Chado molecular biology schemata is proposed as a key complementary schema to be coupled with ICIS, to provide schemata support for genomics data including sequence data.

Shallow versus Deep Semantics

ICIS, the GCP domain model and Chado share one inherent characteristic which is to be preserved: extensive parameterization of a core data model with controlled vocabulary and ontology (CVO). In the case of ICIS, the management of such CVO is somewhat hard coded into a series of tables for germplasm method CV (in the GMS); property, scale and method CV (in the DMS). The GCP domain model makes extensive use of a SimpleOntologyTerm class which is a proxy identifier for ontology terms. The Chado database schemata provides a relatively simple "controlled vocabulary" (or CV) module.

In ICIS 6.0, it is proposed that this design philosophy be extended and progressively rationalized following the design patterns of the Chado schema and the spirit of the GCP domain model. In essence, centralized management of terms in the Chado CV module schema is proposed to replace the germplasm method, property/trait, scale and method vocabulary, and the "discrete scale" value concept in ICIS. In addition, the need for controlled vocabulary in new contexts (e.g. GEMS and LDMS) should be provided by the CV schema module.

Generic Schemata as Core Set

It is proposed that shared semantics across ICIS modules be consolidated into a Generic Core Schemata ("GSC"). The design of the GSC is proposed to rely heavily on the core modules of the GMOD Chado schemata (see below) to allow for wider interoperability of ICIS with the databases and tools of the GMOD community.


Discussions for specific modules

It is generally assumed that the more recent utility ICIS modules such as the inventory management system, already reflect current and recent "best practices" in their design. This could be reviewed later but the initial emphasis of the current discussion will focus rather on core ICIS modules

Generic Core Schemata

A Generic Core Schema (GSC) module is new to ICIS 6.0 and is intended to reflect the design principles discussed above, in particular, the experience and designs of the Generation Challenge Programme core models and the Generic Model Organism Database (GMOD) community.

It is generally proposed here that, insofar possible, ICIS embed as a subset the following schemata modules from GMOD into its schemata and adjust usage of these concepts in ICIS 5.0 schemata accordingly:

Adoption and ICIS integration of this generic core schemata will leverage all the genomic software engineering efforts of the GMOD community and may encourage expansion of the community of practice of ICIS into the GMOD community, in particular, for model and GMOD hosted crop plant research. The modules listed are considered part of generic core modules. Review of the GCP domain model may suggest additional components of the core schemata. Adoption of additional Chado modules are considered in the context of specific ICIS modules (below).

Genealogy Management System

The Germplasm model of the GCP domain model was largely inspired by the design of the ICIS GMS. Fewer changes are anticipated here and in fact, ICIS has driven the evolving design of the GCP Study model. The main suggested changes here is two fold:

  1. General renaming of tables and fields to be more descriptive and meaningful, and perhaps, more closely related to GCP domain model names
  2. Possible migration of generic functionality of the DMS, such as ontology management, upwards to the Generic Core Schemata (i.e. Chado CV table)

Data Management System (DMS)

The Study model of the GCP domain model was largely inspired by the design of the ICIS DMS. Fewer changes are anticipated here and in fact, ICIS has driven the evolving design of the GCP Study model. The main suggested changes here is two fold:

  1. General renaming of tables and fields to be more descriptive and meaningful
  2. Possible migration of generic functionality of the DMS, such as ontology management, upwards to the Generic Core Schemata (i.e. Chado CV table)

Gene Management System (GEMS)

A recently proposed ICIS 6.0 Design of the Genealogy Management System (GEMS) reflects the topology and design patterns of the GCP genetic domain model was already attempted this past year and is fully documented .

In addition, it is proposed that the GEMS_GENOMIC_FEATURE table be closely coupled, if not completely subsumed into, the GMOD Chado Sequence module Feature table.

Location Data Management System (LDMS)

The design of the Location Data Management System (LDMS) is proposed to reflect the topology and design patterns of the GCP Location domain model. Although the [1] schemata doesn't model location data too extensively, some of the design principles of the Chado schemata should be propagated into the design of the LDMS. A first draft data model is presented ICIS 6.0 Design of the Location Data Management System (LDMS).

Genetic Resources Information Management System (GRIMS)

The proposal here is more diffuse: it is recommended that ICIS initiate significant negotiations with Global Crop Diversity Trust funded GRIN-Global project to ensure interoperability and coupling of the ICIS schemata to that initiative, and perhaps, adoption of compatible design principles as is being proposed for ICIS 6.0.

First Steps toward ICIS 6.0: Specific Recommendations

Personal tools