GMS Setup

From ICISWiki

Jump to: navigation, search

Contents

SETTING UP A GMS DATABASE

C.G. McLaren, W. Eusebio, A. Portugal

INTRODUCTION

The ICIS GMS database stores and manages information on genesis, genealogy, nomenclature and chronology of germplasm for a particular crop. Several approaches have been done to populate the databases from historical data of genealogy. Uploading a local GMS database to central database involves several steps that sometimes require knowledge about the crop genealogy and the GMS database schema.

POPULATING AND CORRECTING GMS DATABASES

GMS database population generally requires parsing historical data from different sources into the GMS structure. There are as many ways to do this as there are historical datasets. Some examples can be found in the original TDM document: Setting up a GMS database. Another example is in the problems and solutions section: Uploading pedigree data.

It is inevitable that many errors will be discovered in pedigrees onece they are loaded and available for examination with the GMS appllications. Correcting pedigree information is complicated by the number of dependencies which may be present. Algorithms are presented in the TDM section on CORRECTING RECORDS IN THE GMS DATABASE and these are implemented in the BROWSE application and to a lesser extent in the SetGen application. An example of correcting pedigree information is given in the problems and solutions section: Correcting pedigree data.

CHECK LOCAL DATABASE INTEGRITY

  • Check that the local database is registered (InstlnID) and all users are registered (the UIDs). If not register the local and update all local UID fields.
  • Check that the local database has not been uploaded. Check that the local database structure corresponds with the template. Check that UID fields and DATE fields have been completed wherever possible. Check that all fields have values (0 for missing).
  • Link all GMS tables from the central database to the local database.
  • Check that all derivative methods have GNPGS=-1 and generative methods have GNPGS>=0. (CHKL01A and CHKL01B require a link to central METHODS).
  • Check GPID1 <> GID and GPID2 <> GID for any record. [CHKL01C and CHKL01D]
  • Check that all derivatives with known source (GPID2 <> 0) have known group (GPID1 <> 0) [CHKL01E].
  • Check for circularities. [CHK04D2 (one step), CHK04D4 (two step) and CHK04D6 (three step)]
  • Check that replacements in the central database have been reflected in the local GMS, i.e. No replaced central GID should be referenced in the local GMS [CHKL01G1 &2].
  • Check that no replacements in the CHANGES table have already been replaced or deleted in the central database [CHKL01H].
  • Check that all local replacements and changes have been implemented [CHKL01I1 … CHKL01I8]
  • Check that germplasm in a management group (MGID<>0) has the same group as ist founding germplasm (MGID) [CHKL01J1 … CHKL01J3]. This is not strictly necessary because management groups can cross generative groups, but none used so far in IRIS and IWIS have done so.

UPDATING THE CENTRAL DATABASE

The local installation administrator can periodically send information from the local GMS to the administrator to be included in central GMS. In return they receive an updated copy of central GMS including data from all users who have updated central GMS since their last installation.

Users can elect to send subsets of their data defined by a cut-off date according to the GERMPLASM_DATE field of the GERMPLASM table. All data records from the previous update to the cut-off date will be transferred to the central GMS after checking for duplicates from other users and for inconsistencies with prior updates. Records from local supporting tables that were referenced by the updated data records must also be transferred, and all negative key-field values are assigned new, unique positive values. The origin of each updated data record in terms of USER_ID and local GERMPLASM_ID is recorded on the central GMS in the USER_ID and LOCAL_GERMPLASM_ID fields of the GERMPLASM table.

Once the update is completed a new copy of central GMS may be installed at the local site. The installation process deletes all updated data records and all records from the support tables that have been transferred to central GMS. All references to those records in remaining non-updated data records are changed to the corresponding new, positive key values from the central GMS. New records on the local GMS always have decreasing negative keys (-1, -2, -3 …), those for deleted records cannot be reused. An attempt to access a deleted data record triggers a search in central GMS for the corresponding values of the USER_ID and LOCAL_GERMPLASM_ID fields.

The process of updating the central GMS from local databases is one of the key administrative functions of GMS management involving both the local and central database administrators.

There are two main problems associated with the update process which are technically difficult to automate:

  • Local databases contain replacement and change requests for central germplasm in the form of CHANGE records with positive CRECORD field values. The central database administrator needs to verify that these changes are correct, and have not already been made by other updates.
  • Local databases may also contain germplasm records for germplasm which has been updated to the central from other sources. These must be identified to avoid duplications. (This process can be facilitated by adding a local transfer attribute to any genotype transferred from one local database to another).

To keep these problems manageable, it is recommended that local databases are updated regularly wherever possible. The following steps describe an algorithm for the update processes. They represent a combination of manual and automatic database processes and have so far been executed by direct manipulation of ACCESS databases. Steps 1-8 should be done by the Local Database Administrator. Steps 9 and 10 must be done by the Central Database Administrator. SQL queries to execute many of the steps are saved in :\ICIS\Database\Templates\Access\ GMS databases The names of queries in the template databases corresponding to different steps are given in square braces in the text. The SQL statements could be copied and executed under another DBMS.

Resolve Duplicate Germplasm Records

When germplasm records are entered into a local database, they should be checked against records already in central GMS. If matching records in central GMS are incorrect or incomplete, either changes are added to the local database correcting the central record or a new local germplasm record is added and a replacement is made. However, germplasm records may be updated to the central database from a different source after they have been added to a given local database, but before that database is updated. This results in duplicate germplasm records which must be resolved by the update process. There are currently no set queries for checking for duplicates and users must employ any useful strategies. There are three categories of duplicates as follows:

  • Complete duplicates
    • Select records with matching preferred name, GPID1, GPID2 and GNPGS fields in both central and local databases, GRPLCE=0 in both databases and central GID is not in REPLACEMENTS.GID.
    • If other germplasm fields are different, accept known values over unknown values and resolve differences between non-matching fields with known values. If these differences cannot be resolved leave two distinct records.
    • Some of the matches may be collections, releases or other non-breeding derivatives of the same line (or from the same group with unknown source) which we would like to retain as separate germplasm entries. These usually differ in location, but may only be discernable by other names or attributes such as accession numbers or release attributes. These should be identified and kept separate.
    • For matches which are true duplicates, replace GIDl with GIDc (3.9.7).
    • This creates new matches so return to the first step until no new matches occur.
  • Partial duplicates - complete data in Central
    • Select records with matching preferred name, and GNPGS fields, central GPID1 and GIPD2 > 0, local GPID1 and GPID2 zero or match central values, GRPLCE both zero and central GID not in REPLACEMENTS.GID.
    • Resolve differences in other fields as in (a).
    • Take note of possible non-breeding derivatives as in (a).
    • For resolved matches, replace local GID with central GID as in 3.9.7.
    • This could create new matches so return to (a) until no new matches occur.
  • Partial Duplicates - complete data in Local
    • Select records with matching preferred name and GNPGS fields, local GPID1 and GPID2 >= 0 and central GPID1 and GPID2 zero or matching local values, GRPLCE = 0 on both records and central GID not in REPLACEMENTS.GID.
    • Resolve differences in other germplasm fields.
    • Replace central GID with local GID and add replacements to REPLACEMENTS.

Prepare local update environment

  • Set GERMPLSM.LGID = GERMPLSM.GID [UPD03A].
  • Make a table, MAXTABLE, with fields GID, NID, LOCID, AID, CID and LDID [UPD03B]. Add one record to this table containing the maximum central GERMPLSM.GID value in field GID, the maximum central NAMES.NID value in NID, the maximum central LOCATION.LOCID value in LOCID, the maximum central ATRIBUTS.AID value in AID, the maximum central CHANGES.CID in CID and the maximum central LOCDES.LDID in LDID . This must be done manually, but queries UPD03B1 to UPD03B6 will extract the maximum values for you.
  • For a new crop implementation the value of MAXTABLE.GID should be zero (0) because there are no germplasm records entered in the central database.
  • Define a new table, UPDATEGID, with two fields GID and LGID. GID is auto-increment, long integer with base set to the maximum central GID value. [UPD03C1 AND UPD03C2. C2 uses MAXTABLE].
  • Define table UPDATELOC with fields LOCID and LLOCID. LOCID is auto-increment, long integer with base set to the maximum central LOCATION.LOCID [UPD03D1 and UPD03D2].
  • Define table UPDCHNG with fields CID and LCID. CID is auto-increment, long integer with base set to the maximum central CHANGES.CID [UPD03E1 and UPD03E2].
  • Define table UPDNAMES with fields NID and LNID. NID is auto-increment, long integer with base set to the maximum central NAMES.NID read from MAXTABLE [UPD03F1 and UPD03F2].
  • Define table UPDATRIBUTS with fields AID and LAID. AID is auto-increment, long integer with base set to the maximum central ATRIBUTS.AID read from MAXTABLE [UPD03G1 and UPD03G2].
  • Set INSTLN.UDATE to the current date, set INSTLN.UGID to the minimum local GID, set INSTLN.ULOCN to minimum local location number set INSTLN.UCHNG to the minimum local CHANGES.CID. Similarly set fields UMETHN, UFLDNO, UREFNO, UPID and ULRECID.

Update Locations

  • Generate new central LOCIDs [UPD04A]
  • Update LOCATION.LOCID [UPD04B].
  • Update LOCDES.LOCID [UPD04C].
  • Update GERMPLSM.GLOCN [UPD04D].
  • Update NAMES.NLOCN [UPD04E].
  • Update ATRIBUTS.ALOCN [UPD04F].
  • Update sub-national division fields in LOCATION [UPD04G, UPD04H and UPD04I]

Update Support Tables

  • Update METHODS, UDFLDS and BIBREFS in the same way as LOCATIONS. No set queries are currently available for this, but they can be set up following the pattern for locations: UPD03D1, UPD03D2, UPD04A – F.

Extract all Local Germplasm without Local References

  • Append GERMPLSM.GID to UPDATEGID.LGID where GPID1 >0, GPID2 >0 and GRPLCE= 0 (should also check PROGNTRS.PID > 0). UPDATEGID.GID will be automatically incremented [UPD06A].
  • Set GERMPLSM.GID = UPDATEGID.GID where GERMPLSM.GID = UPDATEGID.LGID [UPD06B1 & UPD06B2].
  • Set GERMPLSM.GPID1 = UPDATEGID.GID where GERMPLSM.GPID1 = UPDATEGID.LGID. Do the same for GPID2 and for PROGNTRS.PID [UPD06C1-UPD06C3].
  • This process may create more germplasm records without local references. If so return to the first step in this section and repeat until no such new records are created.
    • Note 1: For new crop implementation after running the queries [UPD04A to UPD06C3] for the first cycle, delete the first record in the UPDATE table that has GID = 0 and LGID = 0 to avoid replacement circularity.
    • Note 2: The queries from UPD06A to UPD06C3 can be executed by running the macro in the Macro tab of MS Access named Macro1.

Update all Remaining Local Germplasm References

  • Set NAMES.GID = UPDATEGID.GID where NAMES.GID = UPDATEGID.LGID [UPD07A].
  • Repeat for ATRIBUTS [UPD07B].
  • Set LISTDATA.GID to UPDATEGID.GID where LISTDATA.GID=UPDATEGID.LGID[UPD07C].

Check Update

  • There should be no records in GERMPLSM with negative GIDs and GRPLCE = 0 [UPD08A].
  • The only negative GIDs in the NAMES and ATRIBUTS table should belong to replaced or deleted records [UPD08B and UPD08C].
  • All GIDs and PIDs in PROGENITORS table should be positive [UPD08D].

Execute Central Changes and Restore the Local Database

  • Move all records with positive IDs from local to central tables. To use the ACCESS queries the corresponding central tables must be linked as NAMES1, ATRIBUTS1, etc.
    • LOCATIONS [UPD09A1a] followed by [UPD09A1b]
    • LOCDES Queries not available yet
    • METHODS Queries not available yet
    • UDFLDS Queries not available yet
    • BIBREF Queries not available yet
    • NAMES [UPD09A5a] followed by [UPD09A5b]
    • ATRIBUTS [UPD09A6a] followed by [UPD09A6b]
    • GERMPLSM [UPD09A7]
    • PROGNTRS [UPD09A8]
  • Update local INSTLN table with the current date and fields UGID, UNID, ULOCID, UAID, UCID and ULDID set to the new minimum values in the local database.
  • Update central INSTLN table, i.e. replace the central INSTLN record for the current installation with the updated local record.
  • Replace local target GIDs in central change and replacement records with new GIDs [UPD09C4].
  • Execute replacement and deletion changes in central [UPD09C5 and UPD09C6].
  • Apply changes to GPID1, GPID2, GDATE, GLOCN, METHN and GNPGS. Check the old value (CFROM) before applying the new value (CTO) and, if successful, also change CSTATUS to 2. Changes with CRECORD>0 and CSTATUS=0 remaining in the local CHANGES table either contain local GIDs or were not applied because their CFROM value did not match the target field value in the central database [UPD09D5-UPD09D10]. Repeat each command until 0 results.
  • Apply changes to NAMES table fields. Only the queries for changes to NSTAT, NTYPE and NLOCN are available at the moment [UPD09D11 and UPD09D13]. Repeat each command until 0 results.
  • Apply changes to ATRIBUTS table fields. Queries for ATYPE and ALOCN are available, [UPD09D15-UPD09D16]. Repeat each command until 0 results.
  • Allocate positive CID values and move all change records to the central CHANGES table [UPD09E3–UPD09E5].
  • Update negative GID values in LISTDATA using the UPDATEGID table and move lists which should be published to central.
  • Empty local GMS tables (except INSTLN). Remove uploaded records from LISTNMS and LISTDATA but leave records for non-published lists.
  • Delete tables MAXTABLE, UPDATEGID, UPDATELOC, UPDCHNG.
  • Update the central INSTLN1 record for the Central Database (INSTLN1.INSTLID=1) with the current date and fields UGID, UNID, ULOCID, UAID, UCID and ULDID set to the new maximum values in the central which can be obtained by running queries UPD03B1 to UPD03B6.
  • Factors (and possibly variates) in the local DMS with SCALE=GID may have negative GIDs. You should check for replacements and update to the new central GIDs.
  • Transaction records in IMS may have RECORDIDs pointing to LISTDATA records which have been uploaded. These should be updated with the new central LRECID values.
  • Compress the local database.

CHECK CENTRAL DATABASE INTEGRITY

  • Check that all derivative germplasm have GNPGS = -1 [CHK01A].
  • Check that all generative germplasm have GNPGS >= 0 [CHK01B].
  • Check that GPID1 <> GID [CHK01C].
  • Check that GPID2 <> GID [CHK01D].
  • Check that all derivatives with known source (GPID2 <> 0) have known group (GPID1 <> 0) [CHK01E].
  • Check that there are no references to deleted or replaced germplasm in central [CHK10L2] for GPID1, [CHK10L3] for GPID2 and :CHK10L4 (progenitors). (These queries use CHK10L1 to get a list of all replaced or deleted germplasm)
  • Check for one-step circularities [CHK04D2].
This first calls CHK04D1 to get a list of all non replaced derivative germplasm (NRDG), GIDx with source GPID2x. Then it finds all sources of these (GIDy with GIDy=GPID2x) which have their sources GIPD2y=GIDx.
  • Check for two-step circularities [CHK04D4]
This calls CHK04D1 to find all NRDG, GIDx, and then finds the sources of those with query CHK04D3, GIDy, and finally finds the sources of those, GIDz, which have GPID2z=GIDx
  • Check for three-step circularities [CHK04D6]
This calls CHK04D1 to find all NRDG, GIDx, then finds the sources of those with query CHK04D3, GIDy, and then finde the sources of those with query CHK04D5, GIDz, and finally finds the sources of those, GIDw, which have GPID2w=GIDx
  • Check that all group IDs are generative or have unknown source (-1, 0, 0) [CHK01G1]
This finds non-generative group sources with non-zero GPID1 or GPID2. (This calls CHK04D1 first to find NRDG).
  • Check that all source germplasm which are not also the group sources are derivative [CHK01I2]
Does this by finding all NRDG with non-zero sources and then finding those sources which are not also group sources (GPID1) but are generative.
  • Check that NRDG has the same group ID as its derivative source [CHK01I3]
Find NRDG which have different groups (CHK01I1.GPID1) to their derivative sources (GERMPLSM.GPID1).
  • Check special name relationships, e.g., Group source for IR nnn- - - should have name IR nnn. CHK04F1 – CHK04F3 (somewhat IRIS specific but can be adapted to similar situations).


SOURCE CODE FOR THE ACCESS QUERIES

The source code for the Access queries can be seen here: Access source Codes

The source code when using in SQL Server can be seen here: SQL Server source Codes

Personal tools