Data Validation Tool 5.4.2

From ICISWiki

Jump to: navigation, search

Application Programs 5.4 > Data Validation Tool MAIN > Data Validation Tool 5.4.2

Contents

Introduction

The ICIS Data Validation Tool is an application that searches ICIS for data errors that might render it meaningless. This is useful in making sure that published data are always of excellent quality.

Image:Icis-validate5.4.JPG

It is so simple to use! Just choose the tests you want to execute and click on the "Run" button.


ICIS - GMS (i) Queries

Checks the Genealogy Management System (GMS) central database (applicable to all ICIS implementations).

Image:icisgms1.JPG



Invalid parent references [2 checks]

Error messages:

DataError-0001: Unknown group source

DataError-0002: Germplasm with non-generative group source



Circular references [3 checks]

If germplasm A has germplasm B as one of its parent and if germplasm B has germplasm A as one of its parents, then we have a circular reference situation. This option also checks for two and three-level circularity.

Error messages:

A references B and B references A:

DataError-0003: 1st level circular reference 


A references B and B references C and C references A:

DataError-0004: 2nd level circular reference


A references B and B references C and C references D and D references A:

DataError-0005: 3rd level circular reference



Invalid Method

Error message:

DataError-0006: Invalid germplasm methods



Deleted parent references

Let us say germplasm A is replaced with germplasm B as shown in the GERMPLSM.GRPLCE database column. All germplasm that references germplasm A should be corrected to germplasm B.

Error message:

DataError-0007: Germplasm with deleted parent references



Foreign Key References [4 checks]

The ICIS database has no foreign key constraints enabled so we need to check manually if the foreign key definition is violated.

Error messages:

DataError-0008: Invalid LOCATION.LOCID references

DataError-0009: Invalid GERMPLASM.GID references
DataError-0010: Invalid METHODS.MID references
DataError-0011: Invalid BIBREFS.REFID references



ICIS - GMS (ii) Queries

Checks the Genealogy Management System (GMS) central database (applicable to all ICIS implementations).

Image:icisgms2.JPG


Progenitor germplasm dates [3 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


The GDate of a GID must not predate GDate of any of its progenitors. (Beware of missing links in the chain of dates! If the test checks only the GDATE of the progenitors GPID1, GPID2 and MGID, then it will not detect the following error: non-zero GPID1, GPID2 or MGID has GDATE=0 but their GPID1, GPID2 or MGID are younger than the target GID. Therefore, if a GPID1, GPID2 or MGID has GDate=0, iterate to check their GPID1, GPID2 and MGID)

Error messages:

DataError-0012: Germplasm with germplasm date (GDATE) earlier than GDATE of GPID1
DataError-0013: Germplasm with germplasm date (GDATE) earlier than GDATE of GPID2
DataError-0014: Germplasm with germplasm date (GDATE) earlier than GDATE of MGID



Progenitor ID1 unknown, Progenitor ID2 known

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


A GID can’t have GPID2>0 and GPID1=0

Error message:

DataError-0015: Germplasm with Progenitor ID1 unknown ( GPID=0 ), Progenitor ID2 known ( GPID2 > 0 )




Name inheritance from GPID2: check NDATE and NLOCN [2 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


If a GID inherits a name from its source (GPID2) then that name record must also inherit the name date (NDATE) and name location (NLOCN)

Error messages:

DataError-0016: Germplasm with inherited name from GPID2 but NDATE not inherited
DataError-0017: Germplasm with inherited name from GPID2 but NLOCN not inherited


IRIS - GMS Queries

Checks the Genealogy Management System (GMS) central database (applicable to International Rice Information System (IRIS)).

Image:irisgms.JPG



Preferred Names [2 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


No more than one name of a GID must have NSTAT=1 (preferred English name)

DataError-0018: Germplasm with more than one preferred English name (NSTAT=1)


Names eligible to be the preferred name are: CRSNM, RELNM, DRVNM, CVNAM, ELITE

DataError-0019: Germplasm with invalid name type as preferred name

Preferred IDs [3 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


No more than one name of a GID can have a "preferred ID" status (NSTAT =8)

DataError-0020: Germplasm with more than one preferred ID (NSTAT=8) 


The following names types are eligible to be preferred ID: DRVNM, COLNO, ACCNO (if present), GACC (if present), ITEST (if present), CIATGB (if present)

DataError-0021: Germplasm with invalid name type as preferred ID


The preferred ID must be unique for the given name type – that is, two accessions must not share the same name of the same type if one or both is a preferred ID.

DataError-0022: Germplasm with preferred ID not unique for the given name type


Location (GLOCN) of germplasm

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki

For all germplasm with germplasm creation method not Method ID 62 (Import), germplasm location (GLOCN) of a GID should be the same as the GLOCN of its GPID2

DataError-0023: Germplasm with GLOCN different from GLOCN of GPID2



Method - name type combinations [3 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


Only certain combinations of name type and germplasm creation method are acceptable. Namely:


Method type GEN: valid name types are: CRSNM, UNCRS, UNRES

DataError-0024: Invalid method-name type combination for method GEN 


Method type DER: valid name types are: RELNM, DRVNM, CVNAM, CVABR, NTEST, LNAME, ADVNM, ACVNM, AABBR, OLDMUT1, OLDMUT2, ELITE, UNRES

DataError-0025: Invalid method-name type combination for method DER 


Method type MAN: valid name types are: ACCNO, RELNM, CVNAM, CVABR, COLNO, FACCN, ITEST, NTEST, LNAME, TACC, ADVNM, ACVNM, ELITE, GACC, DACCN, LCNAM, CIATGB

DataError-0026: Invalid method-name type combination for method MAN



Name type occurrence [3 checks]

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki


Some name types must not occur more than once for a single GID. These are ACCNO, CRSNM, UNCRS, COLNO, ITEST, GACC, CIATGB, RELNM, DRVNM, CVNAM

DataError-0027: Germplasm with certain name types occurring more than once 


Cross-names (CRSNM) and Line-names (LNAME) cannot occur together

DataError-0028: Germplasm with both CRSNM and LNAME name types 


Release names (RELNM) and Collector's Numbers (COLNO) cannot occur together

DataError-0029: Germplasm with both RELNM and COLNO name types



Name type RELNM (Release Name)

CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki

A Release Name (RELNM) cannot occur more then once for a GID-GLOCN-COUNTRY combination. Two GIDs can share the same RELNM only if their GLOCNs are in different countries

DataError-0030: Germplasm sharing a RELNM (release name) with another germplasm in the SAME country


Miscellaneous Features

  • Option to output query results to MS Excel files. Usually one file for each error code.


  • Removed "Local Database Queries" tabsheet


  • Removed "IRRI-GRC Queries" tabsheet


  • "Central Database" tabsheets (from Version 5.3) renamed to "ICIS - GMS" (checks are applicable to all ICIS implementations).


  • "About" form containing more information, plus the GNU General Public License. CRIL and IRRI logos also included.

Image:AboutIcisValidate.JPG


  • Application icon

Image:Icon_validate5.4.JPG

What's New in Version 5.4.2

  • Option to specify/change INI file to use
  • Changed error code messaging from "Error-xxxx" to "DataError-xxxx" (prefixed with "Data" to distinguish from error codes of Installation Diagnostic Tool)
  • Neater printing of messages in memobox
  • In flagging of internal errors (bugs), the dataerror code is displayed instead of the SQL string
  • Modified SQL for check DataError-0007 (deleted parent references): exclude replacements; strictly for deleted parents only.
Personal tools