Data Validation Tool 5.4.2
From ICISWiki
Application Programs 5.4 > Data Validation Tool MAIN > Data Validation Tool 5.4.2
Contents |
Introduction
The ICIS Data Validation Tool is an application that searches ICIS for data errors that might render it meaningless. This is useful in making sure that published data are always of excellent quality.
It is so simple to use! Just choose the tests you want to execute and click on the "Run" button.
ICIS - GMS (i) Queries
Checks the Genealogy Management System (GMS) central database (applicable to all ICIS implementations).
Invalid parent references [2 checks]
Error messages:
DataError-0001: Unknown group source
DataError-0002: Germplasm with non-generative group source
Circular references [3 checks]
If germplasm A has germplasm B as one of its parent and if germplasm B has germplasm A as one of its parents, then we have a circular reference situation. This option also checks for two and three-level circularity.
Error messages:
A references B and B references A:
DataError-0003: 1st level circular reference
A references B and B references C and C references A:
DataError-0004: 2nd level circular reference
A references B and B references C and C references D and D references A:
DataError-0005: 3rd level circular reference
Invalid Method
Error message:
DataError-0006: Invalid germplasm methods
Deleted parent references
Let us say germplasm A is replaced with germplasm B as shown in the GERMPLSM.GRPLCE database column. All germplasm that references germplasm A should be corrected to germplasm B.
Error message:
DataError-0007: Germplasm with deleted parent references
Foreign Key References [4 checks]
The ICIS database has no foreign key constraints enabled so we need to check manually if the foreign key definition is violated.
Error messages:
DataError-0008: Invalid LOCATION.LOCID references
DataError-0009: Invalid GERMPLASM.GID references
DataError-0010: Invalid METHODS.MID references
DataError-0011: Invalid BIBREFS.REFID references
ICIS - GMS (ii) Queries
Checks the Genealogy Management System (GMS) central database (applicable to all ICIS implementations).
Progenitor germplasm dates [3 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
The GDate of a GID must not predate GDate of any of its progenitors. (Beware of missing links in the chain of dates! If the test checks only the GDATE of the progenitors GPID1, GPID2 and MGID, then it will not detect the following error: non-zero GPID1, GPID2 or MGID has GDATE=0 but their GPID1, GPID2 or MGID are younger than the target GID. Therefore, if a GPID1, GPID2 or MGID has GDate=0, iterate to check their GPID1, GPID2 and MGID)
Error messages:
DataError-0012: Germplasm with germplasm date (GDATE) earlier than GDATE of GPID1
DataError-0013: Germplasm with germplasm date (GDATE) earlier than GDATE of GPID2
DataError-0014: Germplasm with germplasm date (GDATE) earlier than GDATE of MGID
Progenitor ID1 unknown, Progenitor ID2 known
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
A GID can’t have GPID2>0 and GPID1=0
Error message:
DataError-0015: Germplasm with Progenitor ID1 unknown ( GPID=0 ), Progenitor ID2 known ( GPID2 > 0 )
Name inheritance from GPID2: check NDATE and NLOCN [2 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
If a GID inherits a name from its source (GPID2) then that name record must also inherit the name date (NDATE) and name location (NLOCN)
Error messages:
DataError-0016: Germplasm with inherited name from GPID2 but NDATE not inherited
DataError-0017: Germplasm with inherited name from GPID2 but NLOCN not inherited
IRIS - GMS Queries
Checks the Genealogy Management System (GMS) central database (applicable to International Rice Information System (IRIS)).
Preferred Names [2 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
No more than one name of a GID must have NSTAT=1 (preferred English name)
DataError-0018: Germplasm with more than one preferred English name (NSTAT=1)
Names eligible to be the preferred name are: CRSNM, RELNM, DRVNM, CVNAM, ELITE
DataError-0019: Germplasm with invalid name type as preferred name
Preferred IDs [3 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
No more than one name of a GID can have a "preferred ID" status (NSTAT =8)
DataError-0020: Germplasm with more than one preferred ID (NSTAT=8)
The following names types are eligible to be preferred ID: DRVNM, COLNO, ACCNO (if present), GACC (if present), ITEST (if present), CIATGB (if present)
DataError-0021: Germplasm with invalid name type as preferred ID
The preferred ID must be unique for the given name type – that is, two accessions must not share the same name of the same type if one or both is a preferred ID.
DataError-0022: Germplasm with preferred ID not unique for the given name type
Location (GLOCN) of germplasm
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
For all germplasm with germplasm creation method not Method ID 62 (Import), germplasm location (GLOCN) of a GID should be the same as the GLOCN of its GPID2
DataError-0023: Germplasm with GLOCN different from GLOCN of GPID2
Method - name type combinations [3 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
Only certain combinations of name type and germplasm creation method are acceptable. Namely:
Method type GEN: valid name types are: CRSNM, UNCRS, UNRES
DataError-0024: Invalid method-name type combination for method GEN
Method type DER: valid name types are: RELNM, DRVNM, CVNAM, CVABR, NTEST, LNAME, ADVNM, ACVNM, AABBR, OLDMUT1, OLDMUT2, ELITE, UNRES
DataError-0025: Invalid method-name type combination for method DER
Method type MAN: valid name types are: ACCNO, RELNM, CVNAM, CVABR, COLNO, FACCN, ITEST, NTEST, LNAME, TACC, ADVNM, ACVNM, ELITE, GACC, DACCN, LCNAM, CIATGB
DataError-0026: Invalid method-name type combination for method MAN
Name type occurrence [3 checks]
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
Some name types must not occur more than once for a single GID. These are ACCNO, CRSNM, UNCRS, COLNO, ITEST, GACC, CIATGB, RELNM, DRVNM, CVNAM
DataError-0027: Germplasm with certain name types occurring more than once
Cross-names (CRSNM) and Line-names (LNAME) cannot occur together
DataError-0028: Germplasm with both CRSNM and LNAME name types
Release names (RELNM) and Collector's Numbers (COLNO) cannot occur together
DataError-0029: Germplasm with both RELNM and COLNO name types
Name type RELNM (Release Name)
CropForge Feature Request # 160 and/or Version 5.3 Discussion Article on ICISWiki
A Release Name (RELNM) cannot occur more then once for a GID-GLOCN-COUNTRY combination. Two GIDs can share the same RELNM only if their GLOCNs are in different countries
DataError-0030: Germplasm sharing a RELNM (release name) with another germplasm in the SAME country
Miscellaneous Features
- Option to output query results to MS Excel files. Usually one file for each error code.
- Removed "Local Database Queries" tabsheet
- Removed "IRRI-GRC Queries" tabsheet
- "Central Database" tabsheets (from Version 5.3) renamed to "ICIS - GMS" (checks are applicable to all ICIS implementations).
- "About" form containing more information, plus the GNU General Public License. CRIL and IRRI logos also included.
- Application icon
What's New in Version 5.4.2
- Option to specify/change INI file to use
- Changed error code messaging from "Error-xxxx" to "DataError-xxxx" (prefixed with "Data" to distinguish from error codes of Installation Diagnostic Tool)
- Neater printing of messages in memobox
- In flagging of internal errors (bugs), the dataerror code is displayed instead of the SQL string
- Modified SQL for check DataError-0007 (deleted parent references): exclude replacements; strictly for deleted parents only.