ACQ131

From ICISWiki

Jump to: navigation, search

GRIMS main > GRIMS functionality > Seed Acquisition
Previous Next


Contents

Checking Duplicates - Soundex algorithm


Overview

Soundex is a phonetic algorithm. It is an algorithm for indexing names by their sound, when pronounced in English language. The basic aim is for names with the same pronunciation to be encoded to the same string. [1]

Soundex algorithm is used to check the possible duplication of the incoming sample in the collection. It is used to aide the Genebank curators to tell whether a newly received sample already exists in the Genebank. If so, then further seed processing and initial seed increase shall not be performed anymore for the said sample/s. Additionally, this UI processes the data by set of Incoming Batch/es. Hence, the user must supply necessary information before executing the process button. It checks duplicates within a the batch/es supplied and against registered accession.


User interface form

Image:GRIMS frmACQ131.PNG



User input fields

label description
Current Duplicate(s) Number of samples that has the same Soundex name in the GRC collection
Batches The range of batch that is included in the checking process
Check Duplicates Criteria that filters out the name types that need to be selected
Update Soudex Codes Updates the soundex codes of the name type/s supplied by the user within the specified batch
Check Possible Duplicates Comapare the soundex names of every germplasm included in the selected batch/es against the entire collection and within its/their own batch/es


Use Case Definition

Use Case Name 1.3.1 Check possible duplication through Soundex Algorithm
Use Case Definition Checks whether a newly acquired germplasms already exists in IRGC collection through the use of modified soundex algorithm
User Contacts
Actors Genebank Manager(GM)/ Genebank technician (GBT)
Location Genebank
Priority 1
Typical Course of Events
Actor Action System Response
Step 1: User supplies the batch ID of the seed(s) &/or name types that needs to be checked Step 2: System performs necessary comparison
Step 3: GBT requests for a proof list Step 4: System outputs the report
Assumption/s Germplasm list already exists. Temporary ID has been assigned.
Pre-condition/s The batch's seed information has already been created.
Post-condition/s List of possible duplicates
Primary Pathway/s
Alternative Pathway/s Check possible duplicates through soundex algorithm

Check possible duplicates through edit-distance algorithm

Exception Pathway/s

Process/ Data Flow

This user interface references the table TBL_SEED_INFO and NAMES.


TBL_SEED_INFO. TBL_SEED_INFO is like an expanded LISTDATA of ICIS. It contains the GID, LISTID, and ENTRYID. Additionally, it stores the TEMP_ID and the ACCNO. This table should be complete and all the records should be there regardless of the seed registration status: INCOMING/ REGISTERED IRGC ACCESSION. A field ISREGISTEREDACCNO is used to flag whether a particular accession has an assigned accession already. It stores the IRGC ACCESSION NO is ISREGISTEREDACCNO.


LOCAL_NAMES NAMES table stores all the names given to the germplasm with a specific GID. Although a new GID is given to an newly assigned accession number, no additional record is added in the NAMES table.


Figure2. FMRACQ131 Data Flow

Image:ACQ131.png

Percentage of work completed

60 % Interface design and initial testing was conducted.

40 % Othe name types such as Cross, Mutant, etc. should be handled by this UI. Additionally, the proper data entry code needs to be established since the comparison of string is by Name Type.

Personal tools