ACQ131
From ICISWiki
GRIMS main >
GRIMS functionality >
Seed Acquisition
Previous Next
Contents |
Checking Duplicates - Soundex algorithm
Overview
Soundex is a phonetic algorithm. It is an algorithm for indexing names by their sound, when pronounced in English language. The basic aim is for names with the same pronunciation to be encoded to the same string. [1]
Soundex algorithm is used to check the possible duplication of the incoming sample in the collection. It is used to aide the Genebank curators to tell whether a newly received sample already exists in the Genebank. If so, then further seed processing and initial seed increase shall not be performed anymore for the said sample/s. Additionally, this UI processes the data by set of Incoming Batch/es. Hence, the user must supply necessary information before executing the process button. It checks duplicates within a the batch/es supplied and against registered accession.
User interface form
User input fields
label | description |
---|---|
Current Duplicate(s) | Number of samples that has the same Soundex name in the GRC collection |
Batches | The range of batch that is included in the checking process |
Check Duplicates | Criteria that filters out the name types that need to be selected |
Update Soudex Codes | Updates the soundex codes of the name type/s supplied by the user within the specified batch |
Check Possible Duplicates | Comapare the soundex names of every germplasm included in the selected batch/es against the entire collection and within its/their own batch/es |
Use Case Definition
Use Case Name | 1.3.1 Check possible duplication through Soundex Algorithm | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use Case Definition | Checks whether a newly acquired germplasms already exists in IRGC collection through the use of modified soundex algorithm | ||||||||||||
User Contacts | |||||||||||||
Actors | Genebank Manager(GM)/ Genebank technician (GBT) | ||||||||||||
Location | Genebank | ||||||||||||
Priority | 1 | ||||||||||||
Typical Course of Events |
| ||||||||||||
Assumption/s | Germplasm list already exists. Temporary ID has been assigned. | ||||||||||||
Pre-condition/s | The batch's seed information has already been created. | ||||||||||||
Post-condition/s | List of possible duplicates | ||||||||||||
Primary Pathway/s | |||||||||||||
Alternative Pathway/s | Check possible duplicates through soundex algorithm
Check possible duplicates through edit-distance algorithm | ||||||||||||
Exception Pathway/s |
Process/ Data Flow
This user interface references the table TBL_SEED_INFO and NAMES.
TBL_SEED_INFO. TBL_SEED_INFO is like an expanded LISTDATA of ICIS. It contains the GID, LISTID, and ENTRYID. Additionally, it stores the TEMP_ID and the ACCNO. This table should be complete and all the records should be there regardless of the seed registration status: INCOMING/ REGISTERED IRGC ACCESSION. A field ISREGISTEREDACCNO is used to flag whether a particular accession has an assigned accession already. It stores the IRGC ACCESSION NO is ISREGISTEREDACCNO.
LOCAL_NAMES NAMES table stores all the names given to the germplasm with a specific GID. Although a new GID is given to an newly assigned accession number, no additional record is added in the NAMES table.
Figure2. FMRACQ131 Data Flow
Percentage of work completed
60 % Interface design and initial testing was conducted.
40 % Othe name types such as Cross, Mutant, etc. should be handled by this UI. Additionally, the proper data entry code needs to be established since the comparison of string is by Name Type.