IWIS2 to IWIS3

From ICISWiki

Introduction

This document is concerning the change from IWIS2 to IWIS3 (e.g. ICIS).

In the Breeders’ course held in CIMMYT January 7-11 it became clear that Hans Braun wants to give 100% priority to the deployment of ICIS (IWIS3) for the wheat program, so that workbooks can be created in ICIS instead of fieldbooks in IWIS2. But it also became clear in the course that important functionality of IWIS2 were not easy to obtain in ICIS. For instance IWIS2 is automatically creating the BCID for each cross that is made with the fieldbooks program, while that is not so in ICIS.

A task force has been assembled with the purpose of overseeing the IWIS2 to IWIS3 deployment. Its first task will be to make a parallel run between IWIS2 and IWIS3 to compare the results. It should determine if the generation of data can be handled in IWIS3 with the same or comparable ease as it is in IWIS2. At present the taskforce consists of María Luisa Gómez, Sergio González, Martín Rodríguez Velázquez, Jesper Norgaard, Juan Carlos Alarcón, Vicente Morales, Francisco López, and Leopoldo Arteaga.

This document will not analyze each functionality of IWIS2 and determine if we have the same functionality as in IWIS3. That might be the purpose of a later analysis. Instead this document only concerns certain functionality in IWIS2 that is used in the pedigree management, fieldbook production and data capture for the Wheat program, that is not handled the same way or is not handled in IWIS3, which means that the parallel testers would need to make extra efforts to obtain the same results as with IWIS2.

It needs to be determined for each of these functionalities what to recommend. The first (and simple) option is that we discontinue that functionality. However, if we have to discontinue a lot of functionalities of IWIS2, there might be discontent with working with the new system. The second more viable option would be to introduce some reprogramming in ICIS that replicates this functionality. However, that will cost programming time and will add to the general size of the ICIS package for all users (not just for CIMMYT Wheat). This document will give some alternative suggestions for each item, but this is not necessarily the final answer. These decisions need to be taken at a later stage.

The standard BCID

When crosses are made with the IWIS2 fieldbook system, BCIDs are created automatically as the following examples: GRSS98SH00550S or CMBW89Y00020S where the full string is created at the moment of creating the fieldbook with the crosses. The string corresponds to individual fields (that are stored in IWIS2, not in IWIS3) Program, Year, Free, Organization and Location. For instance, if you put "CM" as organization, "SW" as Free, "2007" as cross year, "Y" as location, and the cross which is generated is a female retro cross, the BCID created will then be "CMSW07Y000234F" after the insert. The number 234 is consecutive within the prefix of CMSW07Y

The field Cross Type will be generated automatically, both at Insert and at Update of the Cross. If it is a female retro cross it will be "F", if it is a male retro cross it will be "M", and otherwise it will be "S" for simple cross. See Cross Type for coding. Also the types "D" for Double Cross or "T" for Top Cross are used.

In the IWIS2 to IWIS3 conversion, this has been quite simple: The BCID has been stored as a name in the names table with type "CIMMYT BCID" which is one of the new name types defined. This means that we have converted all existing IWIS2 BCIDs to IWIS3.

But to generate new BCIDs for new germplasm, new functionality is needed. The alternative that the Data Capture will key in these BCIDs manually for thousands of germplasm records, I don’t think is realistic. If we choose to discontinue the CIMMYT BCID generation, users will feel we are taking steps backwards.

Selections made with fieldbooks

Selections of germplasm can be made in IWIS2 using existing selections. The same is the case for IWIS3. However, the way to make them is different. To make new selections in IWIS3 you need to have the original selections collected in a single list, and then activate an option to add new selections to a new list, where you provide a prefix, a body and a postfix (all three are optional). If you have the selection CM33027-1M and want to create 7 more selections CM33027-1M-3Y to CM33027-1M-9Y, you need to select the single selection CM33027-1M from the original list, and then activate the function with prefix 3, body or postfix like the location Y, and ask to add+1 for each, performed 7 times. But then to produce the next 11 selections for Toluca, you need to manually select another selection from the original list, perform the function manually again etc.

In IWIS2 this is different, and (unless I have misunderstood something about IWIS3) considerably easier. Here you need to define an Excel file with all the selections you want which specifies from which trial(s) where the original selections came from. For each record in Excel, it will generate the 7 selections we saw above. You need to specify in each record the Source Trial, the Source Occurrence and the Source Entry. Based on these three the system will automatically find CM33027-1M. In the same record you define how many selections you want, and the location code Y we saw above is taken from the Occurrence. When the file is completed, with possibly thousands of selections defined, you just read it into IWIS2 and the selections will be created according to the specification. Compared to IWIS3 this saves you making manual steps essentially for each Excel record as you need to in IWIS3.

In our example note that we created CM33027-1M-3Y as the first selection. This was based on the knowledge that IWIS2 has and is using, and which IWIS3 doesn’t have, that there were already created two selections CM33027-1M-1Y and CM33027-1M-2Y before these new selections, thus continuing an ordered sequence and making sure that no duplicate selections are created. First of all IWIS3 is not assuring this non-duplication of selection strings. Second, if we were to implement this functionality, we would still risk the same thing when using IWIS3 in a non-central way, e.g. if a CIMMYT breeder takes the laptop to the field and create the selections there, duplication might happen. When importing a local database to central, these things can be solved, but still only by the Data Base Administrator in a manual way, not by any ICIS functionality.

Top and Double crosses

In IWIS2, if the Female Source Trial and the Male Source Trial are both of type F1 (at present identified by that the trial name starts with ’F1’) the crosses created will be Double crosses, identified in the CIMMYT standard BCID with a D in the end. If either Female Source Trial or Male Source Trial are of type F1, but not both, the cross will be a Top cross, identified with a T in the CIMMYT standard BCID. This functionality does not exist in IWIS3.

If the cross is a Female retro cross the letter ’F’ is put in the end of the CIMMYT standard BCID. Likewise ’M’ for a male retrocross. If none of these are fulfilled, the cross will be a simple cross, putting an ’S’ in the CIMMYT standard BCID.

If the selections are made from a Source Trial that is ’F1TOP’ then the selection created will be of the form <BCID>-A (for the first one), <BCID>-B for the second one, … <BCID>-AA for the 64st selection etc.

These functionalities are also not in IWIS3.

Backcrosses

Handling backcrosses or retrocrosses as they are sometimes called, is an essential functionality both in IWIS2 and IWIS3. It seems that 90% of functionality in IWIS2 of these works well in IWIS3. There is just the small consideration that in IWIS2 this is managed completely automatic without user intervention or possibility of user manipulation, while in IWIS3 it is possible to trick the system by both specifying for a germplasm that it is a backcross when it is not, and that it is not a backcross even though it is. At the moment point 4 does not seem to merit big concern.

Pedigree representation

In some cases the pedigrees shown in ICIS and the ones in IWIS2 are not equal. We still have to make a formal test of this once the first corrections of different ICIS modules gets through. At the moment using GID 4919801 of IWIS3, these are the results:

Using fill-column function in Setgen (Fill with Cross Expansion at the specified level)

CD98578-F-1Y-040M-040YRC-8M-0Y/SOMBRA 20///STOT//ALTAR 84/ALD

Using Edit Germplasm in Setgen (the correct pedigree)

CHEN/ALTAR 84/3/HUI/POC//BUB/RUFO/4/FNFOOT/5/SOMBRA 20/6/STOT//ALTAR 84/ALD

Using Browse

CD26406-?/ALTAR 84/3/HUI/POC//BUB/RUFO/4/FNFOOT/5/SOMBRA 20/6/STOT//ALTAR 84/ALD

Using GMS Search

CDSS96Y00129S-2M-0Y-0M-0Y-0B-3Y-0B/CD91Y636-1Y-040M-030Y-1M-0Y-0B-1Y-0B

So the conclusion is that we need some more testing in this area after program changes (possibly version 5.5 of ICIS), to see if there are still discrepancies between the IWIS2 and IWIS3 pedigrees. At the moment there are discrepancies between the ICIS programs themselves.

Fieldbook printing

According to Graham this can be done printing either directly from Excel (workbook), or directly from Access (raw data). However, this is inadequate for a parallel test where you would want to compare output results. Perhaps it would be possible to try and implement a prototype of printing via Crystal Report taking the data from the workbook. However, it should be programmed and tested. I have heard some arguments that fieldbook *printing* is not really a necessary functionality, while fieldbook generation of course is essential (the ICIS workbook). These arguments might be valid, but then we should say so and confirm that every operation that is currently taking place with printed fieldbooks in CIMMYT Wheat, can in fact be done using saved Excel sheets, handheld computer use etc.

WINS link via Trial ID and Occurrence

There is no equivalent to WINS in ICIS. WINS currently works as an add-on to IWIS2 where the fundamental elements are the cooperator, the trials and the occurrences that this cooperator is assigned. If we want to detach WINS from IWIS2 and link it to IWIS3 it can be done via the study ID. Given the heavy use of stored procedures (that are not available in Access as a technology), big parts of the program would have to be rewritten to be able to be converted to e.g. Delphi or other programming language which could work on the IWIS3 DMS central database directly. In essence the IWIS3 part that is used in such a possible converted WINS program is not very big, however, converting the functionalities themselves is likely to be time-consuming. An alternative to a full rewrite of the program might be to extract periodically the needed information from IWIS3 (Access) and load it into SQL server, to make WINS run directly on this conglomerate, but this would first of all make necessary to continue SQL server, licenses etc. and second of all make necessary continuous transferal of data from one database platform to the other.

Names linked with abbreviation

In IWIS2 the name and the abbreviation of the name for a given germplasm, are included in the same record. This means that the link is there automatically, so to say by design. In IWIS3 we have converted all names to name type 2 and all abbreviations to name type 7. This works almost to give the same effect where there is one single name and one abbreviation for a germplasm. But if there are several names for one single germplasm (which there are around 7000 cases of) then it becomes confusing to manage the list of names and abbreviations, because there is no definition of which name corresponds to which abbreviations. Perhaps it would be possible

Conversion of national trials

The international trials have been converted and are included in IWIS3. But the national trials have not yet been converted. An extra effort has to be done to convert these, by Juan Carlos Alarcón. It hasn’t been estimated how much time this will take. At the moment therefore it is not so easy to obtain lists of germplasm in IWIS3 of trials that already exist in IWIS2. It is possible to get the same lists using the Cross Info program in IWIS2, export to Excel, and then import to Setgen to create a new list of germplasm. But to do this for each list of interest manually is something I believe we should not ask the task force to do, it should be in place before the parallel testing is started.

0 Handling of local database vs. central

In IWIS3 all adding and updating of data takes place primarily in the local database, and not in the central. In IWIS2 there is one single database that everybody updates to. Essentially there is no difference since in IWIS3 it is assumed that using more or less frequent uploads from the local to the central database it will be possible to nourish one single database the same way. We need to analyze what extra efforts are needed to keep this vision alive. One option would be instead of having one single local database for each user, there could be a local database common to all of the users inside El Batan to not have to upload so many local databases during time. At the moment there is no single program that uploads a local database to a central one, but there probably will be soon because there is work in progress for this task (an ICIS Administrators’ tool).

Parsers

When receiving material from foreign stations and/or breeders, often there is a task of parsing the pedigrees/cross names or selection histories to obtain the desired material. In IWIS2 it is possible with the program Cross Info to parse a pedigree string, work out the possible name of each element in the pedigree string, and return the best possible guess. If there is a single typo or spelling mistake, the full pedigree will normally be retrieved. However, often some or all of the material is still not in IWIS2 when parsing, in these cases the guesses are less valid. It is possible to parse material while loading an Excel file to a list in Setgen, but the parsing seems to be far slower than the IWIS2 equivalent. More testing is needed to determine if this had to do more with the input data quality or it is to be blamed on SETGEN or other parts of ICIS.

System-wide change of designation (selection history)

In IWIS2 the selections are coded in a hierarchical manner with each part of the selection history coded with the necessary information taken from the cross record or the subsequent selection records. Perhaps this is not a very clear explanation so let us take 2 examples.

Example 1:

Suppose we have CMH82A.1062-1B-3Y as a specific selection history for a given selection. This means that the originating cross has BCID "CMH82A.1062", then that was selected in El Batán (B), and then that one selected in Yaqui (Y). In IWIS3 the same information is stored for three GIDs, one with a name for name type BCID with value "CMH82A.1062", one with a designation value (equivalent to selection history) as "CMH82A.1062-1B" and the last with a designation value as "CMH82A.1062-1B-3Y". Now suppose you figure out that the BCID should be changed to "CMH82B.1062". In this case in IWIS2 you would just change the BCID of the cross to that value, and then all selection histories would now automatically reflect the change, for instance the third selection history mentioned would be "CMH82B.1062-1B-3Y" without user intervention. Instead in IWIS3 this has to be done differently, either by changing each name individually in this example "CMH82A.1062" to "CMH82B.1062" and "CMH82A.1062-1B" to "CMH82B.1062-1B" and "CMH82A.1062-1B-3Y" to ""CMH82B.1062-1B-3Y". The other way to do it in this case in IWIS3 would be to replace all strings (possibly only for name type BCID and for name type designation) with "%CMH82A.1062%" to "%CMH82B.1062%". Of course there is a slight risk when doing this kind of operation because a BCID could be part of another name since we are just processing unrelated name types. In IWIS2 you don’t have to worry about these things as mentioned.

Example 2:

Suppose we have "-0AUS-1M-2Y" as a specific selection history for a given germplasm. In this case there is no BCID for the originating piece of germplasm. If we find out that there was an intermediate step with selection in El Batán, so the selection history should really be "-0AUS-1M-1B-2Y" then we would need to do these changes (to possibly hundreds of selection designations) manually. If we can do it in IWIS3 semi-automatic, I would like to know how, but for the moment it seems that the changes would have to be done manually and individually. In IWIS2 currently the change consists of inserting a new SID for the selection "-0AUS-1M-1B" and then correct the previous SID (PSID) of the selection ""-0AUS-1M-2Y" so that it points to this new germplasm. Then all relevant selection histories are automatically corrected.

In these examples I have not mentioned the data integrity check of selection history that is an integral part of IWIS2 selection handling. There is no way in IWIS2 to put a selection history "-0WAMG" if the location WAMG has not been created, and thus a consistent location list is being maintained, and it is made sure that locations exist. In IWIS3 you can make any designation string as you want, which introduces a number of possibilities to make errors.

Handling of randomizations

For the randomizations in CIMMYT, one single big randomization file is created to represent all the different occurrences within the specific trial, possibly containing thousands of definitions, which can then be loaded to create all the necessary randomization records (which are separate from the entry records by design). In IWIS3 there is no separate handling of the design of the randomization with the actual resulting fieldbook entries, e.g. the design is retrieved once to create entries (from which it is not automatic as I understand it to specify which of the lines are checks). Later in time it will be difficult to determine which randomization design was used etc. Also you can only apply one single randomization to one single study, not to a range of studies (again, this could be based on inadequate knowledge about ICIS).

Database platform

We have not tried to make a test in CIMMYT with Access using all users of IWIS2 in a performance test. As a rule of thumb with Access running on a network shared directory, it works best with 10 concurrent users or less. It is not known whether we will get performance problems with all possible 30+ users but it is a threat that we should try to assess before performing the actual IWIS2 to IWIS3 deployment. Also the backup and recovery philosophy should be checked and verified.

Concerning space requirements Access has a serious limitation in space that you can only maintain databases of size 2 Gb. Since the central GMS database of IWIS3 is already 1.6 Gb this is a serious consideration. In fact, some germplasm information like Program (BW, DW) and Free field (SS,SW) that is held in IWIS2, has not been converted to IWIS3 because the resulting database would exceed the limit of 2 Gb.

As I understand it there are successful implementations of ICIS using Postgresql (although not all functionality has been verified), Oracle, MySQL etc. but these options have not been confirmed in a production environment (perhaps with the exception of Oracle, which Nunhems Int. has verified). All of these systems (as well as SQL Server which is the DB platform for IWIS2) has no upper limit in space, and are only limited in the actual available disk space.

Fast retrieval of pedigrees

One of the performance bottlenecks in FIBOS, IWIS in VAX (two previous systems to the actual Windows NT implementation IWIS2) was the continuous recreation of pedigrees and selection histories, based on the hierarchical data behind. Since the selection history in IWIS3 is a simple string, there is no longer a retrieval problem because the string is not calculated. This does give other maintenance problems, see chapter 12. However for pedigrees the concern is there, every time a pedigree is needed it needs to be calculated based on the parents, the grandparents, great-grandparents etc. and their names. If a big list is produced with a lot of pedigrees, IWIS3 will essentially have to recalculate. The solution in IWIS2 is instead to generate and store both pedigree and selection history, based on the hierarchical data behind, when the germplasm is born, and then maintain this when the hierarchical data behind changes. It sounds perhaps complicated, but the benefit is that the retrieval of data is usually as fast as the fixed strings can be retrieved from the tables. This means that printing and handling big fieldbooks becomes a breeze in work areas that would previously be bottled up.

Better and faster computers means that the above is less and less of a problem, but we need to confirm that, especially in areas we have notoriously seen IWIS3 perform badly. Can we handle well fieldbooks with 5000, 10000, 32000 lines as we have done without major problems in IWIS2?

Naming conventions in IWIS3

Graham has asked for a number of naming conventions for transferring names from IWIS2 to IWIS3. These are the conventions imposed:

1. "<any>_<any>" replaced with "<any> <any>" . Example "POHO_1" replaced with "POHO 1"

2. "<digit>_<digit>" replaced with "<digit>-<digit>". Example "8112_2" replaced with "8112-2"

3. "," replaced with ":" Example "FANE,AUS" replaced with "FANE:AUS"

4. ",(" replaced with " (" Example "HOPE,(1D)" replaced with "HOPE (1D)"

This might not be a problem except for the short term, where some generated Excel files of IWIS2 (past or present) will not completely match Cross names in IWIS3. Should we just accept this as is?

Outside updates to IWIS3

IWIS3 GMS database has been modified in a couple of ways that IWIS2 hasn’t. New germplasm that is not currently in IWIS3 has been introduced. The count of germplasm that has no CID and SID (meaning it did not come from IWIS2) is 910,695 at this moment, out of a total of 5,836,870 or more or less 16%. I think current IWIS2 users will pretty quickly come into situations where they don’t know if they should use this material or introduce new with the same name etc. All names introduced in IWIS2 has been through a process of "general consensus" with CIMMYT, while these materials might not. I am not insisting there is a problem here. I just want to mention this as one area where we have to be conscious what we instruct users to do.

There are also a number of changes to IWIS2 germplasm where one parent directly from IWIS2 has been replaced with a GID which is part of the 16% GIDs with no CID. These changes have not been put to the same scrutiny that was applied to IWIS2 names/pedigrees, and not by the same people. Again I am not sure if we have a problem, but I want to mention this issue because we have to take care not to lose control of the updates of DMS and GMS.

DMS data validation

Concerning DMS I am not sure we have included all traits to IWIS3 that are present in IWIS2, and confirmed that all scales and possible data values are equivalent. The data validation of DMS data in IWIS2 is a bit more batch oriented than the IWIS3 version which is based on the point of upload by the breeder to the local database. I don’t know if there is a problem in this area, but I recognize that there are some fundamental differences in the way data is handled. Perhaps the IWIS3 way is quite adequate, but we should determine if that is the case or not.

Reporting and querying

From the taskforce came the recommendation that we should take each program or input screen from IWIS2 and ask ourselves: what is the equivalent in ICIS? If we can answer each one satisfactorily, we will be closer to the deployment of IWIS3. So in contrast to points 1 to 14, point 15 does not have to be addressed *before* the deployment and parallel test, but *during*. This task will possibly lead to another set of functionalities apart from the above points, that is not handled currently by IWIS3, and which needs addressing.