ICIS Workshop 2007

Clicking on the names below will open the original presentation in PDF format. Also an Excel Template is available with a (draft) comparison over the 3 presentations.

Nunhems uses various tools to gather the data (for details, see presentation Paul), they are storing marker information in their NunGEMS system which is based on a LIMS (Nautilus) system in Oracle. Nunhems is storing their marker data (results) in the ICIS DMS database in Oracle. NunGEMS and ICIS are linked via the MarkerName and UniqueID, in this way queries can be written over both systems. The concept is proved, but more flexible query tools are needed on end-user level. Projects can be up to 1500 markers, normal projects are 200-300 markers and over 1000 individuals to test

IRRI has developed a GEMS database that is accessible via the ICISWorkBook and a new interface in MS-Access. For GEMS new functions have been developed in the ICISWorkBook. Some characteristics:

Markers and Alleles: Per ID multiple names are allowed New Markers/Alleles can be added via the ICISWorkBook and maintained by GEMS
Currently per marker one protocol can be stored GEMS, in future: storage of multiple protocols per markers is desired
Desktop tool: Currently is in MS Access
Web application: to develop with AJAX

SPARC has not yet a LIMS system in place. SPARC will have to store in near future large amounts of marker data, decided is to store these data in DMS. Data are mainly retrieved with DataComp tool, this tool will need to be expanded, main goals are expanding number of lines that can be compared and compare from fenotypic and genotypic data.

SPARC prefers to develop GEMS system within ICIS community.

Some discussion:

Ron thinks we need to be careful because the Bp is only an estimation and needs to be shown with the trait to be useful.
Richard said that within a dataset it should be clear what is going on, but across datasets is a problem.
Ron thinks we should make sure it is clear what protocol the Bp was read on.
Where does GEMS end and LIMS start

5. Developments in ICIS for the management of the genetic marker and genotyping data 10:20 - 2:00 (Chair: Sandra, Reporter: Arllet)

- Nunhems (Casper)

Summary :

Nunhems is managing multiple crops and each having multiple markers. They will be managed together in NunGems but the application to retrieve them are specific to each crop. An ICIS View Crop Schema will integrate the databases (ICIS central, icis local databases and the NunGems schema). Two data retrievers will be built (one for breeding and the other one is for Research). Part of the Breeding Retriever are the label and barcoding tools.

What is needed is a flexible tool that allows end-users to create their own queries. That tool should also export the data in the file formats that can be used in visualization tools or statistical tools. The Retriever is done in MS Access but is there another tools like Business Objects or Rubik?

Requirements:

Input

- Germplasm entries and tree information (including neigbourhoods)

- Lists or studies

- Projects

- Observation results which can be Marker scores and Other properties

- Marker information which can be Description, Position , Marker Type

Thomas: You mentioned about a flexible tool. My work in IRRI is developing standard management of data. But for the retrieval of data, I am using a cookbook approach where list of recipes or examples are developed. They are not textbook. I started developing recipes for management of research data like finding duplicate record. We already taught some people about it and there was good response from them. If you have a good recipe, then that can be useful.

Casper: Which type of user? Is it breeding or the people from the lab? The Retriever has tables (Oracle Views) which users can use to create queries. But based on our experience, some users find it too complicated

Thomas: Field people & technical staff, those who collect data, need immediate feedback based on their data.

Casper: We were able to make people move from storing data from Excel to database.

Thomas: The least that the database can do is minimizing the paste and copy methods

Shawn: Does Microsoft still support the development of the MS Access?

Thomas: But whatever principle that people will learn from MS Access can be applied to another database that Microsoft will be supporting later on.

- IRRI GEMS (Thomas)

Summary

The presentation is to initiate discussion on the future development of GEMS
Genotyping data can be stored as studies in DMS which can be loaded using an (extended) Workbook or through script for different database platform. We have several experiences in SSR … Genotyping data can be considered as primary data/outputs
Marker Data are stored in ICIS GEMS which can be loaded through Workbook as batch. There is also a prototype standalone interface for marker data (SSR). But we don’t have a complete sets of marker data are available in GEMS. This might be due to the way marker data are collected. They are more often inputs/secondary data in genotyping data.
Maybe that is the reason that people don’t spend considerable time to put complete information about it. Who cares about these data enough that the person will complete the information needed for the marker data?
GEMS – we have schema, data loading tools for genotyping and marker data and data curation tool for marker data.
GEMS Schema
forward and backward primer that defines the marker detector
protocol information which is a kind of experimental set-up where the experiments is being done. We want a system where a protocol is used for a part of time and then refine and use in another experiment for the same marker. But we don’t want to change the marker. So what we did is a combination of marker detector and the protocol which we call the polymorphic detector
Molecular variant which is basically the alleles
Name table which looks like the GMS Names table but more of generic table where there is an object which can me marker detector and allele etc
Components that make up entire condition under which a particular marker detector is done. We break up the conditions into different components which have properties.

Shawn: How about references?

Thomas: We have it in Gems_names and gems_mv.

Graham requested for a sample database with the GEMS schema implemented and Weng showed the database and the tables of the GEMS

- The component table is linked with protocol which can have many components.

- The components have properties stored in gems_properties.

- The properties can have scale and method. Those are stored in the gems_scale and gems_method tables.

- The Gems_pdcomp which is a link between the component and the polymorphic detector

Graham asked what happened if we have multiple scales for a property. Weng answered that those are handled in GEMS

Paul commented that the GEMS being developed is not generic because it only contains information about the marker itself but not the protocol to create the marker. Graham clarified that it is not to create the protocol but the protocol that is applied or used.

Paul: Most lab has already set up their own set of protocols.

Ron: we are still at the early stage in applying marker but as more markers are made available from different labs, at that point when we need markers from another lab, we need information. Right now, they are available in the publication. I think this kind of database will be eventually needed.

Graham:

It needs to link the protocol with the output data (allele data).
Although this is available in publication but once the lab make adjustment, it is good to document this
I think what Paul implied is that we only show the SSR marker system. Although we only show the SSR marker system, the challenge is desing a system that accommodates different system
My question with Thomas is who will be responsible managing these data. The point is it is only done when people finally find the need to curate it but they become directed to another experiment. By that time they will curate it, the information is already gone.

Thomas: The LIMS for different labs maybe differ based on what is their focus. Now what are the different applications available in ICIS that are related with LIMS? For sample tracking, we have GMS (GIDs & germplasm methods) such as the DNA, the leaf it was extracted from and plant it came from. But we don’t have the setup of the wells. But you can use IMS and treat wells as location. Protocols can be stored in GEMS.

Paul: My idea of GEMS is a system where the information that cannot be stored in their current LIMS system (Nautilus). Nunza is storing it in NunGEMS.

Richard: Let us say, I used this marker on this genotype with this particular protocol. Marker detector contains the biological information (primer etc). You are detecting whatever amplifies that genome and it differs from one genome to another. You have to separate the different components of the methodologies- what piece of DNA in what condition and the assay (PD). It is the assay that you apply that gives you the molecular variant.

Graham: Our primary information in the breeding database is the PDID.

Paul: I don’t want the AFLP to be together. In a genetic distance, all these markers created will not have added value. I don’t see the added value … GEMS should just consist subset of markers with enough information. Why do you need to put in a database the information that might not be needed?

Graham: It is like what we do in GMS. We enter in GMS the different generations of a line when what the breeder only concerns is the stable line. Let us take DART data. We collected massive data for DART lines which the intention of doing association analysis. As we get more information about those lines

Paul: ...

Graham – we are thinking of quite diverse set of users. If you are using DART, you will eventually be interested how other labs are using it. But in the case of Nunhems, you will only be interested in a marker when other people in the lab becomes interested with it .

Thomas: Part of the question is the coverage as whether we want to put published data for a crop in GEMS. Or it is more of self-contained in a particular lab, project etc? Another question is what will be details and how to make it flexible? As Graham mentioned, in GCP we are required to look into more complicated situation.

- IRRI Retrieval Scripts (Thomas)

Summary: Thomas mentioned that at this point, we have batch loading of information in Workbook and a standalone application for marker data curation. One issue is whether to re-implement in Delphi with separated interface for each marker technology or as web-based interface? He mentioned also that there was discussion in Wheat CRC which Sandra added is available in cropwiki.

It is a challenge to have a one-to-one link between our phenotyping and genotyping data

Script for Retrieving information

the advantage of having the script in the server side is that it is faster t run than in the client side;
The disadvantage is it is database specific
Some tests about the execution of the script were done. The script created where get_dataset(Phenotype study) and get_dataset(genotype study) and create_aray_dataset(genotype table)
To display the dataset in parallel: the alleles found in markers are stored in a text field.

Martin: Will this give me a subset of the information in the same speed?

Thomas: Retrieving the subset from the result of the query is faster than assemble the subset.

Martin: I get it now that you are creating a new storage in the server. But how will you deal with synchronization of those storages?

Thomas: You can create it everyday.

Thomas: We created it with one study but we can create it for several studies.

- MoSel (Graham)

6. Discussions on how to create Genotyping database for the ICIS Community 2:00 - 4:00 (Chair: Shawn, Reporter: Fran and Weng)

Summary of presentation: The problem is there is a lack of a tool to integrate genotype, phenotype and pedigree that can help them them in their breeding program. I had discussions with breeders. They are using excel sheets which is being filled by hand.

Graham showed a sample sheet that Tom Hash is using at ICRISAT. IRRI and some other institutes can still manage to use this kind of “tool”. They are using 10 or less markers. This is not manageable with High throughput technology which handles 1500-2000 markers.

There must be some way to do it automatically? Can this be done only with excel macros?

Graham and Guy had some discussions on it and came up with a concept for Mosel.To illustrate Mosel, Graham presented how mosel look like. The purpose of the presentation was to stimulate thoughts from others who might who would want to have the same application they have in mind.

Fran: For phenotypic traits, are we thinking in terms of quality?

Guy: we can show categorical and continuous data... whatever we have

Graham: we can categorize, put colors...

Guy: heat maps, etc

Graham: Vivek is this gonna be useful for maize breeding?

Vivek: we don't have it right now. We're are just beginning to discuss something on that line

Graham: Can you still go away with spreadsheets?

Vivek: yeah...

Ron: Mosel is like doing DArT which is looking into the whole genome. we should consider where we're at...

Graham: I've had discussions with Ian Delacy.In DarT Technology, all markers are anonymous. Making all selections are made using trait values but later you will find new information as you see changes..

Richard: do we have enough data for this kind of tool?

Some discussions on an association tool...

Graham : The main objective of this tool is select which of this lines you want to keep but association can be a bi-product of this tool.

It's hard to identify the target markers in the foreground than in the background. A simulation tool is good for deciding which lines will have the trait we want

Vivek : the collapsing tool could be very useful

Casper: Is it one of the output tools for DMS, GEMS?

Graham: That could be. Mosel analysis tool and specify the data input into Mosel. Queries like: what is the list of your study? And press go.

Nunhems: to be able to sort, you need data into it...

Paul: normal users were able to collect data and combine with the marker data. dozen of visualization tool over in the internet which has the same output in the slide. it's open source and you can ask for customization from the developer. Effort should not be on creating new visualization tool.We can give a example of a large marker see. Data is collected which can be inputted to the software (phenomap?)

Graham: the objective of Mosel is different from the software. The question is : what is the coverage? which is already stated. genetically speaking, you can calculate any genetic distance from any of the lines. We'll be interested to know the visualization tools.

Paul: real problem is getting the data out. Does not really matter whether we use GEMS or NunGEMS. We Just need to make sure that we create a generic output.

Graham: Paul can you create a proposal with Jesper regarding this?

Ron : we have the same situation with others (use excel) but we need concentrate on how to target populations. IF we start small, it will ultimate go to this more sophisticated tool

Guy : what we need is to create a prototype from GCP data.

NEW TOPIC : LIMS DISCUSSION

Shawn : I think we should discuss what the lab people need, a round table discussion. we need a LIMS system for the lab before a GEMS can be filled in.

Ron : can you briefly discuss the different databases?

Shawn : what's the next step after GEMS?

Arllet : we are using it and will be using it in for mapping data.

Paul: we are using LIMS system and linking it a little bit to marker data. The real value of this systems is the interface between these database. We have a decision, that the lab guys do their molecular selection.

Shawn: do we start developing our own system? Since we are using different systems. Do we take off from GEMS and worry about the LIMS later?

Graham: LIMS component is very laboratory specific. we decided to do is to use a LIMS system being developed in ICRISAT. They have created all the quality controls for all the steps. They pop out with a genotyping data which we want it to flow straight into ICIS. We would like capturing of the protocol into GEMS. We haven't got our hands into that at the moment. we have ICIS leaf tracking system in ICIS. We want to upgrade from that situation.

Shawn: if we want to get something up and running here, we should know how the technicians create the sheets but they should not know where the date gets in.

Shawn : we need a system where lab people just need to learn how to enter data and not need to learn ICIS. If there's a question from a researcher, they should not have to go back to the spreadsheets. Should we go with IRRIs LIMS? GEMS approach?

Thomas: LIMS and GEMS is often mentioned together. We have to tear them apart. They are two different things.

Ron : LIMS is basically a lab book. It's a template in a laptop instead of a notebook. In lab book we set up the protocols. You set the the grid for the samples to keep track of where the samples go. Record all unusual or a little different. It's nice to have all those properties in one computer system. Type in marker and all protocols will be available.

Graham: Management of protocol in the laboratory is part of the LIMS. Before they do any genotype is one of the process for LIMS. Description of the protocols is also needed in the database. That's part of the LIMS and GEMS. When something goes out of the process is where the GEMS system is. How far outside the lab will you push the LIMS system?

Shawn: Casper what sorts of issues do you have?

Casper: how do we slate data into it? some issues were presented during the presentation

Paul: did you have a look at a commercial LIMS system?

Shawn/Rob: no

Paul: there are hundreds available which can fit into your lab. Evaluate 1 and choose 1. Its not that expensive and you will have it right away than create from scratch.

Shawn: Our goal is to incorporate ICIS into our breeding program.

Graham: whatever LIMS system you choose you decide where the interface into ICIS.

Casper : We need to get a bind to get all the databases.

Graham: I think the databases are represent well in the diagram.some of the data in the DMS in not interpretable if there is no GEMS.

Thomas: same thing for GMS and DMS

Ron: you will never see the phenotypic data again. Genotypic and phenotypic data go hand and hand. 2 process in breeding data. 1 parental mapping and evaluation data from the cross. Setting up the cross is an extremely important process. I am looking for tool that will us to take the pheno and geno and display it so that we can evaluate it. Basically there is a standard and select a line based on that standard. Basically we do it like these but with spreadsheets. Bring up a series of lines and do you have markers? do you have polymorphism? For the traits, we look on those genes and see if there's a marker associated to that.

Shawn : Another problem is having many locals. How to manage it?

Graham: you can design a tool that connects to databases.

Shawn : Can we decide on the GEMS, decide on the schema? protocols store in GEMS? For us its getting the phenotypic and Genotypic data together.

Graham: you should see GEMS as like GMS with DMS

Shawn : Agree on a GEMS like a schema that works? Whether we use all the tables and fields together?

Graham: we've come away from the LIMS database, right?

Shawn: yes, my worry is that if 5 years from now, when we need to get back into that...

Wednesday, June 6

Tour of SPARC and biotech facilities 8:00 - 10:00

7. Integration of External Tools with ICIS 10:20-12:00 (Chair: Rob, Reporter: Shawn)

- Graphical GenoTyper (Paul)

See also presentation of GGT made by author Ralph van Berloo in 2006

For more information about GGT on Internet see: http://www.dpw.wau.nl/pv/pub/ggt.

Summary:

- application is freeware and being used in the Nunhems biotech lab

- tabbed browsing of the output

- Takes the data loaded into it and creates a genetic map

- Colours represent marker scores and those colours can be customized

- Each bar represents a plant

- Data can be loaded from an Excel spreadsheet

- Changes can be made to the data within the application, but those changes are not written to the database

- Data can be sorted; individual lines can be looked at

- Some statistics can be calculated on the marker dataset ex) Calculate map distance

- A pie chart can be created from the dataset, but only useful if the map distance is included

- Allows for different views of the genetic maps

- Comes with a help manual

- More information can be found on the presentation Casper did at CIMMYT in 2006

- Fran: Does the data have to be in a precise format?

- Paul: Yes. The system uses it's own file to load, but it comes from an Excel sheet which has to be set up properly

- Images can be exported

- Paul: The lab at Nunhems works differently from your lab here at SPARC. The lab at Nunhems handles the selection process because they are considered the experts. Here at SPARC, your breeders do the selection.

- Ron - Yes, the lab here offers advice, but the ultimate decision rests with the breeders. This software would be very useful in the backcrossing programs

- Paul: The developer of this software really likes to receive feedback, and would probably be accomodating to changes

- PediTree (Arllet)

For more information about PediTree on Internet see: http://www.dpw.wageningen-ur.nl/pv/pub/Peditree/index.htm.

Summary:

- Creates a tree-shaped visual of pedigrees

- Will do basic estimates of Coefficient of Parentage

- Values can be pulled out of ICIS and loaded into PediTree

- The Retriever is used to set up the correct information

- Select the list, study and traits that you need

- Change the databases specified in the Peditree.ini

- Set the keys so it connects to the Retriever

- Put PediTree into your Launcher.txt file

- Arllet still has to check on how to get data next to the pedigrees in the tree

- Will create an ancestry chart from the pedigree tree, but needs a controlled zoom to make viewing it easier

- Graham: Works well for bi-parental crosses, but does not contain any selfing generations.

- Casper: The application was made for Potato breeding programs and still needs some work to be adapted to other crops

- HandHeld Applications (Warren)

Summary:

- Goal is to breakdown the DMS Workbook into modules to deal with data entry from handhelds

- There have been many requests for handheld and web-based data entry tools

- Handheld Workbook

- Tabs for each section (Variate, Scale, Study, Data)

- Can only store 1 variate at a time

- Casper would like to see an import/export option

- Each variate data is stored as text files

- Graham: not sure you need to define your variate and scale in the handheld, it should be done prior to loading the workbook in the handheld

- Casper: would like to see you step through the variates while at the plot

- Warren: it does allow you to choose variates from a combo-box

- Selwyn: having to choose different variates while at a plot takes too long. You want to enter your data and move on. He would also like to see previous data from the experiment.

- Graham: There needs to be more user-driven development for this tool. Unless someone is willing to sit down and explain exactly what is needed, Warren should stop.

- Warren also demonstrated a comparison tool to query marker data by marker name

- Choose your pedigree list

- Select which study you are interested in, and it will display in a new box, the marker names associated with that study

- Choose a trait and get the phenotypic data in another box

- Casper: We need to decide which approach we are going to take in querying this data.

- Nunhems External Tools (Casper)

Summary:

CodeSoft:

- Commercial label-design application which Nunhems links to the Retriever via an ODBC connection

- Datasets are created in the Nunhems Retriever, and sent via macros to CodeSoft to create labels

- Labels are printed on ToshibaTec printers

2L:

- Commercial form design software for handhelds

- Taking a dataset (textfile) created by Excel or the Retriever and forms can be made to enter data or view historical data in the field

- One design can contain many datasets

- Very flexible, but not cheap to buy

Note: there is also a LITE version available with some limited functionalities. In this case the PC-Developers module is for free, only a license fee should be paid for the software on your hand-held computer, about 400 US$?, more info at: http://www.sw4hh.com

- WorkAboutPro handhelds are used at Nunhems

TOAD:

- Used for maintenance of Oracle databases, also for development and maintenance of views (queries)

Remote Controls:

- DameWare

- Shadow in CITRIX

- WebEx - remote control of applications outside of network

Just recently found on the web that there is also an opensource solution for remote control, see http://www.crossloop.com/index.html

Help & Manual:

- Commercial software to create help files

- Can make both browser help files and .pdf files

Screen Capture Tools:

- SnagIt - Captures and saves video clips to your computer

- HauteCapture - Captures images on handhelds in real time

- University of Queensland Pedigree Download Tool (Sandra)

- UofQ depends heavily on web interface for users

- You can currently see the pedigrees, but cannot download them

- SetGen will allow users to export information, but they need to have ICIS installed

8. GRIMS developments 1:00-3:00 (Chair: Selwyn, Reporter: Graham)

- Maize Genebank (Juan Carlos)

- Wheat Genebank (Jesper)

- Rice Genebank (Roniela [Ella])

- Standard Material Transfer Agreement (Grace)

- Web Access to GRIMS Data (Selwyn)

9. Database Backend Issues 3:20 - 4:00 (Chair: Arllet, Reporter: Thomas)

- ICIS Schema Maintenance and Database Conversion (pdf) (Corina [Ching])

Casper: For the schema upgrades, and in general, consider a more consistent naming convention of tables and fields that can be maintained in the longterm.

Thomas: SQL upgrade patches assume that users have the most current version of the database. Consider making this assumption more explicit in the patch name and/or in the release notes for the patch.

Selwyn: Consider supporting partial upgrades by making dependencies between schema and applications more explicit, e.g. release notes.

Casper: The entire new ICIS package has good upgrade notes and instructions.

Thomas: Investigate the technology and/or approach used by the different cross-backend synchronization tools to see whether they are really more efficient and effective than the current conversion, dump, and reload procedure.

- IRIS in PostgreSQL (Corina [Ching])

Graham, Jesper: PostgreSQL needs to be evaluated as local ICIS database since that is where data are changed.

Graham et al: Differences is execution time of the various tests were noted. PostgreSQL was significantly slower is some of the queries, but not consistently so. PostgreSQL was running on a different server, whereas MS Access was running on the same machine as the client. Several explanations for the observed differences are possible, but the differences don't rule out the use of PostgreSQL as a backend for ICIS.

- IWIS3 in PostgreSQL (Jesper) => moved to Thursday

- Future Directions for the ICIS Schema (Arllet/Richard) => moved to Thursday

Thursday, June 7

- IWIS3 in PostgreSQL (pdf) (Jesper)

Arllet: The full install of ICIS was running correctly against the IWIS3 central database in PostgreSQL.

Jesper: Synchronization is fast, ~ 10 min for the extraction of data from SQL Server, and ~ 20 min for the creation of indices.

There are still open issues, especially the use of PostgreSQL for a ICIS local database, since it is in a local database where ICIS applications make changes to the data. A priority should be to test PostgreSQL as a backend for ICIS local databases.

PostgreSQL is not as easy to install, use and support as MS Access. It's role is more for shared databases that get accessed via ODBC. However, further testing and user experience of PostgreSQL on a local machine should be gathered.

- Future Directions for the ICIS Schema (Richard)

No PowerPoint presentation available.

Richard started with the question of what the ICIS schema 6.0 might look like.

It may not be advisable to add an ICIS genomics schema, but to use the Generic Model Organism schema [GMOD] instead. CHADO is modular, e.g. covering sequence data, controlled vocabulary, etc. [CHADO]. The CHADO schema is similar in generality to the ICIS schema and Richard suggested we should try and learn and adopt relevant parts in the further development of the ICIS schema.

Graham: Should we change the ICIS metadata schema (property, scale, method) to the CHADO schema? Richard showed some of the similarities between the two systems, e.g. the CHADO names and dbxref tables.

Additional information on IP status, uploading status, and access control for germplasm in ICIS is needed. An additional field in the germplasm table vs the use of attributes vs the use of ICIS lists were discussed. It is clear that these are different problems that might need different solutions. At IRRI, IP status is already managed as an attribute. Information on uploading status may be maintained in ICIS lists, and differential access control may require additional fields in the germplasm table (GMS) and the study table (DMS) which would need to be used by the applications when retrieving data.

On the locational data management in IRIS, Richard reported on the central concept of location ID and other pertinent aspects. The schema is still under development and needs further refinement. Linking of the locational data curated by Isaiah Mukema during the last 3 years still presents a problem, as the currently used location location ID's in IRIS seem not to link to the curated data.

Selwyn: If the scope and complexity of the location data becomes too large, it becomes too expensive to maintain, and the question also arises as to who should do it.

Richard proposed to set up a section on the ICISWiki to review the schema and the consequences on schema changes during the next year, and discuss it during the next ICIS annual workshop.

10. Issues With Uploading Local Pedigree Data to the Central Database 8:00 - 10:00 (Chair: Jesper, Reporter: Graham)

- Separating Sensitive (private) data from non-sensitive (public) data in local (Shawn, Sandra)

Discussion:

-Should we have a field in GERMPLSM to indicate which germplasm can be uploaded or should we use an attribute? For the moment lets use an attribute and get agreement from the breeders about the stage of publication of pedigrees. This stage should coincide with the making of a list – such as entry into a yield trial stage.

- Another problem is the ability to unload a study from central. If a study is uploaded in error, how can it be replaced? What we need is a tool which allows the removal of a study or set of studies and allows upload of selected studies. This should be the basis of an administrators’ tool for managing GMS and DMS uploading

- Dealing with duplicate entries (Shawn, Jesper)

Discussion:

- Often lines are tested in locations before the GIDs are available in central. These lines need to be identified and replaced before uploading. A fuzzy name search would be a useful application in the administrators’ tool. This checking only needs to be done for lines imported to the local location since those developed at the location will likely not be known to the central.

- Keeping the DMS and GMS Central Databases synchronized (Shawn)

Discussion:

- Dealing with replaced GIDs associated with DMS records. It would be nice to have a tool which synchronized GIDs referenced in DMS with changes in the GMS. A short term solution would be to add a query to the local database to make this update from the changed table periodically.

11. ICIS security access/issues 10:20-12:00 (Chair: Shawn, Reporter: Casper)

Security of ICIS data in AAFC (Shawn)
Security of ICIS data in CAGE (Sandra)
Security of ICIS data at Nunhems (Casper)

Presentations and discussion are about how to deal with different security access levels in ICIS. AAFC and UQ have the problem that users can work in several projects and per project members may have different access to certain to ICIS data, both GMS and DMS.

For example AAFC is currently working with separate local databases per project, to avoid users seeing to much data. This is not easy for users working on multiple projects (have to switch local database all the time) and also not for administrators (eg how to deal with duplicate data, duplicate IDs over multiple local etc.). AAFC data is made only public to central GWIS3 database after 5 years.

Via the UQ website wheat data are available at two levels. Without password only public data is available. With password more breeder’s data is available, which is needed already in an earlier stage. Currently this is organized via 2 different databases in the background which is not easy for maintenance.

In Nunhems it is slightly different organized. A breeding team has full access to all data (Read and Write). Other users have Read-Only access to parts of the database. Differentiation is organized via creating different Oracle Views and creating different versions of the Retriever. The Read-Only users have NO access to SetGen and GMSSearch.

So the conclusion is that Security/Access Restrictions should be improved in ICIS. There is a need to make a difference between GMS and DMS. After some discussion the conclusion looks that:

For GMS a security/access level is necessary on GID level, this probably requires an extra field in the Germplasm table (Another solution might be to add another table in which is kept which GIDs are PRIVATE and which SEC_GROUPS may have access to this GID. GIDs that NOT appear in this table are PUBLIC). Only needed in local db?
For DMS a security/access level is necessary on STUDY level, this probably requires an extra field in the SYUDY table (Also here might be an extra table be more flexible). Only needed in local db?

Since users can be member of multiple projects probably a SEC_GROUP table is needed in which can be identified which user is member of which group and which access rights are available on GID and STUDY level.

(See end of this section with a first rough idea of new table structures)

In order to set the access-level there is a need to have extra functionalities available in SetGen and the WorkBook to (un-)mark PRIVATE certain data for other users. The idea is to set a flag to selected GIDs via

Fill column with flag (Is the flag indicated by available groups in the SEC_GROUP table??)
Save Flag as Lock key

This tool can be similar as how several saving of NAMES and ATTRIBUTES is organized.

Also an extra functionality will be needed to (un-)mark PRIVATE Studies.

Additional to this also tools should become available in the “Administrators” tool that Jesper will start developing.

Item of investigation is still how dll functions should be adapted and what action should be taken is cases eg PRIVATE marked GIDs appear in a tree in SetGen or GMSSearch. First feeling is that changes in DLL for GMS can be easy, for DMS more changes might be needed. Arllet will investigate this.

Questions that arise are:

Might we need multiple levels of “PRIVATE”? For example in some cases it will be allowed to show the germplasm only (NOT the parents) and in some cases also the parents might be shown.
Can this solution also solve the problem of BAYER to show only selected Names of a germplasm to certain users. Bayer likes to add multiple names to a Germplasm, but depending on the user only show 1 or a few names.

Possible idea for implementation this item

Add Security Groups to the UDFLDS table in the local database. Adding of these groups should be able ONLY by the administrator’s tool that Jesper will develop.

New Table 1: SEC_GRP_GMS

SECGRPID (N): ID for security groups (Links to UDFLDS.FLDNO)
GID (N): GID (Links to GERMPLSM.GID)
USERID (N): Links to USERS.USERID (Only users listed in this table may be able to see all information of this GID)
SECLVL (N): Depending on security level only parents can be set to Private, or whole GID can be set to private
GRPSTAT (N): Status of Security (eg 0 for active, 9 for removed (PUBLIC again). In this way can be traced over the time which GIDs have been private once and for which groups

New Table 2: SEC_GRP_DMS

SECGRPID (N): ID for security groups (Links to UDFLDS.FLDNO)
STUDYID (N): STUDYID (Links to STUDY.STUDYID)
USERID (N): Links to USERS table (Only users listed in this table may be able to see all information of this GID
GRPSTAT (N): Status of Security (eg 0 for active, 9 for removed (PUBLIC again) In this way can be traced over the time which STUDIES have been private once and for which groups

12. Break Out Sessions 1:00 - 4:00

- Storing of Passport data

- Pedigree Management (Shawn, Sandra, Jesper)

- Jesper, Shawn and Sandra met about how to handle updates to IWIS3:

- Sandra brought up concerns that her changes from last year were not reflected in the new version of IWIS3. Jesper explained that if the changes are not listed in the CHANGES table in the local GMS, then they are not uploaded to IWIS3.

- It was decided that if conflicts in pedigree information arise, Canadian lines will be resolved by Shawn's updates, Australian lines will be resolved by Sandra's updates, and any others Jesper will notify us to double check prior to uploading.

- Due to and large addition of American pedigree information and some errors that need to be sorted out, Shawn asked Jesper to discard the previous Canadian update databases from 2005 and he will create a new one and send it by the end of summer.

- Jesper explained that not all of the tables from the local GMS are currently being uploaded. Graham later explained the it is because the queries were not yet developed for all tables. He encourgaed Jesper to include queries for all local GMS tables in the new Administration tool he is creating.

- Shawn, Sandra and Jesper agreed to work closely on pedigree management to ensure the updates to IWIS3 run smoothly.

- Structure of Genotyping Data (Graham)

- Ontology (Richard)

== Workshop Supper - Chinook Golf Course == 7:00 - 9:00

Friday, June 8

13. ICIS Software development and collaboration tools 8:00 - 10:00 (Chair: Fran, Reporter: Martin)

- ICISWiki and CropForge (pdf) (Thomas)

CropForge:

- Suggestion: to have bug reports (trackers) open for not only members

- Suggestion: filtering system for bugs (from which module, etc.)

- Conclusion: there are more tracking systems; functionality of Jira and CropForge tracker are similar;CropForge tracker is missing an obligatory version of the system the bug is reported about

- Suggestion: add category filter to the bug tracker

ICIS Wiki:

- Suggestion: all user should have a non-empty home page (better for patrolling)

- Suggestion: find what are the options for externalization (export) of cropwiki pages

- ICIS software development and release management (Arllet)

- A bit of uncertainty about versioning system

- Suggestion: Release notes should be readily available on the CropForge site (without having to download anything. i.e. zip file containing CHANGES.txt, RELEASENOTES.txt)

- Suggestion: As the software development group gets bigger, it would be a good idea to prepare a "proposed changes" document & publish this document for review/comment by the ICIS community before any development takes place.

- Managing Code (Lilibeth, Clarissa)

- Suggestion: Add to best practices: Commit often

- Suggestion: Add to best practices: Commit only compilable code

- Suggestion: Add to best practices: In-code documentation (document as you go)

- Suggestion: Add a more formal way how to test code (Graham: don't expect your users to test everything. We need stricter standards of testing to be done by the developer)

- Intellectual Property and Licensing issues (pdf) (Thomas)

14. Integration of ICIS with GCP Platform and Progress on the ICIS web interface 10:20 - 1:30 (Chair: Graham, Reporter: Thomas)

- Generation Challenge Program Platform and ICIS as GCP Data source (Richard)

- ICIS Web - The Next Generation

- Towards Crop Information Network

15. Future Planning 1:30 - 3:00 (Chair: Graham, Reporter: Shawn)

Development and Consolidation of new modules
- (CRIL/GCP) -Linking ICIS database to the GCP domain model (Mylah, Weng, Martin, Richard)
- (CRIL)- Genotyping module
  - data management (Graham, Weng)
  - Integration of existing tools (structure,GDPC tools) (Guy)
  - Develop a prototype visualization tool (Guy)
- (UQ)-storage of multivariate data (evaluation estimates and their precision) (Ian)
- (GRC/CRIL) GRIMS
  - Backend conversion of Oracle to PostgreSQL (Ella)
  - managing mission information (Ella)
  - display of preferred id (Arllet)
- (CIMMYT/CRIL)-look at the transfer of wheat genetic resources data to ICIS following with the experience on rice (Jesper)
- (CRIL)- adoption of specific RGMIMS components (Richard)
- (CRIL/CIP)-data warehousing, Mondrian/Rubik, Release cultivar warehouse. Crop Finder and views (Edwin, Juan Carlos)
- (CRIL/AAFC)-record level access rights to GERMPLASM table, STUDY level access to DMS (Arllet, Ching, Shawn)
- (CRIL/Bayer) – Restricted access to Names (Beth, Sebastien)
- (CRIL/Bayer) – linkage between Workbook and CROPSTAT (Beth, John)

Stand-alone Applications
- (CRIL)-cross finding facilty in standalone applications (Beth)
- (CRIL)-combining ability analysis in CROPSTAT (Graham)
- (CRIL/Bayer)-additional functionality in Workbook for hybrid breeding (Beth, Warren, John)
- (CRIL)- administrator’s tool (Jesper)
  - Loading/Deleting Central Studies
  - Partial upload of Germplasm records
  - Synchronization of GMS and DMS GIDs
  - Duplicate names search – Validation Tool (Ching)
- Managing user groups
- (CRIL/AAFC)-implementation of naming conventions (Candy, Shawn)
- (AAFC/CRIL)-development of Data Comparison Tool (Shawn)
- (CRIL) – display CID, SID and CIMMYT Cross expansion in GMS Search (Jesper, Beth)
- (CRIL/Bayer/Nunhems) – Investigate/implement alternative naming and ordering of list data columns in SetGen (Candy, Casper, Sebastien)

Web Interface
- (GRC/CRIL/ATFCC/GPG2) - Web interface for Genetic Resources data (Ruaraidh, Ella, Mylah & ?)
- (CRIL)-integration of Scalable Vector Graphics to produce high quality visualization in web interface (Supat)
- (UQ)-make a pedigree download tool, compatibility with Peditree (Sandra)
- (UQ/CRIL)-List management, germplasm and trait list management (Richard, Mylah)
- (CRIL/PBGB) – Seed request module which connects inventory with evaluation data (Mylah, Grace, Beth)
- (CRIL/GCP/UQ) -Web interface for DArT Data (Weng, Sandra, Richard, Mylah)
Utilities
- (CRIL)-Validation of coding and documentation standards (Arllet, Thomas)
- (ICIS community)-user documentation and help, wiki/multimedia (Thomas)
- (CRIL/CIMMYT)-finalizing version 8 of Fieldbook with links to ICIS (Vivek, Graham)

ICIS implementations and conversion
- (CRIL)-integration of Maize Finder with ICIS (Juan Carlos)
- (UASB)-establishment of Sorghum Information Systems (Sashidah)
- (IRRI/CIAT/WARDA)-cross referencing of CG rice germplasm (Ruraidh)
- (CIMMYT/ICARDA)-cross referencing of CG wheat/barley germplasm (Jesper)
- (CIMMYT/IITA)-cross referencing of CG maize germplasm (William)
- (ATFCC) – Construction of Central DB for Lentil, Chickpea and pea (Selwyn)
- (ATFCC) – Publication of Genetic Resources Characterization on the web (Selwyn)
- (ATFCC) – Loading of the observational data into the Central DMS (Selwyn)
- (UQ/AAFC/ICARDA/CRIL) – Uploading pedigree updates into IWIS3 and IBIS (Jesper, Shawn, Sandra, Akin)
ICIS Structure
- (CRIL) – Tree structure for Study Names (Arllet, Warren)
- (CRIL) – Changes table for DMS (Arllet)
- (CRIL) – Naming of datasets (Warren)
- (CRIL) – Make Study Type UD Field (Arllet)
- (CRIL) – Identify management methods in the Germplasm table (GNPGS) (Arllet)
- (CRIL/Nunhems) – Extra columns in the list data (Candy)
- (CRIL) – Full implementation of PostgreSQL as a backend (Ching)

- Wrap-up 3:20 - 4:00

Registered Participants

**ICIS 2007 Developers Workshop Participants**
Name	Affiliation
Shawn Yates	Agriculture and Agri-Food Canada
Fran Clarke	Agriculture and Agri-Food Canada
John Clarke	Agriculture and Agri-Food Canada
Ron Knox	Agriculture and Agri-Food Canada
Jay Ross	Agriculture and Agri-Food Canada
Devin Dahlman	Agriculture and Agri-Food Canada
Graham McLaren	IRRI-CIMMYT
Richard Bruskiewich	IRRI
Thomas Metz	IRRI
Arllet Portugal	IRRI
Warren Vincent Constantino	IRRI
Lilibeth Sison (Beth)	IRRI
Rowena Valerio (Weng)	IRRI
Clarissa Pimentel (Candy)	IRRI
Maria Corina Habito (Ching)	IRRI
Mylah Rystie Anacleto	IRRI
Roniela Prantilla (Ella)	IRRI
Grace Lee Capilit	IRRI
Casper aan den Boom	Nunhems Netherlands BV
Paul Buddiger	Nunhems Netherlands BV
Stefan van Lier	Nunhems Netherlands BV
Rob Aerts	Nunhems Netherlands BV
Jesper Nørgaard	CIMMYT
Bindiganavile Vivek	CIMMYT
Juan Carlos Alarcon	CIMMYT
Sandra Micallef	University of Queensland
Sebastien Frade	Bayer Cropscience - BioScience
Selwyn Ellis	Australian Temperate Field Crops Collection
Martin Senger	GCP/EBI

ICIS Workshop 2007

From ICISWiki

Contents

ICIS Developers' Workshop 2007

Agenda and Minutes

Monday, June 4

Tuesday, June 5

Wednesday, June 6

Thursday, June 7

Friday, June 8

Registered Participants

Views

Personal tools

Navigation

Search

Toolbox