BioloMICS menu

Multiple field curation

Full movie BioloMICS desktop: Automated curation tool - Multiple fields (see below).

*Make sure there is an F-link field in the target table that groups the records (in this case in the Taxonomy table). This field will be used to store the statistics. This is only needed when multiple fields are used for this analyzes.

Select one or several criteria to compare all records in a given cluster and mark the ones that are outliers.

So, it is to look at the strain and see if it is really in the right cluster, and to see if all data are present etc. In this way it is easy to see what data are missing.

Do a search for the wanted records in the main grid of BioloMICS.

The tool will take all records currently in the grid (all pages takes).

So if a part of the data should be used, then first do a query to get the wanted records.
For example, search for all records that have an ITS sequence AND a Beta tubulin sequence linked AND has a value in the field Growth media.
Go to Other Tools and click on Automated curation.
Popup window appears with a short explanation.
Click Next to continue.

The 'one field' will be explained in this chapter and is to point out the outsiders at the unit level (sequence level in this case).
3. Select second option: ‘Select several fields with, less curation resolution’.

It will use all selected fields to create solid clusters and associated statistics and trees but with lower curation possibilities.
Click Next to continue.
In the next step select the fields to be used for the comparisons, curation, analyzes, statistics and trees.
On the right, the algorithm can be selected that will be used for the comparisons and well as some other options like weigth.

Here we will select:
the sequence link-field for ITS sequences - weight 2
the sequences link-field Beta tubulin sequences - weight 2
Growth media - weight 1
Click Next to continue.
Then select the options for the analyzes:

Hover the mouse on top of the number to see the details.
Click Next to continue.
Start analyzing records.

Analyzing step. All records having data for the clustering and fields to be analyzed will be used.

Progress will be displayed at the bottom of the popup window.

Detailed results can be accessed by double clicking on the records of the provided list and are available from the provided path place (#13).
Click Start.

Now it will do the following:
Loading selected records.
Checking for each sequence if the length is long enough to be part of the analysis, otherwise it will be rejected and it gets a score of -2.
Analyzing cluster per species name (in this case).
All found species names (in this case) are listed together with extra information in separate columns:
Cluster: Species name (in this case).
Medoid: Name of the strain that is indicating the medoid of the given species.
Medoid = the one with the smallest distance to all other point in the same species (in this case).
Cluster number: Number of cluster(s) found. Note that the bad and short sequences are excluded.
Record number: Number of records found.
Average similarity to Medoid: Average similarity to medoid record.
Average similarity: Average similarity between all records.
Minimum similarity: The furthest pair of points within the species.
Maximum similarity: The closest pair of points within the species.
Double click on a given species name to open the HTML document and to see all the details. Note that when there is only 1 record in the group then the report is not generated.
For more details about the report, see Results automated curation explained.
Click next to continue and to complete the analyzes.
Completion of the analyzes.
The statistics are stored in the file link-field that was selected in #14.

https://youtu.be/z5VnyChe8Oc

Movie can also be found on YouTube, BioloMICS desktop: Automated curation tool - Multiple fields.

This movie shows how to use the Automated curation tool using multiple fields.

1. Search for wanted records (0:26)

2. Start Automated curation (0:59)

3. Select multiple fields (1:08)

4. Select fields to be used for the analyzes (1:15)

5. Set options (1:37)

6. Start analyzing records (2:16)

7. Read results (2:51)

8. Complete wizard (3:57)

9. Check statistics (4:03)

All needed information is given above this line.

_________________________________________________________________________________________

Field to group records: Select the field containing the value that will be used to cluster the records (species name for example).

This criterium will be used to analyze all the records that share the same value for the selected criterium (species name for example).

Here we will select Taxon name. So this means that the data in the 3 selected fields (selected in previous step) will be grouped based on the taxon name value (species name) in the MIRRI taxonomy link-field. So all Candida albicans’ together and all Cryptococcus neoformans’ together etc.
Higher level grouping (items will be grouped at genus level instead of species): If checked then the items will be grouped based on the level above, so if the record name contains the species name then in this case the genus will be used for the grouping.
For Synlink fields: it takes the value before the first space that is the higher level. And groups based on that first text.
For Olink fields: the parent is taken.