BioloMICS menu

O - Algorithms

OFields are compared by computing how many parents they have in common, divided by the length of the path.

The root item (always with Id = 0 in the database) is not taken into account.

Example:

Using the following data set:

Id	Record name	OField full path, including the record Id
0	Root (not in the database, but the root of all records in memory)	-
1	Europe	0.1
2	North America	0.2
3	South America	0.3
4	Asia	0.4
7	Belgium	0.1.7
8	Germany	0.1.8
10	USA	0.2.10
43	Brussels region	0.1.7.43
44	Walloon region	0.1.7.44
52	Brussels city	0.1.7.43.52
65	Namur province	0.1.7.44
148	Gesves	0.1.7.44.148

Comparing Europe with North America will give a similarity of 0. Path of Europe is "0.1" where path of North America is "0.2". Removing the root value of zero shows that no common value remains.

There are three algorithms available:

Classification algorithm (Similarity=common values/longuest path)
Proportional identification algorithm (Similarity=common values/shortest path)
Identification algorithm (Similarity=presence of most detailed source value in path of reference)

The table below shows a few possible comparisons. The part of the path used to compute the similarity is given in bold.

Source record	Ref record	Source path (no root)	Ref path (no root)	Classification algorithm (Similarity=common values/longuest path)	Proportional identification algorithm (Similarity=common values/shortest path)	Identification algorithm (Similarity=presence of most detailed source value in path of reference)
Gesves	Belgium	1.7.44.148	1.7	2/4=0.5	2/2=1.0	148 not in 1.7 so similarity=0.0
Belgium	Gesves	1.7	1.7.44.148	2/4=0.5	2/2=1.0	7 is in 1.7.44.148 so similarity =1.0
USA	Europe	2.10	1	0	0	0
Belgium	Germany	1.7	1.8	1/2 = 0.5	1/2 = 0.5	7 is in 1.8 so similarity =0.0
Village 2	Village 2	1.7.44.108.215.98.132	1.7.44.108.117	4/8 = 0.5	4/5 = 0.8	132 is not in 1.7.44.108.117 so similarity = 0.0

We see that the similarity is the number of identical parents divided by the possible number of parents, which is in fact the length of the smaller path between the two records being compared.

The root is always ignored in the comparison, as it exist for all records.