BioloMICS menu

W - Algorithms

All points are compared in a pairwise way and a coefficient of correlation (Pearson) is computed between the two series under comparison.

The coefficient of correlation is then transformed into a local similarity coefficient.

In the current version of the unique algorithm available, neither alignment nor stretching is performed.

It means that data should be aligned and monotonous.

Comparison logic:

--- spectrum comparison ---

'''W_ByIndex

'''Compare srce.m_Values[i].Y with ref.m_Values[i].Y

'''The number of values in each field should be the same

'''For each index, the distance between the source (s) and reference (r) is given by: abs(s - r) / max(abs(s), abs(r))

'''W_Interpolate

'''Compare the Y values from the source spectrum with the reference spectrum

'''For each srce.m_Values[i].X and ref.m_Values[i].X, the corresponding reference or source Y value is interpolated.

'''The distance between the source (s) and reference (r) values is given by: fabs(s - r) / max(fabs(s), fabs(r))

'''Optimistic means that values that cannot be interpolated are simply ignored. Only the range of values existing in both waves are taken into account.

W_Correlation

Compare the Y values from the source spectrum with the reference spectrum

For each srce.m_Values[i].X and ref.m_Values[i].X, the corresponding reference or source Y value is interpolated.

The source and the reference interpolated values are used to compute the Pearson correlation coefficient.

--- W peaks comparisons ---

W_sym

Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x - tolerance, ref.x + tolerance]

The final similarity = sum of the similar lanes / (source lane n° + ref lane n° - similar lane n°)

W_sympro

Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x * (1.0 - tolerance), ref.x *(1.0 + tolerance)]

The final similarity = sum of the similar lanes / (source lane n° + ref lane n° - similar lane n°)

W_id

Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x - tolerance, ref.x + tolerance]

The final similarity = sum of the similar lanes / source lane n°

W_idpro

Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x * (1.0 - tolerance), ref.x *(1.0 + tolerance)]

The final similarity = sum of the similar lanes / source lane n°

W_close

Divide the distance to the closest p_Ref band by the greatest of the two bands

At the end, sum all best links and divide by the number of comparisons (= the number of m_Values[i])

W_pearson

The correlation coefficient:

the final similarity is: sim = max_(0.0, r)

W_pearson_reverse

idem. The final similarity is: sim = max_(0.0, -r)

W_closesym

Identical to the Close algorithm, but commutative: sim = (sim(srce, ref) + sim(ref, srce)) / 2.0

W_neili

// Distance equation: Dxy = 2 * Nxy / (Nx + Ny) where:

Nxy is the number of shared lanes between the source and the reference,

Nx is the number of source lanes,

Ny is the number of reference lanes.

example 1 : Source 1010100011

Reference 1010111100

Nx = 5

Ny = 6

Nxy = 3

Dxy = 2 * 3 / (5 + 6) = 0.5455

example 2 : Source 1110011000

Reference 1110000001

Nx = 5

Ny = 4

Nxy = 3

Dxy = 2 * 3 / (5 + 4) = 0.6666

As we compare double values instead of binary values, we use the tolerance to known if the source and the reference are similar, as in the SYM algorithm.