W - Algorithms
All points are compared in a pairwise way and a coefficient of correlation (Pearson) is computed between the two series under comparison.
The coefficient of correlation is then transformed into a local similarity coefficient.
In the current version of the unique algorithm available, neither alignment nor stretching is performed.
It means that data should be aligned and monotonous.
Comparison logic:
--- spectrum comparison ---
'''W_ByIndex
'''Compare srce.m_Values[i].Y with ref.m_Values[i].Y
'''The number of values in each field should be the same
'''For each index, the distance between the source (s) and reference (r) is given by: abs(s - r) / max(abs(s), abs(r))
'''W_Interpolate
'''Compare the Y values from the source spectrum with the reference spectrum
'''For each srce.m_Values[i].X and ref.m_Values[i].X, the corresponding reference or source Y value is interpolated.
'''The distance between the source (s) and reference (r) values is given by: fabs(s - r) / max(fabs(s), fabs(r))
'''Optimistic means that values that cannot be interpolated are simply ignored. Only the range of values existing in both waves are taken into account.
W_Correlation
Compare the Y values from the source spectrum with the reference spectrum
For each srce.m_Values[i].X and ref.m_Values[i].X, the corresponding reference or source Y value is interpolated.
The source and the reference interpolated values are used to compute the Pearson correlation coefficient.
--- W peaks comparisons ---
W_sym
Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x - tolerance, ref.x + tolerance]
The final similarity = sum of the similar lanes / (source lane n° + ref lane n° - similar lane n°)
W_sympro
Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x * (1.0 - tolerance), ref.x *(1.0 + tolerance)]
The final similarity = sum of the similar lanes / (source lane n° + ref lane n° - similar lane n°)
W_id
Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x - tolerance, ref.x + tolerance]
The final similarity = sum of the similar lanes / source lane n°
W_idpro
Scan the source and the reference values and increase the similarity every time a source value is in the range [ref.x * (1.0 - tolerance), ref.x *(1.0 + tolerance)]
The final similarity = sum of the similar lanes / source lane n°
W_close
Divide the distance to the closest p_Ref band by the greatest of the two bands
At the end, sum all best links and divide by the number of comparisons (= the number of m_Values[i])
W_pearson
The correlation coefficient:
the final similarity is: sim = max_(0.0, r)
W_pearson_reverse
idem. The final similarity is: sim = max_(0.0, -r)
W_closesym
Identical to the Close algorithm, but commutative: sim = (sim(srce, ref) + sim(ref, srce)) / 2.0
W_neili
// Distance equation: Dxy = 2 * Nxy / (Nx + Ny) where:
Nxy is the number of shared lanes between the source and the reference,
Nx is the number of source lanes,
Ny is the number of reference lanes.
example 1 : Source 1010100011
Reference 1010111100
Nx = 5
Ny = 6
Nxy = 3
Dxy = 2 * 3 / (5 + 6) = 0.5455
example 2 : Source 1110011000
Reference 1110000001
Nx = 5
Ny = 4
Nxy = 3
Dxy = 2 * 3 / (5 + 4) = 0.6666
As we compare double values instead of binary values, we use the tolerance to known if the source and the reference are similar, as in the SYM algorithm.