Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(8WPU_1)}(2) \setminus P_{f(7CDF_1)}(2)|=85\),
\(|P_{f(7CDF_1)}(2) \setminus P_{f(8WPU_1)}(2)|=45\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10011110011011110000000001010101100010001011111111101111100001000100100100010110110111111001000111110101100110000010011010101110001001010010000001100111111010110011101111101101001000011000001001100110000010111011001010111011100001011100100010000101010011000000001001101100001011111001101011100110001010111100111000111110010111101111101101111001100101000100111001100010001001101111100110100001001000001101100100010010010100001010001011100110110010001110111001001010010110110010010100011001010001011100011010101001011100110001010010011100001110110001110000000111000111010100010010010100000001010000100110000000011001011010011111101111111110111111110100011101000010011110110010001111101001000100111110111010011100001111101011001000111101011111100110111011110011100000001000111100001011111111100011111011111000011001001011010111111111011110100010110110111111101111101110010111101000010010000110110111010100010011110111100111010011101000110010011001110011001110101100110010011010101111001101001101001101101100001011110101110110101100110100111110100101010110100110001101010111010100
Pair
\(Z_2\)
Length of longest common subsequence
8WPU_1,7CDF_1
130
5
8WPU_1,7WTQ_1
319
4
7CDF_1,7WTQ_1
283
3
Newick tree
[
7WTQ_1:17,
[
8WPU_1:65,7CDF_1:65
]:10
]
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1745
}{\log_{20}
1745}-\frac{669}{\log_{20}669})=266.\)
Status
Protein1
Protein2
d
d1/2
Query variables
8WPU_1
7CDF_1
341
275
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]