Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(8OHD_1)}(2) \setminus P_{f(5VFW_1)}(2)|=16\),
\(|P_{f(5VFW_1)}(2) \setminus P_{f(8OHD_1)}(2)|=24\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010110000111001110101101100010011100011101010011001101111111111111001100111100000001101101101110111011111111100011010011100000100001011011110101111010101101010111111000100000011010010001011111100011100000001100111100011000101110110101111001101101100000110101001110001110000000111100111001000111110101100011110111011011100001000111100111010011010111100110110011011101001011111111100111111110000111111111100011111110101111001001111110111011101111000101011000100011111100011000110110111000110010100110110001101110000000010000001000000001100000001000100000000000000100100000000000000000011111111011100001101110101111101110111011110011111011110011011111100100000011001101100110010010011101010000010010110110101001011001100001111011001111111000110111111110110001111110000100010001000100010000000000000001000001000000110000101000000000111111101010111001111011011011011011011011011011011011011011011110011110000000011101001011000000011011011010001001110000111100111111101111000100100101000000000000001101000100000101111110000001011111111000000001011111010100110100000001011111110011100100000000101101011001000000010000000000001010000010000110110111111110100101010111001111110111101110010000011010100001110111001010010011100011111111000000011110010101010100000011111111111011011110111010101111001101101101001100100010001100010000111101011100111111000110101010101110011111000101011111001001011010110111110111110011010100010011001111011110000111100000001100010011111010100100110001000010001001010011111110111101011101010101001111000111111011011100101000111011110111100111111110000110111110001011011000011010101110011001000110001110101111101111110011001110010001101100110000000011110000000011110110011010000010111000110101000001001010110000100011011110111011001111100001111001111011000011000100000111000011101110111111000110001001101011110011101011110101110100011011100100000110111011110011010010111101110011101001110011110100011010011010001001110000111111110100110011010111011011110110110010111110011110001001111110101011011000100010011100110011000011111011101101001111010011100010100011001001001101100111110111011111011011011011010101010101010101011010101001111110110110110110110110111110101111000000000010000000000010100000000000000000100010100001000000100000111100001011101001010010110111011111110010010110111000011110001111010111000111011110010010111010111000011011011011011101000111011111000011111001111011111111100001010111011011001110101110011001100001111110111011101001000011111110111011011000001001000001100110011111111100111000111000001110001111011011111011101001011110100011010110110101100110000111111100110111110000111111110000000000000101111110111101000011110111000100001111111111000101000011111101001011000011011010001101110000010011000001111100011111111111010111000010100111001010001010001011011100000111101110110000011010100111101101011101111111100110111001110001011000011110111110011000011111001110011001110011110101111011110011101010100101100111011110100100100000000101000111101000000001011000000000100001000010101010010001000000000010000101000000000000000000000001000000100000000000000011111110100101011111011011011111111111110011110110111110011011011000100101111000011011011111010110000001011111111000111010001111110011011011011011000011101011100111000000001011100100001100101101110100101100100000111111000110111010011010100000000000100001010000100101010101000100111110111111011001110110110110011011101101111011110110001000000010000100000001100001000100000010000000000000000110101011011011011011011101101111111001011100110000000010011100010000011110010110000101011010000100001100110100011011001100011110011010111001111111000110010001100111101111010010111110001011011101001101011010110000010001101000011101001111011111110001101111010111011101101111101100101100000001111011001110100001001000110011011010101011101110111011110000010010000010001001000110111100101100111111101110001101111001101111111111110000100111000110000110001101011011111110101111110101111011101111110000011010000000110100000101111110001111011110001001100001011100100110111101001001000011001000000010011000110111101111111011100001111110000010000011010011101000110010101001100111010110001000011111011010011101111110001100111101101010001001110110110101110100001111011100011111111011111000000101111011111110111110001000110000110000011010111010111001011111011110000101100000001100000011100001110111111010011111110010010111110110011000101101100111010001011011010010000001100000011010011000000001001001011110111100010011101001110010001000100110111111010111001110001110010010111101110011000010000100110110101001001001011011000010001101011111111001011100011101000110101010100011001111110011011110111100100100010111100101100111010000011100111100001000111011110110101101101001011110000110011000011101100110000001000100000100110111001000000000000101010000101010101111111010101000010010101001111001111000110101111010000001000011111101111010110011111110110010000000100010010101001010100010111111000110100111001000101110110001000001110011110000101010110111101100000001001011000100111110011000001101011111000100
Pair
\(Z_2\)
Length of longest common subsequence
8OHD_1,5VFW_1
40
1
8OHD_1,4RFE_1
144
4
5VFW_1,4RFE_1
146
2
Newick tree
[
4RFE_1:82.91,
[
8OHD_1:20,5VFW_1:20
]:62.91
]
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{5095
}{\log_{20}
5095}-\frac{25}{\log_{20}25})=1200\)
Status
Protein1
Protein2
d
d1/2
Query variables
8OHD_1
5VFW_1
747
381.5
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]