Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(9NWD_1)}(2) \setminus P_{f(5VGY_1)}(2)|=222\),
\(|P_{f(5VGY_1)}(2) \setminus P_{f(9NWD_1)}(2)|=3\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11001100111111111011011000111011101110100011010011011101100000110000000110001111000010010011100010011110011101011010010010110000111110110010001000011010111001011001001001000001101101010000101011001001101001100110000110100000000001011000011110011010011100011010101101100100100111100111111011010110011001110101110010111101101001001011010010111011001111011010010010000000000001101100010100111011000001110000010111110110011111010101110010000111110100110000011101011110100111110111100110110011001010110011000111111011100001001001100111101110110000001011000001010001010000000000010000000000000011110110001010000111111111111000101001000111001011100001011111100110110001100000110001010100001101101100100011010000011111001000110001001010001100111111001111101010010110011111000101010101100101100110010001101111000001010010111111001000100111011101100101010101011111110111110011000001110110010001101111110100000000100110011001000100010001110101001101010000100100110011100110000101110111111001001000000010110101100011111011111110000100101001010000110010100010100010110001100110100100110110000000100010110001100100100100001101001101001001011100101010001001110001000001000111101011000101001011001000100001000010111111111000001001110110011001001000100100001101101001110001100001011011011010000111110100011011010101111000101101100111111000000110100001001111100001110100011010001110010100110001100010010001000010011001100010110111101000101010001101100110100001010110101011011010110101110010001100000101100000110110001100000110110111110101110011101010111011111101101101110101001110110000001000011001010110100111100000110011010011000010110010101000110100011001110000001000000010001001010000110001000000011011110010101000000100100101100011000100111100010011000100011000101000110010000110010101001011000010000100010100001000101011111101101110110001001011100001001100100100110100011110110001110010100010010010011010110011101100101000011100001010110101110010000001010011011111011010101000001110110000110100010100011101011010111011111100001111010110100101011010101111000100101110001001111100110100011001001001110100110100001000000111111010000110111100001001110100001011011110100001100001110010011001111001000011111111010011100100111010100111100010000000011110001010101101000001101010100110110110000010100000001011101100000100101110011010010010001000110110001111010100000011101101011001100110010111001010100001101110000110100010111110101111011011010100000111100110011010100101001000010100001110000101100111001011000111111100000001100110111011111010000001110100000100000001110011001000000100101011001110100111101001101000011010001100100100010100001100110011010100100111111011110010101011101101000001001001001010111010111010000111011010000001011001000011100000000010001000111011010000000001001010110001110010010010101101111110111111011101000011011111010000010000111100111010110000101101000010111000010011001001000110011011000110110011100010100010101010101111101000011100001000101001010100010100000101101111001100110100111101110101111100010100000011100110011101110000100000001100101111011011100000100001000001100101111100111000101100110010000000011100011010000011010111100010101101101000110011101100100100000011111100011011000111000111000100111110100000001001001000101100110001111010110100101100001101100101010111000101001010000110111010111001101111011001101001111111001000100001111100101000000000000000000000100010000010011100100110000110110011100000010101001010100000000001110111011101110100110110111010100100000100000011011000001100010001000101110101001000101100010111001010010100000000011011100010010101101000011001010000001011101000110100100101011000101011111110011101101000001000010010001011101110100100100000000100000011100101100010101110101001101100000000110010011001001000111001010011001001110010000101111000010100011011000010000010010011001110000110001000011000000010101010000110111010000000010101100001011011100111001110011100110001001111100010011011000010100010011110100110101101011001000111100010000001010100110111111010011110010110101100110111100000001110110010100001010101110001010001100011101101010110000100101000011010011000100001101011000110011101100110011001101110110000011011000100101110011001110001100101010111011110110110001101111001010001001011001011100110100100010001110110101010011100001100000111011001001000000111110100100001000001111100100110100001001110100010000110101110100000111111100100010000011111000011011100011010111101000110000010110110010111101000110010000000000001001111110011100110011110010010011011101100010101000011010100101111010111110000000111111001101101110000101100001011101000011111001000110001011011101110101101001011100101000100000000100011100100111110000010010011100110001100100011010010101100110011111110110111100110011110001101001001000011101100110110001010001011000001000011111000111011100000101100011100100110011100010001001010011110010001110010001000010001001011000001111011010001001110010000011111111011001110011000001000010001000101001011110111000101001111000010111011001101100001000000010111001000110011010110010111101111001010010110011100010111111100100011000010000111111101100110011000001110001100100001110011001100100011110010011011111001001001100110011
Pair
\(Z_2\)
Length of longest common subsequence
9NWD_1,5VGY_1
225
4
9NWD_1,5URF_1
124
5
5VGY_1,5URF_1
187
4
Newick tree
[
5VGY_1:11.94,
[
9NWD_1:62,5URF_1:62
]:51.94
]
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{5443
}{\log_{20}
5443}-\frac{260}{\log_{20}260})=1193\)
Status
Protein1
Protein2
d
d1/2
Query variables
9NWD_1
5VGY_1
1523
797
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]