Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(9BLZ_1)}(2) \setminus P_{f(1PVG_1)}(2)|=166\),
\(|P_{f(1PVG_1)}(2) \setminus P_{f(9BLZ_1)}(2)|=1\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10011111100101110101100110101100010011111100110111110111000011001001100101001110000100011001000001100010101001100001111000111010011000101101000010001001100111111000100010100010011101000110101111010001011010111011100110000001001010011001001011001001100110010010010001101011001011101001100100000010111010110010010101010000110011001000011100111001101001001001111110010010000011001101101100010001101110001101100010011110101100100000010111001100000001011101011000101010010010000001011110110101011100001011010010110111011010110010110001001011010001001101110000001001000101010001101001001101100101111010101110000001100100010010001010010001001001001111010111100100010101001001110110001010010001001010100001100110010000111010110100001010010110101011101101000100101111011111100100100101111011001000000000100000101111110001011110111111000010101001100110100010011110001010100100010000010011001001100101000001111100101010011110101110110011110100010101000110100011101010011001010001101011100000010001111011110110100000011100010000010001100110111110000011111100100010111000011010100100011001001011110100101010010000011111100101000101000010001100110111001001000100000010000100100001101100100100010010001010001001100001011101101001010111100110000011000110101011000011000000110010000110101010011011010010110100000001010011010001110100001011100100101110010011001001000111010100100010111001001110100010001100110101010111101000110000100110010101110010110110101000011100111110101110011001001100001011000000011011001100100010010110101000110001101000100111110111010001101011101010100111100001001000111110010001111011010110001001101110100111001000000110101110001101110000110100010011111001110000011111000010011100110100010100110110001010110111001001011101001010001011000010111101011100010011001111101111001100101010111001110011100001001100110000100011000100100101100101010100001100101011010100110011100011001100000101001101011101111110100001011100110111110000010101110111110011111010010010001101100010010011000001000000111000110001010101111101011011000110010011001110010001110111000110010111001111101000010000000111011001110110100001001000000010110010110011000111001000111011100111110110011111000010101100010010001010010100111111001101001001001111111010100111011101100101101110110101100001010101000010011100110011001010100001111010101011001001100001101101001011101011101001001011010001111100011000111001110100111001000100000100001001101110100011011010100011100110011010011010010011011011001000110001001011101001000100011011110101000101010110010010011110110111100010101010110101101010000111101111010010001110011100011110111101001011011011101011110100100101110010000000001011111110110111110001011010001000110110011001110000000110100101111001100110011000110011110100111101001010100111011101000101101111010010000100000100100100100110111011011001110111011100110110001100000010000100111001101000011001110001100001110000100010101011000010111111001100110100110010101111110111000100111110110100101000001001000100110001000001111100001100111001001110101111101000101100000110001111000001001100011001011101010001100011001111000110111010001100110010001010010011100111100011011000011100011100010010101100110011101000101100010110000001000010101110010001001001000101000010100111000100110000010000110001000100000111000101000100101111010011001000011010011011111011100101111000001001001110001110110101001001100010000100100000110010110111101111010010110010110001001000100000010010011001010110000001111001011010111101010000111001010000100000010001001110011011111011010001000110010001001010100011000010010001010100111001000011110010001111010101001110000000100001100110001001101101111001000011101110001000110111011000101010111110000101011101000101101010000100001001101001010000001101010101010010001101100101011000011001001000110100010000111001001000011100100010101001001011000010111010001100010101100000010110001101110011011100001011111101010101101000101001101001110110010101101001011101001111001110101000111110000100011011000011011101100111101101001111101110001100110110011010011100101001111001110010101001110000010011110101100100110011001011110010111111101000100101010101110101010111011011011110111110101100100111001000100010101111110111000100111110000011000100100010011000101000101001110110011100101101000100011001100110000100010110010100010110110000110110111000010111110010011100011011001101011000001101000000000000010111100100010011011100100100010010011101100010111011001000110110100100000001001100110111100100001111101101100100010010010111101110010010101111111010101000011010010100101010100001101010011101101011000000101001100111100101100000000101101110101001011101010110000100100011111000
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{5064
}{\log_{20}
5064}-\frac{418}{\log_{20}418})=1068\)
Status
Protein1
Protein2
d
d1/2
Query variables
9BLZ_1
1PVG_1
1363
738.5
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]