Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(8XVV_1)}(2) \setminus P_{f(3NQH_1)}(2)|=136\),
\(|P_{f(3NQH_1)}(2) \setminus P_{f(8XVV_1)}(2)|=17\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100101100100010000101100100011010111011111110101011001000100110001101001010100111111100010111111101001110101010010010100101001100010000100101111011001011111111110000101111010111001010100110111001011001101101010011011010000100100001101001110011111111101110100100101110100110101100001101111101111000111111111111111100111100100011111100111001100110011001001111010110100001000000101100110001101111001010110111100000111010100001010000000000000001100000111101111110010001100001110011100000101011001111001110111110011101011111010110111000000000100111100110100111011101111111011100110110010110100010111000010111111100011111001101110110110101010010011011100011101111010111000000111101011110110001100100100011011000011111011001001100100000000001000101000100011010011110000110100110100010011001011100110000101111001100110000000100001000000010011100100100110010011010101010000001101001000100101011011000010011010000101010000100000010000100111000001001100101111011010011111000110101010000100010001100000001111001111001011001011010100110101110111101010100010101101101110000011101110100001011110011110010111111011000101110111100001101010100101110110011000010100001101001010100000110110110010100111001001011000010111010000011110011000110110110111111001010011011000000000011101001001111000000100010000001100010000011000111011000110010110110111010010001111010011000111110010101110110001100101011011110001000010110000110011001000111110111101010011011001001010011110001100111001011101110101111010101010111111000010101100011010001101101101010001010011011101011001110011100111100101011010110110110101100011010011010101001010000100101101101010110110111010101111111001001111011110011111101111010111110111011111011010011011111000000011000100101100000001110100110101110010101010101001001111000000000100111010001001001100111111111111101011011110000101100110001110100100001101101101011010010101111110010001001111001111101101110100100101000100000001100100000110111000000011011010011100001011101010010001100001010011010010001100100011001110100001111000010011010011001111101001111000101000111011011101100100100010001001111100011000000110000100100110110010110001100101100010000000000111011011011010010000101010000101100000010010010001010001111101101100000101001101100101110101111010000000000101110000000101111100110010111101000100000011100011110111011010101100010111000111101100110111010110110010101100110000010000000000000111100010000001100001010000100001000010110101100011101111101100010010111001100001111101101010011000011100001001111011110101111100111111010111001111111101010101000101101110101110011011111101000101001101110101110011111110100100011011111110011101110110001110111011010101101100011011001010111000101100110011101001011011100101101011000000000000000000000000000000000000101101010100110101110101001101000010111011110100111010101010000110000101001011011111010011010110100110001010101000110110111101111011111100111001110111101111010011100111011010011010001100001101011011111000101001001110001101101110001100110010011110010001011011011000011101001101000101001110010111001011111010101100011000110010111001111011011100001000101010111101001001100
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{3564
}{\log_{20}
3564}-\frac{441}{\log_{20}441})=740.\)
Status
Protein1
Protein2
d
d1/2
Query variables
8XVV_1
3NQH_1
911
522.5
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]