CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8PIY_1 7ACQ_1 6PKU_1 Letter Amino acid
9 9 24 A Alanine
2 3 6 C Cysteine
11 11 8 I Isoleucine
9 8 23 S Serine
4 4 14 H Histidine
12 12 3 K Lycine
6 6 12 F Phenylalanine
16 15 25 V Valine
11 11 21 L Leucine
0 0 4 W Tryptophan
4 4 14 N Asparagine
9 9 13 Q Glutamine
13 13 17 E Glutamic acid
10 11 34 G Glycine
12 12 19 T Threonine
8 8 4 Y Tyrosine
10 10 25 R Arginine
15 16 12 D Aspartic acid
4 4 4 M Methionine
4 4 18 P Proline

8PIY_1|Chain A|RASK GTPase (Fragment)|Homo sapiens (9606)
>7ACQ_1|Chains A, B, C|GTPase KRas|Homo sapiens (9606)
>6PKU_1|Chains A, B, C, D|N-acetylglucosamine-1-phosphodiester alpha-N-acetylglucosaminidase (NAGPA)|Cavia porcellus (10141)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8PIY , Knot 82 169 0.83 38 132 167
MTEYKLVVVGAVGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKSDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEK
7ACQ , Knot 83 170 0.83 38 134 168
GMTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEK
6PKU , Knot 130 300 0.82 40 178 280
DRHHHHHHGSGPYPRARLRPVRDSTPVHTGSLKHENWPPPPAAPGAGPPAVRTFVSHFGGRAVSGHLTRAAAPLRTFSVLEPGGPGGCSQKRRATVEETAQAAACRIAQNGGFFRMNTGECLGNVVSDGRRVSSSGGLQNAQFGIRRDGTLVTGYLSEEEVLDTENPFVQLLSGVVWLIRNGSIYINESQATESDETQETGSFSKFVNVMSARTAIGHDRDGQLVLFHADGQTEQRGINLWEMAEFLLRQGVVNAINLDGGGSATFVLNGTLASYPSDHCQDNMWRCPRRVSTVVCVHEP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8PIY_1)}(2) \setminus P_{f(7ACQ_1)}(2)|=2\), \(|P_{f(7ACQ_1)}(2) \setminus P_{f(8PIY_1)}(2)|=4\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000011111111110011010110001100001010000000111010001101100110000011000010010111011110000010010000001001000001111111000011000100001001100011111000100001100110011001000000
Pair \(Z_2\) Length of longest common subsequence
8PIY_1,7ACQ_1 6 105
8PIY_1,6PKU_1 186 4
7ACQ_1,6PKU_1 190 4

Newick tree

 
[
	6PKU_1:10.53,
	[
		8PIY_1:3,7ACQ_1:3
	]:10.53
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{339 }{\log_{20} 339}-\frac{169}{\log_{20}169})=51.4\)
Status Protein1 Protein2 d d1/2
Query variables 8PIY_1 7ACQ_1 3 3
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]