CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4OIA_1 5UMJ_1 1UIY_1 Letter Amino acid
42 5 19 R Arginine
10 3 1 C Cysteine
2 3 5 M Methionine
1 3 12 K Lycine
30 8 13 P Proline
15 4 10 D Aspartic acid
28 9 21 G Glycine
5 6 4 I Isoleucine
22 10 7 S Serine
4 1 1 W Tryptophan
3 5 4 Y Tyrosine
41 12 40 A Alanine
16 6 2 Q Glutamine
15 4 9 F Phenylalanine
48 11 38 L Leucine
27 4 9 T Threonine
25 8 25 V Valine
13 8 7 N Asparagine
29 3 23 E Glutamic acid
2 1 3 H Histidine

4OIA_1|Chains A, B|Intercellular adhesion molecule 5|Homo sapiens (9606)
>5UMJ_1|Chains A, B, C|Macrophage migration inhibitory factor|Homo sapiens (9606)
>1UIY_1|Chain A|Enoyl-CoA Hydratase|Thermus thermophilus (274)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4OIA , Knot 147 378 0.77 40 162 331
EPFWADLQPRVAFVERGGSLWLNCSTNCPRPERGGLETSLRRNGTQRGLRWLARQLVDIREPETQPVCFFRCARRTLQARGLIRTFQRPDRVELMPLPPWQPVGENFTLSCRVPGAGPRASLTLTLLRGAQELIRRSFAGEPPRARGAVLTATVLARREDHGANFSCRAELDLRPHGLGLFENSSAPRELRTFSLSPDAPRLAAPRLLEVGSERPVSCTLDGLFPASEARVYLALGDQNLSPDVTLEGDAFVATATATASAEQEGARQLVCNVTLGGENRETRENVTIYSFPAPLLTLSEPSVSEGQMVTVTCAAGAQALVTLEGVPAAVPGQPAQLQLNATENDDRRSFFCDATLDVDGETLIKNRSAELRVLYAPR
5UMJ , Knot 60 114 0.83 40 96 111
PMFIVNTNVPRASVPDGFLSELTQQLAQATGKPPQYIAVHVVPDQLMAFGGSSEPCALCSLASIGKIGGAQNRSYSKLLCGLLAERLRISPDRVYINYYDMNAANVGWNNSTFA
1UIY , Knot 108 253 0.78 40 141 229
MVQVEKGHVAVVFLNDPERRNPLSPEMALSLLQALDDLEADPGVRAVVLTGRGKAFSAGADLAFLERVTELGAEENYRHSLSLMRLFHRVYTYPKPTVAAVNGPAVAGGAGLALACDLVVMDEEARLGYTEVKIGFVAALVSVILVRAVGEKAAKDLLLTGRLVEAREAKALGLVNRIAPPGKALEEAKALAEEVAKNAPTSLRLTKELLLALPGMGLEDGFRLAALANAWVRETGDLAEGIRAFFEKRPPRF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4OIA_1)}(2) \setminus P_{f(5UMJ_1)}(2)|=107\), \(|P_{f(5UMJ_1)}(2) \setminus P_{f(4OIA_1)}(2)|=41\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011110101011110011011100000010100111000100010001101110011010010001101100100010101110010010010111111101110010100011111101010101101100110001110110101111010111000001101000101010101111100001100100101010110111101101100011000101111100101011110001010101010111101010101000110011001011100000000101001111110100101001011010011110111010111111110110101010000000011001010101001100001010110110
Pair \(Z_2\) Length of longest common subsequence
4OIA_1,5UMJ_1 148 4
4OIA_1,1UIY_1 135 4
5UMJ_1,1UIY_1 147 3

Newick tree

 
[
	5UMJ_1:75.71,
	[
		4OIA_1:67.5,1UIY_1:67.5
	]:8.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{492 }{\log_{20} 492}-\frac{114}{\log_{20}114})=112.\)
Status Protein1 Protein2 d d1/2
Query variables 4OIA_1 5UMJ_1 130 86.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]