CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GGM_1 6PFK_1 5JJM_1 Letter Amino acid
15 26 19 V Valine
19 26 13 A Alanine
20 19 14 R Arginine
5 8 11 N Asparagine
17 20 11 D Aspartic acid
14 12 8 H Histidine
9 7 17 F Phenylalanine
4 3 6 C Cysteine
15 9 15 Q Glutamine
20 46 9 G Glycine
8 18 14 K Lycine
3 5 13 M Methionine
14 7 12 P Proline
5 30 15 I Isoleucine
21 22 28 L Leucine
22 19 6 T Threonine
9 1 4 W Tryptophan
26 20 13 E Glutamic acid
16 12 17 S Serine
12 9 7 Y Tyrosine

6GGM_1|Chains A, C|MHC class I antigen|Homo sapiens (9606)
>6PFK_1|Chains A, B, C, D|PHOSPHOFRUCTOKINASE|Geobacillus stearothermophilus (1422)
>5JJM_1|Chains A, E[auth C], G[auth D]|Androgen receptor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GGM , Knot 119 274 0.81 40 177 261
GSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVEWLHKYLEKGKETLLHLEPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQQDGEGHTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPVTLRW
6PFK , Knot 140 319 0.84 40 176 301
MKRIGVLTSGGDSPGMNAAIRSVVRKAIYHGVEVYGVYHGYAGLIAGNIKKLEVGDVGDIIHRGGTILYTARCPEFKTEEGQKKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEHGFPCVGVPGTIDNDIPGTDFTIGFDTALNTVIDAIDKIRDTATSHERTYVIEVMGRHAGDIALWSGLAGGAETILIPEADYDMNDVIARLKRGHERGKKHSIIIVAEGVGSGVDFGRQIQEATGFETRVTVLGHVQRGGSPTAFDRVLASRLGARAVELLLEGKGGRCVGIQNNQLVDHDIAEALANKHTIDQRMYALSKELSI
5JJM , Knot 121 252 0.88 40 186 244
ECQPIFLNVLEAIEPGVVCAGHDNNQPDSFAALLSSLNELGERQLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFTFDLLIKSHMVSVDFPEMMAEIISVQVPKILSGKVKPIYFHTQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GGM_1)}(2) \setminus P_{f(6PFK_1)}(2)|=93\), \(|P_{f(6PFK_1)}(2) \setminus P_{f(6GGM_1)}(2)|=92\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000100100010011010101101101000011010001101011101111000100010000001000101101010010100000011000101101001110101101000110010001010001001011001101000000010010000101000010110001001000110101100010001100001010011111011010101000101000000110001110101001111111010000000010001110110101
Pair \(Z_2\) Length of longest common subsequence
6GGM_1,6PFK_1 185 4
6GGM_1,5JJM_1 183 4
6PFK_1,5JJM_1 188 3

Newick tree

 
[
	6PFK_1:93.83,
	[
		6GGM_1:91.5,5JJM_1:91.5
	]:2.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{593 }{\log_{20} 593}-\frac{274}{\log_{20}274})=89.7\)
Status Protein1 Protein2 d d1/2
Query variables 6GGM_1 6PFK_1 114 107
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]