CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6ZGY_1 7NSY_1 7LUE_1 Letter Amino acid
1 4 5 W Tryptophan
11 8 20 Y Tyrosine
31 14 45 V Valine
31 10 18 R Arginine
13 13 21 D Aspartic acid
8 2 14 C Cysteine
20 10 22 Q Glutamine
22 20 19 P Proline
29 6 28 E Glutamic acid
15 11 11 H Histidine
6 10 43 K Lycine
9 7 8 M Methionine
9 12 35 I Isoleucine
47 12 52 L Leucine
11 8 18 F Phenylalanine
51 19 24 A Alanine
5 6 41 N Asparagine
33 22 29 G Glycine
30 14 55 S Serine
17 9 40 T Threonine

6ZGY_1|Chains A, B, C[auth D], D[auth E]|Galactokinase|Homo sapiens (9606)
>7NSY_1|Chains A[auth AAA], B[auth BBB]|Isoform A of Peptidoglycan-recognition protein LB|Drosophila melanogaster (7227)
>7LUE_1|Chains A, B, C|Fusion glycoprotein F0|Respiratory syncytial virus A2 (1972429)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6ZGY , Knot 163 399 0.81 40 205 366
MAHHHHHHAALRQPQVAELLAEARRAFREEFGAEPELAVSAPGRVNLIGEHTDYNQGLVLPMALELMTVLVGSPRKDGLVSLLTTSEGADEPQRLQFPLPTAQRSLEPGTPRWANYVKGVIQYYPAAPLPGFSAVVVSSVPLGGGLSSSASLEVATYTFLQQLCPDSGTIAARAQVCQQAEHSFAGMPCGIMDQFISLMGQKGHALLIDCRSLETSLVPLSDPKLAVLITNSNVRHSLASSEYPVRRRQCEEVARALGAASLREVQLEELEAARDLVSKEGFRRARHVVGEIRRTAQAAAALRRGDYRAFGRLMVESHRSLRDDYEVSCPELDQLVEAALAVPGVYGSRMTGGGFGGCTVTLLEASAAPHAMRHIQEHYGGTATFYLSQAADGAKVLCL
7NSY , Knot 102 217 0.84 40 155 209
GPMQQANLGDGVATARLLSRSDWGARLPKSVEHFQGPAPYVIIHHSYMPAVCYSTPDCMKSMRDMQDFHQLERGWNDIGYSFGIGGDGMIYTGRGFNVIGAHAPKYNDKSVGIVLIGDWRTELPPKQMLDAAKNLIAFGVFKGYIDPAYKLLGHRQVRDTESPGGRLFAEISSWPHFTHINDTEGVSSTTAPVVPHVHPQAAAPQKPHQSPPAAPKV
7LUE , Knot 219 548 0.84 40 248 505
QNITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKEIKCNGTDAKVKLIKQELDKYKNAVTELQLLMQSTPATNNRARRELPRFMNYTLNNAKKTNVTLSKKRKRRFLGFLLGVGSAIASGVAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSIPNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSNNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNVDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDEFDASISQVNEKINQSLAFIRKSDELLSAIGGYIPEAPRDGQAYVRKDGEWVLLSTFLGSLEVLFQGPGHHHHHHHHSAWSHPQFEK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6ZGY_1)}(2) \setminus P_{f(7NSY_1)}(2)|=109\), \(|P_{f(7NSY_1)}(2) \setminus P_{f(6ZGY_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000001110010110111010011000111010111011101011100000001111111101101111010001110110000110010010111101000101101011001011100011111111011110011111110001010110001100101001011101010001000111110111001101110010111100001000111100101111100001000110000110000000110111110100101001011001100011001001110100010111110010001110111000001000001001010011011111111010010111111001011010111011001000011010101001101101101
Pair \(Z_2\) Length of longest common subsequence
6ZGY_1,7NSY_1 168 4
6ZGY_1,7LUE_1 155 6
7NSY_1,7LUE_1 193 4

Newick tree

 
[
	7NSY_1:94.39,
	[
		6ZGY_1:77.5,7LUE_1:77.5
	]:16.89
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{616 }{\log_{20} 616}-\frac{217}{\log_{20}217})=113.\)
Status Protein1 Protein2 d d1/2
Query variables 6ZGY_1 7NSY_1 142 110
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]