CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4ZLT_1 8GXM_1 5TNY_1 Letter Amino acid
23 10 16 D Aspartic acid
16 16 15 Q Glutamine
8 13 6 M Methionine
16 5 9 F Phenylalanine
17 5 7 Y Tyrosine
20 5 29 A Alanine
20 6 25 R Arginine
7 1 10 N Asparagine
12 0 0 C Cysteine
35 22 18 E Glutamic acid
29 13 25 G Glycine
18 5 12 H Histidine
22 15 6 K Lycine
32 6 19 T Threonine
4 4 0 W Tryptophan
16 3 22 I Isoleucine
42 13 28 L Leucine
21 3 23 P Proline
28 5 25 S Serine
34 4 37 V Valine

4ZLT_1|Chains A, B|Putative uncharacterized protein|Cricetid herpesvirus 2 (1605972)
>8GXM_1|Chains A, B|SURP and G-patch domain-containing protein 1|Homo sapiens (9606)
>5TNY_1|Chain A|Serine protease HTRA2, mitochondrial|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4ZLT , Knot 173 420 0.83 40 233 400
GPVGEPVASEINEASKVSSRLLTQDILFRKDRQATISLPIKLPVEDIITQTCDKITYGPLKFLDLLEKETAVLPLSTDITCPACLGRAVLVGKWECPAHVAVNESDLTVFGPNKEEHVPQFVTVQQPSDGKMQRLFFAKFLGTEESLAVLRVPGPDGHLCIQEALIHFKELSGAGVCSLWKANDSREEGLEMKQVDCLETTVLENQTCIATTLSKKIYHRLYCGERLMTGGQVSTRVLLTALGFYKRQPYTFHRVPKGMVYVHLIDSGSEDYMEYSECEEVTPGRYEDKQISYTFYTDLFQTADGEPVLASVWGTSGLKDSAYESCAFVIPTDGEEDLVPRRIMSKCYPFRLTYHPSTMTVRLDVRVEKHHGATDQGFVFLKMESGTYSEGREYYLDRVLWGEDSSTNNVLQHHHHHHHH
8GXM , Knot 73 154 0.79 38 109 149
GVTELSDAQKKQLKEQQEMQQMYDMIMQHKRAMQDMQLLWEKAVQQHQHGYDSDEEVDSELGTWEHQLRRMEMDKTREWAEQLTKMGRGKHFIGDFLPPDELEKFMETFKALKEGREPDYSEYKEFKLTVENIGYQMLMKMGWKEGEGLGSEGQ
5TNY , Knot 139 332 0.81 36 177 307
MAVPSPPPASPRSQYNFIADVVEKTAPAVVYIEILDRHPFLGREVPISNGSGFVVAADGLIVTNAHVVADRRRVRVRLLSGDTYEAVVTAVDPVADIATLRIQTKEPLPTLPLGRSADVRQGEFVVAMGSPFALQNTITSGIVSSAQRPARDLGLPQTNVEYIQTDAAIDFGNSGGPLVNLDGEVIGVNTMKVTAGISFAIPSDRLREFLHRGEKKNSSSGISGSQRRYIGVMMLTLSPSILAELQLREPSFPDVQHGVLIHKVILSSPAHRAGLRPGDVILAIGEQMVQNAEDVYEAVRTQSQLAVQIRRGRETLTLYVTPEVTEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4ZLT_1)}(2) \setminus P_{f(8GXM_1)}(2)|=160\), \(|P_{f(8GXM_1)}(2) \setminus P_{f(4ZLT_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111101110010010010001100011100000101011101110011000000100111011011000011111000100110110111110100110111000010111100000110110100100101001111011100001111011110101010011101001011110011010000001101001001000110000011001000100010010011011010001110111100001001001101110101100100001000000010110000001000100011001010111101110011000100001111100100011100110000110100010010101010100001100011111010010000100001001111000000011000000000
Pair \(Z_2\) Length of longest common subsequence
4ZLT_1,8GXM_1 196 4
4ZLT_1,5TNY_1 174 6
8GXM_1,5TNY_1 176 3

Newick tree

 
[
	8GXM_1:95.09,
	[
		4ZLT_1:87,5TNY_1:87
	]:8.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{574 }{\log_{20} 574}-\frac{154}{\log_{20}154})=121.\)
Status Protein1 Protein2 d d1/2
Query variables 4ZLT_1 8GXM_1 154 103.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]