CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3GPB_1 2GQE_1 3HVM_1 Letter Amino acid
12 1 4 W Tryptophan
62 3 15 V Valine
63 0 6 R Arginine
65 2 22 E Glutamic acid
38 0 13 F Phenylalanine
36 3 22 T Threonine
51 2 21 I Isoleucine
21 1 8 M Methionine
46 1 20 N Asparagine
49 2 22 D Aspartic acid
9 4 8 C Cysteine
48 2 18 G Glycine
36 0 13 Y Tyrosine
63 2 19 A Alanine
78 1 41 L Leucine
48 3 30 K Lycine
29 0 12 S Serine
30 1 12 Q Glutamine
22 1 11 H Histidine
36 3 13 P Proline

3GPB_1|Chain A|GLYCOGEN PHOSPHORYLASE B|Oryctolagus cuniculus (9986)
>2GQE_1|Chain A|Nuclear pore complex protein Nup153|Homo sapiens (9606)
>3HVM_1|Chain A|AGMATINE DEIMINASE|Helicobacter pylori (85963)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3GPB , Knot 321 842 0.85 40 316 771
SRPLSDQEKRKQISVRGLAGVENVTELKKNFNRHLHFTLVKDRNVATPRDYYFALAHTVRDHLVGRWIRTQQHYYEKDPKRIYYLSLEFYMGRTLQNTMVNLALENACDEATYQLGLDMEELEEIEEDAGLGNGGLGRLAACFLDSMATLGLAAYGYGIRYEFGIFNQKICGGWQMEEADDWLRYGNPWEKARPEFTLPVHFYGRVEHTSQGAKWVDTQVVLAMPYDTPVPGYRNNVVNTMRLWSAKAPNDFNLKDFNVGGYIQAVLDRNLAENISRVLYPNDNFFEGKELRLKQEYFVVAATLQDIIRRFKSSKFGCRDPVRTNFDAFPDKVAIQLNDTHPSLAIPELMRVLVDLERLDWDKAWEVTVKTCAYTNHTVIPEALERWPVHLLETLLPRHLQIIYEINQRFLNRVAAAFPGDVDRLRRMSLVEEGAVKRINMAHLCIAGSHAVNGVARIHSEILKKTIFKDFYELEPHKFQNKTNGITPRRWLVLCNPGLAEIIAERIGEEYISDLDQLRKLLSYVDDEAFIRDVAKVKQENKLKFAAYLEREYKVHINPNSLFDVQVKRIHEYKRQLLNCLHVITLYNRIKKEPNKFVVPRTVMIGGKAAPGYHMAKMIIKLITAIGDVVNHDPVVGDRLRVIFLENYRVSLAEKVIPAADLSEQISTAGTEASGTGNMKFMLNGALTIGTMDGANVEMAEEAGEENFFIFGMRVEDVDRLDQRGYNAQEYYDRIPELRQIIEQLSSGFFSPKQPDLFKDIVNMLMHHDRFKVFADYEEYVKCQERVSALYKNPREWTRMVIRNIATSGKFSSDRTIAQYAREIWGVEPSRQRLPAPDEKIP
2GQE , Knot 23 32 0.83 32 30 30
GHMVIGTWDCDTCLVQNKPEAIKCVACETPKP
3HVM , Knot 142 330 0.83 40 192 310
MKRMLAEFEKIQAILMAFPHEFSDWAYCIKEARESFLNIIQTIAKHAKVLVCVHTNDTIGYEMLKNLPGVEIAKVDTNDTWARDFGAISIENHGVLECLDFGFNGWGLKYPSNLDNQVNFKLKSLGFLKHPLKTMPYVLEGGSIESDGAGSILTNTQCLLEKNRNPHLNQNGIETMLKKELGAKQVLWYSYGYLKGDDTDSHTDTLARFLDKDTIVYSACEDKNDEHYTALKKMQEELKTFKKLDKTPYKLIPLEIPKAIFDENQQRLPATYVNFLLCNDALIVPTYNDPKDALILETLKQHTPLEVIGVDCNTLIKQHGSLHCVTMQLY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3GPB_1)}(2) \setminus P_{f(2GQE_1)}(2)|=292\), \(|P_{f(2GQE_1)}(2) \setminus P_{f(3GPB_1)}(2)|=6\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00110000000010101111100100100010001010110000110100001111001000111011000000000010010010101011001000110111001000100011101001001000111101111011101100110111110101100011110001011101001001100101100101010111010101000001101100011111100011110000110010110101100101001011101011100011001001101000110100101000011111010011001000011000110001011100111010000101111011011101001010011010100010000011101100111011001110010110010001100111111101001001011001110010110101110011011101000110001100100101001000001101001111001111011100110001001001001100100011100110100000101110100000101010011010100100000011001011010001000100111100111110111100110111011011101100011110010111100001011001111101000100110010101010111011101101011010110011000111111010010010001001000000110100110010011101001011001101110000101110000010000010110001001001110011001010000011001001111010000111100011
Pair \(Z_2\) Length of longest common subsequence
3GPB_1,2GQE_1 298 3
3GPB_1,3HVM_1 176 5
2GQE_1,3HVM_1 188 3

Newick tree

 
[
	2GQE_1:13.57,
	[
		3GPB_1:88,3HVM_1:88
	]:46.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{874 }{\log_{20} 874}-\frac{32}{\log_{20}32})=244.\)
Status Protein1 Protein2 d d1/2
Query variables 3GPB_1 2GQE_1 312 161.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]