CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3IEE_1 3VGJ_1 7WKY_1 Letter Amino acid
25 33 14 L Leucine
9 13 5 M Methionine
9 14 5 F Phenylalanine
22 13 5 A Alanine
8 13 4 R Arginine
17 22 6 D Aspartic acid
17 32 8 E Glutamic acid
12 35 7 I Isoleucine
9 12 13 P Proline
0 5 3 W Tryptophan
13 15 7 Y Tyrosine
13 13 9 Q Glutamine
22 17 6 V Valine
2 11 2 C Cysteine
21 24 2 S Serine
15 8 8 T Threonine
15 29 12 N Asparagine
11 16 4 G Glycine
2 6 7 H Histidine
28 42 14 K Lycine

3IEE_1|Chain A|Putative exported protein|Bacteroides fragilis NCTC 9343 (272559)
>3VGJ_1|Chains A, B|Tyrosyl-tRNA synthetase, putative|Plasmodium falciparum (36329)
>7WKY_1|Chains A, B, C|Bromodomain-containing protein 4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3IEE , Knot 122 270 0.84 38 176 258
GASCSGGDKSKAPVVSTADIENAAEVIKYYNTSLGVLKDMVKEKDVNAVLDYMEQKGKTPALSAIVPPAVVSKDSAIVLNPGNCFNEETRRNLKQNYTGLFQARTEFYANFDTYLSYLKKKDVTNAKKLLDVNYQLSTQMSEYKQNIFDILSPFTEQAELVLLVDNPLKAQIMSVRKMSSTMQSILNLYARKHRMDGPRIDLKVAELTKQLDAAKKLPVVNGHEGEMKSYQAFLSQVETFIKQVKKVREKGEYSDADYDMLTSAFETSII
3VGJ , Knot 162 373 0.85 40 221 352
METTDTKREEQEIEEKKAQEESKIEDVDKILNDILSISSECIQPDELRVKLLLKRKLICYDGFEPSGRMHIAQGLLKSIIVNKLTSNGCTFIFWIADWFAHLNNKMSGDLKKIKKVGSYFIEVWKSCGMNMENVQFLWASEEINKKPNEYWSLVLDISRSFNINRMKRCLKIMGRSEGEENYCSQILYPCMQCADIFFLNVDICQLGIDQRKVNMLAREYCDIKKIKKKPVILSHGMLPGLLEGQEKMSKSDENSAIFMDDSESDVNRKIKKAYCPPNVIENNPIYAYAKSIIFPSYNEFNLVRKEKNGGDKTYYTLQELEHDYVNGFIHPLDLKDNVAMYINKLLQPVRDHFQNNIEAKNLLNEIKKYKVTK
7WKY , Knot 70 141 0.82 40 118 133
MKKGHHHHHHLVPRGSNPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPTEE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3IEE_1)}(2) \setminus P_{f(3VGJ_1)}(2)|=62\), \(|P_{f(3VGJ_1)}(2) \setminus P_{f(3IEE_1)}(2)|=107\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001100001111001010011011000000111100110000101110010001001110111111110000111101100100000001000001110100010101000100100001001001101000100010000001101101100010111110011010110100100010011010100001011010101101000101100111101001010000111001001100100100010000100011001100011
Pair \(Z_2\) Length of longest common subsequence
3IEE_1,3VGJ_1 169 4
3IEE_1,7WKY_1 168 3
3VGJ_1,7WKY_1 195 4

Newick tree

 
[
	3VGJ_1:93.51,
	[
		3IEE_1:84,7WKY_1:84
	]:9.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{643 }{\log_{20} 643}-\frac{270}{\log_{20}270})=104.\)
Status Protein1 Protein2 d d1/2
Query variables 3IEE_1 3VGJ_1 131 111.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]