CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9DHS_1 3CPK_1 4YSB_1 Letter Amino acid
29 25 20 A Alanine
11 4 9 N Asparagine
17 8 18 D Aspartic acid
39 4 13 S Serine
9 1 0 W Tryptophan
8 7 11 Q Glutamine
27 4 9 I Isoleucine
34 1 4 K Lycine
14 4 3 M Methionine
24 8 19 T Threonine
19 3 6 Y Tyrosine
33 15 22 V Valine
16 10 17 R Arginine
7 0 2 C Cysteine
36 16 23 L Leucine
13 12 10 P Proline
32 9 12 E Glutamic acid
37 13 17 G Glycine
3 3 10 H Histidine
22 3 8 F Phenylalanine

9DHS_1|Chains A, B, C, D|Isoform Flip of Glutamate receptor 2|Rattus norvegicus (10116)
>3CPK_1|Chain A|Uncharacterized protein Q7W7N7_BORPA|Bordetella parapertussis 12822 (257311)
>4YSB_1|Chains A, B|Metallo-beta-lactamase family protein|Myxococcus xanthus (strain DK 1622) (246197)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9DHS , Knot 179 430 0.84 40 229 406
EQKTVVVTTILESPYVMMKKNHEMLEGNERYEGYCVDLAAEIAKHCGFKYKLTIVGDGKYGARDADTKIWNGMVGELVYGKADIAIAPLTITLVREEVIDFSKPFMSLGISIMIKKPQKSKPGVFSFLDPLAYEIWMCIVFAYIGVSVVLFLVSRFSPYEWHTEEFEDGRETQSSESTNEFGIFNSLWFSLGAFMQQGCDISPRSLSGRIVGGVWWFFTLIIISSYTANLAAFLTVERMVSPIESAEDLSKQTEIAYGTLDSGSTKEFFRRSKIAVFDKMWTYMRSAEPSVFVRTTAEGVARVRKSKGKYAYLLESTMNEYIEQRKPCDTMKVGGNLDSKGYGIATPKGSSLGTPVNLAVLKLSEQGVLDKLKNKWWYDKGECGAKDSGSKEKTSALSLSNVAGVFYILVGGLGLAMLVALIEFCYKSRA
3CPK , Knot 71 150 0.79 38 101 142
MKLHTDPATALNTVTAYGDGYIEVNQVRFSHAIAFAPEGPVASWPVQRPADITASLLQQAAGLAEVVRDPLAFLDEPEAGAGARPANAPEVLLVGTGRRQHLLGPEQVRPLLAMGVGVEAMDTQAAARTYNILMAEGRRVVVALLPDGDS
4YSB , Knot 105 233 0.81 38 154 225
MIFRQLFDSESSTYTYLIGDEATRQAVLIDPVLEQVDRDLQMVAELDLTLTHVFDTHVHADHITASGALRERTQATVVGSVNGASCANVQVRHGDEVRVGQLVFQVLATPGHTDDSISYLLGDRVFTGDALLVRGNGRTDFQNGNASQLYDSLTRVLFTLPDETLVYPGHDYKGRTVTSIAEEKRHNPRVAGKSREEFIHIMENLNLPRPKLIDAAVPANRACGHTAPSPQGA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9DHS_1)}(2) \setminus P_{f(3CPK_1)}(2)|=161\), \(|P_{f(3CPK_1)}(2) \setminus P_{f(9DHS_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000111001100101110000011010000010010111011000110001011101001100100011011110110101011111101011000110100111011101110010000111101101110011101111011101111110010100100001001000000000001111001110111110010010100101011111111101111000010111110100110110010010000011010100100001100001111001100100101011100010111010000100101100010001000010001011101000101110101001101101111010001110010001100010011000100000011010011111011111111111111101000001
Pair \(Z_2\) Length of longest common subsequence
9DHS_1,3CPK_1 194 4
9DHS_1,4YSB_1 179 3
3CPK_1,4YSB_1 139 4

Newick tree

 
[
	9DHS_1:10.01,
	[
		4YSB_1:69.5,3CPK_1:69.5
	]:30.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{580 }{\log_{20} 580}-\frac{150}{\log_{20}150})=124.\)
Status Protein1 Protein2 d d1/2
Query variables 9DHS_1 3CPK_1 161 106.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]