CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7GSL_1 3NXZ_1 7KZD_1 Letter Amino acid
23 18 44 K Lycine
25 12 21 S Serine
17 14 24 V Valine
18 10 24 D Aspartic acid
4 1 7 C Cysteine
14 5 16 Q Glutamine
13 5 18 T Threonine
6 1 1 W Tryptophan
11 6 34 N Asparagine
30 15 33 E Glutamic acid
14 7 24 G Glycine
11 2 10 M Methionine
14 9 17 F Phenylalanine
9 3 17 Y Tyrosine
18 13 45 I Isoleucine
29 19 48 L Leucine
22 6 15 P Proline
13 11 20 A Alanine
19 9 16 R Arginine
11 4 11 H Histidine

7GSL_1|Chain A|Tyrosine-protein phosphatase non-receptor type 1|Homo sapiens (9606)
>3NXZ_1|Chains A, B, C, D|Urease accessory protein ureE|Helicobacter pylori (210)
>7KZD_1|Chains A, B, C, D, E, F, G, H|Aminotransferase class I/II-fold pyridoxal phosphate-dependent enzyme|Bacillus cereus (1396)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7GSL , Knot 141 321 0.84 40 205 309
MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPSRVAKLPKNKNRNRYRDVSPFDHSRIKLHQEDNDYINASLIKMEEAQRSYILTQGPLPNTVGHFWEMVWEQKSRGVVMLNRVMEKGSLKCAQYWPQKEEKEMIFEDTNLKLTLISEDIKSYYTVRQLELENLTTQETREILHFHYTTWPDFGVPESPASFLNFLFKVRESGSLSPEHGPVVVHCSAGIGRSGTFCLADTCLLLMDKRKDPSSVDIKKVLLEMRKFRMGLIQTADQLRFSYLAVIEGAKFIMGDSSVQDQWKELSHEDLEPPPEHIPPPPRPPKRILEPHN
3NXZ , Knot 83 170 0.83 40 120 163
MIIERLVGNLRDLNPLDFSVDHVDLEWFETRKKIARFKTRQGKDIAIRLKDAPKLGLSQGDILFKEEKEIIAVNILDSEVIHIQAKSVAEVAKICYEIGNRHAALYYGESQFEFKTPFEKPTLALLEKLGVQNRVLSSKLDSKERLTVSMPHSEPNFKVSLASDFKVVVK
7KZD , Knot 183 445 0.83 40 235 419
GAMDMKTLTTISGHSKDNLALLKCLQGETKEKEFEISNVLPNHKMKEKLFRENKLKIDIDIEKDIFNYSRKNIQKIEFMPVNRLISQSEIDGIIGTLKEVLPTGQFTSGPFSKKLEEVIGDYLNKKYVIATSSGTDALMVSLLSIGIQPGDEVIMPANSFAATENAVLAIGAKPVFVDIDHKSYCIDPLKIEEAITQKTKCILPVHLYGKQCDMKRIREIADVYQLRIIEDACQAIGSSNLGEYGDIIILSFNPYKNFGVCGKAGAIVTNNENLAIRCNQYSYHGFEVDKKNKKVLDFGFNSKIDNLQAAIGLERIKFLSYNNLKRVFLAQRYIRNLKELEDRELIKLPRMTEDNVWHLFPIRIINGRRDEVKNKLYQLYNIETDIYYPVLSHKHNTKLVKKNYMQDTLLNTEQVHKEILHLPLHPNMLLEEQNFVLEGLINVNK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7GSL_1)}(2) \setminus P_{f(3NXZ_1)}(2)|=127\), \(|P_{f(3NXZ_1)}(2) \setminus P_{f(7GSL_1)}(2)|=42\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000100100010111100010001001100110110000000000101100001010000000101011010010000110011110011011011100000111110011001010010011000000111000010101100010000010010100100000001101000011011110011011011101000101010011111000111100101011000111100000100101001110100101111001001010011110110111100010001001000010111001111101100110100
Pair \(Z_2\) Length of longest common subsequence
7GSL_1,3NXZ_1 169 5
7GSL_1,7KZD_1 172 5
3NXZ_1,7KZD_1 185 4

Newick tree

 
[
	7KZD_1:90.85,
	[
		7GSL_1:84.5,3NXZ_1:84.5
	]:6.35
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{491 }{\log_{20} 491}-\frac{170}{\log_{20}170})=93.9\)
Status Protein1 Protein2 d d1/2
Query variables 7GSL_1 3NXZ_1 120 89
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]