CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3HBK_1 7RGR_1 1JLK_1 Letter Amino acid
16 10 11 D Aspartic acid
17 17 22 L Leucine
24 14 10 K Lycine
18 13 12 V Valine
5 4 1 W Tryptophan
7 8 6 Q Glutamine
22 9 12 E Glutamic acid
9 12 9 I Isoleucine
13 5 3 F Phenylalanine
13 5 10 S Serine
14 10 7 T Threonine
18 14 8 A Alanine
0 1 1 C Cysteine
8 5 3 Y Tyrosine
5 11 6 R Arginine
11 9 8 N Asparagine
27 12 4 G Glycine
8 2 3 H Histidine
2 5 3 M Methionine
8 2 8 P Proline

3HBK_1|Chain A|putative glycosyl hydrolase|Parabacteroides distasonis ATCC 8503 (435591)
>7RGR_1|Chains A, B|Artificial protein L056|synthetic construct (32630)
>1JLK_1|Chains A, B|Response regulator RCP1|Synechocystis sp. (1148)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3HBK , Knot 113 245 0.84 38 166 234
GTEETAKVVPMAVITPAINQLTDQEKAEGWALLFDGKTTKGWRGAHKDAFPDHGWMVKDGELIVQKSDGSESTNGGDIVTEGEYSAFEFSVDFKITEGANSGIKYFVTEQEKQKGSAYGLEFQLLDDAKHPDAKLYTTFPGSRTLGSLYDLKKSENIHFNGVGEWNTAVVKVFPNNHVEHWLNGVKVLEYERGSKEFRDLVKGSKYADPSYNAGGAFGEAPKGHILLQDHGDEVAFRNIKVKELK
7RGR , Knot 79 168 0.80 40 128 160
MAKVVDEFDMLRVDEGLKLTVYQDHLGYWTVGIGHLLTKIKDKAKAIQILDNLLGRKTNGVITEKEARQIFEGDVKKAIQGILSNATLSPIYDILDEVRRCALINMVFQMGVAGVAGFNNSLRMLQEKRWDEAAVNLAQSRWYRQTPNRAKRVISTFKTGTWKAYENL
1JLK , Knot 72 147 0.81 40 112 143
MSDESNPPKVILLVEDSKADSRLVQEVLKTSTIDHELIILRDGLAAMAFLQQQGEYENSPRPNLILLDLNLPKKDGREVLAEIKQNPDLKRIPVVVLTTSHNEDDVIASYELHVNCYLTKSRNLKDLFKMVQGIESFWLETVTLPAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3HBK_1)}(2) \setminus P_{f(7RGR_1)}(2)|=94\), \(|P_{f(7RGR_1)}(2) \setminus P_{f(3HBK_1)}(2)|=56\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000101111111011100100000101111110100001101100011100111100101110000100000110110010001101010101001100110011000000010101101011001001010100011100011010010000010101110100111011100010011011011000010001001101000101000111111011010111000100111001010010
Pair \(Z_2\) Length of longest common subsequence
3HBK_1,7RGR_1 150 4
3HBK_1,1JLK_1 150 3
7RGR_1,1JLK_1 152 3

Newick tree

 
[
	1JLK_1:75.66,
	[
		3HBK_1:75,7RGR_1:75
	]:0.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{413 }{\log_{20} 413}-\frac{168}{\log_{20}168})=72.8\)
Status Protein1 Protein2 d d1/2
Query variables 3HBK_1 7RGR_1 92 76
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]