CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6MRL_1 6PPA_1 5PIR_1 Letter Amino acid
24 6 20 N Asparagine
13 7 14 D Aspartic acid
22 10 20 I Isoleucine
17 11 24 L Leucine
43 2 14 V Valine
18 3 18 R Arginine
36 2 27 G Glycine
4 8 11 M Methionine
17 7 19 F Phenylalanine
4 0 8 W Tryptophan
16 9 15 Q Glutamine
15 13 23 K Lycine
19 6 18 P Proline
35 10 18 S Serine
41 5 20 T Threonine
14 5 19 Y Tyrosine
33 7 25 A Alanine
2 1 7 C Cysteine
10 9 26 E Glutamic acid
3 2 18 H Histidine

6MRL_1|Chains A, B, C[auth D]|p41|Cucumber leaf spot virus (165432)
>6PPA_1|Chain A|Bromodomain-containing protein 7|Homo sapiens (9606)
>5PIR_1|Chain A|Lysine-specific demethylase 4D|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6MRL , Knot 156 386 0.80 40 191 353
MEIARTNKNSVVKYVPAAVGAAYQMGKSIVPYAPTIVDALGNVVSRATGRKKKSKGKEVQNQIVGGIGAIAAPVSITKRVRGMRPSFRQTKGKVHIVHRELVTSVINLVGNFRVNNNVSAQIGQFRINPSNSSLFTWLPTIASNFDSYRFTSIRFVYVPLCATTETGRVSLFWDKDSQDPLPVDRAALSSYGHSNEGPPWAETTLNVPTDGKQRFVTDSNTTDRKLVDLGQFAFATYAGGSNNQIGDIYVEYGVEFSEAQPAGGLTQYITKSVGATASTTGPSYVVDANINVNATTANVEFFSPGTFLITAVVYGSTIASPSMAGGNGTLIGDLPVVGGSNASIWTCVFSTTGVSTSVPTFTQAGTGLTRVQYTITRVNSQTAYQV
6PPA , Knot 65 123 0.84 38 104 120
ESEVEQTPLQEALNQLMRQLQRKDPSAFFSFPVTDFIAPGYSMIIKHPMDFSTMKEKIKNNDYQSIEELKDNFKLMCTNAMIYNKPETIYYKAAKKLLHSGMKILSQERIQSLKQSIDFMADL
5PIR , Knot 160 364 0.86 40 229 347
MHHHHHHSSGVDLGTENLYFQSMETMKSKANCAQNPNCNIMIFHPTKEEFNDFDKYIAYMESQGAHRAGLAKIIPPKEWKARETYDNISEILIATPLQQVASGRAGVFTQYHKKKKAMTVGEYRHLANSKKYQTPPHQNFEDLERKYWKNRIYNSPIYGADISGSLFDENTKQWNLGHLGTIQDLLEKECGVVIEGVNTPYLYFGMWKTTFAWHTEDMDLYSINYLHLGEPKTWYVVPPEHGQRLERLARELFPGSSRGCGAFLRHKVALISPTVLKENGIPFNRITQEAGEFMVTFPYGYHAGFNHGFNCAEAINFATPRWIDYGKMASQCSCGEARVTFSMDAFVRILQPERYDLWKRGQDR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6MRL_1)}(2) \setminus P_{f(6PPA_1)}(2)|=137\), \(|P_{f(6PPA_1)}(2) \setminus P_{f(6MRL_1)}(2)|=50\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110000001100111111110011001110110110111011001010000001001000111111111111010001011010100001010110001100110111010100010101101010100001101110110010000100101101110100001010111000000111100111000100001111100010110010001100000000011011011110011100001101010011010010111110001000111010001100110101010100101011011011101110100110101111010111011111100101100110001100011010011011001000100100001001
Pair \(Z_2\) Length of longest common subsequence
6MRL_1,6PPA_1 187 3
6MRL_1,5PIR_1 188 4
6PPA_1,5PIR_1 197 4

Newick tree

 
[
	5PIR_1:97.18,
	[
		6MRL_1:93.5,6PPA_1:93.5
	]:3.68
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{509 }{\log_{20} 509}-\frac{123}{\log_{20}123})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 6MRL_1 6PPA_1 140 93.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]