CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3BQJ_1 4KPI_1 1HDJ_1 Letter Amino acid
10 10 8 R Arginine
19 11 1 N Asparagine
5 4 1 C Cysteine
24 19 5 L Leucine
16 11 3 S Serine
17 7 2 Q Glutamine
11 18 10 E Glutamic acid
10 14 1 H Histidine
31 18 1 T Threonine
17 11 7 A Alanine
14 10 6 D Aspartic acid
6 11 1 M Methionine
26 11 3 P Proline
4 3 0 W Tryptophan
29 26 8 G Glycine
8 9 3 I Isoleucine
7 18 8 K Lycine
16 10 2 F Phenylalanine
10 10 6 Y Tyrosine
24 14 1 V Valine

3BQJ_1|Chain A|va387 polypeptide|Norovirus isolates (150080)
>4KPI_1|Chain A|Reversibly photoswitchable red fluorescent protein rsTagRFP|synthetic construct (32630)
>1HDJ_1|Chain A|HUMAN HSP40|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3BQJ , Knot 133 304 0.83 40 186 292
APFTGPILTVEEMSNSRFPIPLEKLYTGPSGAFVVQPQNGRCTTDGVLLGTTQLSAVNICTFRGDVTGVAGSHDYIMNLASQNWNNYDPTEEIPAPLGTPDFVGKIQGMLTQTTREDGSTRAHKATVSTGSVHFTPKLGSVQYTTDTNNDLQTGQNTKFTPVGVIQDGNHQNEPGGWVLPNYSGRTGHNVHLAPAVAPTFPGEQLLFFRSTMPGCSGYPNMNLDCLLPQEWVQHFYQEAAPAQSDVALLRFVNPDTGRVLFECKLHKSGYVTVAHTGPHDLVIPPNGYFRFDSWVNQFYTLAPM
4KPI , Knot 115 245 0.86 40 173 234
MRGSHHHHHHGSMSELIKENMHMKLYMEGTVNNHHFKCTSEGEGKPYEGTQTMRIKVVEGGPLPFAFDILATSFMYGSRTFINHTQGIPDFWKQSFPEGFTWERVTTYEDGGVLTATQDTSLQDGCLIYNVKIRGVNFPSNGPVMQKKTLGWEAATEMLYPADGGLEGRGDMALKLVGGGHLICNLKTTYRSKNPAKNLKMPGVYFVDHRLERIKEADKETYVEQHEVAVARYCDLPSKLGHKLN
1HDJ , Knot 44 77 0.82 38 64 75
MGKDYYQTLGLARGASDEEIKRAYRRQALRYHPDKNKEPGAEEKFKEIAEAYDVLSDPRKREIFDRYGEEGLKGSGC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3BQJ_1)}(2) \setminus P_{f(4KPI_1)}(2)|=92\), \(|P_{f(4KPI_1)}(2) \setminus P_{f(3BQJ_1)}(2)|=79\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1110111101001000011111001001101111101001000001111100010110100101010111100001101100010000100011111101011101011100000001000100101001010101011010000000001001000010111110010000011111110001001001011111110111001111000111001010101001110011001000111100011110110100101110001000101011001100111110101010011001001111
Pair \(Z_2\) Length of longest common subsequence
3BQJ_1,4KPI_1 171 3
3BQJ_1,1HDJ_1 184 3
4KPI_1,1HDJ_1 183 3

Newick tree

 
[
	1HDJ_1:93.74,
	[
		3BQJ_1:85.5,4KPI_1:85.5
	]:8.24
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{549 }{\log_{20} 549}-\frac{245}{\log_{20}245})=86.5\)
Status Protein1 Protein2 d d1/2
Query variables 3BQJ_1 4KPI_1 109 98.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]