CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7MLH_1 5UUM_1 6VED_1 Letter Amino acid
10 12 14 E Glutamic acid
13 12 9 G Glycine
6 9 7 I Isoleucine
5 14 18 R Arginine
4 4 1 H Histidine
14 14 7 L Leucine
14 10 8 K Lycine
1 3 5 M Methionine
23 10 6 T Threonine
14 13 18 V Valine
14 6 4 Q Glutamine
5 7 4 F Phenylalanine
4 3 3 W Tryptophan
16 8 10 A Alanine
8 4 10 N Asparagine
6 11 15 D Aspartic acid
5 1 2 C Cysteine
14 2 10 P Proline
27 11 9 S Serine
9 2 8 Y Tyrosine

7MLH_1|Chains A, D|IgE Light Chain|Homo sapiens (9606)
>5UUM_1|Chains A, B|Induced myeloid leukemia cell differentiation protein Mcl-1|Homo sapiens (9606)
>6VED_1|Chain A|E3 ubiquitin-protein ligase UHRF1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7MLH , Knot 98 212 0.82 40 145 205
DIQMTQSPSTLSASVGDRVTITCRASQNINNWLAWYQQKPGKAPNLLIYKASTLETGVPSRFSGSGSGTEFTLTISSLQPDDLATYYCQQYHSHRTFGQGTKVEVKGQPKANPTVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADGSPVKAGVETTKPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECS
5UUM , Knot 80 156 0.86 40 122 153
GSDELYRQSLEIISRYLREQATGAKDTKPMGRSGATSRKALETLRRVGDGVQRNHETAFQGMLRKLDIKNEDDVKSLSRVMIHVFSDGVTNWGRIVTLISFGAFVAKHLKTINQESCIEPLAESITDVLVRTKRDWLVKQRGWDGFVEFFHVEDLE
6VED , Knot 82 168 0.83 40 130 162
GLYKVNEYVDARDTNMGAWFEAQVVRVTRKAPSRDEPCSSTSRPALEEDVIYHVKYDDYPENGVVQMNSRDVRARARTIIKWQDLEVGQVVMLNYNPDNPKERGFWYDAEISRKRETRTARELYANVVLGDDSLNDCRIIFVDEVFKIERPGEGSPMVDNPMRRKSGP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7MLH_1)}(2) \setminus P_{f(5UUM_1)}(2)|=93\), \(|P_{f(5UUM_1)}(2) \setminus P_{f(7MLH_1)}(2)|=70\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01010001001010110010100010001001111000011011011100100100111001010101001010100101001100000000000011010010101010101010111100001010010110110010111101110101011011100001000000001100010101001000000000100010010001110000
Pair \(Z_2\) Length of longest common subsequence
7MLH_1,5UUM_1 163 3
7MLH_1,6VED_1 175 3
5UUM_1,6VED_1 142 3

Newick tree

 
[
	7MLH_1:88.61,
	[
		5UUM_1:71,6VED_1:71
	]:17.61
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{368 }{\log_{20} 368}-\frac{156}{\log_{20}156})=63.9\)
Status Protein1 Protein2 d d1/2
Query variables 7MLH_1 5UUM_1 80 70.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]