CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6EZA_1 7UIK_1 2MXU_1 Letter Amino acid
11 0 1 M Methionine
16 6 0 T Threonine
10 0 1 Y Tyrosine
18 0 3 D Aspartic acid
17 0 1 Q Glutamine
12 0 3 H Histidine
17 11 6 G Glycine
21 0 3 I Isoleucine
30 0 2 L Leucine
7 0 3 F Phenylalanine
1 0 0 W Tryptophan
30 8 4 A Alanine
8 0 1 N Asparagine
9 13 0 C Cysteine
23 0 3 E Glutamic acid
13 0 0 P Proline
27 0 6 V Valine
25 0 1 R Arginine
15 0 2 K Lycine
17 0 2 S Serine

6EZA_1|Chains A, B|tRNA-dihydrouridine(20) synthase [NAD(P)+]-like|Homo sapiens (9606)
>7UIK_1|Chain A[auth X]|DNA (38-MER)|Saccharomyces cerevisiae (559292)
>2MXU_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|Amyloid beta A4 protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6EZA , Knot 139 327 0.82 40 194 310
MHHHHHHLILAPMVRVGTLPMRLLALDYGADIVYCEELIDLKMIQCKRVVNEVLSTVDFVAPDDRVVFRTCEREQNRVVFQMGTSDAERALAVARLVENDVAGIDVNMGCPKQYSTKGGMGAALLSDPDKIEKILSTLVKGTRRPVTCKIRILPSLEDTLSLVKRIERTGIAAIAVHGRKREERPQHPVSCEVIKAIADTLSIPVIANGGSHDHIQQYSDIEDFRQATAASSVMVARAAMWNPSIFLKEGLRPLEEVMQKYIRYAVQYDNHYTNTKYCLCQMLREQLKSPQGRLLHAAQSSREICEAFGLGAFYEETTQELDAQQAR
7UIK , Knot 13 38 0.41 8 12 24
ACCGGAGGACAGTCCTCCCGACTGACTGACGTCGTACG
2MXU , Knot 26 42 0.77 32 39 40
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6EZA_1)}(2) \setminus P_{f(7UIK_1)}(2)|=187\), \(|P_{f(7UIK_1)}(2) \setminus P_{f(6EZA_1)}(2)|=5\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000011111110110111011110011011000011010110000110011001011110001110000000001110110001001111101100011110101101000000111111110010010011001101000110001011101000101100100011111110100000010011000110111001011111011000010000010010010110011110111101011100110110011000100110000000000001001100010010101101100000100111111100000001010010
Pair \(Z_2\) Length of longest common subsequence
6EZA_1,7UIK_1 192 2
6EZA_1,2MXU_1 185 3
7UIK_1,2MXU_1 47 2

Newick tree

 
[
	6EZA_1:10.00,
	[
		2MXU_1:23.5,7UIK_1:23.5
	]:84.50
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{365 }{\log_{20} 365}-\frac{38}{\log_{20}38})=104.\)
Status Protein1 Protein2 d d1/2
Query variables 6EZA_1 7UIK_1 134 72.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]