CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8EBR_1 4RLM_1 8QLA_1 Letter Amino acid
6 6 19 K Lycine
9 10 23 S Serine
18 6 34 V Valine
7 3 16 Q Glutamine
25 12 36 G Glycine
27 8 32 L Leucine
40 12 36 A Alanine
3 2 10 M Methionine
15 2 20 P Proline
23 7 29 T Threonine
23 7 27 D Aspartic acid
4 1 13 H Histidine
10 6 27 I Isoleucine
11 2 37 E Glutamic acid
6 3 20 F Phenylalanine
3 6 4 W Tryptophan
9 3 19 Y Tyrosine
19 11 21 R Arginine
6 14 16 N Asparagine
3 8 12 C Cysteine

8EBR_1|Chains A, B, C, D|Beta-lactamase|Mycobacterium tuberculosis (1773)
>4RLM_1|Chain A|Lysozyme C|Gallus gallus (9031)
>8QLA_1|Chain A|Tubulin alpha-1B chain|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8EBR , Knot 118 267 0.82 40 155 249
GADLADRFAELERRYDARLGVYVPATGTTAAIEYRADERFAFCSTFKAPLVAAVLHQNPLTHLDKLITYTSDDIRSISPVAQQHVQTGMTIGQLCDAAIRYSDGTAANLLLADLGGPGGGTAAFTGYLRSLGDTVSRLDAEEPELNRDPPGDERDTTTPHAIALVLQQLVLGNALPPDKRALLTDWMARNTTGAKRIRAGFPADWKVIDKTGTGDYGRANDIAVVWSPTGVPYVVAVMSDRAGGGYDAEPREALLAEAATCVAGVLA
4RLM , Knot 66 129 0.82 40 104 127
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
8QLA , Knot 192 451 0.86 40 258 426
MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITASLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLSVAEITNACFEPANQMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRSIQFVDWCPTGFKVGINYQPPTVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSEAREDMAALEKDYEEVGVDSVEGEGEEEGEEY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8EBR_1)}(2) \setminus P_{f(4RLM_1)}(2)|=110\), \(|P_{f(4RLM_1)}(2) \setminus P_{f(8EBR_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110110011010000010111011101001110001000111000101111111100011001001100000010010111000100110110100111000010110111101111111011101010011001001010010100011100000001011111100111101111000111001110000110010111110101100010100101001111101011101111100011110010100111101100111111
Pair \(Z_2\) Length of longest common subsequence
8EBR_1,4RLM_1 169 3
8EBR_1,8QLA_1 181 4
4RLM_1,8QLA_1 222 4

Newick tree

 
[
	8QLA_1:10.27,
	[
		8EBR_1:84.5,4RLM_1:84.5
	]:21.77
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{396 }{\log_{20} 396}-\frac{129}{\log_{20}129})=80.7\)
Status Protein1 Protein2 d d1/2
Query variables 8EBR_1 4RLM_1 102 76
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]