CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2GFW_1 4CHA_1 7DMZ_1 Letter Amino acid
10 1 16 Q Glutamine
2 0 4 W Tryptophan
10 0 19 Y Tyrosine
53 1 36 A Alanine
18 0 21 R Arginine
16 0 13 H Histidine
28 1 27 I Isoleucine
27 2 34 V Valine
16 0 16 N Asparagine
20 0 27 D Aspartic acid
15 2 20 P Proline
34 1 23 S Serine
23 0 29 T Threonine
5 1 12 C Cysteine
14 0 20 F Phenylalanine
29 2 32 L Leucine
17 0 19 K Lycine
14 0 10 M Methionine
22 0 37 E Glutamic acid
54 2 36 G Glycine

2GFW_1|Chain A|3-oxoacyl-[acyl-carrier-protein] synthase 2|Escherichia coli (562)
>4CHA_1|Chains A, D[auth E]|ALPHA-CHYMOTRYPSIN A|Bos taurus (9913)
>7DMZ_1|Chains A, B, C[auth D]|Tubulin alpha-1B chain|Sus scrofa (9823)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2GFW , Knot 174 427 0.82 40 219 396
MRGSHHHHHHGSACVSKRRVVVTGLGMLSPVGNTVESTWKALLAGQSGISLIDHFDTSAYATKFAGLVKDFNCEDIISRKEQRKMDAFIQYGIVAGVQAMQDSGLEITEENATRIGAAIGSGIGGLGLIEENHTSLMNGGPRKISPFFVPSTIVNMVAGHLTIMYGLRGPSISIATACTSGVHNIGHAARIIAYGDADVMVAGGAEKASTPLGVGGFGAARALSTRNDNPQAASRPWDKERDGFVLGDGAGMLVLEEYEHAKKRGAKIYAELVGFGMSSDAYHMTSPPENGAGAALAMANALRDAGIEASQIGYVNAHGTSTPAGDKAEAQAVKTIFGEAASRVLVSSTKSMTGHLLGAAGAVESIYSILALRDQAVPPTINLDNPDEGCDLDFVPHEARQVSGMEYTLCNSFGFGGTNGSLIFKKI
4CHA , Knot 11 13 0.72 18 12 11
CGVPAIQPVLSGL
7DMZ , Knot 192 451 0.86 40 258 426
MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITASLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLSVAEITNACFEPANQMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRSIQFVDWCPTGFKVGINYQPPTVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSEAREDMAALEKDYEEVGVDSVEGEGEEEGEEY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2GFW_1)}(2) \setminus P_{f(4CHA_1)}(2)|=209\), \(|P_{f(4CHA_1)}(2) \setminus P_{f(2GFW_1)}(2)|=2\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010000000101010000111011111011100100010111110011011001000101001111100100001100000001011100111111011000110100001001111110111111110000001101110010111110011011110101101101101011010001100110110111010101111111001001111111111011000000101100110000011111011111110000010001101010111111000100100110011111111101100111010011010101000111001010110011101100111000001010111111110010011110001111010100100100101110010010110001000111110010111001
Pair \(Z_2\) Length of longest common subsequence
2GFW_1,4CHA_1 211 3
2GFW_1,7DMZ_1 155 4
4CHA_1,7DMZ_1 248 3

Newick tree

 
[
	4CHA_1:12.17,
	[
		2GFW_1:77.5,7DMZ_1:77.5
	]:47.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{440 }{\log_{20} 440}-\frac{13}{\log_{20}13})=136.\)
Status Protein1 Protein2 d d1/2
Query variables 2GFW_1 4CHA_1 170 87.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]