CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9NNO_1 3DPY_1 3EYK_1 Letter Amino acid
14 21 22 I Isoleucine
37 25 17 R Arginine
26 26 11 Q Glutamine
56 37 24 L Leucine
27 26 17 P Proline
25 22 35 S Serine
24 8 24 T Threonine
6 10 5 W Tryptophan
35 22 24 V Valine
4 2 9 C Cysteine
31 12 28 G Glycine
35 36 10 E Glutamic acid
13 9 7 H Histidine
21 27 19 D Aspartic acid
27 15 19 K Lycine
15 6 2 M Methionine
28 9 8 F Phenylalanine
12 18 9 Y Tyrosine
29 27 13 A Alanine
9 19 20 N Asparagine

9NNO_1|Chain A|Cholesterol 24-hydroxylase|Homo sapiens (9606)
>3DPY_1|Chain A|Protein farnesyltransferase/geranylgeranyltransferase type-1 subunit alpha|Rattus norvegicus (10116)
>3EYK_1|Chain A|Hemagglutinin HA1 chain|Influenza A virus (11320)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9NNO , Knot 193 474 0.83 40 238 445
MHHHHHHSRYEHIPGPPRPSFLLGHLPCFWKKDEVGGRVLQDVFLDWAKKYGPVVRVNVFHKTSVIVTSPESVKKFLMSTKYNKDSKMYRALQTVFGERLFGQGLVSECNYERWHKQRRVIDLAFSRSSLVSLMETFNEKAEQLVEILEAKADGQTPVSMQDMLTYTAMDILAKAAFGMETSMLLGAQKPLSQAVKLMLEGITASRNTLAKFLPGKRKQLREVRESIRFLRQVGRDWVQRRREALKRGEEVPADILTQILKAEEGAQDDEGLLDNFVTFFIAGHETSANHLAFTVMELSRQPEIVARLQAEVDEVIGSKRYLDFEDLGRLQYLSQVLKESLRLYPPAWGTFRLLEEETLIDGVRVPGNTPLLFSTYVMGRMDTYFEDPLTFNPDRFGPGAPKPRFTYFPFSLGHRSCIGQQFAQMEVKVVMAKLLQRLEFRLVPGQRFGLQEQATLKPLDPVLCTLRPRGWQPA
3DPY , Knot 156 377 0.81 40 213 348
MAATEGVGESAPGGEPGQPEQPPPPPPPPPAQQPQEEEMAAEAGEAAASPMDDGFLSLDSPTYVLYRDRAEWADIDPVPQNDGPSPVVQIIYSEKFRDVYDYFRAVLQRDERSERAFKLTRDAIELNAANYTVWHFRRVLLRSLQKDLQEEMNYIIAIIEEQPKNYQVWHHRRVLVEWLKDPSQELEFIADILNQDAKNYHAWQHRQWVIQEFRLWDNELQYVDQLLKEDVRNNSVWNQRHFVISNTTGYSDRAVLEREVQYTLEMIKLVPHNESAWNYLKGILQDRGLSRYPNLLNQLLDLQPSHSSPYLIAFLVDIYEDMLENQCDNKEDILNKALELCEILAKEKDTIRKEYWRYIGRSLQSKHSRESDIPASV
3EYK , Knot 139 323 0.82 40 192 312
GNPIICLGHHAVENGTSVKTLTDNHVEVVSAKELVETKHTDELCPSPLKLVDGQDCDLINGALGSPGCDRLQDTTWDVFIERPTAVDTCYPFDVPDYQSLRSILASSGSLEFIAEQFTWNGVKVDGSSSACLRGGRNSFFSRLNWLTKATNGNYGPINVTKENTGSYVRLYLWGVHHPSSDNEQTDLYKVATGRVTVSTRSDQISIVPNIGSRPRVRNQSGRISIYWTLVNPGDSIIFNSIGNLIAPRGHYKISKSTKSTVLKSDKRIGSCTSPCLTDKGSIQSDKPFQNVSRIAIGNCPKYVKQGSLMLATGMRNIPGKQAK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9NNO_1)}(2) \setminus P_{f(3DPY_1)}(2)|=93\), \(|P_{f(3DPY_1)}(2) \setminus P_{f(9NNO_1)}(2)|=68\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000000111110101111011011000011101100111011000111101011000011100100100111000000000100110011100111011100000001000001101110000110110010001001101101010100110100110001101110111110001111100110011011101101000011011110000100100010110011001100000110010011101100110100110000111001101111100001001110110100010111010101001110000101001101001001100010101111101011000011011011100111100011101000100110101001111110101001110110000110011010101111011001010111100111000101011011100101011011
Pair \(Z_2\) Length of longest common subsequence
9NNO_1,3DPY_1 161 4
9NNO_1,3EYK_1 178 3
3DPY_1,3EYK_1 191 5

Newick tree

 
[
	3EYK_1:95.92,
	[
		9NNO_1:80.5,3DPY_1:80.5
	]:15.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{851 }{\log_{20} 851}-\frac{377}{\log_{20}377})=127.\)
Status Protein1 Protein2 d d1/2
Query variables 9NNO_1 3DPY_1 160 142.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]