CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8SCV_1 3IKT_1 7NDG_1 Letter Amino acid
9 11 29 P Proline
17 7 26 T Threonine
9 5 15 Y Tyrosine
10 17 37 R Arginine
14 4 22 N Asparagine
9 3 18 H Histidine
12 9 16 F Phenylalanine
21 22 22 V Valine
22 18 35 A Alanine
21 7 28 D Aspartic acid
11 6 17 Q Glutamine
27 27 23 L Leucine
20 8 30 S Serine
25 16 18 E Glutamic acid
20 22 31 G Glycine
22 9 20 K Lycine
11 2 7 M Methionine
5 1 34 C Cysteine
19 11 9 I Isoleucine
1 2 4 W Tryptophan

8SCV_1|Chain A|Interleukin-1 receptor-associated kinase 4|Homo sapiens (9606)
>3IKT_1|Chains A, B|Redox-sensing transcriptional repressor rex|Thermus thermophilus HB27 (262724)
>7NDG_1|Chains A, F[auth D], I[auth G]|Netrin-1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8SCV , Knot 134 305 0.83 40 192 292
GAMGVSDTRFHSFSFYELKNVTNNFDERPISVGGNKMGEGGFGVVYKGYVNNTTVAVKKLAAMVDITTEELKQQFDQEIKVMAKCQHENLVELLGFSSDGDDLCLVYVYMPNGSLLDRLSCLDGTPPLSWHMRCKIAQGAANGINFLHENHHIHRDIKSANILLDEAFTAKISDFGLARASEKFAQTVMTSRIVGTTAYMAPEALRGEITPKSDIYSFGVVLLEIITGLPAVDEHREPQLLLDIKEEIEDEEKTIEDYIDKKMNDADSTSVEAMYSVASQCLHEKKNKRPDIKKVQQLLQEMTAS
3IKT , Knot 94 207 0.80 40 128 200
GMKVPEAAISRLITYLRILEELEAQGVHRTSSEQLGELAQVTAFQVRKDLSYFGSYGTRGVGYTVPVLKRELRHILGLNRKWGLCIVGMGRLGSALADYPGFGESFELRGFFDVDPEKVGRPVRGGVIEHVDLLPQRVPGRIEIALLTVPREAAQKAADLLVAAGIKGILNFAPVVLEVPKEVAVENVDFLAGLTRLSFAILNPKWR
7NDG , Knot 188 441 0.86 40 240 418
GPGLSMFAGQAAQPDPCSDENGHPRRCIPDFVNAAFGKDVRVSSTCGRPPARYCVVSERGEERLRSCHLCNASDPKKAHPPAFLTDLNNPHNLTCWQSENYLQFPHNVTLTLSLGKKFEVTYVSLQFCSPRPESMAIYKSMDYGRTWVPFQFYSTQCRKMYNRPHRAPITKQNEQEAVCTDSHTDMRPLSGGLIAFSTLDGRPSAHDFDNSPVLQDWVTATDIRVAFSRLHTFGDENEDDSELARDSYFYAVSDLQVGGRCKCNGHAARCVRDRDDSLVCDCRHNTAGPECDRCKPFHYDRPWQRATAREANECVACNCNLHARRCRFNMELYKLSGRKSGGVCLNCRHNTAGRHCHYCKEGYYRDMGKPITHRKACKACDCHPVGAAGKTCNQTTGQCPCKDGVTGITCNRCAKGYQQSRSPIAPCIKGSGTETSQVAPA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8SCV_1)}(2) \setminus P_{f(3IKT_1)}(2)|=110\), \(|P_{f(3IKT_1)}(2) \setminus P_{f(8SCV_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11111000010010100100100010001101110011011111100101000011100111110100001000100010111000000110111100010010110101101011001001010111010100011011101101100000100010010111001101010011110100011001100011100101110110101010001001111110110111110000010111010001000000100010001001000010110011000100000001010010011001010
Pair \(Z_2\) Length of longest common subsequence
8SCV_1,3IKT_1 156 3
8SCV_1,7NDG_1 190 3
3IKT_1,7NDG_1 190 3

Newick tree

 
[
	7NDG_1:10.02,
	[
		8SCV_1:78,3IKT_1:78
	]:22.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{512 }{\log_{20} 512}-\frac{207}{\log_{20}207})=88.1\)
Status Protein1 Protein2 d d1/2
Query variables 8SCV_1 3IKT_1 112 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]