CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1JJJ_1 7PUL_1 3VBR_1 Letter Amino acid
6 1 1 C Cysteine
11 28 14 G Glycine
14 24 11 K Lycine
4 18 13 F Phenylalanine
1 12 20 P Proline
10 22 15 V Valine
6 12 12 R Arginine
4 29 11 N Asparagine
8 31 9 D Aspartic acid
6 13 8 Q Glutamine
12 12 9 E Glutamic acid
6 24 7 I Isoleucine
6 31 18 A Alanine
10 19 13 L Leucine
5 9 10 M Methionine
4 24 15 S Serine
2 6 4 W Tryptophan
1 5 3 H Histidine
17 12 20 T Threonine
2 23 12 Y Tyrosine

1JJJ_1|Chain A|EPIDERMAL-TYPE FATTY ACID BINDING PROTEIN (E-FABP)|Homo sapiens (9606)
>7PUL_1|Chain A|Beta-N-acetylhexosaminidase|Enterococcus faecalis (1351)
>3VBR_1|Chain A|Genome Polyprotein, capsid protein VP1|Human enterovirus 71 (39054)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1JJJ , Knot 69 135 0.83 40 100 130
MATVQQLEGRWRLVDSKGFDEYMKELGVGIALRKMGAMAKPDCIITCDGKNLTIKTESTLKTTQFSCTLGEKFEETTADGRKTQTVCNFTDGALVQHQEWDGKESTITRKLKDGKLVVECVMNNVTCTRIYEKVE
7PUL , Knot 147 355 0.81 40 200 335
GGSGQTQPLKSVFSIDAGRKYFSVEQLEELVAKASQNGYTDVQLILGNDGLRFILDDMSVNVNGKKYNHNRVSKAIQRGNNAYYNDPNGNALTQKEMDRLLAFAKARNINIIPVINSPGHMDALLVAMEKLAIKNPAFDGSKRTVDLGNQKAVNFTKAIISKYVAYFSAHSEIFNFGGDEYANDVDTGGWAKLQSSGRYKDFVAYANDLAKIIKDAGMQPMSFNDGIYYNSDDSFGTFDPEIIISYWTAGWSGYDVAKPEYFVQKGHKIFNTNDAWYWVAGNVDSGIYQYDDALANMSKKAFTDVPAGSPNLPIIGSIQCVWYDDPRRDYDFERIYTLMDTFSENYREYMVVKNH
3VBR , Knot 107 225 0.85 40 157 217
HSTAETTLDSFFSRAGLVGEIDLPLKGTTNPNGYANWDIDITGYAQMRRKVELFTYMRFDAEFTFVACTPTGEVVPQLLQYMFVPPGAPKPDSRESLAWQTATNPSVFVKLSDPPAQVSVPFMSPASAYQWFYDGYPTFGEHKQEKDLEYGAMPNNMMGTFSVRTVGTSKSKYPLVVRIYMRMKHVRAWIPRPMRNQNYLFKANPNYAGNSIKPTGASRTAITTL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1JJJ_1)}(2) \setminus P_{f(7PUL_1)}(2)|=47\), \(|P_{f(7PUL_1)}(2) \setminus P_{f(1JJJ_1)}(2)|=147\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110100101010110001100010011111110011111010011000100101000001000010001100100001010000010010011110000101000010001001011100110010000100010
Pair \(Z_2\) Length of longest common subsequence
1JJJ_1,7PUL_1 194 3
1JJJ_1,3VBR_1 165 3
7PUL_1,3VBR_1 167 3

Newick tree

 
[
	7PUL_1:93.01,
	[
		1JJJ_1:82.5,3VBR_1:82.5
	]:10.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{490 }{\log_{20} 490}-\frac{135}{\log_{20}135})=105.\)
Status Protein1 Protein2 d d1/2
Query variables 1JJJ_1 7PUL_1 132 91
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]