CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8QWW_1 7XJB_1 7YGV_1 Letter Amino acid
1 13 2 Q Glutamine
0 10 12 E Glutamic acid
0 14 12 K Lycine
1 13 2 Y Tyrosine
0 15 4 V Valine
0 16 11 A Alanine
0 6 2 N Asparagine
0 4 0 C Cysteine
0 9 6 M Methionine
0 14 7 S Serine
0 11 4 T Threonine
0 9 7 R Arginine
0 16 13 G Glycine
0 2 2 H Histidine
1 2 0 W Tryptophan
1 9 4 I Isoleucine
0 4 12 F Phenylalanine
0 11 4 P Proline
0 18 9 D Aspartic acid
2 27 17 L Leucine

8QWW_1|Chains A, B|Peptide LYIQWL from Tc5b|synthetic construct (32630)
>7XJB_1|Chains A, B, C, D|Catechol O-methyltransferase|Rattus norvegicus (10116)
>7YGV_1|Chain A|EF-hand domain-containing protein D1|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8QWW , Knot 6 6 0.59 10 5 4
LYIQWL
7XJB , Knot 103 223 0.83 40 150 216
GSMGDTKEQRILRYVQQNAKPGDPQSVLEAIDTYCTQKEWAMNVGDAKGQIMDAVIREYSPSLVLELGAYCGYSAVRMARLLQPGARLLTMEMNPDYAAITQQMLNFAGLQDKVTILNGASQDLIPQLKKKYDVDTLDMVFLDHWKDRYLPDTLLLEKCGLLRKGTVLLADNVIVPGTPDFLAYVRGSSSFECTHYSSYLEYMKVVDGLEKAIYQGPSSPDKS
7YGV , Knot 63 130 0.78 36 100 128
GAMGSGTARPGRSKVFNPYTEFPEFSRRLLKDLEKMFKTYDAGRDGFIDLMELKLMMEKLGAPQTHLGLKSMIKEVDEDFDGKLSFREFLLIFHKAAAGELQEDSGLLALAKFSEIDVALEGVRGAKNFF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8QWW_1)}(2) \setminus P_{f(7XJB_1)}(2)|=5\), \(|P_{f(7XJB_1)}(2) \setminus P_{f(8QWW_1)}(2)|=150\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011
Pair \(Z_2\) Length of longest common subsequence
8QWW_1,7XJB_1 155 1
8QWW_1,7YGV_1 105 1
7XJB_1,7YGV_1 156 3

Newick tree

 
[
	7XJB_1:84.50,
	[
		8QWW_1:52.5,7YGV_1:52.5
	]:32.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{229 }{\log_{20} 229}-\frac{6}{\log_{20}6})=79.0\)
Status Protein1 Protein2 d d1/2
Query variables 8QWW_1 7XJB_1 100 51
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: