CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4DGW_1 8TIT_1 3BLQ_1 Letter Amino acid
20 0 13 Y Tyrosine
10 0 21 V Valine
17 7 20 A Alanine
26 0 21 D Aspartic acid
36 0 28 K Lycine
44 0 43 L Leucine
29 0 19 R Arginine
19 0 12 Q Glutamine
14 8 19 G Glycine
7 0 7 H Histidine
23 0 18 I Isoleucine
23 0 11 F Phenylalanine
11 0 16 P Proline
31 0 15 S Serine
20 0 15 N Asparagine
4 11 7 C Cysteine
46 0 20 E Glutamic acid
13 5 12 T Threonine
7 0 9 M Methionine
2 0 5 W Tryptophan

4DGW_1|Chain A|Pre-mRNA-splicing factor PRP9|Saccharomyces cerevisiae (559292)
>8TIT_1|Chain A|DNA (5'-D(CP*CP*CP*GP*GP*AP*CP*CP*TP*GP*TP*GP*AP*CP*AP*AP*AP*TP*TP*GP*CP*CP*CP*TP*CP*AP*GP*AP*CP*GP*G)-3')||Escherichia coli (562)
>3BLQ_1|Chain A|Cell division protein kinase 9|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4DGW , Knot 166 402 0.82 40 205 381
GSPGISGGGGILEMNLLETRRSLLEEMEIIENAIAERIQRNPELYYHYIQESSKVFPDTKLPRSSLIAENKIYKFKKVKRKRKQIILQQHEINIFLRDYQEKQQTFNKINRPEETQEDDKDLPNFERKLQQLEKELKNEDENFELDINSKKDKYALFSSSSDPSRRTNILSDRARDLDLNEIFTRDEQYGEYMELEQFHSLWLNVIKRGDCSLLQFLDILELFLDDEKYLLTPPMDRKNDRYMAFLLKLSKYVETFFFKSYALLDAAAVENLIKSDFEHSYCRGSLRSEAKGIYCPFCSRWFKTSSVFESHLVGKIHKKNESKRRNFVYSEYKLHRYLKYLNDEFSRTRSFVERKLAFTANERMAEMDILTQKYEAPAYDSTEKEGAEQVDGEQRDGQLQEE
8TIT , Knot 12 31 0.44 8 15 24
CCCGGACCTGTGACAAATTGCCCTCAGACGG
3BLQ , Knot 141 331 0.82 40 207 319
GPAKQYDSVECPFCDEVSKYEKLAKIGQGTFGEVFKARHRKTGQKVALKKVLMENEKEGFPITALREIKILQLLKHENVVNLIEICRTKASPYNRCKGSIYLVFDFCEHDLAGLLSNVLVKFTLSEIKRVMQMLLNGLYYIHRNKILHRDMKAANVLITRDGVLKLADFGLARAFSLAKNSQPNRYTNRVVTLWYRPPELLLGERDYGPPIDLWGAGCIMAEMWTRSPIMQGNTEQHQLALISQLCGSITPEVWPNVDNYELYEKLELVKGQKRKVKDRLKAYVRDPYALDLIDKLLVLDPAQRIDSDDALNHDFFWSDPMPSDLKGMLST

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4DGW_1)}(2) \setminus P_{f(8TIT_1)}(2)|=202\), \(|P_{f(8TIT_1)}(2) \setminus P_{f(4DGW_1)}(2)|=12\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101110111111010110000011001011001110010001010000100000111000110001110001001001000000111000010111000000000100100100000000011010001001000100000010101000000011100000100000110001001010011000000100101001001110110010001101101101110000011011100000001111101000100111000111011110011000100000010100010110011000110000110001110100000000001100000100010010001000001100011101000110101100000111000000011001010000101000
Pair \(Z_2\) Length of longest common subsequence
4DGW_1,8TIT_1 214 3
4DGW_1,3BLQ_1 152 4
8TIT_1,3BLQ_1 208 2

Newick tree

 
[
	8TIT_1:11.65,
	[
		4DGW_1:76,3BLQ_1:76
	]:37.65
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{433 }{\log_{20} 433}-\frac{31}{\log_{20}31})=126.\)
Status Protein1 Protein2 d d1/2
Query variables 4DGW_1 8TIT_1 164 87.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]