CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8WWB_1 1XCX_1 1ALB_1 Letter Amino acid
1 11 2 C Cysteine
7 14 2 Q Glutamine
7 11 0 H Histidine
4 9 4 M Methionine
18 36 16 V Valine
9 34 11 D Aspartic acid
9 22 14 K Lycine
8 17 2 W Tryptophan
14 19 8 E Glutamic acid
9 28 7 R Arginine
24 50 11 G Glycine
11 28 8 I Isoleucine
3 23 1 P Proline
17 21 11 T Threonine
20 27 7 A Alanine
29 25 6 L Leucine
13 26 6 F Phenylalanine
20 32 9 S Serine
8 21 2 Y Tyrosine
2 42 4 N Asparagine

8WWB_1|Chain A|Sigma non-opioid intracellular receptor 1|Xenopus laevis (8355)
>1XCX_1|Chain A|Alpha-amylase|Homo sapiens (9606)
>1ALB_1|Chain A|ADIPOCYTE LIPID-BINDING PROTEIN|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8WWB , Knot 104 233 0.81 40 150 223
GGGGRSVDTMALWLGLRAVLVVAGLAVLLQLIRGWLSSKSYVFNREEIARLAKEHSGLDYEVAFSKIIVELRKKHPGHILQDEDLQWVFVNAGGWMGSMCLLHASLTEYVLLFGTAVDTGGHSGRYWAEISDTILSGTFRQWKEGTTKSEIFYPGDTIVHEVGEATSVQWSSGTWMVEYGRGFIPSTLAFALADTIFSTQDFLTLFYTVKVYSKALLLEASTHLSQLGFFAAA
1XCX , Knot 212 496 0.88 40 258 473
QYSPNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIYNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLTGLLDLALEKDYVRSKIAEYMNHLIDIGVAGFRLDASKHMWPGDIKAILDKLHNLNSNWFPAGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFVPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRQFQNGNDVNDWVGPPNNNGVIKEVTINPDTTCGNDWVCEHRWRQIRNMVIFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWSFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAEDPFIAIHAESKL
1ALB , Knot 64 131 0.79 38 97 127
CDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDLVTIRSESTFKNTEISFKLGVEFDEITADDRKVKSIITLDGGALVQVQKWDGKSTTIKRKRDGDKLVVECVMKGVTSTRVYERA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8WWB_1)}(2) \setminus P_{f(1XCX_1)}(2)|=47\), \(|P_{f(1XCX_1)}(2) \setminus P_{f(8WWB_1)}(2)|=155\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11110010011111110111111111111011011100000110000110110000110001110011101000011011000010111101111110101101010001111101100110010011010001101010010010000011011001100110100101001011100101111001111110011000011011001010001111010001001111111
Pair \(Z_2\) Length of longest common subsequence
8WWB_1,1XCX_1 202 4
8WWB_1,1ALB_1 143 3
1XCX_1,1ALB_1 207 3

Newick tree

 
[
	1XCX_1:11.62,
	[
		8WWB_1:71.5,1ALB_1:71.5
	]:39.12
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{729 }{\log_{20} 729}-\frac{233}{\log_{20}233})=138.\)
Status Protein1 Protein2 d d1/2
Query variables 8WWB_1 1XCX_1 180 130
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]