CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7BYT_1 3WIH_1 1KBV_1 Letter Amino acid
18 6 10 Q Glutamine
12 2 1 W Tryptophan
2 1 1 C Cysteine
20 5 15 I Isoleucine
23 4 15 L Leucine
17 8 21 P Proline
24 7 21 T Threonine
22 3 13 Y Tyrosine
33 12 35 V Valine
45 10 35 G Glycine
9 5 22 E Glutamic acid
13 4 20 K Lycine
5 1 10 M Methionine
15 3 15 F Phenylalanine
56 8 11 S Serine
22 4 18 D Aspartic acid
10 2 10 R Arginine
33 5 13 N Asparagine
8 1 8 H Histidine
40 6 33 A Alanine

7BYT_1|Chain A|Galactan 1,3-beta-galactosidase|Phanerochaete chrysosporium (2822231)
>3WIH_1|Chains A, D[auth B]|Roundabout homolog 1|Homo sapiens (9606)
>1KBV_1|Chains A, B, C, D, E, F|Major outer membrane protein PAN 1|Neisseria gonorrhoeae (485)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7BYT , Knot 178 427 0.84 40 226 405
NQIVSGAAWTDTAGNTIQAHGAGILQVGSTFYWFGEDKSHNSALFKAVSCYTSSDLVNWSRQNDALSPIAGTMISTSNVVERPKVIFNQKNSEYVMWFHSDSSNYGAAMVGVATAKTPCGPYTYKGSFKPLGADSRDESIFQDDDSAQTAYLLYASDNNQNFKISRLDANYYNVTAQVSVMNGATLEAPGIVKHNGEYFLIASHTSGWAPNPNKWFSASSLAGPWSAQQDIAPSATRTWYSQNAFDLPLGSNAIYMGDRWRPSLLGSSRYIWYPLDFSSGAPQIVHADVWSVNVQAGTYSVASGTSYEAENGQRGGSSTILSGSGFSGGKAVGYLGHGGTVTINNVQSNGGSHWVALYFANGDSTYRNVTVSVNGGPSVLVDQPDSGGGNVVISVPVKLNLNSGENSITFGSGQSNYAADLDKIIVY
3WIH , Knot 51 97 0.80 40 82 95
APPQGVTVSKNDGNGTAILVSWQPPPEDTQNGMVQEYKVWCLGNETRYHINKTVDGSTFSVVIPFLVPGIRYSVEVAASTGAGSGVKSEPQFIQLDA
1KBV , Knot 137 327 0.80 40 179 312
MAAQATAETPAGELPVIDAVTTHAPEVPPAIDRDYPAKVRVKMETVEKTMKMDDGVEYRYWTFDGDVPGRMIRVREGDTVEVEFSNNPSSTVPHNVDFHAATGQGGGAAATFTAPGRTSTFSFKALQPGLYIYHCAVAPVGMHIANGMYGLILVEPKEGLPKVDKEFYIVQGDFYTKGKKGAQGLQPFDMDKAVAEQPEYVVFNGHVGALTGDNALKAKAGETVRMYVGNGGPNLVSSFHVIGEIFDKVYVEGGKLINENVQSTIVPAGGSAIVEFKVDIPGNYTLVDHSIFRAFNKGALGQLKVEGAENPEIMTQKLSDTAYAVPR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7BYT_1)}(2) \setminus P_{f(3WIH_1)}(2)|=169\), \(|P_{f(3WIH_1)}(2) \setminus P_{f(7BYT_1)}(2)|=25\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0011011110001100101011111011001011100000001110110000000110100000110111101100001100101110000000111100000001111111101001011000010101111000000110000010010110100000010100101000010101011011010111110001001111000011110100110100111110100011101000100001101111001101100101011100001101101001110110101101010110001101000010010011000110101101101110110110101001000110011110110100000010101011101110010011101110111010100100010110100001101001110
Pair \(Z_2\) Length of longest common subsequence
7BYT_1,3WIH_1 194 4
7BYT_1,1KBV_1 167 4
3WIH_1,1KBV_1 153 4

Newick tree

 
[
	7BYT_1:94.71,
	[
		1KBV_1:76.5,3WIH_1:76.5
	]:18.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{524 }{\log_{20} 524}-\frac{97}{\log_{20}97})=127.\)
Status Protein1 Protein2 d d1/2
Query variables 7BYT_1 3WIH_1 160 96
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]