CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GPA_1 3HWX_1 1JDP_1 Letter Amino acid
13 36 13 Q Glutamine
21 31 35 E Glutamic acid
24 40 32 G Glycine
24 35 32 V Valine
20 16 13 N Asparagine
25 26 28 D Aspartic acid
7 7 6 C Cysteine
21 13 20 K Lycine
22 69 40 L Leucine
15 8 12 M Methionine
9 14 20 F Phenylalanine
12 9 18 Y Tyrosine
27 64 38 A Alanine
5 19 14 H Histidine
15 32 16 P Proline
8 30 34 S Serine
12 27 18 T Threonine
10 17 5 W Tryptophan
9 35 28 R Arginine
15 28 19 I Isoleucine

6GPA_1|Chains A, B|Arabinogalactan endo-beta-1,4-galactanase|Bacteroides thetaiotaomicron (strain ATCC 29148 / DSM 2079 / NCTC 10582 / E50 / VPI-5482) (226186)
>3HWX_1|Chains A[auth 1], B[auth A], C[auth B], D[auth I], E[auth J], F[auth R], G[auth S], H[auth Z]|2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase|Escherichia coli (83333)
>1JDP_1|Chains A, B|ATRIAL NATRIURETIC PEPTIDE CLEARANCE RECEPTOR|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GPA , Knot 139 314 0.84 40 199 300
VVKEEGFARGADVSWLTQMEAEGLKFYTPDENRQEMECMDLLRDYCGVNSIRLRVWVNPKDGWNNMNDVIVKAKRAERLGLRTMIDFHFSDTWADPGHQEMPEAWKELSFDDLKIALSEHVKSVLTALKAVGVTPEWVQVGNETTPGMMLPVGSVDNPEQLTALNNAGYDAVKAICPDAKVIVHLDAGNDQWVYNRMFDILQANGGKYDMIGMSLYPYWAEQEGKTGGWLKVADDCIANIKHVKQKYNKPVMICEIGMPYDQAEACKQLITKMMQADVEGIFYWEPQAPNGYNDGYNLGCFDNNAPTIALDAFK
3HWX , Knot 227 556 0.86 40 250 531
MSVSAFNRRWAAVILEALTRHGVRHICIAPGSRSTLLTLAAAENSAFIHHTHFDERGLGHLALGLAKVSKQPVAVIVTSGTAVANLYPALIEAGLTGEKLILLTADRPPELIDCGANQAIRQPGMFASHPTHSISLPRPTQDIPARWLVSTIDHALGTLHAGGVHINCPFAEPLYGEMDDTGLSWQQRLGDWWQDDKPWLREAPRLESEKQRDWFFWRQKRGVVVAGRMSAEEGKKVALWAQTLGWPLIGDVLSQTGQPLPCADLWLGNAKATSELQQAQIVVQLGSSLTGKRLLQWQASCEPEEYWIVDDIEGRLDPAHHRGRRLIANIADWLELHPAEKRQPWCVEIPRLAEQAMQAVIARRDAFGEAQLAHRICDYLPEQGQLFVGNSLVVRLIDALSQLPAGYPVYSNRGASGIDGLLSTAAGVQRASGKPTLAIVGDLSALYDLNALALLRQVSAPLVLIVVNNNGGQIFSLLPTPQSERERFYLMPQNVHFEHAAAMFELKYHRPQNWQELETAFADAWRTPTTTVIEMVVNDTDGAQTLQQLLAQVSHL
1JDP , Knot 185 441 0.85 40 233 421
EREALPPQKIEVLVLLPQDDSYLFSLTRVRPAIEYALRSVEGNGTGRRLLPPGTRFQVAYEDSDCGNRALFSLVDRVAAARGAKPDLILGPVCEYAAAPVARLASHWDLPMLSAGALAAGFQHKDSEYSHLTRVAPAYAKMGEMMLALFRHHHWSRAALVYSDDKLERNCYFTLEGVHEVFQEEGLHTSIYSFDETKDLDLEDIVRNIQASERVVIMCASSDTIRSIMLVAHRHGMTSGDYAFFNIELFNSSSYGDGSWKRGDKHDFEAKQAYSSLQTVTLLRTVKPEFEKFSMEVKSSVEKQGLNMEDYVNMFVEGFHDAILLYVLALHEVLRAGYSKKDGGKIIQQTWNRTFEGIAGQVSIDANGDRYGDFSVIAMTDVEAGTQEVIGDYFGKEGRFEMRPNVKYPWGPLKLRIDENRIVEHTNSSPCKSCGLEESAVT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GPA_1)}(2) \setminus P_{f(3HWX_1)}(2)|=56\), \(|P_{f(3HWX_1)}(2) \setminus P_{f(6GPA_1)}(2)|=107\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000111011010110010101101001000000100101100001100101011101001100100111010010011100110101000110110001101100101001011100010011011011110101101100001111111101001001011001100110110101011101011000110001101101011000111101010110001001111011000110100100000011110011110001010001100110101011101010110100010011010001101110110
Pair \(Z_2\) Length of longest common subsequence
6GPA_1,3HWX_1 163 4
6GPA_1,1JDP_1 178 4
3HWX_1,1JDP_1 147 4

Newick tree

 
[
	6GPA_1:88.92,
	[
		3HWX_1:73.5,1JDP_1:73.5
	]:15.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{870 }{\log_{20} 870}-\frac{314}{\log_{20}314})=150.\)
Status Protein1 Protein2 d d1/2
Query variables 6GPA_1 3HWX_1 194 151
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]