CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XML_1 3DNC_1 6BPE_1 Letter Amino acid
29 1 16 F Phenylalanine
28 4 13 T Threonine
12 6 14 R Arginine
10 8 8 H Histidine
30 8 29 I Isoleucine
30 2 37 K Lycine
12 3 6 M Methionine
6 0 0 W Tryptophan
31 11 9 A Alanine
17 3 26 N Asparagine
24 2 19 D Aspartic acid
60 5 25 L Leucine
15 9 21 E Glutamic acid
12 3 5 P Proline
23 2 21 Y Tyrosine
38 15 12 V Valine
14 0 4 C Cysteine
22 1 13 Q Glutamine
14 9 12 G Glycine
34 7 17 S Serine

8XML_1|Chain A[auth R]|Soluble cytochrome b562,C-C chemokine receptor type 8|Escherichia coli (562)
>3DNC_1|Chain A|Carbon dioxide-concentrating mechanism protein ccmK homolog 2|Synechocystis sp. (1148)
>6BPE_1|Chains A, D, G, J|Reticulocyte binding protein 2, putative|Plasmodium vivax (strain Salvador I) (126793)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XML , Knot 191 461 0.84 40 246 438
ADLEDNWETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLEDKSPDSPEMKDFRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRNAYIQKYLMDYTLDLSVTTVTDYYYPDIFSSPCDAELIQTNGKLLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRSITDVYLLNLALSDLLFVFSFPFQTYYLLDQWVFGTVMCKVVSGFYYIGFYSSMWFITLMSVDRYLAVVHAVYALKVRTIRMGTTLCLAVWLTAIMATIPLLVFYQVASEDGVLQCYSFYNQQTLKWKIFTNFKMNILGLLIPFTIFMFCYIKILHQLKRCQNHNKTKAIRLVLIVVIASLLFWVPFNVVLFLTSLHSMHILDGCSISQQLTYATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIFQKSCSQIFNYLGRQMPRESCEKSSSCQQHSSRSSSVDYIL
3DNC , Knot 48 99 0.74 36 78 92
MSIAVGMIETRGFPAVVEAADSMVKAARVTLVGYEKIGSGRVTVIVRGDVSEVQASVSAGIEAANRVNGGEVLSTHIIARPHENLEYVLPILEHHHHHH
6BPE , Knot 133 307 0.82 38 182 296
GAMGSTNTTDNIDYFDISDESNYYLISQLRPHFSNIYFFDEFKRYASYHTEIKRYEDIHKTKVNSLLNEASRAIGICNRAKNTVKGLINILENPQKFKTQRESYDVKLRQYEEKKEAFRGCLLNKNRKNLDQIKKINNEIRDLLEKLKCSQDCQTNVYFDMIKIYLVDFKKMPYENYDTFIKQYKNSYLSGVDMIRKIEKQIDNPVTINAIKFTQKEMGYIIDRFEYHLQKVKHSIDQVTALSDGVKPKQVTKNRLKEYYFNIGNYYSIFKFGKDSLNMLNKALIHKEKIVHNLLGELFGHLEERIS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XML_1)}(2) \setminus P_{f(3DNC_1)}(2)|=198\), \(|P_{f(3DNC_1)}(2) \setminus P_{f(8XML_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100010010001011001001101001100101111010010110100001001010010011011110100110110010100101110010000010100011000101010010000010110010010110001011111100111110111001111111100010010010110111001111101110000110011110110011011001110001111011010001111011011010010110010111110111101111110011000111000010000010101100101011111111011110010110010000000000110111111110111111101111100100101101001000100100100110100001011101111001000100110000001100110011000000000000000000010011
Pair \(Z_2\) Length of longest common subsequence
8XML_1,3DNC_1 228 3
8XML_1,6BPE_1 180 4
3DNC_1,6BPE_1 190 3

Newick tree

 
[
	3DNC_1:10.45,
	[
		8XML_1:90,6BPE_1:90
	]:19.45
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{560 }{\log_{20} 560}-\frac{99}{\log_{20}99})=136.\)
Status Protein1 Protein2 d d1/2
Query variables 8XML_1 3DNC_1 177 106
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]