CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6XAQ_1 1YXH_1 3CAW_1 Letter Amino acid
5 6 14 P Proline
19 10 23 D Aspartic acid
8 7 13 Q Glutamine
20 9 23 G Glycine
3 5 29 K Lycine
13 8 12 Y Tyrosine
2 8 13 R Arginine
10 11 11 N Asparagine
2 1 8 H Histidine
12 5 41 L Leucine
1 2 12 M Methionine
8 3 13 F Phenylalanine
20 5 14 S Serine
2 3 8 W Tryptophan
0 14 2 C Cysteine
8 2 18 E Glutamic acid
20 8 15 T Threonine
11 4 22 V Valine
16 10 27 A Alanine
9 5 12 I Isoleucine

6XAQ_1|Chain A|Antifreeze protein|Marinomonas primoryensis (178399)
>1YXH_1|Chain A|phospholipase A2|Naja sagittifera (195058)
>3CAW_1|Chains A, B|o-succinylbenzoate synthase|Bdellovibrio bacteriovorus HD100 (264462)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6XAQ , Knot 83 189 0.76 38 130 184
AQDDSTPDSLFAGLVGEYYGTNSQLNNISDFRALVDSKEADATFEAANISYGRGSSDVAKGTHLQEFLGSDASTLSTDPGDNTDGGIYLQGYVYLEAGTYNFKVTADDGYEITINGNPVATVDNNQSVYTVTHASFTISESGYQAIDMIWWDQGGDYVFQPTLSADGGSTYFVLDSAILSSTGETPYTT
1YXH , Knot 66 126 0.84 40 106 124
SNRPMPLNIYQFKNMIQCTVPSRSWWDFADYGCYCGRGGSGTPVDDLDRCCQVHDNCYNQAQEITGCRPKWKTYTYECSQGTLTCKGRNNACAATVCDCDRLAAICFAGAPYNDNNYNIDLKARCQ
3CAW , Knot 141 330 0.82 40 197 314
MIKISYSPYTLKPVQSLNAATAATAREGVLLKVEWNDGLYGFADLHPWPELGDLSLEEQLSDLRMGRMTTQIEQSIWLARRDALLRKEKKHVFDGGEKIKNNYLLSHFQDLKPGFLDGLKNEGYNTVKVKMGRDLQKEADMLTHIAASGMRMRLDFNALGSWQTFEKFMVNLPLTVRPLIEYVEDPFPFDFHAWGEARKLAKIALDNQYDKVPWGKIASAPFDVIVIKPAKTDVDKAVAQCQKWNLKLAVTSYMDHPVGVVHAVGVAMELKDKYGDMILESGCLTHRLYQMDSFAAELSTQGPYLLKNKGTGVGFDKLLEALTWYQLKVR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6XAQ_1)}(2) \setminus P_{f(1YXH_1)}(2)|=82\), \(|P_{f(1YXH_1)}(2) \setminus P_{f(6XAQ_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000100111111100010000100100101110000101010110100101000110100100111001001000110000111010101010110001010100100101010111010000010010010101000100110111100110011010101011000111001110001001000
Pair \(Z_2\) Length of longest common subsequence
6XAQ_1,1YXH_1 140 3
6XAQ_1,3CAW_1 165 4
1YXH_1,3CAW_1 201 3

Newick tree

 
[
	3CAW_1:98.17,
	[
		6XAQ_1:70,1YXH_1:70
	]:28.17
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{315 }{\log_{20} 315}-\frac{126}{\log_{20}126})=58.4\)
Status Protein1 Protein2 d d1/2
Query variables 6XAQ_1 1YXH_1 72 61
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]