CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3TEX_1 3JUO_1 6OFA_1 Letter Amino acid
31 10 5 Q Glutamine
32 17 0 G Glycine
8 13 0 H Histidine
57 4 2 K Lycine
7 6 0 W Tryptophan
37 11 2 V Valine
67 7 2 N Asparagine
46 7 1 D Aspartic acid
53 8 0 I Isoleucine
8 4 1 M Methionine
21 10 1 F Phenylalanine
56 7 0 T Threonine
28 17 1 R Arginine
29 9 3 P Proline
66 13 3 S Serine
28 3 3 Y Tyrosine
0 3 4 C Cysteine
48 17 2 E Glutamic acid
57 12 1 L Leucine
36 7 2 A Alanine

3TEX_1|Chain A|Protective antigen|Bacillus anthracis (1392)
>3JUO_1|Chains A, B|Phenazine biosynthesis protein A/B|Burkholderia sp. (482957)
>6OFA_1|Chain A|Wasabi Receptor Toxin|Urodacus manicatus (1330407)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3TEX , Knot 267 715 0.81 38 262 643
EVKQENRLLNESESSSQGLLGYYFSDLNFQAPMVVTSSTTGDLSIPSSELENIPSENQYFQSAIWSGFIKVKKSDEYTFATSADNHVTMWVDDQEVINKASNSNKIRLEKGRLYQIKIQYQRENPTEKGLDFKLYWTDSQNKKEVISSDNLQLPELKQKSSNSRKKRSTSAGPTVPDRDNDGIPDSLEVEGYTVDVKNKRTFLSPWISNIHEKKGLTKYKSSPEKWSTASDPYSDFEKVTGRIDKNVSPEARHPLVAAYPIVHVDMENIILSKNEDQSTQNTDSQTRTISKNTSTSRTHTSEPGSNSNSSTVAIDHSLSLAGERTWAETMGLNTADTARLNANIRYVNTGTAPIYNVLPTTSLVLGKNQTLATIKAKENQLSQILAPNNYYPSKNLAPIALNAQDDFSSTPITMNYNQFLELEKTKQLRLDTDQVYGNIATYNFENGRVRVDTGSNWSEVLPQIQETTARIIFNGKDLNLVERRIAAVNPSDPLETTKPDMTLKEALKIAFGFNEPNGNLQYQGKDITEFDFNFDQQTSQNIKNQLAELNATNIYTVLDKIKLNAKMNILIRDKRFHYDRNNIAVGADESVVKEAHREVINSSTEGLLLNIDKDIRKILSGYIVEIEDTEGLKEVINDRYDMLNISSLRQDGKTFIDFKKYNDKLPLYISNPNYKVNVYAVTKENTIINPSENGDTSTNGIKKILIFSKKGYEIG
3JUO , Knot 89 185 0.83 40 139 178
MGSSHHHHHHSSGLVPRGSHMSDVESLENTSENRAQVAARQHNRKIVEQYMHTRGEARLKRHLLFTEDGVGGLWTTDSGQPIAIRGREKLGEHAVWSLQCFPDWVWTDIQIFETQDPNWFWVECRGEGAIVFPGYPRGQYRNHFLHSFRFENGLIKEQREFMNPCEQFRSLGIEVPEVRRDGLPS
6OFA , Knot 22 33 0.77 30 29 31
ASPQQAKYCYEQCNVNKVPFDQCYQMCSPLERS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3TEX_1)}(2) \setminus P_{f(3JUO_1)}(2)|=163\), \(|P_{f(3JUO_1)}(2) \setminus P_{f(3TEX_1)}(2)|=40\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0100000110000000011110010010101111100000101011000100110000010011101110100000001100100010111000011001000001010010100101000000100011010101000000001100001011010000000000000011101100000111001010100101000001101110010000110000001001001001000100101010001010100111110111010100111000000000000000001000000000000011000000011100010111000110011100100101010100100101110011100011110000110101000010011110000100011111101000100011010000110100000101000010101100010010101001001001110100001011101001011000111101001100001010100110111110010101000100100101010000000100011010100100110010101010111000010000001111100011001000110000011110100010011010110100001100110000011010010001001101000000111010010001010110000011010001000001100111100010011
Pair \(Z_2\) Length of longest common subsequence
3TEX_1,3JUO_1 203 4
3TEX_1,6OFA_1 251 3
3JUO_1,6OFA_1 158 2

Newick tree

 
[
	3TEX_1:12.64,
	[
		3JUO_1:79,6OFA_1:79
	]:44.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{900 }{\log_{20} 900}-\frac{185}{\log_{20}185})=197.\)
Status Protein1 Protein2 d d1/2
Query variables 3TEX_1 3JUO_1 246 155.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]