CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5TWH_1 2IQF_1 2YZL_1 Letter Amino acid
7 14 4 M Methionine
8 27 12 R Arginine
10 17 3 Q Glutamine
16 46 28 K Lycine
3 2 1 C Cysteine
16 30 26 E Glutamic acid
9 32 15 G Glycine
16 30 8 F Phenylalanine
11 30 8 P Proline
12 31 15 A Alanine
10 22 9 N Asparagine
10 39 21 D Aspartic acid
11 28 5 T Threonine
10 32 16 V Valine
13 19 26 I Isoleucine
15 25 6 S Serine
8 25 10 Y Tyrosine
8 21 2 H Histidine
20 24 26 L Leucine
3 11 1 W Tryptophan

5TWH_1|Chain A|MOB kinase activator 1A|Homo sapiens (9606)
>2IQF_1|Chains A, B|Catalase|Helicobacter pylori (210)
>2YZL_1|Chain A|Phosphoribosylaminoimidazole-succinocarboxamide synthase|Methanocaldococcus jannaschii (2190)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5TWH , Knot 102 216 0.84 40 156 211
MSFLFSSRSSKTFKPKKNIPEGSHQYELLKHAEATLGSGNLRQAVMLPEGEDLNEWIAVNTVDFFNQINMLYGTITEFCTEASCPVMSAGPRYEYHWADGTNIKKPIKCSAPKYIDYLMTWVQDQLDDETLFPSKIGVPFPKNFMSVAKTILKRLFRVYAHIYHQHFDSVMQLQEEAHLNTSFKHFIFFVQEFNLIDRRELAPLQELIEKLGSKDR
2IQF , Knot 208 505 0.85 40 260 476
MVNKDVKQTTAFGAPVWDDNNVITAGPRGPVLLQSTWFLEKLAAFDRERIPERVVHAKGSGAYGTFTVTKDITKYTKAKIFSKVGKKTECFFRFSTVAGERGSADAVRDPRGFAMKYYTEEGNWDLVGNNTPVFFIRDAIKFPDFIHTQKRDPQTNLPNHDMVWDFWSNVPESLYQVTWVMSDRGIPKSFRHMDGFGSHTFSLINAKGERFWVKFHFHTMQGVKHLTNEEAAEIRKHDPDSNQRDLFDAIARGDYPKWKLSIQVMPEEDAKKYRFHPFDVTKIWYTQDYPLMEVGIVELNKNPENYFAEVEQAAFTPANVVPGIGYSPDRMLQGRLFSYGDTHRYRLGVNYPQIPVNKPRCPFHSSSRDGYMQNGYYGSLQNYTPSSLPGYKEDKSARDPKFNLAHIEKEFEVWNWDYRADDSDYYTQPGDYYRSLPADEKERLHDTIGESLAHVTHKEIVDKQLEHFKKADPKYAEGVKKALEKHQKMMKDMHGKDMHHTKKKK
2YZL , Knot 108 242 0.81 40 148 235
MEIKLEEILKKQPLYSGKAKSIYEIDDDKVLIEFRDDITAGNGAKHDVKQGKGYLNALISSKLFEALEENGVKTHYIKYIEPRYMIAKKVEIIPIEVIVRNIAAGSLCRRYPFEEGKELPFPIVQFDYKNDEYGDPMLNEDIAVALGLATREELNKIKEIALKVNEVLKKLFDEKGIILVDFKIEIGKDREGNLLVADEISPDTMRLWDKETRDVLDKDVFRKDLGDVIAKYRIVAERLGLL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5TWH_1)}(2) \setminus P_{f(2IQF_1)}(2)|=41\), \(|P_{f(2IQF_1)}(2) \setminus P_{f(5TWH_1)}(2)|=145\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101110000000101000110100000110010101101010011111010010011110010110010110101001000100111011100000110100100110001100100110110001000011100111111001101100110011010101000010011010001010001001111100101100001111001100110000
Pair \(Z_2\) Length of longest common subsequence
5TWH_1,2IQF_1 186 4
5TWH_1,2YZL_1 148 3
2IQF_1,2YZL_1 172 4

Newick tree

 
[
	2IQF_1:94.18,
	[
		5TWH_1:74,2YZL_1:74
	]:20.18
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{721 }{\log_{20} 721}-\frac{216}{\log_{20}216})=141.\)
Status Protein1 Protein2 d d1/2
Query variables 5TWH_1 2IQF_1 180 128
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]