CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4AMQ_1 4RBU_1 5SEB_1 Letter Amino acid
15 4 17 Q Glutamine
13 10 16 G Glycine
3 4 12 M Methionine
17 0 15 F Phenylalanine
20 5 22 T Threonine
17 3 14 R Arginine
44 5 36 L Leucine
32 1 12 Y Tyrosine
30 2 14 N Asparagine
25 4 15 D Aspartic acid
3 0 12 C Cysteine
52 6 21 I Isoleucine
40 6 18 K Lycine
10 3 14 P Proline
23 2 24 S Serine
19 14 16 V Valine
10 17 21 A Alanine
23 6 24 E Glutamic acid
8 8 13 H Histidine
3 0 7 W Tryptophan

4AMQ_1|Chain A|L544|ACANTHAMOEBA POLYPHAGA MIMIVIRUS (212035)
>4RBU_1|Chains A, B, C, D, E, F, G, H, I|Propanediol utilization protein PduA|Salmonella enterica serovar Typhimurium (99287)
>5SEB_1|Chains A, B, C, D|cAMP and cAMP-inhibited cGMP 3',5'-cyclic phosphodiesterase 10A|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4AMQ , Knot 162 407 0.79 40 201 376
SYYHHHHHHLESTSLYKKAGLRMLIFTYKLERYIKNKILPKILVVPDRDKYQIKGSFRRRIPYITDIDIVNNVHPEYDDTNIYQRIVDLINSFTNDNQIKLIYVICGTDDRFLLTEYSDEEIEKIKILLNPTELVELNNVLSKYQDDLNKKVFYINEIIWDLYKLRWTSSEVLAGKKILRGGIEVSFQDVVKNNSILLLQYFVKIEYYPIGFDIAVRYKPINLITAYQNAAFYQLKLANYSKEYYFMLFPLRFYFKNDPTISKQLEYIIETKFGLYKQLLVRIDSYRTIYESGNLDLDTAKSIIISIIKDIRKLNGIDMNIIDKIQEVSNNSAGQDKIIAWNTLLTQLYTNINKSVNKQSKKYFTRYINIIPKEDRKLCCLEEEHVLQSGGINFESTNFLTKKKLIY
4RBU , Knot 47 100 0.72 34 71 92
MHHHHHHQQEALGMVETKGLTAAIEAADAMVASANVMLVGYEKIGQGLVTVIVRGDVGAVKAATDAGAAAARNVGEVKAVHVIPRPHTDVEKILPKGISQ
5SEB , Knot 152 343 0.86 40 221 334
GSSICTSEEWQGLMQFTLPVRLCKEIELFHFDIGPFENMWPGIFVYMVHRSCGTSCFELEKLCRFIMSVKKNYRRVPYHNWKHAVTVAHCMYAILQNNHTLFTDLERKGLLIACLCHDLDHRGFSNSYLQKFDHPLAALYSTSTMEQHHFSQTVSILQLEGHNIFSTLSSSEYEQVLEIIRKAIIATDLALYFGNRKQLEEMYQTGSLNLNNQSHRDRVIGLMMTACDLCSVTKLWPVTKLTANDIYAEFWAEGDEMKKLGIQPIPMMDRDKKDEVPQGQLGFYNAVAIPCYTTLTQILPPTEPLLKACRDNLSQWEKVIRGEETATWISSPSVAQKAAASED

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4AMQ_1)}(2) \setminus P_{f(4RBU_1)}(2)|=161\), \(|P_{f(4RBU_1)}(2) \setminus P_{f(4AMQ_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00000000010000100011101111000100010001110111110000001010100011010010110010100000010001101100100000101101101000011100000001001011101001101001100000010001101001110100101000011110011011101010011000011110011010001111011100011011010001110010110000000111111010100010100010011000111000111010000010001010100100111011001001011010110010010000110001111001100100010001000000010001011100000100100001100111010000110000110
Pair \(Z_2\) Length of longest common subsequence
4AMQ_1,4RBU_1 192 6
4AMQ_1,5SEB_1 180 4
4RBU_1,5SEB_1 208 3

Newick tree

 
[
	4RBU_1:10.22,
	[
		4AMQ_1:90,5SEB_1:90
	]:13.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{507 }{\log_{20} 507}-\frac{100}{\log_{20}100})=121.\)
Status Protein1 Protein2 d d1/2
Query variables 4AMQ_1 4RBU_1 152 93
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]