CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9ESU_1 5OGF_1 6CZL_1 Letter Amino acid
16 12 19 I Isoleucine
15 6 26 S Serine
17 28 17 T Threonine
8 14 12 N Asparagine
3 1 4 C Cysteine
9 7 18 Q Glutamine
23 33 29 V Valine
10 15 6 H Histidine
39 24 43 L Leucine
20 22 15 P Proline
21 17 19 K Lycine
17 21 24 E Glutamic acid
18 33 29 G Glycine
5 9 8 M Methionine
16 13 9 F Phenylalanine
4 4 3 W Tryptophan
19 29 27 A Alanine
16 12 19 R Arginine
16 20 17 D Aspartic acid
10 13 8 Y Tyrosine

9ESU_1|Chains A, C|Cyclin-dependent kinase 2|Homo sapiens (9606)
>5OGF_1|Chain A|Copper-containing nitrite reductase|Achromobacter cycloclastes (223)
>6CZL_1|Chains A, B, C, D, E, F|ATP phosphoribosyltransferase catalytic subunit|Medicago truncatula (3880)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9ESU , Knot 136 302 0.85 40 193 288
GPGSMENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNHPNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL
5OGF , Knot 141 333 0.82 40 194 313
DISTLPRVKVDLVKPPFVHAHDQVAKTGPRVVEFTMTIEEKKLVIDREGTEIHAMTFNGSVPGPLMVVHENDYVELRLINPDTNTLLHNIDFHAATGALGGGALTQVNPGEETTLRFKATKPGVFVYHCAPEGMVPWHVTSGMNGAIMVLPRDGLKDEKGQPLTYDKIYYVGEQDFYVPKDEAGNYKKYETPGEAYEDAVKAMRTLTPTHIVFNGAVGALTGDHALTAAVGERVLVVHSQANRDTRPHLIGGHGDYVWATGKFRNPPDLDQETWLIPGGTAGAAFYTFRQPGVYAYVNHNLIEAFELGAAGHFKVTGEWNDDLMTSVVKPASM
6CZL , Knot 153 352 0.85 40 207 335
SNATHHQVLNGNTVSRQEIRLGLPSKGRMSSDTLDLLKDCQLSVKQVNPRQYVAQIPQISNLEVWFQRPKDIVRKLLSGDLDLGIVGLDVLTEFGQGNEDLIVVHEALEYGDCRLSIAIPQYGIFENVNSLEELAKMPQWTEDKPLRVATGFTYLGPKFMKDNGIKHVAFSTADGALEAAPAMGIADAILDLVSSGTTLKENNLKEIEGGTVLESQAALVASRRSMIGRKGVLETTHEMLERLEAHLRAMGQFTVVANMRGSSAEEVAERVLSQPSLAGLQGPTVSPVFCKRDGKVSADYYAIVICVPKKALYKSIQQLRAIGGSGVLVSPLTYIFDEETPRWRQLLSKLGL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9ESU_1)}(2) \setminus P_{f(5OGF_1)}(2)|=78\), \(|P_{f(5OGF_1)}(2) \setminus P_{f(9ESU_1)}(2)|=79\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101001001001101001110010001010111100101000001110011001011001001011011011000001011101100010011010110111111100011011011110000011000101001110001110110111101111110000001101100110111100000011011011011101100011111000100110110011010011111100110001011011000100111110001001100110001000101011110111001001110101
Pair \(Z_2\) Length of longest common subsequence
9ESU_1,5OGF_1 157 4
9ESU_1,6CZL_1 158 3
5OGF_1,6CZL_1 171 4

Newick tree

 
[
	6CZL_1:83.54,
	[
		9ESU_1:78.5,5OGF_1:78.5
	]:5.04
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{635 }{\log_{20} 635}-\frac{302}{\log_{20}302})=92.7\)
Status Protein1 Protein2 d d1/2
Query variables 9ESU_1 5OGF_1 117 112
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]