CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5MLG_1 8IMW_1 3TKQ_1 Letter Amino acid
16 5 13 K Lycine
34 6 14 V Valine
20 3 14 P Proline
29 5 15 T Threonine
4 1 4 W Tryptophan
14 4 4 N Asparagine
20 7 18 E Glutamic acid
37 8 21 G Glycine
23 10 15 I Isoleucine
9 6 1 M Methionine
18 11 12 A Alanine
13 7 15 R Arginine
14 10 7 Q Glutamine
20 4 13 F Phenylalanine
32 4 15 S Serine
17 11 16 D Aspartic acid
6 0 4 C Cysteine
13 8 13 H Histidine
33 22 22 L Leucine
17 3 10 Y Tyrosine

5MLG_1|Chain A|Renin|Rattus norvegicus (10116)
>8IMW_1|Chains A, B|Transcriptional regulator|Acinetobacter baumannii (strain ATCC 19606 / DSM 30007 / JCM 6841 / CCUG       19606 / CIP 70.34 / NBRC 109757 / NCIMB 12457 / NCTC 12156 / 81) (575584)
>3TKQ_1|Chains A, B, C, D, E|Peroxiredoxin-4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5MLG , Knot 165 389 0.84 40 222 373
LPTDTASFGRILLKKMPSVREILEERGVDMTRISAEWGEFIKKSSFTNVTSPVVLTNYLDTQYYGEIGIGTPSQTFKVIFDTGSANLWVPSTKCGPLYTACEIHNLYDSSESSSYMENGTEFTIHYGSGKVKGFLSQDVVTVGGIIVTQTFGEVTELPLIPFMLAKFDGVLGMGFPAQAVDGVIPVFDHILSQRVLKEEVFSVYYSRESHLLGGEVVLGGSDPQHYQGNFHYVSISKAGSWQITMKGVSVGPATLLCEEGCMAVVDTGTSYISGPTSSLQLIMQALGVKEKRANNYVVNCSQVPTLPDISFYLGGRTYTLSNMDYVQKNPFRNDDLCILALQGLDIPPPTGPVWVLGATFIRKFYTEFDRHNNRIGFALARAAHHHHHH
8IMW , Knot 63 135 0.76 38 102 127
MGHHHHHHTKILMIEDDFMIAESTITLLQYHQFEVEWVNNGLDGLAQLAKTKFDLILLDLGLPMMDGMQVLKQIRQRAATPVLIISARDQLQNRVDGLNLGADDYLIKPYEFDELLARIHALLRRSGVEAQLASQ
3TKQ , Knot 113 246 0.84 40 166 235
MRGSHHHHHHGSWETEERPRTREEECHFYAGGQVYPGEASRVSVADHSLHLSKAKISKPAPYWEGTAVIDGEFKELKLTDYRGKYLVFFFYPLDFTFVCPTEIIAFGDRLEEFRSINTEVVACSVDSQFTHLAWINTPRRQGGLGPIRIPLLSDLTHQISKDYGVYLEDSGHTLRGLFIIDDKGILRQITLNDLPVGRSVDETLRLVQAFQYTDKHGEVCPAGWKPGSETIIPDPAGKLKYFDKLN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5MLG_1)}(2) \setminus P_{f(8IMW_1)}(2)|=153\), \(|P_{f(8IMW_1)}(2) \setminus P_{f(5MLG_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000101101110011010011000110100101011011000010010011110001000001011110100010111001010111100001110010010010000000001001001010010101011100011011111100011010011111111101011111111101101111110011000110001101000000011110111110010000101001010011010101011011110110001011110010001011000101110111100001000110000110110101011100001001001000110000101111011011110111111110110010001000000111111011000000
Pair \(Z_2\) Length of longest common subsequence
5MLG_1,8IMW_1 186 6
5MLG_1,3TKQ_1 180 6
8IMW_1,3TKQ_1 158 6

Newick tree

 
[
	5MLG_1:95.31,
	[
		3TKQ_1:79,8IMW_1:79
	]:16.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{524 }{\log_{20} 524}-\frac{135}{\log_{20}135})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 5MLG_1 8IMW_1 147 96.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]