CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1TOA_1 8EYF_1 6MNL_1 Letter Amino acid
1 6 0 C Cysteine
26 30 3 G Glycine
3 6 0 W Tryptophan
35 38 2 A Alanine
19 59 1 D Aspartic acid
15 83 2 K Lycine
14 27 0 Q Glutamine
15 23 0 H Histidine
13 54 0 I Isoleucine
12 48 0 F Phenylalanine
10 32 2 P Proline
20 62 1 S Serine
16 52 0 T Threonine
27 51 1 V Valine
14 29 3 R Arginine
6 63 1 N Asparagine
7 22 0 M Methionine
14 55 0 Y Tyrosine
18 65 0 E Glutamic acid
28 93 0 L Leucine

1TOA_1|Chains A, B|PROTEIN (PERIPLASMIC BINDING PROTEIN TROA)|Treponema pallidum (160)
>8EYF_1|Chain A|M1 family aminopeptidase|Plasmodium falciparum (5833)
>6MNL_1|Chain A|FOXO3a peptide|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1TOA , Knot 133 313 0.81 40 183 299
SYYHHHHHHDYDIPTTENLYFQGAMGSFGSKDAAADGKPLVVTTIGMIADAVKNIAQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEVFSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYESLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHDAFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESSIPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNIDTIVAALAR
8EYF , Knot 337 898 0.85 40 306 811
MEPKIHYRKDYKPSGFIINQVTLNINIHDQETIVRSVLDMDISKHNVGEDLVFDGVGLKINEISINNKKLVEGEEYTYDNEFLTIFSKFVPKSKFAFSSEVIIHPETNYALTGLYKSKNIIVSQCEATGFRRITFFIDRPDMMAKYDVTVTADKEKYPVLLSNGDKVNEFEIPGGRHGARFNDPPLKPCYLFAVVAGDLKHLSATYITKYTKKKVELYVFSEEKYVSKLQWALECLKKSMAFDEDYFGLEYDLSRLNLVAVSDFNVGAMENKGLNIFNANSLLASKKNSIDFSYARILTVVGHEYFHQYTGNRVTLRDWFQLTLKEGLTVHRENLFSEEMTKTVTTRLSHVDLLRSVQFLEDSSPLSHPIRPESYVSMENFYTTTVYDKGSEVMRMYLTILGEEYYKKGFDIYIKKNDGNTATCEDFNYAMEQAYKMKKADNSANLNQYLLWFSQSGTPHVSFKYNYDAEKKQYSIHVNQYTKPDENQKEKKPLFIPISVGLINPENGKEMISQTTLELTKESDTFVFNNIAVKPIPSLFRGFSAPVYIEDQLTDEERILLLKYDSDAFVRYNSCTNIYMKQILMNYNEFLKAKNEKLESFQLTPVNAQFIDAIKYLLEDPHADAGFKSYIVSLPQDRYIINFVSNLDTDVLADTKEYIYKQIGDKLNDVYYKMFKSLEAKADDLTYFNDESHVDFDQMNMRTLRNTLLSLLSKAQYPNILNEIIEHSKSPYPSNWLTSLSVSAYFDKYFELYDKTYKLSKDDELLLQEWLKTVSRSDRKDIYEILKKLENEVLKDSKNPNDIRAVYLPFTNNLRRFHDISGKGYKLIAEVITKTDKFNPMVATQLCEPFKLWNKLDTKRQELMLNEMNTMLQEPQISNNLKEYLLRLTNKLHHHHHH
6MNL , Knot 10 16 0.57 18 13 14
NPDGGKSGKAPRRRAV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1TOA_1)}(2) \setminus P_{f(8EYF_1)}(2)|=21\), \(|P_{f(8EYF_1)}(2) \setminus P_{f(1TOA_1)}(2)|=144\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000000000001100001010111101100011101011110011111011001101010101111111010100101101011101011100110100011011001010011111000111000101001010101110101100010110001001111000010000010000100101010001001110001110100111010010110101101100100101001001111110001111110001100010110011010101101110110011101100010011110001001111110
Pair \(Z_2\) Length of longest common subsequence
1TOA_1,8EYF_1 165 6
1TOA_1,6MNL_1 182 2
8EYF_1,6MNL_1 295 3

Newick tree

 
[
	6MNL_1:13.25,
	[
		1TOA_1:82.5,8EYF_1:82.5
	]:50.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1211 }{\log_{20} 1211}-\frac{313}{\log_{20}313})=236.\)
Status Protein1 Protein2 d d1/2
Query variables 1TOA_1 8EYF_1 300 200
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]