CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3ZCX_1 3CGO_1 6KFP_1 Letter Amino acid
22 23 24 I Isoleucine
10 15 12 M Methionine
32 29 19 V Valine
35 20 20 A Alanine
5 7 5 C Cysteine
6 12 15 F Phenylalanine
6 4 3 W Tryptophan
18 16 20 N Asparagine
16 23 42 E Glutamic acid
10 17 16 Q Glutamine
29 16 23 G Glycine
15 28 33 K Lycine
13 20 14 P Proline
25 19 23 S Serine
21 14 25 T Threonine
18 17 10 R Arginine
19 22 18 D Aspartic acid
9 18 17 Y Tyrosine
18 10 7 H Histidine
27 35 32 L Leucine

3ZCX_1|Chains A, B|GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE|THERMOSYNECHOCOCCUS ELONGATUS (146786)
>3CGO_1|Chain A|Mitogen-activated protein kinase 10|Homo sapiens (9606)
>6KFP_1|Chain A|MavC|Legionella pneumophila (446)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3ZCX , Knot 155 354 0.85 40 214 339
MRGSHHHHHHGLVPRGSMVRVAINGFGRIGRNFMRCWLQRKANSKLEIVGINDTSDPRTNAHLLKYDSMLGIFQDAEITADDDCIYAGGHAVKCVSDRNPENLPWSAWGIDLVIEATGVFTSREGASKHLSAGAKKVLITAPGKGNIPTYVVGVNHHTYDPSEDIVSNASCTTNCLAPIVKVLHEAFGIQQGMMTTTHSYTGDQRLLDASHRDLRRARAAAMNIVPTSTGAAKAVGLVIPELQGKLNGIALRVPTPNVSVVDFVAQVEKPTIAEQVNQVIKEASETTMKGIIHYSELELVSSDYRGHNASSILDASLTMVLGGNLVKVVAWYDNEWGYSQRVLDLAEHMAAHWA
3CGO , Knot 157 365 0.84 40 229 353
MGSKSKVDNQFYSVEVGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAVLDRNVAIKKLSRPFQNQTHAKRAYRELVLMKCVNHKNIISLLNVFTPQKTLEEFQDVYLVMELMDANLCQVIQMELDHERMSYLLYQMLCGIKHLHSAGIIHRDLKPSNIVVKSDCTLKILDFGLARTAGTSFMMTPYVVTRYYRAPEVILGMGYKENVDIWSVGCIMGEMVRHKILFPGRDYIDQWNKVIEQLGTPCPEFMKKLQPTVRNYVENRPKYAGLTFPKLFPDSLFPADSEHNKLKASQARDLLSKMLVIDPAKRISVDDALQHPYINVWYDPAEVEAPPPQIYDKQLDEREHTIEEWKELIYKEVMNSE
6KFP , Knot 162 378 0.84 40 218 365
EKTGLHVHEKIKHMVKNYGTMITGIPAEILGQNEAEISVGYVKKMGNMKENIAEVVRKSEMTQPTNSAGKASNEVCDLLLGTEGASEFEKSSYQVLSGDGSNLKGSLPNKNLLVRVEMDRFNAPQKYQKIKREEFNPETAEKNKIYLLEDQLVYLDIFGKVIDLGQTSDTCHRLFNAITTPFYQNYILYDEYIDPEESAEEAAMFEMGEIVKAKMKNIDCWTATHSFTIFVPESDSEDTRTLYPYQAYWTSHTLQQWFSGDKDEKLSRLGIDGYIEKLALLGTTTDSKIRSSIYGELFSPPGKEHVFCTGMNEKFSPLRVKFKVTEVNPEIALQNLEEVQEFIDTNYPGENAKDQCELYKIKAQEAMTKQLEMRLLIE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3ZCX_1)}(2) \setminus P_{f(3CGO_1)}(2)|=68\), \(|P_{f(3CGO_1)}(2) \setminus P_{f(3ZCX_1)}(2)|=83\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000000011110101101110111011001100110001000101111000001000101100001111100101010000101110110010000100111011110111010111000011000101110011101110101100111100000010001100100000011111011001111001110000000100011010000100101111011100011101111111010101011110110101011011101001011001001100100001011100001011000001001001101010111110110111100001100001101100111011
Pair \(Z_2\) Length of longest common subsequence
3ZCX_1,3CGO_1 151 4
3ZCX_1,6KFP_1 166 4
3CGO_1,6KFP_1 161 4

Newick tree

 
[
	6KFP_1:83.74,
	[
		3ZCX_1:75.5,3CGO_1:75.5
	]:8.24
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{719 }{\log_{20} 719}-\frac{354}{\log_{20}354})=99.8\)
Status Protein1 Protein2 d d1/2
Query variables 3ZCX_1 3CGO_1 124 122
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]