CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9CFT_1 2JKC_1 3RZG_1 Letter Amino acid
6 9 1 C Cysteine
15 12 10 H Histidine
38 25 8 I Isoleucine
32 23 12 T Threonine
7 12 4 W Tryptophan
27 25 5 N Asparagine
28 40 11 D Aspartic acid
6 21 6 Y Tyrosine
36 28 17 V Valine
35 36 15 G Glycine
55 52 19 L Leucine
8 12 3 M Methionine
15 32 11 F Phenylalanine
25 28 11 P Proline
33 37 18 S Serine
49 44 13 A Alanine
13 37 18 R Arginine
25 20 4 Q Glutamine
33 23 10 E Glutamic acid
24 22 13 K Lycine

9CFT_1|Chain A|Importin subunit alpha-1|Mus musculus (10090)
>2JKC_1|Chain A|Flavin-dependent tryptophan halogenase PrnA|Pseudomonas fluorescens (294)
>3RZG_1|Chain A|Alpha-ketoglutarate-dependent dioxygenase alkB homolog 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9CFT , Knot 206 510 0.84 40 248 467
MHHHHHHSSGLVPRGSGMLETAAALFERNHMDSPDLGTDDDDLAMADIGSNQGTVNWSVEDIVKGINSNNLESQLQATQAARKLLSREKQPPIDNIIRAGLIPKFVSFLGKTDCSPIQFESAWALTNIASGTSEQTKAVVDGGAIPAFISLLASPHAHISEQAVWALGNIAGDGSAFRDLVIKHGAIDPLLALLAVPDLSTLACGYLRNLTWTLSNLCRNKNPAPPLDAVEQILPTLVRLLHHNDPEVLADSCWAISYLTDGPNERIEMVVKKGVVPQLVKLLGATELPIVTPALRAIGNIVTGTDEQTQKVIDAGALAVFPSLLTNPKTNIQKEATWTMSNITAGRQDQIQQVVNHGLVPFLVGVLSKADFKTQKEAAWAITNYTSGGTVEQIVYLVHCGIIEPLMNLLSAKDTKIIQVILDAISNIFQAAEKLGETEKLSIMIEECGGLDKIEALQRHENESVYKASLNLIEKYFSVEEEEDQNVVPETTSEGFAFQVQDGAPGTFNF
2JKC , Knot 214 538 0.83 40 273 498
MNKPIKNIVIVGGGTAGWMAASYLVRALQQQANITLIESAAIPRIGVGEATIPSLQKVFFDFLGIPEREWMPQVNGAFKAAIKFVNWRKSPDPSRDDHFYHLFGNVPNCDGVPLTHYWLRKREQGFQQPMEYACYPQPGALDGKLAPCLSDGTRQMSHAWHFDAHLVADFLKRWAVERGVNRVVDEVVDVRLNNRGYISNLLTKEGRTLEADLFIDCSGMRGLLINQALKEPFIDMSDYLLCDSAVASAVPNDDARDGVEPYTSSIAMNSGWTWKIPMLGRFGSGYVFSSHFTSRDQATADFLKLWGLSDNQPLNQIKFRVGRNKRAWVNNCVSIGLSSCFLEPLDSTGIYFIYAALYQLVKHFPDTSFDPRLSDAFNAEIVHMFDDCRDFVQAHYFTTSRDDTPFWLANRHDLRLSDAIKEKVQRYKAGLPLTTTSFDDSTYYETFDYEFKNFWLNGNYYCIFAGLGMLPDRSLPLLQHRPESIEKAEAMFASIRREAERLRTSLPTNYDYLRSLRDGDAGLSRGQRGPKLAAQESL
3RZG , Knot 97 209 0.82 40 147 204
GSMSWRHIRAEGLDSSYTVLFGKAEADEIFQELEKEVEYFTGALARVQVFGKWHSVPRKQATYGDAGLTYTFSGLTLSPKPWIPVLERIRDHVSGVTGQTFNFVLINRYKDGSDHICEHRDDERDLAPGSPIASVSFGASRDFVFRHKDSRGKSPSRRVAVVRLPLAHGSLLMMNHPTNTHWYHSLPVRKKVLAPRVNLTFRKILLTKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9CFT_1)}(2) \setminus P_{f(2JKC_1)}(2)|=59\), \(|P_{f(2JKC_1)}(2) \setminus P_{f(9CFT_1)}(2)|=84\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000111101011100111110000100101100000111101100010101010011011000010001010011001100000111001101111101101110000011010011110011010000001110111111110111010101000111111011101011001110011101111111110100110101001010100100000111110110011101101100001011100011100100110001011100111101101111001111011101110110100000001101111111101100100010001010100101100001001100111111111100101000001111100000110100110110011101110110100001101110110011011001100001011100011100101100000001001010110001010000000111000001111010011110101
Pair \(Z_2\) Length of longest common subsequence
9CFT_1,2JKC_1 143 4
9CFT_1,3RZG_1 177 4
2JKC_1,3RZG_1 190 5

Newick tree

 
[
	3RZG_1:97.64,
	[
		9CFT_1:71.5,2JKC_1:71.5
	]:26.14
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1048 }{\log_{20} 1048}-\frac{510}{\log_{20}510})=140.\)
Status Protein1 Protein2 d d1/2
Query variables 9CFT_1 2JKC_1 180 173.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]