CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8YIG_1 1BGJ_1 8QMH_1 Letter Amino acid
98 36 0 A Alanine
84 39 0 R Arginine
89 19 0 D Aspartic acid
32 4 4 C Cysteine
69 15 0 Q Glutamine
122 28 0 V Valine
101 4 0 N Asparagine
124 34 0 E Glutamic acid
15 6 0 W Tryptophan
66 16 0 Y Tyrosine
75 34 2 G Glycine
181 46 0 L Leucine
45 6 0 M Methionine
81 17 0 P Proline
86 20 0 S Serine
99 20 0 T Threonine
43 8 0 H Histidine
103 18 0 I Isoleucine
118 12 0 K Lycine
90 12 0 F Phenylalanine

8YIG_1|Chains A, E|Dicer-2, isoform A|Drosophila melanogaster (7227)
>1BGJ_1|Chain A|P-HYDROXYBENZOATE HYDROXYLASE|Pseudomonas fluorescens (294)
>8QMH_1|Chains A, B, C, D|RNA (5'-R(*GP*GP*CP*CP*CP*C)-3')|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8YIG , Knot 598 1721 0.86 40 366 1476
EDVEIKPRGYQLRLVDHLTKSNGIVYLPTGSGKTFVAILVLKRFSQDFDKPIESGGKRALFMCNTVELARQQAMAVRRCTNFKVGFYVGEQGVDDWTRGMWSDEIKKNQVLVGTAQVFLDMVTQTYVALSSLSVVIIDECHHGTGHHPFREFMRLFTIANQTKLPRVVGLTGVLIKGNEITNVATKLKELEITYRGNIITVSDTKEMENVMLYATKPTEVMVSFPHQEQVLTVTRLISAEIEKFYVSLDLMNIGVQPIRRSKSLQCLRDPSKKSFVKQLFNDFLYQMKEYGIYAASIAIISLIVEFDIKRRQAETLSVKLMHRTALTLCEKIRHLLVQKLQDMTYDDDDDNVNTEEVIMNFSTPKVQRFLMSLKVSFADKDPKDICCLVFVERRYTCKCIYGLLLNYIQSTPELRNVLTPQFMVGRNNISPDFESVLERKWQKSAIQQFRDGNANLMICSSVLEEGIDVQACNHVFILDPVKTFNMYVQSKGRARTTEAKFVLFTADKEREKTIQQIYQYRKAHNDIAEYLKDRVLEKTEPELYEIKGHFQDDIDPFTNENGAVLLPNNALAILHRYCQTIPTDAFGFVIPWFHVLQEDERDRIFGVSAKGKHVISINMPVNCMLRDTIYSDPMDNVKTAKISAAFKACKVLYSLGELNERFVPKTLKERVASIADVHFEHWNKYGDSVTATVNKADKSKDRTYKTECPLEFYDALPRVGEICYAYEIFLEPQFESCEYTEHMYLNLQTPRNYAILLRNKLPRLAEMPLFSNQGKLHVRVANAPLEVIIQNSEQLELLHQFHGMVFRDILKIWHPFFVLDRRSKENSYLVVPLILGAGEQKCFDWELMTNFRRLPQSHGSNVQQREQQPAPRPEDFEGKIVTQWYANYDKPMLVTKVHRELTPLSYMEKNQQDKTYYEFTMSKYGNRIGDVVHKDKFMIEVRDLTEQLTFYVHNRGKFNAKSKAKMKVILIPELCFNFNFPGDLWLKLIFLPSILNRMYFLLHAEALRKRFNTYLNLHLLPFNGTDYMPRPLEIDYSLKRNVDPLGNVIPTEDIEEPKSLLEPMPTKSIEASVANLEITEFENPWQKYMEPVDLSRNLLSTYPVELDYYYHFSVGNVCEMNEMDFEDKEYWAKNQFHMPTGNIYGNRTPAKTNANVPALMPSKPTVRGKVKPLLILQKTVSKEHITPAEQGEFLAAITASSAADVFDMERLEILGNSFLKLSATLYLASKYSDWNEGTLTEVKSKLVSNRNLLFCLIDADIPKTLNTIQFTPRYTWLPPGISLPHNVLALWRENPEFAKIIGPHNLRDLALGDEESLVKGNCSDINYNRFVEGCRANGQSFYAGADFSSEVNFCVGLVTIPNKVIADTLEALLGVIVKNYGLQHAFKMLEYFKICRADIDKPLTQLLNLELGGKKMRANVNTTEIDGFLINHYYLEKNLGYTFKDRRYLLQALTHPSYPTNRITGSYQELEFIGNAILDFLISAYIFENNTKMNPGALTDLRSALVNNTTLACICVRHRLHFFILAENAKLSEIISKFVNFQESQGHRVTNYVRILLEEADVQPTPLDLDDELDMTELPHANKCISQEAEKGVPPKGEFNMSTNVDVPKALGDVLEALIAAVYLDCRDLQRTWEVIFNLFEPELQEFTRKVPINHIRQLVEHKHAKPVFSSPIVEGETVMVSCQFTCMEKTIKVYGFGSNKDQAKLSAAKHALQQLSKCDA
1BGJ , Knot 163 394 0.82 40 215 371
MKTQVAIIGAGPSGLLLGQLLHKAGIDNVILERQTPDYVLGRIRAGVLEQGMVDLLREAGVDRRMARDGLVHEGVEIAFAGQRRRIDLKRLSGGKTVTVYGQTEVTRDLMEAREASGATTVYQAAEVRLHDLQGERPYVTFERDGERLRLDCDYIAGCDGFRGISRQSIPAERLKVFERVYPFGWLGLLADTPPVSHELIYANHPRGFALCSQRSATRSRYYVQVPLTEKVEDWSDERFWTELKARLPAEVAEKLVTGPSLEKSIAPLRSFVVEPMQHGRLFLAGDAAHIVPPTGAKGLNLAASDVSTLYRLLLKAYREGRGELLERYSAICLRRIWKAERFSWWMTSVLHRFPDTDAFSQRIQQTELEYYLGSEAGLATIAENYVGLPYEEIE
8QMH , Knot 3 6 0.29 4 3 3
GGCCCC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8YIG_1)}(2) \setminus P_{f(1BGJ_1)}(2)|=162\), \(|P_{f(1BGJ_1)}(2) \setminus P_{f(8YIG_1)}(2)|=11\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00101010100101100100001110110101001111111001000100110011001111000101100011110000010111011001100100111000100001111010111011000011100101111000001010011001101101100001101111011110100100110010010100010110100000100111010010011101100001101001101010010101011011101100000100100100001100110011001000110110111101110101000010010101100011010001001110010010000000010000111010010100111010101100010010011110000000010111100100010100110101111000101010011000100011001001010111000110011010100011110110010101000101000010111101000000010010000010001100100011000010100101010001011000011111100111110000001100111111111011000000011110101001101011100110001000110010010101110100110011010001110010001101101010010001001010100100000000000011010011101101001001110101000000001010100100011110001101101111000101010110111011100000101100101111001101101111100000000011111111110000101011001001100010010000001110100101011001010000111100100010110010000000000010100010011011000011101001000101010001010100010101111101010101110111011111011001011101011000100010101111010001101101000100010111011100010010011011100010101101010010011000101101000110001101000001011010010010100000110001011010101000110001011111100101010101111100010000101100101111101001101101001011100110101010110000010010100100011000011101101011001001010100011111101100111110001011011110010011110000110100001000011010010100101110100010101111011001110010111111100011001101100101001010011001101011100101010000101111000010001100100000110110010010001010000101110111011101011000001011110010011100001101010001011111001010011001101000010010001011100101010110100010100110100010001001111010101000101101110110111111010000100010111011010100100011100100110000101110011101001110001001000101011100000101011001100100001
Pair \(Z_2\) Length of longest common subsequence
8YIG_1,1BGJ_1 173 4
8YIG_1,8QMH_1 363 2
1BGJ_1,8QMH_1 214 2

Newick tree

 
[
	8QMH_1:16.62,
	[
		8YIG_1:86.5,1BGJ_1:86.5
	]:78.12
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2115 }{\log_{20} 2115}-\frac{394}{\log_{20}394})=428.\)
Status Protein1 Protein2 d d1/2
Query variables 8YIG_1 1BGJ_1 554 337.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]