CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5AZV_1 2AWZ_1 1NOJ_1 Letter Amino acid
6 24 22 H Histidine
16 48 29 S Serine
17 54 62 A Alanine
8 39 63 R Arginine
12 34 48 G Glycine
11 29 37 P Proline
0 9 12 W Tryptophan
9 19 46 N Asparagine
19 26 65 E Glutamic acid
39 57 78 L Leucine
24 32 48 K Lycine
9 12 21 M Methionine
15 14 38 F Phenylalanine
13 38 36 T Threonine
8 23 36 Y Tyrosine
18 26 49 D Aspartic acid
1 19 9 C Cysteine
16 16 30 Q Glutamine
21 24 51 I Isoleucine
14 37 62 V Valine

5AZV_1|Chains A, B|Peroxisome proliferator-activated receptor gamma|Homo sapiens (9606)
>2AWZ_1|Chains A, B|Genome polyprotein|Hepatitis C virus (3052230)
>1NOJ_1|Chain A|GLYCOGEN PHOSPHORYLASE|Oryctolagus cuniculus (9986)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5AZV , Knot 121 276 0.82 38 174 264
GALNPESADLRALAKHLYDSYIKSFPLTKAKARAILTGKTTDKSPFVIYDMNSLMMGEDKIKFKHITPLQEQSKEVAIRIFQGCQFRSVEAVQEITEYAKSIPGFVNLDLNDQVTLLKYGVHEIIYTMLASLMNKDGVLISEGQGFMTREFLKSLRKPFGDFMEPKFEFAVKFNALELDDSDLAIFIAVIILSGDRPGLLNVKPIEDIQDNLLQALELQLKLNHPESSQLFAKLLQKMTDLRQIVTEHVQLLQVIKKTETDMSLHPLLQEIYKDLY
2AWZ , Knot 232 580 0.84 40 276 536
SMSYTWTGALITPCAAEESKLPINALSNSLLRHHNMVYATTSRSAGLRQKKVTFDRLQVLDDHYRDVLKEMKAKASTVKAKLLSVEEACKLTPPHSAKSKFGYGAKDVRNLSSKAVNHIHSVWKDLLEDTVTPIDTTIMAKNEVFCVQPEKGGRKPARLIVFPDLGVRVCEKMALYDVVSTLPQVVMGSSYGFQYSPGQRVEFLVNTWKSKKNPMGFSYDTRCFDSTVTENDIRVEESIYQCCDLAPEARQAIKSLTERLYIGGPLTNSKGQNCGYRRCRASGVLTTSCGNTLTCYLKASAACRAAKLQDCTMLVNGDDLVVICESAGTQEDAASLRVFTEAMTRYSAPPGDPPQPEYDLELITSCSSNVSVAHDASGKRVYYLTRDPTTPLARAAWETARHTPVNSWLGNIIMYAPTLWARMILMTHFFSILLAQEQLEKALDCQIYGACYSIEPLDLPQIIERLHGLSAFSLHSYSPGEINRVASCLRKLGVPPLRAWRHRARNVRARLLSRGGRAAICGKYLFNWAVKTKLKLTPIAAAGRLDLSSWFTAGYSGGDIYHGVSHARPRHHHHHHHHHH
1NOJ , Knot 321 842 0.85 40 317 770
SRPLSDQEKRKQISVRGLAGVENVTELKKNFNRHLHFTLVKDRNVATPRDYYFALAHTVRDHLVGRWIRTQQHYYEKDPKRIYYLSLEFYMGRTLQNTMVNLALENACDEATYQLGLDMEELEEIEEDAGLGNGGLGRLAACFLDSMATLGLAAYGYGIRYEFGIFNQKICGGWQMEEADDWLRYGNPWEKARPEFTLPVHFYGRVEHTSQGAKWVDTQVVLAMPYDTPVPGYRNNVVNTMRLWSAKAPNDFNLKDFNVGGYIQAVLDRNLAENISRVLYPNDNFFEGKELRLKQEYFVVAATLQDIIRRFKSSKFGCRDPVRTNFDAFPDKVAIQLNDTHPSLAIPELMRVLVDLERLDWDKAWEVTVKTCAYTNHTVIPEALERWPVHLLETLLPRHLQIIYEINQRFLNRVAAAFPGDVDRLRRMSLVEEGAVKRINMAHLCIAGSHAVNGVARIHSEILKKTIFKDFYELEPHKFQNKTNGITPRRWLVLCNPGLAEIIAERIGEEYISDLDQLRKLLSYVDDEAFIRDVAKVKQENKLKFAAYLEREYKVHINPNSLFDVQVKRIHEYKRQLLNCLHVITLYNRIKKEPNKFVVPRTVMIGGKPAPGYHMAKMIIKLITAIGDVVNHDPVVGDRLRVIFLENYRVSLAEKVIPAADLSEQISTAGTEASGTGNMKFMLNGALTIGTMDGANVEMAEEAGEENFFIFGMRVEDVDRLDQRGYNAQEYYDRIPELRQIIEQLSSGFFSPKQPDLFKDIVNMLMHHDRFKVFADYEEYVKCQERVSALYKNPREWTRMVIRNIATSGKFSSDRTIAQYAREIWGVEPSRQRLPAPDEKIP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5AZV_1)}(2) \setminus P_{f(2AWZ_1)}(2)|=46\), \(|P_{f(2AWZ_1)}(2) \setminus P_{f(5AZV_1)}(2)|=148\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111010010101110010000100111001010111010000001111001001111000101001011000000111011010010010110010001001111101010001011001100110011101100011110010111000110010011101101010111010110100001111111111010011110101100100011011010101001000011101100100100110001011011000000101011100100010
Pair \(Z_2\) Length of longest common subsequence
5AZV_1,2AWZ_1 194 4
5AZV_1,1NOJ_1 181 4
2AWZ_1,1NOJ_1 139 4

Newick tree

 
[
	5AZV_1:10.61,
	[
		1NOJ_1:69.5,2AWZ_1:69.5
	]:31.11
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{856 }{\log_{20} 856}-\frac{276}{\log_{20}276})=158.\)
Status Protein1 Protein2 d d1/2
Query variables 5AZV_1 2AWZ_1 202 147
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]