CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1JRP_1 5WIX_1 2XUJ_1 Letter Amino acid
36 10 37 R Arginine
27 22 27 D Aspartic acid
27 25 27 E Glutamic acid
8 19 18 Y Tyrosine
73 11 50 A Alanine
50 38 58 L Leucine
12 37 8 K Lycine
13 9 8 M Methionine
16 28 27 F Phenylalanine
20 41 32 S Serine
7 9 10 H Histidine
17 5 22 Q Glutamine
22 46 15 I Isoleucine
26 12 45 P Proline
3 7 13 W Tryptophan
20 25 45 V Valine
8 51 17 N Asparagine
42 25 51 G Glycine
23 20 27 T Threonine
12 5 6 C Cysteine

1JRP_1|Chains A, C, E, G|xanthine dehydrogenase, chain A|Rhodobacter capsulatus (1061)
>5WIX_1|Chain A|p-47 protein|Clostridium botulinum E1 str. 'BoNT E Beluga' (536233)
>2XUJ_1|Chains A, B|ACETYLCHOLINESTERASE|MUS MUSCULUS (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1JRP , Knot 181 462 0.80 40 216 417
MEIAFLLNGETRRVRIEDPTQSLLELLRAEGLTGTKEGCNEGDCGACTVMIRDAAGSRAVNACLMMLPQIAGKALRTIEGIAAPDGRLHPVQQAMIDHHGSQCGFCTPGFIVSMAAAHDRDRKDYDDLLAGNLCRCTGYAPILRAAEAAAGEPPADWLQADAAFTLAQLSSGVRGQTAPAFLPETSDALADWYLAHPEATLIAGGTDVSLWVTKALRDLPEVAFLSHCKDLAQIRETPDGYGIGAGVTIAALRAFAEGPHPALAGLLRRFASEQVRQVATIGGNIANGSPIGDGPPALIAMGASLTLRRGQERRRMPLEDFFLEYRKQDRRPGEFVESVTLPKSAPGLRCYKLSKRFDQDISAVCGCLNLTLKGSKIETARIAFGGMAGVPKRAAAFEAALIGQDFREDTIAAALPLLAQDFTPLSDMRASAAYRMNAAQAMALRYVRELSGEAVAVLEVMP
5WIX , Knot 176 445 0.80 40 201 406
MRGSHHHHHHGSLVPRGSNTYGWDIVYGCSNRVVNKHLKNYIDENKIEFLYSDINKKQEIKMIFDNWEIINGGTSNFLRIKIFIKEGYFKFRNTTVDLSGVIPILEIKLDFFNDASNPHIKELKFSFGNKTNDDIKVIVSDLSGKLYEEDEFYFNKLLISAFINNEKQVSYIFASLNVTSNIVWMNPKQFKFVYYSPTDNNDGYLCILSVVTNRDISKLSTNVDSSILSENSEVGLLISEKLFMENLLLPKLSSNMGSNITSNNFNVINTSDTTGIIKNKNTLNWYGIKVAALYYYPEINDFSMELFEGNKLKTRLSGIVKLTGYERIYSKLNMECITKFIYDNKNKKVSFEIYSTPIMECRPIFGLLDGIPAAVAKSVGNWSLKSFRDSLAFELANNFTDIINDIVNWNNLKISEVTNIILNVGFCIQGNMNPGSAWSHPQFEK
2XUJ , Knot 217 543 0.84 40 249 505
EGREDPQLLVRVRGGQLRGIRLKAPGGPVSAFLGIPFAEPPVGSRRFMPPEPKRPWSGVLDATTFQNVCYQYVDTLYPGFEGTEMWNPNRELSEDCLYLNVWTPYPRPASPTPVLIWIYGGGFYSGAASLDVYDGRFLAQVEGAVLVSMNYRVGTFGFLALPGSREAPGNVGLLDQRLALQWVQENIAAFGGDPMSVTLFGESAGAASVGMHILSLPSRSLFHRAVLQSGTPNGPWATVSAGEARRRATLLARLVGCPPGGAGGNDTELIACLRTRPAQDLVDHEWHVLPQESIFRFSFVPVVDGDFLSDTPEALINTGDFQDLQVLVGVVKDEGSAFLVYGVPGFSKDNESLISRAQFLAGVRIGVPQASDLAAEAVVLHYTDWLHPEDPTHLRDAMSAVVGDHNVVCPVAQLAGRLAAQGARVYAYIFEHRASTLTWPLWMGVPHGYEIEFIFGLPLDPSLNYTTEERIFAQRLMKYWTNFARTGDPNDPRDSKSPQWPPYTTAAQQYVSLNLKPLEVRRGLRAQTCAFWNRFLPKLLSAT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1JRP_1)}(2) \setminus P_{f(5WIX_1)}(2)|=96\), \(|P_{f(5WIX_1)}(2) \setminus P_{f(1JRP_1)}(2)|=81\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101111101000010100100011011010110100010001001100111001110011010111110111011001011111010101100111000100011001111101111000000000011110100001011110110111101110110101110110100110100111111000011101011010101111100101110011001101111000001101000101011111101111011101101111111001100010011011101101011101111111111010100100000111001110000000011011001011001111000010001000101101010101010010010111111111100111101111100100001111111110010110010101100101101111001001010111110111
Pair \(Z_2\) Length of longest common subsequence
1JRP_1,5WIX_1 177 4
1JRP_1,2XUJ_1 149 4
5WIX_1,2XUJ_1 170 4

Newick tree

 
[
	5WIX_1:90.48,
	[
		1JRP_1:74.5,2XUJ_1:74.5
	]:15.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{907 }{\log_{20} 907}-\frac{445}{\log_{20}445})=122.\)
Status Protein1 Protein2 d d1/2
Query variables 1JRP_1 5WIX_1 157 156
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]