CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4JTW_1 8QXL_1 5JGJ_1 Letter Amino acid
54 30 20 A Alanine
20 13 4 C Cysteine
28 45 20 E Glutamic acid
50 33 17 S Serine
9 7 6 W Tryptophan
42 40 7 R Arginine
15 26 12 N Asparagine
26 44 15 D Aspartic acid
18 29 14 Q Glutamine
31 36 19 G Glycine
25 45 18 I Isoleucine
32 37 18 V Valine
18 19 6 H Histidine
58 54 27 L Leucine
31 36 15 P Proline
41 23 21 T Threonine
29 46 19 K Lycine
12 12 7 M Methionine
14 26 14 F Phenylalanine
23 25 10 Y Tyrosine

4JTW_1|Chains A, B|Genome polyprotein|HEPATITIS C VIRUS (420174)
>8QXL_1|Chains A, B, C, D|Deoxynucleoside triphosphate triphosphohydrolase SAMHD1|Homo sapiens (9606)
>5JGJ_1|Chain A|UbiE/COQ5 family methyltransferase, putative|Aspergillus fumigatus Z5 (1437362)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4JTW , Knot 231 576 0.85 40 274 533
SMSYTWTGALITPCAAEESKLPINPLSNSLLRHHNMVYATTSRSASLRQKKVTFDRLQVLDDHYRDVLKEMKAKASTVKAKLLSIEEACKLTPPHSAKSKFGYGAKDVRNLSSRAVNHIRSVWEDLLEDTETPIDTTIMAKSEVFCVQPEKGGRKPARLIVFPDLGVRVCEKMALYDVVSTLPQAVMGSSYGFQYSPKQRVEFLVNTWKSKKCPMGFSYDTRCFDSTVTESDIRVEESIYQCCDLAPEARQAIRSLTERLYIGGPLTNSKGQNCGYRRCRASGVLTTSCGNTLTCYLKATAACRAAKLQDCTMLVNGDDLVVICESAGTQEDAAALRAFTEAMTRYSAPPGDPPQPEYDLELITSCSSNVSVAHDASGKRVYYLTRDPTTPLARAAWETARHTPINSWLGNIIMYAPTLWARMILMTHFFSILLAQEQLEKALDCQIYGACYSIEPLDLPQIIERLHGLSAFTLHSYSPGEINRVASCLRKLGVPPLRTWRHRARSVRAKLLSQGGRAATCGRYLFNWAVRTKLKLTPIPAASQLDLSGWFVAGYSGGDIYHSLSRARPRHHHHHH
8QXL , Knot 245 626 0.84 40 282 586
MQRADSEQPSKRPRCDDSPRTPSNTPSAEADWSPGLELHPDYKTWGPEQVCSFLRRGGFEEPVLLKNIRENEITGALLPCLDESRFENLGVSSLGERKKLLSYIQRLVQIHVDTMKVINDPIHGHIELHPLLVRIIDTPQFQRLRYIKQLGGGYYVFPGASHNRFEHSLGVGYLAGCLVHALGEKQPELQISERDVLCVQIAGLCHDLGHGPFSHMFDGRFIPLARPEVKWTHEQGSVMMFEHLINSNGIKPVMEQYGLIPEEDICFIKEQIVGPLESPVEDSLWPYKGRPENKSFLYEIVSNKRNGIDVDKWDYFARDCHHLGIQNNFDYKRFIKFARVCEVDNELRICARDKEVGNLYDMFHTRNSLHRRAYQHKVGNIIDTMITDAFLKADDYIEITGAGGKKYRISTAIDDMEAYTKLTDNIFLEILYSTDPKLKDAREILKQIEYRNLFKYVGETQPTGQIKIKREDYESLPKEVASAKPKVLLDVKLKAEDFIVDVINMDYGMQEKNPIDHVSFYCKTAPNRAIRITKNQVSQLLPEKFAEQLIRVYCKKVDRKSLYAARQYFVQWCADRNFTKPQDGDVIAPLITPQKKEWNDSTSVQNPTRLREASKSRVQLFKDDPM
5JGJ , Knot 128 289 0.83 40 191 275
GHMSKSDYIQNMFQTKSFVDRYKYTEKLTGLYAQTLVDYSGVANTSQKPLIVLDNACGIGAVSSVLNHTLQDEAKKTWKLTCGDLSEGMLETTKRRLQDEGWVNAETKIVNALDTGLPDGHYTHVFVAFGFQSFPDANAALKECFRILASGGILASSTWQNFNWIPIMKAAIETIPGNLPFPTQKEFIALHNAGWDSESYIQSELEKLGFRDVKVIPVPKETSIPIDEFFEVCMMIIPYLLPKFWTEEQRESHEKDVPMVLRQYLQDTYGANGQVPLEAVALITTGLKP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4JTW_1)}(2) \setminus P_{f(8QXL_1)}(2)|=75\), \(|P_{f(8QXL_1)}(2) \setminus P_{f(4JTW_1)}(2)|=83\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010001011110101100001110110001100001101000001010000101001011000000110010101001010110100100101100100011011001001000110010011001100000110001110001101010011001101111101110100011100110011011110001100010001011100100000111100000010001000010100010000011101001100100010111110000100010000010111000010010001010110011010000111010011110001100001111011001100001111011010001011000000101100101001001000100111011100100011001110111011011101111001101111000100110001011000101101101100101101101000011010011001001111110010001001010110011011001001101110001010111110010101111110011010001001010000000
Pair \(Z_2\) Length of longest common subsequence
4JTW_1,8QXL_1 158 4
4JTW_1,5JGJ_1 171 4
8QXL_1,5JGJ_1 175 4

Newick tree

 
[
	5JGJ_1:88.86,
	[
		4JTW_1:79,8QXL_1:79
	]:9.86
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1202 }{\log_{20} 1202}-\frac{576}{\log_{20}576})=160.\)
Status Protein1 Protein2 d d1/2
Query variables 4JTW_1 8QXL_1 208 198
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]