CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8URQ_1 3RDN_1 6XOL_1 Letter Amino acid
5 5 23 G Glycine
15 7 38 L Leucine
22 2 17 S Serine
8 8 15 V Valine
9 2 17 Q Glutamine
12 4 23 E Glutamic acid
12 5 23 I Isoleucine
4 7 12 M Methionine
13 7 22 T Threonine
8 7 11 P Proline
1 0 8 W Tryptophan
7 1 12 Y Tyrosine
7 6 31 A Alanine
8 1 19 R Arginine
13 6 23 N Asparagine
13 2 28 D Aspartic acid
4 0 3 C Cysteine
1 0 9 H Histidine
10 4 31 K Lycine
10 0 21 F Phenylalanine

8URQ_1|Chain A[auth F]|Meiotic recombination protein REC104|Saccharomyces cerevisiae S288C (559292)
>3RDN_1|Chain A|ANTIFREEZE PROTEIN RD3 TYPE III|Lycodichthys dearborni (8201)
>6XOL_1|Chain A|Lysozyme, DCN1-like protein 1 chimera|Enterobacteria phage T4 (10665)
Protein code c LZ-complexity LZ(w) Length n=|w| LZ(w)n/log20n pw(1) pw(2) pw(3) Sequence w=f(c)
8URQ , Knot 85 182 0.81 40 130 178
MSIEEEDTNKITCTQDFLHQYFVTERVSIQFGLNNKTVKRINKDEFDKAVNCIMSWTNYPKPGLKRTASTYLLSNSFKKSATVSLPFILGDPVCMPKRVESNNNDTCLLYSDTLYDDPLIQRNDQAGDEIEDEFSFTLLRSEVNEIRPISSSSTAQILQSDYSALMYERQASNGSIFQFSSP
3RDN , Knot 40 74 0.77 32 64 71
MNKASVVANQLIPINTALTLIMMKAEVVTPMGIPAEEIPNLVGMQVNRAVPLGTTLMPDMVKNYEDGTTSPGLK
6XOL , Knot 162 386 0.83 40 216 363
MHHHHHHSSGVDLGTENLYFQSNAMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWAEAAVNLAKSRWYNQTPNRTKRVITTFATGTWDAYKNLRKKLEQLYNRYKDPQDENKIGIDGIQQFCDDLALDPASISVLIIAWKFRAATQCEFSKQEFMDGMTELGCDSIEKLKAQIPKMEQELKEPGRFKDFYQFTFNFAKNPGQKGLDLEMAIAYWNLVLNGRFKFLDLWNKFLLEHHKRSIPKDTWNLLLDFSTMIADDMSNYDEEGAWPVLIDDFVEFARPQIAGTKSTTV

Let Pw(n) be the set of distinct subwords (intervals) in a word w. Let pw(n) be the cardinality of Pw(n). Let f(c) be the sequence in FASTA with 4-symbol Protein Data Bank code c.

|Pf(8URQ1)(2)Pf(3RDN1)(2)|=102, |Pf(3RDN1)(2)Pf(8URQ1)(2)|=36. Let Zk(x,y)=|Px(k)Py(k)|+|Py(k)Px(k)| be a LZ76 style (set of subwords) Jaccard distance numerator for P(k).Hydrophobic-polar version of Sequence 1:10100000001000001100011000101011100001001000010011001101000101110001000110001000101011111101101100100000000110000100011100000110010001010110001001011000001011000001110000100101101001
Pair Z2 Length of longest common subsequence
8URQ_1,3RDN_1 138 4
8URQ_1,6XOL_1 194 3
3RDN_1,6XOL_1 200 3

Newick tree

 
[
	6XOL_1:10.54,
	[
		8URQ_1:69,3RDN_1:69
	]:37.54
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is (0.85)(0.8)(256log2025674log2074)=59.0
Status Protein1 Protein2 d d1/2
Query variables 8URQ_1 3RDN_1 75 52
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
δ=αmin+(1α)max={dα=0,d1/2α=1/2