CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2DBB_1 8GMO_1 2JNI_1 Letter Amino acid
1 1 2 C Cysteine
0 4 0 H Histidine
16 23 1 L Leucine
3 21 0 F Phenylalanine
6 62 1 A Alanine
0 3 2 W Tryptophan
5 19 3 Y Tyrosine
12 31 4 V Valine
6 17 0 N Asparagine
12 29 0 D Aspartic acid
3 19 0 P Proline
13 18 6 R Arginine
3 24 0 Q Glutamine
11 26 0 E Glutamic acid
4 49 1 G Glycine
23 32 1 I Isoleucine
16 22 0 K Lycine
4 14 0 M Methionine
9 25 0 S Serine
4 26 0 T Threonine

2DBB_1|Chains A, B|Putative HTH-type transcriptional regulator PH0061|Pyrococcus horikoshii (70601)
>8GMO_1|Chains AA[auth 1], AC[auth 2], A[auth 0], BA[auth 4], BB[auth 5], BC[auth 6], B[auth 3], CA[auth 8], CB[auth 9], CC[auth AA], C[auth 7], DA[auth AC], DB[auth AD], DC[auth AE], D[auth AB], EA[auth AG], EB[auth AH], EC[auth AI], E[auth AF], FA[auth AK], FB[auth AL], FC[auth AM], F[auth AJ], GA[auth AO], GB[auth AP], GC[auth AQ], G[auth AN], HA[auth AS], HB[auth AT], HC[auth AU], H[auth AR], IA[auth AW], IB[auth AX], IC[auth AY], I[auth AV], JA[auth Aa], JB[auth Ab], JC[auth Ac], J[auth AZ], KA[auth Ad], KB[auth Ae], KC[auth Af], LA[auth Ah], LB[auth Ai], LC[auth Aj], L[auth Ag], MA[auth Al], MB[auth Am], MC[auth G], M[auth Ak], NA[auth J], NB[auth K], NC[auth L], N[auth H], OA[auth N], OB[auth O], OC[auth P], O[auth M], PA[auth R], PB[auth S], PC[auth T], P[auth Q], QA[auth V], QB[auth W], QC[auth X], Q[auth U], RA[auth Z], RB[auth b], RC[auth c], R[auth Y], SA[auth e], SB[auth f], SC[auth g], S[auth d], TA[auth i], TB[auth j], TC[auth k], T[auth h], UA[auth m], U[auth l], VA[auth o], VB[auth p], V[auth n], WA[auth r], WB[auth s], W[auth q], XA[auth u], XB[auth v], X[auth t], YB[auth x], Y[auth w], ZB[auth z], Z[auth y]|Mature major capsid protein|Tequatrovirus T4 (10665)
>2JNI_1|Chain A|Arenicin-2|Arenicola marina (6344)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2DBB , Knot 70 151 0.77 36 101 140
MDCMRKLDRVDMQLVKILSENSRLTYRELADILNTTRQRIARRIDKLKKLGIIRKFTIIPDIDKLGYMYAIVLIKSKVPSDADKVISEISDIEYVKSVEKGVGRYNIIVRLLLPKDIKDAENLISEFLQRIKNAENVEVILISEVRKFEII
8GMO , Knot 186 465 0.82 40 227 429
AEIGGDHGYNATNIAAGQTSGAVTQIGPAVMGMVRRAIPNLIAFDICGVQPMNSPTGQVFALRAVYGKDPVAAGAKEAFHPMYGPDAMFSGQGAAKKFPALAASTQTTVGDIYTHFFQETGTVYLQASVQVTIDAGDEDEDEDEDATDAAKLDAEIKKQMEAGALVEIAEGMATSIAELQEGFNGSTDNPWNEMGFRIDKQVIEAKSRQLKAAYSIELAQDLRAVHGMDADAELSGILATEIMLEINREVVDWINYSAQVGKSGMTLTPGSKAGVFDFQDPIDIRGARWAGESFKALLFQIDKEAVEIARQTGRGEGNFIIASRNVVNVLASVDTGISYAAQGLATGFSTDTTKSVFAGVLGGKYRVYIDQYAKQDYFTVGYKGPNEMDAGIYYAPYVALTPLRGSDPKNFQPVMGFKTRYGIGINPFAESAAQAPASRIQSGMPSILNSLGKNAYFRRVYVKGI
2JNI , Knot 13 21 0.62 18 19 19
RWCVYAYVRIRGVLVRYRRCW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2DBB_1)}(2) \setminus P_{f(8GMO_1)}(2)|=24\), \(|P_{f(8GMO_1)}(2) \setminus P_{f(2DBB_1)}(2)|=150\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001001001010110110000010000110110000001100100100111100101110100110101111100011001001100100100100100111000111011110010010011001100100100101111001001011
Pair \(Z_2\) Length of longest common subsequence
2DBB_1,8GMO_1 174 4
2DBB_1,2JNI_1 98 2
8GMO_1,2JNI_1 216 3

Newick tree

 
[
	8GMO_1:10.64,
	[
		2DBB_1:49,2JNI_1:49
	]:60.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{616 }{\log_{20} 616}-\frac{151}{\log_{20}151})=134.\)
Status Protein1 Protein2 d d1/2
Query variables 2DBB_1 8GMO_1 169 109.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]