CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5WJZ_1 1JPH_1 2SGF_1 Letter Amino acid
22 17 7 D Aspartic acid
24 22 2 Q Glutamine
22 11 7 I Isoleucine
24 13 22 S Serine
14 26 14 V Valine
1 6 4 C Cysteine
4 21 2 H Histidine
29 43 7 L Leucine
39 39 15 A Alanine
16 27 2 E Glutamic acid
15 12 1 K Lycine
8 11 2 M Methionine
2 26 5 P Proline
17 17 28 T Threonine
14 25 8 R Arginine
27 6 10 N Asparagine
20 33 32 G Glycine
5 17 5 F Phenylalanine
0 5 2 W Tryptophan
1 11 10 Y Tyrosine

5WJZ_1|Chains A, AA[auth B], BA[auth D], B[auth C], CA[auth F], C[auth E], DA[auth H], D[auth G], EA[auth J], E[auth I], FA[auth L], F[auth K], GA[auth N], G[auth M], HA[auth P], H[auth O], IA[auth R], I[auth Q], JA[auth T], J[auth S], KA[auth V], K[auth U], LA[auth X], L[auth W], MA[auth Z], M[auth Y], NA[auth b], N[auth a], OA[auth d], O[auth c], PA[auth f], P[auth e], QA[auth h], Q[auth g], RA[auth j], R[auth i], SA[auth l], S[auth k], TA[auth n], T[auth m], U[auth o], V[auth p], W[auth q], X[auth r], Y[auth s], Z[auth t]|Flagellin|Bacillus subtilis (1423)
>1JPH_1|Chain A|UROPORPHYRINOGEN DECARBOXYLASE|Homo sapiens (9606)
>2SGF_1|Chain A[auth E]|Streptogrisin B|Streptomyces griseus (1911)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5WJZ , Knot 127 304 0.79 38 163 285
MRINHNIAALNTLNRLSSNNSASQKNMEKLSSGLRINRAGDDAAGLAISEKMRGQIRGLEMASKNSQDGISLIQTAEGALTETHAILQRVRELVVQAGNTGTQDKATDLQSIQDGISALTDEIDGISNRTEFNGKKLLDGTYKVDTATPANQKNLVFQIGANATQQISVNIEDMGADALGIKEADGSIAALHSVNDLDVTKFADNAADCADIGFDAQLKVVDEAINQVSSQRAKLGAVQNRLEHTINNLSASGENLTAAESRIRDVDMAKEMSEFTKNNILSQASQAMLAQANQQPQNVLQLLR
1JPH , Knot 158 388 0.81 40 207 356
MGHHHHHHHHHHSSGHIEGRHMEANGLGPQGFPELKNDTFLRAAWGEETDYTPVWCMRQAGRYLPEFRETRAAQDFFSTCRSPEACCELTLQPLRRFPLDAAIIFSDILVVPQALGMEVTMVPGKGPSFPEPLREEQDLERLRDPEVVASELGYVFQAITLTRQRLAGRVPLIGFAGAPWTLMTYMVEGGGSSTMAQAKRWLYQRPQASHQLLRILTDALVPYLVGQVVAGAQALQLFESHAGHLGPQLFNKFALPYIRDVAKQVKARLREAGLAPVPMITFAKDGHFALEELAQAGYEVVGLDWTVAPKKARECVGKTVTLQGNLDPCALYASEEEIGQLVKQMLDDFGPHRYIANLGHGLYPDMDPEHVGAFVDAVHKHSRLLRQN
2SGF , Knot 82 185 0.77 40 117 171
ISGGDAIYSSTGRCSLGFNVRSGSTYYFLTAGHCTDGATTWWANSARTTVLGTTSGSSFPNNDYGIVRYTNTTIPKDGTVGGQDITSAANATVGMAVTRRGSTTGTHSGSVTALNATVNYGGGDVVYGMIRTNVCAEPGDSGGPLYSGTRAIGLTSGGSGNCSSGGTTFFQPVTEALSAYGVSVY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5WJZ_1)}(2) \setminus P_{f(1JPH_1)}(2)|=70\), \(|P_{f(1JPH_1)}(2) \setminus P_{f(5WJZ_1)}(2)|=114\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010001111001001000001000010010011010011001111110001010101101100000011011001011100001110010011101100100001001001001101100010110000010100110100010010110000111011101000101010011101111001010111100100101001100110010111010101100110010000101111000100010010101001011000100101100100100001100100111101000100110110
Pair \(Z_2\) Length of longest common subsequence
5WJZ_1,1JPH_1 184 4
5WJZ_1,2SGF_1 154 4
1JPH_1,2SGF_1 194 3

Newick tree

 
[
	1JPH_1:99.69,
	[
		5WJZ_1:77,2SGF_1:77
	]:22.69
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{692 }{\log_{20} 692}-\frac{304}{\log_{20}304})=107.\)
Status Protein1 Protein2 d d1/2
Query variables 5WJZ_1 1JPH_1 135 120
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]