CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5WJV_1 4IBK_1 6DDG_1 Letter Amino acid
22 10 1 D Aspartic acid
29 11 4 L Leucine
5 5 1 F Phenylalanine
0 1 2 W Tryptophan
15 7 7 V Valine
38 9 5 A Alanine
24 9 2 Q Glutamine
17 2 0 E Glutamic acid
4 5 1 H Histidine
8 2 1 M Methionine
27 2 4 N Asparagine
22 13 1 I Isoleucine
2 11 1 P Proline
24 8 5 S Serine
14 7 8 R Arginine
19 10 5 G Glycine
15 9 9 K Lycine
17 3 4 T Threonine
1 1 0 Y Tyrosine
1 4 1 C Cysteine

5WJV_1|Chains A, AA[auth B], BA[auth D], B[auth C], CA[auth F], C[auth E], DA[auth H], D[auth G], EA[auth J], E[auth I], FA[auth L], F[auth K], GA[auth N], G[auth M], HA[auth P], H[auth O], IA[auth R], I[auth Q], JA[auth T], J[auth S], KA[auth V], K[auth U], LA[auth X], L[auth W], MA[auth Z], M[auth Y], NA[auth b], N[auth a], OA[auth d], O[auth c], PA[auth f], P[auth e], QA[auth h], Q[auth g], RA[auth j], R[auth i], SA[auth l], S[auth k], TA[auth n], T[auth m], U[auth o], V[auth p], W[auth q], X[auth r], Y[auth s], Z[auth t]|Flagellin|Bacillus subtilis (1423)
>4IBK_1|Chains A, B|Polymerase cofactor VP35|Ebola virus (128952)
>6DDG_10|Chain J|50S ribosomal protein L28|Staphylococcus aureus (1280)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5WJV , Knot 128 304 0.80 38 164 286
MRINHNIAALNTLNRLSSNNSASQKNMEKLSSGLRINRAGDDAAGLAISEKMRGQIRGLEMASKNSQDGISLIQTAEGALTETHAILQRVRELVVQAGNTGTQDKATDLQSIQDEISALTDEIDGISNRTEFNGKKLLDGTYKVDTATPANQKNLVFQIGANATQQISVNIEDMGADALGIKEADGSIAALHSVNDLDVTKFADNAADCADIGFDAQLKVVDEAINQVSSQRVKLGAVQNRLEHTINNLSASGENLTAAESRIRDVDMAKEMSEFTKNNILSQASQAMLAQANQQPQNVLQLLR
4IBK , Knot 66 129 0.82 40 104 127
GHMGKPDISAKDLRNIMYDHLPGFGTAFHQLVQVICKLGKDSNSLDIIHAEFQASLAEGDSPQCALIQITKRVPIFQDAAPPVIHIRSRGDIPRACQKSLRPVPPSPKIDRGWVCVFQLQDGKTLGLKI
6DDG , Knot 34 62 0.75 36 52 60
MGKQCFVTGRKASTGNRRSHALNSTKRRWNANLQKVRILVDGKPKKVWVSARALKSGKVTRV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5WJV_1)}(2) \setminus P_{f(4IBK_1)}(2)|=110\), \(|P_{f(4IBK_1)}(2) \setminus P_{f(5WJV_1)}(2)|=50\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010001111001001000001000010010011010011001111110001010101101100000011011001011100001110010011101100100001001001000101100010110000010100110100010010110000111011101000101010011101111001010111100100101001100110010111010101100110010000101111000100010010101001011000100101100100100001100100111101000100110110
Pair \(Z_2\) Length of longest common subsequence
5WJV_1,4IBK_1 160 3
5WJV_1,6DDG_1 152 3
4IBK_1,6DDG_1 112 3

Newick tree

 
[
	5WJV_1:84.09,
	[
		6DDG_1:56,4IBK_1:56
	]:28.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{433 }{\log_{20} 433}-\frac{129}{\log_{20}129})=91.2\)
Status Protein1 Protein2 d d1/2
Query variables 5WJV_1 4IBK_1 112 80.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]