CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5MSJ_1 6NBL_1 7LBO_1 Letter Amino acid
12 13 5 N Asparagine
12 23 4 Q Glutamine
12 26 5 I Isoleucine
26 42 11 L Leucine
9 20 7 T Threonine
1 5 3 W Tryptophan
15 32 13 A Alanine
31 33 20 E Glutamic acid
5 13 5 H Histidine
5 11 3 M Methionine
10 26 6 R Arginine
3 4 6 C Cysteine
11 25 7 G Glycine
14 30 10 P Proline
6 28 5 S Serine
18 24 2 V Valine
14 22 7 D Aspartic acid
30 11 16 K Lycine
8 18 11 F Phenylalanine
7 9 0 Y Tyrosine

5MSJ_1|Chains A, AA[auth B], BA[auth D], B[auth C], C[auth E], D[auth F], E[auth G], F[auth H], G[auth I], H[auth J], I[auth K], J[auth L], K[auth M], L[auth N], M[auth O], N[auth P], O[auth Q], P[auth R], Q[auth S], R[auth T], S[auth U], T[auth V], U[auth W], V[auth X], W[auth Y], X[auth Z], Y[auth a], Z[auth b]|Proteasome activator complex subunit 1|Mus musculus (10090)
>6NBL_1|Chains A, B|Camphor 5-monooxygenase|Pseudomonas putida (303)
>7LBO_1|Chains A, B|Baculoviral IAP repeat-containing protein 5|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5MSJ , Knot 114 249 0.84 40 158 237
MATLRVHPEAQAKVDVFREDLCSKTENLLGSYFPKKISELDAFLKEPALNEANLSNLKAPLDIPVPDPVKEKEKEERKKQQEKEEKEEKKKGDEDDKGPPCGPVNCNEKIVVLLQRLKPEIKDVTEQLNLVTTWLQLQIPRIEDGNNFGVAVQEKVFELMTNLHTKLEGFHTQISKYFSERGDAVAKAAKQPHVGDYRQLVHELDEAEYQEIRLMVMEIRNAYAVLYDIILKNFEKLKKPRGETKGMIY
6NBL , Knot 171 415 0.82 40 214 397
MTTETIQSNANLAPLPPHVPEHLVFDFDMYNPSNLSAGVQEAWAVLQESNVPDLVWTRSNGGHWIATRGQLIREAYEDYRHFSSESPFIPREAGEAYDFIPTSMDPPEQRQFRALANQVVGMPVVDKLENRIQELASSLIESLRPQGQCNFTEDYAEPFPIRIFMLLAGLPEEDIPHLKYLTDQMTRPDGSMTFAEAKEALYDYLIPIIEQRRQKPGTDAISIVANGQVNGRPITSDEAKRMCGLLLVGGLDTVVNFLSFSMEFLAKSPEHRQELIERPERIPAASEELLRRFSLVADGRILTSDYEFHGVQLKKGDQILLPQMLSGLDERENAAPMHVDFSRQCVSHTTFGHGSHLCLGQHLARREIIVTLKEWLTRIPDFSIAPGAQIQHKSGIVSGVQALPLVWDPATTKAV
7LBO , Knot 71 146 0.80 38 109 138
GSHEMGAPTLPPAWQPFLKDHRISTFKNWPFLEGCACTPERMAEAGFIHCPTENEPDLAQCFFCFKELEGWEPDDDPIEEHKKHSSGCAFLSVKKQFEELTLGEFLKLDRERAKNKIAKETNNKKKEFEETAKKVRRAIEQLAAMD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5MSJ_1)}(2) \setminus P_{f(6NBL_1)}(2)|=48\), \(|P_{f(6NBL_1)}(2) \setminus P_{f(5MSJ_1)}(2)|=104\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101010101010110001000000111001100100101110011100101001011101111011000000000000000000000010000011101110000011111001010100100010110011010110100100111110001101100100010110001000100010111011001011000011001001000010111101001011100111001001001010001110
Pair \(Z_2\) Length of longest common subsequence
5MSJ_1,6NBL_1 152 4
5MSJ_1,7LBO_1 159 3
6NBL_1,7LBO_1 197 3

Newick tree

 
[
	7LBO_1:93.57,
	[
		5MSJ_1:76,6NBL_1:76
	]:17.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{664 }{\log_{20} 664}-\frac{249}{\log_{20}249})=116.\)
Status Protein1 Protein2 d d1/2
Query variables 5MSJ_1 6NBL_1 147 117.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]