CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5FJX_1 6CLM_1 5HZL_1 Letter Amino acid
19 0 26 A Alanine
7 0 1 R Arginine
21 0 29 I Isoleucine
24 0 18 D Aspartic acid
24 0 32 L Leucine
13 0 23 P Proline
30 1 17 S Serine
1 0 3 W Tryptophan
5 0 12 Y Tyrosine
19 0 12 V Valine
1 0 0 C Cysteine
22 0 23 E Glutamic acid
10 1 19 G Glycine
5 0 7 M Methionine
11 1 13 F Phenylalanine
16 0 22 T Threonine
11 3 27 N Asparagine
11 1 9 Q Glutamine
5 0 4 H Histidine
15 0 6 K Lycine

5FJX_1|Chains A, B, C|COATOMER SUBUNIT DELTA|SACCHAROMYCES CEREVISIAE (4932)
>6CLM_1|Chain A|GSNQNNF|synthetic construct (32630)
>5HZL_1|Chain A[auth B]|Lmo2445 protein|Listeria monocytogenes EGD-e (169963)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5FJX , Knot 117 270 0.80 40 159 249
GPLGSEEDVPENNGILISIKEVINAEFSRDGTIHSSELKGVLELRINDHDLSHSNLKLADSIDVRDKSFQFKTHPNIDKQSFLSTKLISLRDKSKAFPANDQSLGVLRWRKVAPAEDDSLIPLTLTTAVSPSESQQGFDVIIEYESVLETELADVIFTIPVFPQEPVDINTESSTCSDAEVVNMDQEMGTSIKISKIAANDAGALAFTIEAPYEDALYPMTVSFQESTRDKLAKSFTGMAIQSVVMANDHDQELPYDVITSLKSDEYLVQ
6CLM , Knot 6 7 0.55 10 6 5
GSNQNNF
5HZL , Knot 125 303 0.78 38 168 283
MLFAPTIKAQADTVPLPAPIIEAFPVEAIAEAIAGELDKDSVNDTITQADLDTMTAIPLPSLGLTGEDLSVLNNEVFTNAIELAIWSNNIGELPDLSEALPALENIEANGANITVFPDANYPNLTNVDLSQNNFGFNIPKFVGMEGLVSINMENAGLSGYIAEDIWMNMPNLDSLILNENHLISIPEDIFLSQQLGTHSFANQTATYPPTTIKQGENLKVFVPFIYQALDFIAPSNHDLIIIKDNGRTLYEPPYPTYDGSYMYTIETAGLQPGEHLLEISLGYNSGEYTGWYDFPVTITESNA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5FJX_1)}(2) \setminus P_{f(6CLM_1)}(2)|=156\), \(|P_{f(6CLM_1)}(2) \setminus P_{f(5FJX_1)}(2)|=3\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111100001100011110100110101000101000010111010100001000010110010100001010001010000110001101000001111000011110100111100001111010011010000011011100001100011011101111100110100000000010110100011001010011100111111010110001101101010000000110010111100111100000011001100100000110
Pair \(Z_2\) Length of longest common subsequence
5FJX_1,6CLM_1 159 2
5FJX_1,5HZL_1 133 4
6CLM_1,5HZL_1 162 4

Newick tree

 
[
	6CLM_1:84.34,
	[
		5FJX_1:66.5,5HZL_1:66.5
	]:17.84
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{277 }{\log_{20} 277}-\frac{7}{\log_{20}7})=93.0\)
Status Protein1 Protein2 d d1/2
Query variables 5FJX_1 6CLM_1 113 58
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]