CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1QDE_1 2BQO_1 3DSO_1 Letter Amino acid
16 3 6 E Glutamic acid
9 2 1 P Proline
4 6 1 Y Tyrosine
21 8 3 L Leucine
17 5 3 I Isoleucine
9 2 7 M Methionine
9 6 4 S Serine
18 8 6 V Valine
17 17 1 A Alanine
5 10 4 N Asparagine
15 8 6 D Aspartic acid
1 6 0 C Cysteine
15 6 1 Q Glutamine
13 11 10 G Glycine
3 1 2 H Histidine
12 5 10 K Lycine
12 14 3 R Arginine
16 5 3 T Threonine
0 5 0 W Tryptophan
12 2 3 F Phenylalanine

1QDE_1|Chain A|TRANSLATION INITIATION FACTOR 4A|Saccharomyces cerevisiae (4932)
>2BQO_1|Chain A|LYSOZYME|Homo sapiens (9606)
>3DSO_1|Chain A|Putative uncharacterized protein copK|Ralstonia metallidurans (266264)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1QDE , Knot 101 224 0.81 38 158 217
EESQIQTNYDKVVYKFDDMELDENLLRGVFGYGFEEPSAIQQRAIMPIIEGHDVLAQAQSGTGKTGTFSIAALQRIDTSVKAPQALMLAPTRELALQIQKVVMALAFHMDIKVHACIGGTSFVEDAEGLRDAQIVVGTPGRVFDNIQRRRFRTDKIKMFILDEADEMLSSGFKEQIYQIFTLLPPTTQVVLLSATMPNDVLEVTTKFMRNPVRILVKKDELTLE
2BQO , Knot 67 130 0.83 40 106 128
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNAAHLSCSALLQDNIADAVAAAKRAVRDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
3DSO , Knot 41 74 0.79 36 65 72
VDMSNVVKTYDLQDGSKVHVFKDGKMGMENKFGKSMNMPEGKVMETRDGTKIIMKGNEIFRLDEALRKGHSEGG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1QDE_1)}(2) \setminus P_{f(2BQO_1)}(2)|=115\), \(|P_{f(2BQO_1)}(2) \setminus P_{f(1QDE_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00001000000110010010100011011110110010110001111110100111010010100101011110010001011011111100011101001111111010101010111001100101100101111011011001000010000101111001001100110001001101111000111101011001101000110011011100001010
Pair \(Z_2\) Length of longest common subsequence
1QDE_1,2BQO_1 178 3
1QDE_1,3DSO_1 151 4
2BQO_1,3DSO_1 145 3

Newick tree

 
[
	1QDE_1:85.60,
	[
		3DSO_1:72.5,2BQO_1:72.5
	]:13.10
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{354 }{\log_{20} 354}-\frac{130}{\log_{20}130})=68.4\)
Status Protein1 Protein2 d d1/2
Query variables 1QDE_1 2BQO_1 88 70
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]