CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7CHN_1 4LLD_1 7RDG_1 Letter Amino acid
12 9 14 G Glycine
9 10 2 T Threonine
18 2 6 Q Glutamine
15 4 5 N Asparagine
31 1 5 I Isoleucine
5 3 8 F Phenylalanine
15 3 1 Y Tyrosine
15 12 15 V Valine
7 0 13 R Arginine
15 2 9 E Glutamic acid
8 8 6 H Histidine
23 8 12 L Leucine
23 20 10 S Serine
4 1 1 W Tryptophan
9 6 8 A Alanine
5 3 1 C Cysteine
28 8 3 K Lycine
8 0 0 M Methionine
13 9 10 P Proline
19 2 6 D Aspartic acid

7CHN_1|Chain A|Dual specificity protein kinase TTK|Homo sapiens (9606)
>4LLD_1|Chain A|Ig gamma-1 chain C region|Homo sapiens (9606)
>7RDG_1|Chains A, B, C, D|Galectin-7|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7CHN , Knot 129 282 0.86 40 183 272
NECISVKGRIYSILKQIGSGGSSKVFQVLNEKKQIYAIKYVNLEEADNQTLDSYRNEIAYLNKLQQHSDKIIRLYDYEITDQYIYMVMECGNIDLNSWLKKKKSIDPWERKSYWKNMLEAVHTIHQHGIVHSDLKPANFLIVDGMLKLIDFGIANQMQPDTTSVVKDSQVGTVNYMPPEAIKDMSSSRENGKSKSKISPKSDVWSLGCILYYMTYGKTPFQQIINQISKLHAIIDPNHEIEFPDIPEKDLQDVLKCCLKRDPKQRISIPELLAHPYVQIQTL
4LLD , Knot 54 111 0.76 36 81 103
ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCGSHHHHHH
7RDG , Knot 69 135 0.83 38 106 131
SNVPHKSSLPEGIRPGTVLRIRGLVPPNASRFHVNLLCGEEQGSDAALHFNPRLDTSEVVFNSKEQGSWGREERGPGVPFQRGQPFEVLIIASDDGFKAVVGAAQYHHFRHRLPLARVRLVEVGGDVQLDSVRIF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7CHN_1)}(2) \setminus P_{f(4LLD_1)}(2)|=138\), \(|P_{f(4LLD_1)}(2) \setminus P_{f(7CHN_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000101010100110011011000110110000010110010100100001000000110100100000011010000100001011100101010011000001011000001001101100100011100010110111101110110111100101000011000011010011101100100000010000010100011011011001001001100110010010111010001011011000100110001000100010110111010101001
Pair \(Z_2\) Length of longest common subsequence
7CHN_1,4LLD_1 174 3
7CHN_1,7RDG_1 189 3
4LLD_1,7RDG_1 117 3

Newick tree

 
[
	7CHN_1:99.29,
	[
		4LLD_1:58.5,7RDG_1:58.5
	]:40.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{393 }{\log_{20} 393}-\frac{111}{\log_{20}111})=86.0\)
Status Protein1 Protein2 d d1/2
Query variables 7CHN_1 4LLD_1 112 75
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]