CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6SPH_1 6MLB_1 7MMS_1 Letter Amino acid
3 6 17 Q Glutamine
8 1 5 H Histidine
9 4 21 I Isoleucine
9 9 35 L Leucine
0 2 6 M Methionine
5 0 13 P Proline
9 4 24 A Alanine
4 3 1 C Cysteine
8 12 18 T Threonine
11 13 24 D Aspartic acid
11 14 18 K Lycine
4 7 12 R Arginine
7 7 18 N Asparagine
10 11 25 E Glutamic acid
1 4 7 W Tryptophan
10 3 25 S Serine
0 4 18 Y Tyrosine
15 12 17 V Valine
25 9 17 G Glycine
4 8 21 F Phenylalanine

6SPH_1|Chains A, B[auth C], C[auth E], D[auth G], E[auth J], F[auth L]|Superoxide dismutase [Cu-Zn]|Homo sapiens (9606)
>6MLB_1|Chains A, B, C, D|Retinol-binding protein 2|Homo sapiens (9606)
>7MMS_1|Chains A, B, C, D|Ribonucleoside-diphosphate reductase|Aerococcus urinae (strain ACS-120-V-Col10a) (2976812)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6SPH , Knot 73 153 0.80 36 112 148
ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
6MLB , Knot 65 133 0.79 38 103 128
TRDQNGTWEMESNENFEGYMKALDIDFATRKIAVRLTQTLVIDQDGDNFKFKTTSTFRNYDVDFTVGVEFDEYTKSLDNRHVKALVTWEGDVLVCVQKGEKENRGWKKWIEGDKLYLELTCGDQVCRQVFKKK
7MMS , Knot 153 342 0.87 40 214 331
SNILEMTKNYYDRSVSPVEYAYFDQSQNMRAINWNKIVDEKDLEVWNRVTQNFWLPENIPVSNDLPSWNELDDDWQQLITRTFTGLTLLDTVQSSIGDVAQIKNSLTEQEQVIYANFAFMVGVHARSYGTIFSTLCTSEQIEEAHEWVVDNEALQARPKALIPFYTADDPLKSKIAAALMPGFLLYGGFYLPFYLSARGKLPNTSDIIRLILRDKVIHNFYSGYKYQLKVAKLSPEKQAEMKQFVFDLLDKMIGLEKTYLHQLYDGFGLADEAIRFSLYNAGKFLQNLGYESPFTKEETRIAPEVFAQLSARADENHDFFSGSGSSYIIGTSEETLDEDWDF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6SPH_1)}(2) \setminus P_{f(6MLB_1)}(2)|=68\), \(|P_{f(6MLB_1)}(2) \setminus P_{f(6SPH_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110110101110111010000001110111010110011011010011000110001110101100001110000001101101010001110101000110101000111001110001001101100000001011001101111110
Pair \(Z_2\) Length of longest common subsequence
6SPH_1,6MLB_1 127 3
6SPH_1,7MMS_1 180 3
6MLB_1,7MMS_1 187 3

Newick tree

 
[
	7MMS_1:99.41,
	[
		6SPH_1:63.5,6MLB_1:63.5
	]:35.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{286 }{\log_{20} 286}-\frac{133}{\log_{20}133})=47.6\)
Status Protein1 Protein2 d d1/2
Query variables 6SPH_1 6MLB_1 59 55
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]