CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2ATV_1 5MBB_1 1LQB_1 Letter Amino acid
13 14 7 R Arginine
21 23 9 E Glutamic acid
11 34 5 I Isoleucine
5 9 3 M Methionine
11 18 8 S Serine
6 14 1 Y Tyrosine
4 5 2 C Cysteine
6 20 7 Q Glutamine
9 10 1 H Histidine
10 24 7 K Lycine
3 8 9 P Proline
13 24 9 T Threonine
12 19 9 A Alanine
15 23 6 V Valine
5 19 1 N Asparagine
11 14 11 D Aspartic acid
14 20 6 G Glycine
17 42 10 L Leucine
8 11 7 F Phenylalanine
2 2 0 W Tryptophan

2ATV_1|Chain A|RAS-like estrogen-regulated growth inhibitor|Homo sapiens (9606)
>5MBB_1|Chains A, B|Beta subunit of photoactivated adenylyl cyclase|Beggiatoa sp. PS (422289)
>1LQB_1|Chain A|Elongin B|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2ATV , Knot 90 196 0.80 40 135 185
MHHHHHHSSGVDLGTENLYFQSMAKSAEVKLAIFGRAGVGKSALVVRFLTKRFIWEYDPTLESTYRHQATIDDEVVSMEILDTAGQEDTIQREGHMRWGEGFVLVYDITDRGSFEEVLPLKNILDEIKKPKNVTLILVGNKADLDHSRQVSTEEGEKLATELACAFYECSACTGEGNITEIFYELCREVRRRRMVQ
5MBB , Knot 154 353 0.85 40 206 341
GSHMMKRLVYISKISGHLSLEEIQRIGKVSIKNNQRDNITGVLLYLQGLFFQILEGENEKVDKLYKKILVDDRHTNILCLKTEYDITDRMFPNWAMKTINLNENSELMIQPIKSLLQTITQSHRVLEKYMPARVIYLINQGINPLTVEPQLVEKIIFFSDILAFSTLTEKLPVNEVVILVNRYFSICTRIISAYGGEVTKFIGDCVMASFTKEQGDAAIRTSLDIISELKQLRHHVEATNPLHLLYTGIGLSYGHVIEGNMGSSLKMDHTLLGDAVNVAARLEALTRQLPYALAFTAGVKKCCQAQWTFINLGAHQVKGKQEAIEVYTVNEAQKYYDTLQITQLIRQTLENDK
1LQB , Knot 61 118 0.82 38 98 115
MDVFLMIRRHKTTIFTDAKESSTVFELKRIVEGILKRPPDEQRLYKDDQLLDDGKTLGECGFTSQTARPQAPATVGLAFRADDTFEALCIEPFSSPPELPDVMKPQDSGSSANEQAVQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2ATV_1)}(2) \setminus P_{f(5MBB_1)}(2)|=50\), \(|P_{f(5MBB_1)}(2) \setminus P_{f(2ATV_1)}(2)|=121\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000001101100010100110010101111101111001111011000111000101000000010100011010110011000010001010110111110010001010011110011001001001011111001010000010000100110011011000010010101001100100010000110
Pair \(Z_2\) Length of longest common subsequence
2ATV_1,5MBB_1 171 6
2ATV_1,1LQB_1 145 2
5MBB_1,1LQB_1 178 3

Newick tree

 
[
	5MBB_1:91.66,
	[
		2ATV_1:72.5,1LQB_1:72.5
	]:19.16
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{549 }{\log_{20} 549}-\frac{196}{\log_{20}196})=101.\)
Status Protein1 Protein2 d d1/2
Query variables 2ATV_1 5MBB_1 129 98
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]