CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9IXQ_1 3PUZ_1 6IBM_1 Letter Amino acid
12 15 15 F Phenylalanine
7 8 16 W Tryptophan
6 15 15 Y Tyrosine
23 27 18 E Glutamic acid
21 37 17 K Lycine
8 21 19 P Proline
6 24 29 D Aspartic acid
10 9 22 Q Glutamine
3 3 7 H Histidine
18 30 41 L Leucine
5 6 15 M Methionine
23 11 23 S Serine
18 44 28 A Alanine
8 6 19 R Arginine
18 28 31 G Glycine
16 23 21 I Isoleucine
11 20 14 T Threonine
20 20 16 V Valine
13 21 20 N Asparagine
2 2 12 C Cysteine

9IXQ_1|Chains A, B|Beta-lactamase|Pseudomonas aeruginosa (287)
>3PUZ_1|Chain A[auth E]|Maltose transporter subunit; periplasmic-binding component of ABC superfamily|Escherichia coli (83333)
>6IBM_1|Chains A, B|Alpha-galactosidase A|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9IXQ , Knot 116 248 0.86 40 168 240
GHMSITENTSWNKEFSAEAVNGVFVLCKSSSKSCATNDLARASKEYLPASTFKIPSAIIGLETGVIKNEHQVFKWDGKPRAMKQWERDLTLRGAIQVSAVPVFQQIAREVGEVRMQKYLKKFSYGNQNISGGIDKFWLEGQLRISAVNQVEFLESLYLNKLSASKENQLIVKEALVTEAAPEYLVHSKTGFSGVGTESNPGVAWWVGWVEKETEVYFFAFNMDIDNESKLPLRKSIPTKIMESEGIIG
3PUZ , Knot 151 370 0.80 40 193 344
KIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGCYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMCAFWYAVRTAVINAASGRQTVDEALKDAQTRITK
6IBM , Knot 170 398 0.85 40 232 378
LDNGLARTPTMGWLHWERFMCNLDCQEEPDSCISEKLFMEMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQLANYVHSKGLKLGIYADVGNKTCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRSIVYSCEWPLYMWPFQKPNYTEIRQYCNHWRNFADIDDSWKSIKSILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQVTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWAVAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENTMQMSLKDLL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9IXQ_1)}(2) \setminus P_{f(3PUZ_1)}(2)|=67\), \(|P_{f(3PUZ_1)}(2) \setminus P_{f(9IXQ_1)}(2)|=92\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10101000001000101011011111000000001000110100001110010110111110011100000110101010110010001010111010111110011001101010001001001000101110011101010101100101100101001010000011100111001110011000011011100001111111111000001011110101000001110001100110001111
Pair \(Z_2\) Length of longest common subsequence
9IXQ_1,3PUZ_1 159 4
9IXQ_1,6IBM_1 184 4
3PUZ_1,6IBM_1 173 4

Newick tree

 
[
	6IBM_1:92.32,
	[
		9IXQ_1:79.5,3PUZ_1:79.5
	]:12.82
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{618 }{\log_{20} 618}-\frac{248}{\log_{20}248})=104.\)
Status Protein1 Protein2 d d1/2
Query variables 9IXQ_1 3PUZ_1 127 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]