CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8ETO_1 3QZQ_1 3ZZN_1 Letter Amino acid
66 7 21 E Glutamic acid
54 11 16 S Serine
39 3 10 Y Tyrosine
49 17 39 V Valine
30 9 8 Q Glutamine
82 18 37 L Leucine
45 8 5 K Lycine
11 6 3 M Methionine
35 9 7 F Phenylalanine
22 1 3 W Tryptophan
42 8 50 A Alanine
50 10 11 D Aspartic acid
70 19 32 G Glycine
45 8 15 P Proline
40 10 24 R Arginine
6 2 0 C Cysteine
24 7 5 H Histidine
41 11 10 I Isoleucine
38 14 12 T Threonine
30 9 2 N Asparagine

8ETO_1|Chains A, B|Chaetomium alpha glucosidase|Thermochaetoides thermophila (759272)
>3QZQ_1|Chains A, B, C, D|3C protein|Human enterovirus 71 (39054)
>3ZZN_1|Chains A, B, C, D|LACTATE DEHYDROGENASE|THERMUS THERMOPHILUS (300852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8ETO , Knot 311 819 0.85 40 312 757
MGILPSPGMPALLSLVSLLSVLLMGCVAETGVEGESILHSEIGRLNNQSLLWGPYRPNIYFGTRPRIGKSLMTGLMWGKIESYTDFQHTVRYTCEQNEGMKGYGWDEYDPRRGGIQSIHDIQNGLDITTSFVKIPGGAHGGSWAARIKGTLNDDAPKDQKTIVVFYVSQEGENSELEAVPSENEFGYEGDVILKGRSEALGNYKLVVTKGKGVIPQSDHDLSRLRGPGQTVVQSLTYPDEVLWQAKPILFQQLKAGIDWLVENKYDVADPPPPWQVYLLANKPGSGNVHIVQKVFEGDFEFDILFSSESAGKEVTSKDLEREVKQATEVFGERFARVFDLKAPFQGDNYKKFGKSMFSNLIGGIGYFYGHSLVDRSYAPEYDEENEGFWEDAAEARARHQEALEGPYELFTSIPSRPFFPRGFLWDEGFHLLPIADWDIDLALEIIKSWYNLMDEDGWIAREQILGAEARSKVPKEFQTQYPHYANPPTLFLVLDNFVERLRKNNASQPVVKDNLSLDETLSTASVDNPEVGLEYLRRLYPLLRRQFDWFRKTQAGDIKSYDREAYSTKEAYRWRGRTVSHCLTSGLDDYPRPQPPHPGELHVDLMSWVGVMVKSLISIGSLLGATEDVEFYTKVLDAIEHNLDDLHWSEKEGCYCDATIDEFEEHKLVCHKGYISLFPFLTGLLKPDSPKLGKLLALIGDESELWSPYGLRSLSKKDEFYGTAENYWRSPVWININYLAIVQLYNIATQDGPYKETARDLYTRLRKNIVETVYRNWEETGFAWEQYNPETGKGQRTQHFTGWTSLVVKIMSGHHHHHH
3QZQ , Knot 91 187 0.84 40 144 185
GSHMGPSLDFALSLLRRNVRQVQTDQGHFTMLGVRDRLAVLPRHSQPGKTIWIEHKLVNVLDAVELVDEQGVNLDLTLITLDTNEKFRDITKFIPENISTASDATLVINTEHMPSMFVPVGDVVQYGFLNLSGKPTHRTMMYNFPTKAGQCGGVVTSVGKIIGIHIGGNGRQGFCAGLKRSYFASEQ
3ZZN , Knot 121 310 0.74 38 146 283
MKVGIVGSGMVGSATAYALALLGVAREVVLVDLDRKLAQAHAEDILHATPFAHPVWVWAGSYGDLEGARAVVLAAGVAQRPGETRLQLLDRNAQVFAQVVPRVLEAAPEAVLLVATNPVDVMTQVAYALSGLPPGRVVGSGTILDTARFRALLAEYLRVAPQSVHAYVLGEHGDSEVLVWSSAQVGGVPLLEFAEARGRALSPEDRARIDEGVRRAAYRIIEGKGATYYGIGAGLARLVRAILTDEKGVYTVSAFTPEVAGVLEVSLSLPRILGAGGVAGTVYPSLSPEERAALRRSAEILKEAAFALGF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8ETO_1)}(2) \setminus P_{f(3QZQ_1)}(2)|=186\), \(|P_{f(3QZQ_1)}(2) \setminus P_{f(8ETO_1)}(2)|=18\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111110111111101101101111101100110100110001101000011111001010110010110011011111010000010001000000001101011000010011100100100110100011011111011011101010100011000001111010001000010111000011001011101000111000111001011110000010010111001100100100111010111100101110111000001101111101011100110101011001101010101110000110010000100010010011100110110101110100000110011001111110101001100001100000001110011010100001101100110011001111011110011011111010101110110010011000111100011110100011001000010010110111110011001000010011100010100010010100101110010010111000101100001101000000100000100101001000100110001010110110101011011111100110110111100010100011011000100101000010000101001000011000101011111011101001011011111100001101011001000001010100010011110100111101001100011000010010001000110010001000111100001001010000010110011101101000000
Pair \(Z_2\) Length of longest common subsequence
8ETO_1,3QZQ_1 204 5
8ETO_1,3ZZN_1 182 4
3QZQ_1,3ZZN_1 162 3

Newick tree

 
[
	8ETO_1:10.33,
	[
		3ZZN_1:81,3QZQ_1:81
	]:20.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1006 }{\log_{20} 1006}-\frac{187}{\log_{20}187})=223.\)
Status Protein1 Protein2 d d1/2
Query variables 8ETO_1 3QZQ_1 282 171.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]