CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8TDH_1 7SOZ_1 2VWQ_1 Letter Amino acid
16 0 18 F Phenylalanine
16 0 18 S Serine
34 0 35 V Valine
12 0 6 Q Glutamine
21 0 35 E Glutamic acid
39 4 30 G Glycine
21 0 16 I Isoleucine
23 0 17 R Arginine
6 7 1 C Cysteine
17 0 10 Y Tyrosine
37 6 30 A Alanine
32 0 27 L Leucine
15 0 7 M Methionine
13 0 3 W Tryptophan
30 0 24 P Proline
23 5 20 T Threonine
19 0 7 N Asparagine
35 0 29 D Aspartic acid
11 0 12 H Histidine
31 0 12 K Lycine

8TDH_1|Chains A, B, C, D|Predicted dehydrogenases and related proteins|Alistipes (239759)
>7SOZ_1|Chain A|DNA (5'-D(*CP*CP*TP*GP*TP*GP*AP*CP*AP*AP*AP*TP*TP*GP*CP*CP*CP*TP*CP*AP*GP*A)-3')|Escherichia coli (562)
>2VWQ_1|Chain A|GLUCOSE DEHYDROGENASE|HALOFERAX MEDITERRANEI (2252)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8TDH , Knot 188 451 0.85 40 245 431
DKLNILGVGIGGRGSSVLRGLESQNIIGLCDVDWKYADHVFKRYPAAKKYNDYRKMFDEMLKSADAVMVATADHTHAIIAADAMTAGKHVYVEKPLTHTVYESRLLTKLADKYKVATQMGNQGASDEGVRKVCEWIWNGEIGEVRKVETFTDRPIWPQGLSRPEDDQRIPKTLNWDAFIGPAPYRPYNAIYTPWNFRGWWDFGTGALGDMACHILHPVFKGLKLGYPTKVQGSSTLLLNESAPMAQTVKFVFPARDNMPKVAMPEVEVYWYDGGLKPARPEGLPAGKDLNMAGGGVIFYGTKDTLICGCYGVNPYLVSGRVPNAPKVLREIKESHQMDWVRACKEDADDRVPSASDFSEAGPFNEMVVMGVLAVRLQNLNRELLWDGPNMRFTNIPDDATISAVIKDGFHIKDGHPTFDKTWTDPVNAQQFAQELIKHTYRDGWKLPDMPR
7SOZ , Knot 10 22 0.46 8 13 19
CCTGTGACAAATTGCCCTCAGA
2VWQ , Knot 153 357 0.84 40 206 338
MKAIAVKRGEDRPVVIEKPRPEPESGEALVRTLRVGVDGTDHEVIAGGHGGFPEGEDHLVLGHEAVGVVVDPNDTELEEGDIVVPTVRRPPASGTNEYFERDQPDMAPDGMYFERGIVGAHGYMSEFFTSPEKYLVRIPRSQAELGFLIEPISITEKALEHAYASRSAFDWDPSSAFVLGNGSLGLLTLAMLKVDDKGYENLYCLGRRDRPDPTIDIIEELDATYVDSRQTPVEDVPDVYEQMDFIYEATGFPKHAIQSVQALAPNGVGALLGVPSDWAFEVDAGAFHREMVLHNKALVGSVNSHVEHFEAATVTFTKLPKWFLEDLVTGVHPLSEFEAAFDDDDTTIKTAIEFSTV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8TDH_1)}(2) \setminus P_{f(7SOZ_1)}(2)|=237\), \(|P_{f(7SOZ_1)}(2) \setminus P_{f(8TDH_1)}(2)|=5\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010111111110100110110000111100101001001100011100000000110011001011111010000111110110110010100110001000011001100001100110011000110010011101011010010010001111011001000001100101011111110010011001101011101101111011001101110110110100101000111000111100101111100011011110101010011101101011111001011111111010000110100110101101011011011001000001011010000100011010010011110011111111101001000111011010100110010101110011010010101000100110100110011000000110110110
Pair \(Z_2\) Length of longest common subsequence
8TDH_1,7SOZ_1 242 4
8TDH_1,2VWQ_1 157 3
7SOZ_1,2VWQ_1 205 3

Newick tree

 
[
	7SOZ_1:12.28,
	[
		8TDH_1:78.5,2VWQ_1:78.5
	]:42.78
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{473 }{\log_{20} 473}-\frac{22}{\log_{20}22})=141.\)
Status Protein1 Protein2 d d1/2
Query variables 8TDH_1 7SOZ_1 183 95.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]