CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3EUH_1 6PBF_1 5HWT_1 Letter Amino acid
13 23 10 K Lycine
10 31 2 P Proline
23 40 6 S Serine
14 24 2 Y Tyrosine
35 43 6 R Arginine
39 31 4 D Aspartic acid
50 99 15 L Leucine
7 13 3 W Tryptophan
28 46 10 V Valine
41 54 12 A Alanine
14 26 5 N Asparagine
9 20 1 H Histidine
20 40 5 T Threonine
29 37 4 Q Glutamine
9 24 2 M Methionine
17 35 4 F Phenylalanine
26 39 8 I Isoleucine
1 15 2 C Cysteine
36 45 16 E Glutamic acid
19 45 13 G Glycine

3EUH_1|Chains A, B|Chromosome partition protein mukF|Escherichia coli (83333)
>6PBF_1|Chains A, B, C, D|Transient receptor potential cation channel subfamily V member 5|Oryctolagus cuniculus (9986)
>5HWT_1|Chains A, B|Sensor histidine kinase TodS|Pseudomonas putida (strain F1 / ATCC 700007) (351746)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3EUH , Knot 178 440 0.82 40 216 406
MSEFSQTVPELVAWARKNDFSISLPVDRLSFLLAVATLNGERLDGEMSEGELVDAFRHVSDAFEQTSETIGVRANNAINDMVRQRLLNRFTSEQAEGNAIYRLTPLGIGITDYYIRQREFSTLRLSMQLSIVAGELKRAADAAEEGGDEFHWHRNVYAPLKYSVAEIFDSIDLTQRLMDEQQQQVKDDIAQLLNKDWRAAISSCELLLSETSGTLRELQDTLEAAGDKLQANLLRIQDATMTHDDLHFVDRLVFDLQSKLDRIISWGQQSIDLWIGYDRHVHKFIRTAIDMDKNRVFAQRLRQSVQTYFDEPWALTYANADRLLDMRDEEMALRDEEVTGELPEDLEYEEFNEIREQLAAIIEEQLAVYKTRQVPLDLGLVVREYLSQYPRARHFDVARIVIDQAVRLGVAQADFTGLPAKWQPINDYGAKVQAHVIDKY
6PBF , Knot 279 730 0.84 40 305 668
MGACPPKAKGPWAQLQKLLISWPVGEQDWEQYRDRVNMLQQERIRDSPLLQAAKENDLRLLKILLLNQSCDFQQRGAVGETALHVAALYDNLEAATLLMEAAPELAKEPALCEPFVGQTALHIAVMNQNLNLVRALLARGASVSARATGAAFRRSPHNLIYYGEHPLSFAACVGSEEIVRLLIEHGADIRAQDSLGNTVLHILILQPNKTFACQMYNLLLSYDEHSDHLQSLELVPNHQGLTPFKLAGVEGNTVMFQHLMQKRKHVQWTCGPLTSTLYDLTEIDSWGEELSFLELVVSSKKREARQILEQTPVKELVSFKWKKYGRPYFCVLASLYILYMICFTTCCIYRPLKLRDDNRTDPRDITILQQKLLQEAYVTHQDNIRLVGELVTVTGAVIILLLEIPDIFRVGASRYFGQTILGGPFHVIIITYASLVLLTMVMRLTNMNGEVVPLSFALVLGWCSVMYFARGFQMLGPFTIMIQKMIFGDLMRFCWLMAVVILGFASAFHITFQTEDPNNLGEFSDYPTALFSTFELFLTIIDGPANYSVDLPFMYCITYAAFAIIATLLMLNLFIAMMGDTHWRVAQERDELWRAQVVATTVMLERKMPRFLWPRSGICGYEYGLGDRWFLRVENHHDQNPLRVLRYVEAFKCSDKEDGQEQLSEKRPSTVESGMLSRASVAFQTPSLSRTTSQSSNSHRGWEILRRNTLGHLNLGLDLGEGDGEEVYHF
5HWT , Knot 63 130 0.78 40 100 127
GAMALYEFVGLLDAHGNVLEVNQVALEGGGITLEEIRGKPFWKARWWQISKKTEATQKRLVETASSGEFVRCDVEILGKSGGREVIAVDFSLLPICNEEGSIVYLLAEGRNITDKKKAEAMLALKNQELE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3EUH_1)}(2) \setminus P_{f(6PBF_1)}(2)|=36\), \(|P_{f(6PBF_1)}(2) \setminus P_{f(3EUH_1)}(2)|=125\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10010001101111100001010111001011111101010010101001011011001001100000011101001100110001100100001010110010111111000010000100101010101111010011011001100101000101110001101100101000110000001000110110001011100001110000101001000101110010101101001010000101100111010001001101100010111100001001100110100001110010001000100111100101001101000011100001010110010000100100011111000111000001110111110001000101001011011100110111101010111101011000110101011000
Pair \(Z_2\) Length of longest common subsequence
3EUH_1,6PBF_1 161 4
3EUH_1,5HWT_1 182 3
6PBF_1,5HWT_1 243 3

Newick tree

 
[
	5HWT_1:11.90,
	[
		3EUH_1:80.5,6PBF_1:80.5
	]:34.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1170 }{\log_{20} 1170}-\frac{440}{\log_{20}440})=190.\)
Status Protein1 Protein2 d d1/2
Query variables 3EUH_1 6PBF_1 242 191
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]