CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8ZYW_1 1CYX_1 4YAE_1 Letter Amino acid
26 7 19 L Leucine
16 13 13 M Methionine
2 6 4 Y Tyrosine
20 3 21 R Arginine
31 15 19 E Glutamic acid
21 12 28 G Glycine
23 15 12 S Serine
2 3 3 W Tryptophan
19 10 15 V Valine
3 2 2 C Cysteine
25 7 11 Q Glutamine
13 12 11 F Phenylalanine
10 8 12 N Asparagine
9 10 6 H Histidine
15 12 6 K Lycine
19 14 14 P Proline
13 10 17 T Threonine
17 24 37 A Alanine
20 10 18 D Aspartic acid
17 12 21 I Isoleucine

8ZYW_1|Chains A, B|PomB|Vibrio alginolyticus (663)
>1CYX_1|Chain A|CYOA|Escherichia coli (562)
>4YAE_1|Chains A, B|C alpha-dehydrogenase|Sphingobium sp. SYK-6 (627192)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8ZYW , Knot 138 321 0.82 40 193 302
MDDEDNKCDCPPPGLPLWMGTFADLMSLLMCFFVLLLSFSEMDVLKFKQIAGSMKFAFGVQNQLEVKDIPKGTSIIAQEFRPGRPEPTPIDVIMQQTMDITQQTLEFHEGESDRAGGTKRDEGKLTGGQSPATSTQNNESAEADMQQQQSKEMSQEMETLMESIKKALEREIEQGAIEVENLGQQIVIRMREKGAFPEGSAFLQPKFRPLVRQIAELVKDVPGIVRVSGHTDNRPLDSELYRSNWDLSSQRAVSVAQEMEKVRGFSHQRLRVRGMADTEPLLPNDSDDNRALNRRVEISIMQGEPLYSEEVPVIQHHHHHH
1CYX , Knot 99 205 0.85 40 148 196
THALEPSKPLAHDEKPITIEVVSMDWKWFFIYPEQGIATVNEIAFPANTPVYFKVTSNSVMHSFFIPRLGSQIYAMAGMQTRLHLIANEPGTYDGICAEICGPGHSGMKFKAIATPDRAAFDQWVAKAKQSPNTMSDMAAFEKLAAPSEYNQVEYFSNVKPDLFADVINKFMAHGKSMDMTQPEGEHSAHEGMEGMDMSHAESAH
4YAE , Knot 128 289 0.83 40 178 276
MDIAGTTAFITGGASGIGFGIAQRLLANGARLVLADIRQDHLDEARQFFEERQQGRNVHTIRLDVSDRAQMAEAARECEAVMGGPDILINNAGIDPSGPFKDATYQDWDYGLAINLMGPINGIMAFTPGMRARGRGGHIVNTASLAGLTPMPSFMAIYATAKAAVITLTETIRDSMAEDNIGVTVLMPGPIKSRIHESGQNRPERFRAGSGLAETEQQLAKRVVADNWMEPTEVGDMIVDAIVHNKLYVSTHGNWRETCEARFQALLDSMPEARPFDFGASLAVPKEEA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8ZYW_1)}(2) \setminus P_{f(1CYX_1)}(2)|=106\), \(|P_{f(1CYX_1)}(2) \setminus P_{f(8ZYW_1)}(2)|=61\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000011111111110110110111011111101001011010011101011111000101001101001110010110101011011100010100001010010000111000001010110011000000001010100000001000100110010011000100111010011001110100011110101110101011100110110011111010100000110001000010100001101100100101100001010111000111100000001100010101101011000011110000000
Pair \(Z_2\) Length of longest common subsequence
8ZYW_1,1CYX_1 167 3
8ZYW_1,4YAE_1 159 4
1CYX_1,4YAE_1 178 3

Newick tree

 
[
	1CYX_1:88.44,
	[
		8ZYW_1:79.5,4YAE_1:79.5
	]:8.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{526 }{\log_{20} 526}-\frac{205}{\log_{20}205})=92.5\)
Status Protein1 Protein2 d d1/2
Query variables 8ZYW_1 1CYX_1 115 93.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]