CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XFW_1 1GAK_1 1QHN_1 Letter Amino acid
19 6 1 N Asparagine
27 4 26 A Alanine
9 2 3 C Cysteine
26 2 11 I Isoleucine
6 9 6 M Methionine
22 4 5 T Threonine
11 1 5 W Tryptophan
21 17 11 R Arginine
31 6 15 E Glutamic acid
10 2 3 H Histidine
53 10 10 L Leucine
28 5 5 P Proline
5 8 2 Y Tyrosine
36 13 20 V Valine
21 8 11 D Aspartic acid
32 10 22 G Glycine
24 15 4 K Lycine
23 9 5 F Phenylalanine
40 5 9 S Serine
15 5 4 Q Glutamine

8XFW_1|Chains A, B|UDP-glycosyltransferase 13|Mangifera indica (29780)
>1GAK_1|Chain A|FERTILIZATION PROTEIN|Haliotis fulgens (6456)
>1QHN_1|Chain A|CHLORAMPHENICOL PHOSPHOTRANSFERASE|Streptomyces venezuelae (54571)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XFW , Knot 186 459 0.82 40 231 432
ALNSCPHVALLLSSGMGHLTPCLRFAATLVQHHCRVTIITNYPTVSVAESRAISLLLSDFPQITEKQFHLLPFDPSTANTTDPFFLRWEAIRRSAHLLNPLLSSISPPLSALVIDDSLVSSFVPVAANLDLPSYVLFTSSTRACSLEETFPAFVASKTNFDSIQLDDVIEIPGFSPVPVSSVPPTFLNLNHLFTTMLIQNGQSFRKANGILINTFEALEGGILPGINDKRAADGLPPYCSVGPLLPCKFEKTECSAPVKWLDDQPEGSVVYVSFGSRFALSSEQIKELGDGLIRSGCRFLWVVKCKKVDQEDEESLDELLGRDVLEKIKKYGFVIKNWVNQQEILDHRAVGGFVTHGGWNSSMEAVWHGVPMLVWPQFGDQKINAEVIERSGLGMWVKRWGWGTQQLVKGEEIGERIKDLMGNNPLRVRAKTLREEARKAIEVGGSSEKTLKELIENWK
1GAK , Knot 68 141 0.79 40 110 138
FDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNRERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRKYSSKDPGTFPCKNEKRRG
1QHN , Knot 79 178 0.76 40 124 171
MTTRMIILNGGSSAGKSGIVRCLQSVLPEPWLAFGVDSLIEAMPLKMQSAEGGIEFDADGGVSIGPEFRALEGAWAEGVVAMARAGARIIIDDVFLGGAAAQERWRSFVGDLDVLWVGVRCDGAVAEGRETARGDRVAGMAAKQAYVVHEGVEYDVEVDTTHKESIECAWAIAAHVVP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XFW_1)}(2) \setminus P_{f(1GAK_1)}(2)|=155\), \(|P_{f(1GAK_1)}(2) \setminus P_{f(8XFW_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001011111001110101010111011000001011000101011000110111001101000010111101001000011110101100010110111001011101111000110011111101011001110000010010001111110000100101001101111011110011101101001100111001001001011110010110111111100001101111000111111001000000111011000101011010110011100001001101110010011111000010000000100111001100100011110011000011000111111001110001011101111111101100010101100011111100111100011010011001001110011010100100010011011100000100110010
Pair \(Z_2\) Length of longest common subsequence
8XFW_1,1GAK_1 189 3
8XFW_1,1QHN_1 185 4
1GAK_1,1QHN_1 160 3

Newick tree

 
[
	8XFW_1:97.59,
	[
		1QHN_1:80,1GAK_1:80
	]:17.59
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{600 }{\log_{20} 600}-\frac{141}{\log_{20}141})=133.\)
Status Protein1 Protein2 d d1/2
Query variables 8XFW_1 1GAK_1 166 108.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]