CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9JDA_1 5XOJ_1 2QFC_1 Letter Amino acid
19 8 11 D Aspartic acid
13 3 21 Q Glutamine
6 4 12 H Histidine
43 12 9 F Phenylalanine
18 2 5 M Methionine
34 8 24 Y Tyrosine
19 8 11 R Arginine
26 10 32 E Glutamic acid
53 14 9 G Glycine
27 18 29 K Lycine
20 3 2 W Tryptophan
34 12 19 A Alanine
19 3 5 C Cysteine
42 6 13 S Serine
32 12 2 T Threonine
49 16 14 V Valine
17 11 13 N Asparagine
44 11 27 I Isoleucine
70 13 32 L Leucine
35 8 3 P Proline

9JDA_1|Chain A|Sodium- and chloride-dependent taurine transporter|Homo sapiens (9606)
>5XOJ_1|Chain A|GTP-binding nuclear protein|Saccharomyces cerevisiae (strain AWRI796) (764097)
>2QFC_1|Chains A, B|PlcR protein|Bacillus thuringiensis serovar israelensis ATCC 35646 (339854)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9JDA , Knot 245 620 0.84 40 280 577
MATKEKLQCLKDFHKDILKPSPGKSPGTRPEDEAEGKPPQREKWSSKIDFVLSVAGGFVGLGNVWRFPYLCYKNGGGAFLIPYFIFLFGSGLPVFFLEIIIGQYTSEGGITCWEKICPLFSGIGYASVVIVSLLNVYYIVILAWATYYLFQSFQKELPWAHCNHSWNTPHCMEDTMRKNKSVWITISSTNFTSPVIEFWERNVLSLSPGIDHPGSLKWDLALCLLLVWLVCFFCIWKGVRSTGKVVYFTATFPFAMLLVLLVRGLTLPGAGAGIKFYLYPDITRLEDPQVWIDAGTQIFFSYAICLGAMTSLGSYNKYKYNSYRDCMLLGCLNSGTSFVSGFAIFSILGFMAQEQGVDIADVAESGPGLAFIAYPKAVTMMPLPTFWSILFFIMLLLLGLDSQFVEVEGQITSLVDLYPSFLRKGYRREIFIAFVCSISYLLGLTMVTEGGMYVFQLFDYYAASGVCLLWVAFFECFVIAWIYGGDNLYDGIEDMIGYRPGPWMKYSWAVITPVLCVGCFIFSLVKYVPLTYNKTYVYPNWAIGLGWSLALSSMLCVPLVIVIRLCQTEGPFLVRVKYLLTPREPNRWAVEREGATPYNSRTVMNGALVKPTHIIVETMM
5XOJ , Knot 90 182 0.85 40 136 178
MSAPAANGEVPTFKLVLVGDGGTGKTTFVKRHLTGEFEKKYIATIGVEVHPLSFYTNFGEIKFDVWDTAGLEKFGGLRDGYYINAQCAIIMFDVTSRITYKNVPNWHRDLVRVCENIPIVLCGNKVDVKERKVKAKTITFHRKKNLQYYDISAKSNYNFEKPFLWLARKLAGNPQLEFVASP
2QFC , Knot 127 293 0.82 40 170 275
MQAEKLGSEIKKIRVLRGLTQKQLSENICHQSEVSRIESGAVYPSMDILQGIAAKLQIPIIHFYEVLIYSDIERKKQFKDQVIMLCKQKRYKEIYNKVWNELKKEEYHPEFQQFLQWQYYVAAYVLKKVDYEYCILELKKLLNQQLTGIDVYQNLYIENAIANIYAENGYLKKGIDLFEQILKQLEALHDNEEFDVKVRYNHAKALYLDSRYEESLYQVNKAIEISCRINSMALIGQLYYQRGECLRKLEYEEAEIEDAYKKASFFFDILEMHAYKEALVNKISRLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9JDA_1)}(2) \setminus P_{f(5XOJ_1)}(2)|=174\), \(|P_{f(5XOJ_1)}(2) \setminus P_{f(9JDA_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000010010010001101011001100100010101100001000101110111111111011011010000111111110111111011111110111100000111001001011101110101111011010011111110001100100011110000010010010001000001110100001001110110001101011100110101011101111111011011011000101101010111111111110110111111110101010100100101110110011100110111100110000000000000111101001001101111101111110001101101100111111110101101111101101111111111110001101010100110101011001000011111100100111101100111011011000110110111111100111111011001001100111001111100011110111011011101100111000000101011111110111001101111111010000111110100110100100111000110100000110111101001110011
Pair \(Z_2\) Length of longest common subsequence
9JDA_1,5XOJ_1 204 4
9JDA_1,2QFC_1 202 4
5XOJ_1,2QFC_1 170 3

Newick tree

 
[
	9JDA_1:10.43,
	[
		2QFC_1:85,5XOJ_1:85
	]:21.43
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{802 }{\log_{20} 802}-\frac{182}{\log_{20}182})=173.\)
Status Protein1 Protein2 d d1/2
Query variables 9JDA_1 5XOJ_1 218 140
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]