CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6OPC_1 2APJ_1 7JUT_1 Letter Amino acid
40 19 19 S Serine
18 5 10 Y Tyrosine
11 8 17 H Histidine
49 14 23 K Lycine
41 14 14 P Proline
39 7 14 T Threonine
23 5 11 M Methionine
28 4 17 F Phenylalanine
69 15 17 D Aspartic acid
4 6 4 C Cysteine
59 25 23 G Glycine
64 17 15 V Valine
81 19 19 A Alanine
51 12 21 R Arginine
73 18 25 E Glutamic acid
51 17 25 I Isoleucine
69 27 34 L Leucine
3 5 8 W Tryptophan
42 14 10 N Asparagine
20 9 16 Q Glutamine

6OPC_1|Chains A, B, C, D, E, F|Cell division control protein 48|Saccharomyces cerevisiae (4932)
>2APJ_1|Chains A, B, C, D|Putative Esterase|Arabidopsis thaliana (3702)
>7JUT_1|Chain A[auth B]|Kinase suppressor of Ras 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6OPC , Knot 305 835 0.82 40 274 729
MGEEHKPLLDASGVDPREEDKTATAILRRKKKDNMLLVDDAINDDNSVIAINSNTMDKLELFRGDTVLVKGKKRKDTVLIVLIDDELEDGACRINRVVRNNLRIRLGDLVTIHPCPDIKYATRISVLPIADTIEGITGNLFDVFLKPYFVEAYRPVRKGDHFVVRGGMRQVEFKVVDVEPEEYAVVAQDTIIHWEGEPINREDEENNMNEVGYDDIGGCRKQMAQIREMVELPLRHPQLFKAIGIKPPRGVLMYGPPGTGKTLMARAVANETGAFFFLINGPEVMSKMAGESESNLRKAFEEAEKNAPAIIFIDEIDSIAPKRDKTNGEVERRVVSQLLTLMDGMKARSNVVVIAATNRPNSIDPALRRFGRFDREVDIGIPDATGRLEVLRIHTKNMKLADDVDLEALAAETHGYVGADIASLCSEAAMQQIREKMDLIDLDEDEIDAEVLDSLGVTMDNFRFALGNSNPSALRETVVESVNVTWDDVGGLDEIKEELKETVEYPVLHPDQYTKFGLSPSKGVLFYGPPGTGKTLLAKAVATEVSANFISVKGPELLSMWYGESESNIRDIFDKARAAAPTVVFLDELDSIAKARGGSLGDAGGASDRVVNQLLTEMDGMNAKKNVFVIGATNRPDQIDPAILRPGRLDQLIYVPLPDENARLSILNAQLRKTPLEPGLELTAIAKATQGFSGADLLYIVQRAAKYAIKDSIEAHRQHEAEKEVKVEGEDVEMTDEGAKAEQEPEVDPVPYITKEHFAEAMKTAKRSVSDAELRRYEAYSQQMKASRGQFSNFNFNDAPLGTTATDNANSNNSAPSGAGAAFGSNAEEDDDLYS
2APJ , Knot 114 260 0.81 40 172 252
MEGGSITPGEDKPEIQSPIPPNQIFILSGQSNMAGRGGVFKDHHNNRWVWDKILPPECAPNSSILRLSADLRWEEAHEPLHVDIDTGKVCGVGPGMAFANAVKNRLETDSAVIGLVPCASGGTAIKEWERGSHLYERMVKRTEESRKCGGEIKAVLWYQGESDVLDIHDAESYGNNMDRLIKNLRHDLNLPSLPIIQVAIASGGGYIDKVREAQLGLKLSNVVCVDAKGLPLKSDNLHLTTEAQVQLGLSLAQAYLSNFC
7JUT , Knot 149 342 0.84 40 215 326
MSYYHHHHHHDYDIPTTENLYFQGAEMNLSLLSARSFPRKASQTSIFLQEWDIPFEQLEIGELIGKGRFGQVYHGRWHGEVAIRLIDIERDNEDQLKAFKREVMAYRQTRHENVVLFMGACMSPPHLAIITSLCKGRTLYSVVRDAKIVLDVNKTRQIAQEIVKGMGYLHAKGILHKDLKSKNVFYDNGKVVITDFGLFSISGVLQAGRREDKLRIQNGWLCHLAPEIIRQLSPDTEEDKLPFSKHSDVFALGTIWYELHAREWPFKTQPAEAIIWQMGTGMKPNLSQIGMGKEISDILLFCWAFEQEERPTFTKLMDMLEKLPKRNRRLSHPGHFWKSAEL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6OPC_1)}(2) \setminus P_{f(2APJ_1)}(2)|=132\), \(|P_{f(2APJ_1)}(2) \setminus P_{f(6OPC_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100001110101101000000101110000000111100110000011110000100101101001110100000011111100010011001001100010101101101010101001001011111001011010110111010110100110010011101110010101101010001111000110101011000000001001100011100001101001101110010110111101101111011110100111011100011111110110110011100000100110010001111111001001110000001010001100110110110100011111100010010111001101000101111010101011010000101100101011110001011101101000111001000101101000010101100111010010111100010110001100101010011110010001000100111010000011101001111011110100111011100101011010110110110100000100110010111101111001001101011011011110001100110010110100011111100010010111101101001101111000101011010100011011101011101001101101101100110011000101000001000101010010100011010001010111010000110110010001001010000100001010010100101001111001000100000110111111100100000100
Pair \(Z_2\) Length of longest common subsequence
6OPC_1,2APJ_1 162 4
6OPC_1,7JUT_1 161 4
2APJ_1,7JUT_1 163 3

Newick tree

 
[
	2APJ_1:81.49,
	[
		6OPC_1:80.5,7JUT_1:80.5
	]:0.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1095 }{\log_{20} 1095}-\frac{260}{\log_{20}260})=223.\)
Status Protein1 Protein2 d d1/2
Query variables 6OPC_1 2APJ_1 274 179.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]