CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4WUY_1 1QUI_1 9IQN_1 Letter Amino acid
25 22 12 V Valine
21 4 5 R Arginine
21 19 14 D Aspartic acid
17 0 1 C Cysteine
20 17 9 I Isoleucine
35 31 9 K Lycine
14 0 6 M Methionine
13 21 7 T Threonine
42 15 8 E Glutamic acid
22 1 7 H Histidine
29 35 10 A Alanine
15 17 3 N Asparagine
15 14 2 Q Glutamine
17 12 6 F Phenylalanine
27 19 7 S Serine
4 8 1 W Tryptophan
20 12 1 Y Tyrosine
24 35 14 G Glycine
41 24 8 L Leucine
19 15 2 P Proline

4WUY_1|Chain A|N-lysine methyltransferase SMYD2|Homo sapiens (9606)
>1QUI_1|Chain A|PHOSPHATE-BINDING PROTEIN|Escherichia coli (562)
>9IQN_1|Chain A|SnoaL-like domain-containing protein|Streptantibioticus cattleyicolor (strain ATCC 35852 / DSM 46488 / JCM 4925 / NBRC 14057 / NRRL 8057) (1003195)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4WUY , Knot 183 441 0.84 40 242 413
MRAEGLGGLERFCSPGKGRGLRALQPFQVGDLLFSCPAYAYVLTVNERGNHCEYCFTRKEGLSKCGRCKQAFYCNVECQKEDWPMHKLECSPMVVFGENWNPSETVRLTARILAKQKIHPERTPSEKLLAVKEFESHLDKLDNEKKDLIQSDIAALHHFYSKHLGFPDNDSLVVLFAQVNCNGFTIEDEELSHLGSAIFPDVALMNHSCCPNVIVTYKGTLAEVRAVQEIKPGEEVFTSYIDLLYPTEDRNDRLRDSYFFTCECQECTTKDKDKAKVEIRKLSDPPKAEAIRDMVRYARNVIEEFRRAKHYKSPSELLEICELSQEKMSSVFEDSNVYMLHMMYQAMGVCLYMQDWEGALQYGQKIIKPYSKHYPLYSLNVASMWLKLGRLYMGLEHKAAGEKALKKAIAIMEVAHGKDHPYISEIKQEIESHEGHHHHHH
1QUI , Knot 138 321 0.82 36 177 307
EASLTGAGATFPAPVYAKWADTYQKETGNKVNYQGIGSSGGVKQIIANTVDFGASDAPLSDEKLAQEGLFQFPTVIGGVVLAVNIPGLKSGELVLDGKTLGDIYLGKIKKWDDEAIAKLNPGLKLPSQNIAVVRRAGGSGTSFVFTSYLAKVNEEWKNNVGTGSTVKWPIGLGGKGNDGIAAFVQRLPGAIGYVEYAYAKQNNLAYTKLISADGKPVSPTEENFANAAKGADWSKTFAQDLTNQKGEDAWPITSTTFILIHKDQKKPEQGTEVLKFFDWAYKTGAKQANDLDYASLPDSVVEQVRAAWKTNIKDSSGKPLY
9IQN , Knot 65 132 0.80 40 105 123
MSAEVIDRFFKSSGAGDIETAVECFADDGQWITPDGDGLGTVHTKDQIGDLITSMNAMREKMIASGVDGKFESPIMFGENMGLVRATVETDDGKVVNRGVDLFILSDGKIVLKDVYRKVKLAAALEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4WUY_1)}(2) \setminus P_{f(1QUI_1)}(2)|=123\), \(|P_{f(1QUI_1)}(2) \setminus P_{f(4WUY_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011111001001101011011011011011100110101101000100000010000110001000011000100000011100100011111100101000101010111000101000100011110010001001000000110001111001000011110000111111010001101000010011011110111100000101110001011010110010110011000101101000000010000110000000000000010101001001101011001100100110010010000010011010010000100110000101101100111101010010111001001101000001100101101110110101110001110011001111101101000101001000100001000000
Pair \(Z_2\) Length of longest common subsequence
4WUY_1,1QUI_1 181 5
4WUY_1,9IQN_1 189 6
1QUI_1,9IQN_1 152 3

Newick tree

 
[
	4WUY_1:97.40,
	[
		1QUI_1:76,9IQN_1:76
	]:21.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{762 }{\log_{20} 762}-\frac{321}{\log_{20}321})=120.\)
Status Protein1 Protein2 d d1/2
Query variables 4WUY_1 1QUI_1 152 130
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]