CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2RDE_1 1TRG_1 7WLM_1 Letter Amino acid
3 9 9 Y Tyrosine
15 13 5 R Arginine
4 5 2 C Cysteine
9 16 6 Q Glutamine
13 12 7 K Lycine
13 14 14 P Proline
13 13 25 A Alanine
14 11 3 H Histidine
15 13 18 F Phenylalanine
19 10 12 S Serine
11 11 5 N Asparagine
9 21 10 D Aspartic acid
12 16 10 I Isoleucine
30 28 19 L Leucine
8 7 4 M Methionine
15 13 19 E Glutamic acid
18 18 28 G Glycine
18 14 8 T Threonine
1 7 6 W Tryptophan
11 13 13 V Valine

2RDE_1|Chains A, B|Uncharacterized protein VCA0042|Vibrio cholerae (345073)
>1TRG_1|Chain A|THYMIDYLATE SYNTHASE|Escherichia coli (562)
>7WLM_1|Chains A, B, C|siphonaxanthin chlorophyll a/b binding light-harvesting complex II|Codium fragile (3133)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2RDE , Knot 114 251 0.83 40 165 238
MGSSHHHHHHSSGLVPRGSHMTVSTINSTDALAMVEHSSELTLSITTPVGTKFVCRTPFIGTHTDKFLLVEMPKISADDLQYFFQEGFWMNIRAISPRGEGALIHFRSQLMHILQEPVPMAFLSIPNTMQVSQLRKEPRFELNLAGKVLFDEHRGDCELRDLSRSGCRFITPPLGKTYQVGDLVALEIFSDLRGTKTFPPLTGKICNLQRSLHHARYGLEFNEEGRNNAKNLLAQLKFNGTKLTLNAEKKA
1TRG , Knot 125 265 0.87 42 187 257
XMKQYLELMQKVLDEGTQKNDRTGTGTLSIFGHQMRFNLQDGFPLVTTKRCHLRSIIHELLWFLQGDTNIAYLHENNVTIWDEWADENGDLGPVYGKQWRAWPTPDGRHIDQITTVLNQLKNDPDSRRIIVSAWNVGELDKMALAPCHAFFQFYVADGKLSCQLYQRSCDVFLGLPFNIASYALLVHMMAQQCDLEVGDFVWTGGDTHLYSNHMDQTHLQLSREPRPLPKLIIKRKPESIFDYRFEDFEIEGYDPHPGIKAPVAI
7WLM , Knot 105 223 0.84 40 144 211
IEFYGPDRALWLGPYSEGAVPSYLTGEFPGDYGWDSAGLSADPETFAANRELELIHARWAMLGVVGCLTPEALEKYSGVEFGEATWFKAGSQIFAEGGIDYLGNPSLVHAQSILAIVWSQVVLMGLAEGYRVSGGPLGEATDPLYPGEAFDPFGFADDPETFSELKIKEIKNGRLAMFAMFGFFVQALQTGKGPVECWASHIEDPVANNGFVYATKFFGAQIF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2RDE_1)}(2) \setminus P_{f(1TRG_1)}(2)|=79\), \(|P_{f(1TRG_1)}(2) \setminus P_{f(2RDE_1)}(2)|=101\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000000000011110100101001000011111000001010100111001100011110000011110110101001001100111101011010101111010001101100111111101100101001000101010111011100001000100100010011011110000110111101100101000111101010010001001001101000100010011101010100101010001
Pair \(Z_2\) Length of longest common subsequence
2RDE_1,1TRG_1 180 4
2RDE_1,7WLM_1 157 3
1TRG_1,7WLM_1 163 3

Newick tree

 
[
	1TRG_1:88.17,
	[
		2RDE_1:78.5,7WLM_1:78.5
	]:9.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{516 }{\log_{20} 516}-\frac{251}{\log_{20}251})=75.7\)
Status Protein1 Protein2 d d1/2
Query variables 2RDE_1 1TRG_1 99 93
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]