CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GBK_1 8DSS_1 2QKR_1 Letter Amino acid
9 2 15 Q Glutamine
19 13 18 E Glutamic acid
27 12 19 G Glycine
5 11 23 K Lycine
22 15 17 V Valine
22 5 19 S Serine
4 0 5 W Tryptophan
35 11 35 L Leucine
15 1 13 F Phenylalanine
5 4 10 Y Tyrosine
38 22 13 A Alanine
27 6 16 R Arginine
7 7 9 N Asparagine
2 0 5 C Cysteine
6 2 15 H Histidine
27 10 20 D Aspartic acid
23 7 25 I Isoleucine
5 4 8 M Methionine
14 9 17 P Proline
23 7 11 T Threonine

6GBK_1|Chains A, B|Parathion hydrolase|Brevundimonas diminuta (293)
>8DSS_1|Chains A, B|ComE operon protein 1|Geobacillus stearothermophilus ATCC 7953 (937593)
>2QKR_1|Chain A|Cdc2-like CDK2/CDC28 like protein kinase|Cryptosporidium parvum Iowa II (353152)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GBK , Knot 144 335 0.83 40 190 311
TNSGDRINTVRGPITISEAGFTLMHEHICGSSAGFLRAWPEFFGSRDALAEKAVRGLRRARAAGVRTIVDVSTFDLGRDVELLAEVSEAADVHIVAATGLWFDPPLSMRLRSVEELTQFFLREIQYGIEDTGIRAGIIKVATTGKATPFQERVLRAAARASLATGVPVTTHTDASQRDGEQQADIFESEGLDPSRVCIGHSDDTDDLDYLTALAARGYLIGLDGIPHSAIGLEDNASAAALLGLRSWQTRALLIKALIDQGYADQILVSNDWLFGFSSYVTNIMDVLDRVNPDGMAFIPLRVIPFLREKGVPDETLETIMVDNPARFLSPTLRAS
8DSS , Knot 69 148 0.77 36 110 145
ASKTAVVDVKGAVANPGVYEVAADARVRDAIALAGGLTDEADETKVNLAAKVHDEMMIYVPKKGEDAPASNAVSKSPSDGDRNGMQVAINTATEEELMQLPGIGPAKANAIIAYREEHGPFRRVEDLLNVTGIGEKTLEKLKPYLLVP
2QKR , Knot 139 313 0.85 40 201 299
MHHHHHHSSGRENLYFQGLMEKYQKLEKVGEGTYGVVYKAKDSQGRIVALKRIRLDAEDEGIPSTAIREISLLKELHHPNIVSLIDVIHSERCLTLVFEFMEKDLKKVLDENKTGLQDSQIKIYLYQLLRGVAHCHQHRILHRDLKPQNLLINSDGALKLADFGLARAFGIPVRSYTHEVVTLWYRAPDVLMGSKKYSTSVDIWSIGCIFAEMITGKPLFPGVTDDDQLPKIFSILGTPNPREWPQVQELPLWKQRTFQVFEKKPWSSIIPGFCQEGIDLLSNMLCFDPNKRISARDAMNHPYFKDLDPQIMI

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GBK_1)}(2) \setminus P_{f(8DSS_1)}(2)|=120\), \(|P_{f(8DSS_1)}(2) \setminus P_{f(6GBK_1)}(2)|=40\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00010010010111010011101100010100111101110111000111001101100101111001101001011001011101001101011110111101110101001001001110010011000110111101100101011000110111010110111100000100001000101100011010010110000000100101111010111101110011110001011111110010001111011100101001110001111100010011011001010111111101111100011100010011100110110101010
Pair \(Z_2\) Length of longest common subsequence
6GBK_1,8DSS_1 160 3
6GBK_1,2QKR_1 161 3
8DSS_1,2QKR_1 179 3

Newick tree

 
[
	2QKR_1:86.75,
	[
		6GBK_1:80,8DSS_1:80
	]:6.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{483 }{\log_{20} 483}-\frac{148}{\log_{20}148})=98.8\)
Status Protein1 Protein2 d d1/2
Query variables 6GBK_1 8DSS_1 125 89
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]