CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6HQY_1 2HMX_1 1QLD_1 Letter Amino acid
17 3 0 H Histidine
26 3 3 P Proline
19 5 5 T Threonine
26 10 4 A Alanine
21 8 2 R Arginine
32 5 1 D Aspartic acid
8 1 1 M Methionine
19 1 1 F Phenylalanine
36 10 3 S Serine
24 4 2 Y Tyrosine
12 11 4 Q Glutamine
24 6 2 I Isoleucine
49 14 2 L Leucine
29 10 7 G Glycine
16 13 0 K Lycine
28 7 2 V Valine
8 2 3 W Tryptophan
20 6 3 N Asparagine
11 2 4 C Cysteine
22 12 1 E Glutamic acid

6HQY_1|Chains A, B, C, D|Histone deacetylase|Schistosoma mansoni (6183)
>2HMX_1|Chain A|HUMAN IMMUNODEFICIENCY VIRUS TYPE 1 MATRIX PROTEIN|Human immunodeficiency virus 1 (11676)
>1QLD_1|Chain A|XYLANASE|PSEUDOMONAS FLUORESCENS (294)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6HQY , Knot 190 447 0.86 40 247 428
HMSVGIVYGDQYRQLCCSSPKFGDRYALVMDLINAYKLIPELSRVPPLQWDSPSRMYEAVTAFHSTEYVDALKKLQMLHCEEKELTADDELLMDSFSLNYDCPGFPSVFDYSLAAVQGSLAAASALICRHCEVVINWGGGWHHAKRSEASGFCYLNDIVLAIHRLVSSTPPETSPNRQTRVLYVDLDLHHGDGVEEAFWYSPRVVTFSVHHASPGFFPGTGTWNMVDNDKLPIFLNGAGRGRFSAFNLPLEEGINDLDWSNAIGPILDSLNIVIQPSYVVVQCGADCLATDPHRIFRLTNFYPNLNLDSDCDSECSLSGYLYAIKKILSWKVPTLILGGGGYNFPDTARLWTRVTALTIEEVKGKKMTISPEIPEHSYFSRYGPDFELDIDYFPHESHNKTLDSIQKHHRRILEQLRNYADLNKLIYDYDQVYQLYNLTGMGSLVPR
2HMX , Knot 66 133 0.81 40 107 130
HMGARASVLSGGELDKWEKIRLRPGGKKQYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQTGSEELRSLYNTIAVLYCVHQRIDVKDTKEALDKIEEEQNKSKKKAQQAAADTGNNSQVSQNY
1QLD , Knot 29 50 0.75 36 46 48
MGNQQCNWYGTLYPLCVTTTNGWGWEDQRSCIARSTCAAQPAPFGIVGSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6HQY_1)}(2) \setminus P_{f(2HMX_1)}(2)|=179\), \(|P_{f(2HMX_1)}(2) \setminus P_{f(6HQY_1)}(2)|=39\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010111101000001000010110001111011010011101001111010010010011011000001011001011000000101000111001010000111101100011110101111011100000111011111001000010110010011111001100011000100000110101010010110011100101101010010111111010101100001111101110101011011100110010100111111001011101001110011001100100110100101010100000000010101011001101011011111110011001011001011010010100101010110000100011010101001100000001001000000110010001010011000001001001011101110
Pair \(Z_2\) Length of longest common subsequence
6HQY_1,2HMX_1 218 3
6HQY_1,1QLD_1 237 3
2HMX_1,1QLD_1 121 2

Newick tree

 
[
	6HQY_1:12.73,
	[
		2HMX_1:60.5,1QLD_1:60.5
	]:66.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{580 }{\log_{20} 580}-\frac{133}{\log_{20}133})=130.\)
Status Protein1 Protein2 d d1/2
Query variables 6HQY_1 2HMX_1 170 108.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]