CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9ETC_1 1HKS_1 4DAN_1 Letter Amino acid
2 2 10 Y Tyrosine
7 6 20 A Alanine
6 8 16 D Aspartic acid
9 6 22 G Glycine
7 7 13 I Isoleucine
13 3 18 T Threonine
6 6 9 Q Glutamine
3 3 12 H Histidine
6 10 8 F Phenylalanine
5 7 9 R Arginine
4 8 7 N Asparagine
9 11 18 L Leucine
14 8 14 K Lycine
1 2 0 W Tryptophan
11 3 25 V Valine
1 1 3 C Cysteine
11 3 14 E Glutamic acid
4 2 8 M Methionine
3 4 6 P Proline
7 6 21 S Serine

9ETC_1|Chains A, B|Fatty acid-binding protein, liver|Gallus gallus (9031)
>1HKS_1|Chain A|HEAT-SHOCK TRANSCRIPTION FACTOR|Drosophila melanogaster (7227)
>4DAN_1|Chains A, B|Purine nucleoside phosphorylase deoD-type|Bacillus subtilis (1423)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9ETC , Knot 66 129 0.82 40 106 127
GSHMAFSGTWQVYAQENYEEFLKALALPEDLIKMARDIKPIVEIQQKGDDFVVTSKTPRQTVTNSFTLGKEADITTMDGKKLKCTVHLANGKLVTKSEKFSHEQEVKGNEMVETITFGGVTLIRRSKRV
1HKS , Knot 56 106 0.82 40 96 104
GSGVPAFLAKLWRLVDDADTNRLICWTKDGQSFVIQNQAQFAKELLPLNYKHNNMASFIRQLNMYGFHKITSIDNGGLRFDRDEIEFSHPFFKRNSPFLLDQIKRK
4DAN , Knot 115 253 0.83 38 165 241
MGSSHHHHHHSSGLVPRGSHMSVHIGAEKGQIADTVLLPGDPLRAKFIAETYLENVECYNEVRGMYGFTGTYKGKKISVQGTGMGVPSISIYVNELIQSYDVQNLIRVGSCGAIRKDVKVRDVILAMTSSTDSQMNRVAFGSVDFAPCADFELLKNAYDAAKDKGVPVTVGSVFTADQFYNDDSQIEKLAKYGVLGVEMETTALYTLAAKHGRKALSILTVSDHVLTGEETTAEERQTTFHDMIDVALHSVSQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9ETC_1)}(2) \setminus P_{f(1HKS_1)}(2)|=70\), \(|P_{f(1HKS_1)}(2) \setminus P_{f(9ETC_1)}(2)|=60\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111010101010000001101111100110110010111010001001110000100010001011001010010100100010110101100000100000101001100101111011000001
Pair \(Z_2\) Length of longest common subsequence
9ETC_1,1HKS_1 130 3
9ETC_1,4DAN_1 173 4
1HKS_1,4DAN_1 163 4

Newick tree

 
[
	4DAN_1:89.48,
	[
		9ETC_1:65,1HKS_1:65
	]:24.48
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{235 }{\log_{20} 235}-\frac{106}{\log_{20}106})=41.3\)
Status Protein1 Protein2 d d1/2
Query variables 9ETC_1 1HKS_1 51 47.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]