CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1DTH_1 1HMV_1 7DLY_1 Letter Amino acid
11 16 32 R Arginine
14 42 23 I Isoleucine
6 13 22 F Phenylalanine
2 19 6 W Tryptophan
11 34 31 G Glycine
9 8 10 H Histidine
7 38 17 P Proline
14 19 35 S Serine
9 39 30 V Valine
14 20 27 N Asparagine
9 36 10 Q Glutamine
13 47 34 E Glutamic acid
25 49 42 L Leucine
8 61 20 K Lycine
7 6 13 M Methionine
10 39 21 T Threonine
9 26 28 A Alanine
14 24 25 D Aspartic acid
4 2 7 C Cysteine
7 22 14 Y Tyrosine

1DTH_1|Chains A, B|ATROLYSIN C|Crotalus atrox (8730)
>1HMV_1|Chains A, C, E, G|HIV-1 REVERSE TRANSCRIPTASE (SUBUNIT P66)|Human immunodeficiency virus 1 (11676)
>7DLY_1|Chains A, B|1-aminocyclopropane-1-carboxylate synthase 7|Arabidopsis thaliana (3702)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1DTH , Knot 97 203 0.84 40 158 199
QQNLPQRYIELVVVADHRVFMKYNSDLNTIRTRVHEIVNFINGFYRSLNIHVSLTDLEIWSNEDQINIQSASSDTLNAFAEWRETDLLNRKSHDNAQLLTAIELDEETLGLAPLGTMCDPKLSIGIVQDHSPINLLMGVTMAHELGHNLGMEHDGKDCLRGASLCIMRPGLTKGRSYEFSDDSMHYYERFLKQYKPQCILNKP
1HMV , Knot 222 560 0.83 40 252 513
PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFKKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNKGRQKVVPLTNTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKIL
7DLY , Knot 182 447 0.82 40 236 420
MGLPLMMERSSNNNNVELSRVAVSDTHGEDSPYFAGWKAYDENPYDESHNPSGVIQMGLAENQVSFDLLETYLEKKNPEGSMWGSKGAPGFRENALYGNFRGGETFRQAMASFMEQIRGGKARFDPDRIVLTAGATAANELLTFILADPNDALLVPTPYYPGFDRDLRWRTGVKIVPIHCDSSNHFQITPEALESAYQTARDANIRVRGVLITNPSNPLGATVQKKVLEDLLDFCVRKNIHLVSDEIYSGSVFHASEFTSVAEIVENIDDVSVKERVHIVYSLSKDLGLPGFRVGTIYSYNDNVVRTARRMSSFTLVSSQTQHMLASMLSDEEFTEKYIRINRERLRRRYDTIVEGLKKAGIECLKGNAGLFCWMNLGFLLEKKTKDGELQLWDVILKELNLNISPGSSCHCSEVGWFRVCFANMSENTLEIALKRIHEFMDRRRRF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1DTH_1)}(2) \setminus P_{f(1HMV_1)}(2)|=53\), \(|P_{f(1HMV_1)}(2) \setminus P_{f(1DTH_1)}(2)|=147\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00011000101111100011100000100100010011011011000101010100101100000101001000010111010000110000000101101101000011111110100101011110000110111110110011001110001000101101011011100100001000010000011000010011001
Pair \(Z_2\) Length of longest common subsequence
1DTH_1,1HMV_1 200 4
1DTH_1,7DLY_1 188 3
1HMV_1,7DLY_1 160 4

Newick tree

 
[
	1DTH_1:10.09,
	[
		7DLY_1:80,1HMV_1:80
	]:22.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{763 }{\log_{20} 763}-\frac{203}{\log_{20}203})=156.\)
Status Protein1 Protein2 d d1/2
Query variables 1DTH_1 1HMV_1 196 135
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]