CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1DTT_1 3AKF_1 6TLD_1 Letter Amino acid
37 11 12 Q Glutamine
19 40 36 S Serine
19 12 8 W Tryptophan
26 52 26 A Alanine
25 26 32 D Aspartic acid
6 5 8 M Methionine
37 25 26 P Proline
40 28 28 V Valine
41 20 24 I Isoleucine
58 17 16 K Lycine
47 20 22 E Glutamic acid
34 32 29 G Glycine
8 18 17 H Histidine
49 29 49 L Leucine
40 38 19 T Threonine
18 24 21 R Arginine
19 25 20 N Asparagine
22 26 24 Y Tyrosine
2 1 11 C Cysteine
13 19 19 F Phenylalanine

1DTT_1|Chain A|HIV-1 RT A-CHAIN|Human immunodeficiency virus 1 (11676)
>3AKF_1|Chain A|Putative secreted alpha L-arabinofuranosidase II|Streptomyces avermitilis (227882)
>6TLD_1|Chains A, B, C, D|Histone deacetylase|Schistosoma mansoni (6183)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1DTT , Knot 223 560 0.84 40 255 517
PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL
3AKF , Knot 187 468 0.82 40 229 420
MTAPASPSVTFTNPLAEKRADPHIFKHTDGYYYFTATVPEYDRIVLRRATTLQGLATAPETTIWTKHASGVMGAHIWAPEIHFIDGKWYVYFAAGSTSDVWAIRMYVLESGAANPLTGSWTEKGQIATPVSSFSLDATTFVVNGVRHLAWAQRNPAEDNNTSLFIAKMANPWTISGTPTEISQPTLSWETVGYKVNEGPAVIQHGGKVFLTYSASATDANYCLGMLSASASADLLNAASWTKSSQPVFKTSEATGQYGPGHNSFTVSEDGKSDILVYHDRNYKDISGDPLNDPNRRTRLQKVYWNADGTPNFGIPVADGVTPVRFSSYNYPDRYIRHWDFRARIEANVTNLADSQFRVVTGLAGSGTISLESANYPGYYLRHKNYEVWVEKNDGSSAFKNDASFSRRAGLADSADGIAFESYNYPGRYLRHYENLLRIQPVSTALDRQDATFYAEKLAAALEHHHHHH
6TLD , Knot 190 447 0.86 40 247 428
HMSVGIVYGDQYRQLCCSSPKFGDRYALVMDLINAYKLIPELSRVPPLQWDSPSRMYEAVTAFHSTEYVDALKKLQMLHCEEKELTADDELLMDSFSLNYDCPGFPSVFDYSLAAVQGSLAAASALICRHCEVVINWGGGWHHAKRSEASGFCYLNDIVLAIHRLVSSTPPETSPNRQTRVLYVDLDLHHGDGVEEAFWYSPRVVTFSVHHASPGFFPGTGTWNMVDNDKLPIFLNGAGRGRFSAFNLPLEEGINDLDWSNAIGPILDSLNIVIQPSYVVVQCGADCLATDPHRIFRLTNFYPNLNLDSDCDSECSLSGYLYAIKKILSWKVPTLILGGGGYNFPDTARLWTRVTALTIEEVKGKKMTISPEIPEHSYFSRYGPDFELDIDYFPHESHNKTLDSIQKHHRRILEQLRNYADLNKLIYDYDQVYQLYNLTGMGSLVPR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1DTT_1)}(2) \setminus P_{f(3AKF_1)}(2)|=88\), \(|P_{f(3AKF_1)}(2) \setminus P_{f(1DTT_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11011001110101110110100111000010111010001000101001110010001111100000001001101001000000110101111011110000010110110101011100010000110110100001110000011101101011110001001101100001011100010010110010110000010010001101110010000000111111100101001010111110000101001001110101100101110100100110100110011110001010110000110011011000100011101000101010001000110010010010101100001001001100100001111100101011100001001100010101110101100111101100100011111001010111000001101101000100011010000000001011011100011010110000011111010100000011001100110000101111110011110001001101110011
Pair \(Z_2\) Length of longest common subsequence
1DTT_1,3AKF_1 150 4
1DTT_1,6TLD_1 166 4
3AKF_1,6TLD_1 152 4

Newick tree

 
[
	6TLD_1:81.04,
	[
		1DTT_1:75,3AKF_1:75
	]:6.04
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1028 }{\log_{20} 1028}-\frac{468}{\log_{20}468})=146.\)
Status Protein1 Protein2 d d1/2
Query variables 1DTT_1 3AKF_1 188 170.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]