CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6AMH_1 1DRR_1 3AMT_1 Letter Amino acid
12 0 20 F Phenylalanine
17 0 18 S Serine
16 0 36 R Arginine
13 0 13 N Asparagine
43 4 32 G Glycine
22 0 26 I Isoleucine
36 0 34 L Leucine
11 0 12 M Methionine
14 0 18 T Threonine
3 0 3 W Tryptophan
17 0 15 Y Tyrosine
39 5 23 A Alanine
1 1 9 C Cysteine
30 0 44 E Glutamic acid
30 0 26 K Lycine
15 0 21 P Proline
31 0 42 V Valine
17 0 26 D Aspartic acid
12 0 6 Q Glutamine
17 0 16 H Histidine

6AMH_1|Chains A, B, C, D|Tryptophan synthase beta chain 1|Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1) (186497)
>1DRR_1|Chain A|DNA (5'-D(*GP*AP*AP*GP*AP*GP*AP*AP*GP*C)-3')|
>3AMT_1|Chain A|Putative uncharacterized protein|Archaeoglobus fulgidus (2234)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6AMH , Knot 165 396 0.83 40 210 373
MWFGEFGGQYVPETLIGPLKELEKAYKRFKDDEEFNRQLNYYLKTWAGRPTPLYYAKRLTEKIGGAKVYLKREDLVHGGAHKTNNAIGQALLAKFMGKTRLIAETGAGQHGVATAMAGALLGMKVDIYMGAEDVERQKMNVFRMKLLGANVIPVNSGSRTLKDAINEALRDWVATFEYTHYLIGSVVGPHPYPTIVRDFQSVIGREAKAQILEAEGQLPDVIVACVGGGSNAMGIFYPFVNDKKVKLVGVEAGGKGLESGKHSASLNAGQVGVSHGMLSYFLQDEEGQIKPSHSIAPGLDYPGVGPEHAYLKKIQRAEYVAVTDEEALKAFHELSRTEGIIPALESAHAVAYAMKLAKEMSRDEIIIVNLSGRGDKDLDIVLKVSGNVLEHHHHHH
1DRR , Knot 5 10 0.38 6 4 5
GAAGAGAAGC
3AMT , Knot 180 440 0.83 40 226 411
MGSSHHHHHHSSGLVPRGSHMRVWVGIDDTDSSRGMCTTYLAVLAMERVERELGKVIGFPRLIRLNPTIPYKTRGNGAVSFLVEVDDVGELVDVVNEVIIEHAMLDDEKTNPGAVFVDEELAVKLKPFADKAIKDVLQIDEALFVIGKYFIPHLRHKKGRGLIGALAAVGAELEDFTLELIAYRYPERFGTEREYDEESFFDMDYELYPQTFDNVDWCNDVVVCIPNTPCPVLYGIRGESVEALYKAMESVKTEPVDRRMIFVTNHATDMHLIGEEEVHRLENYRSYRLRGRVTLEPYDIEGGHVFFEIDTKFGSVKCAAFEPTKQFRNVIRLLRKGDVVEVYGSMKKDTINLEKIQIVELAEIWVEKNPICPSCGRRMESAGRGQGFRCKKCRTKADEKLREKVERELQPGFYEVPPSARRHLSKPLIRMNVEGRHIFR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6AMH_1)}(2) \setminus P_{f(1DRR_1)}(2)|=208\), \(|P_{f(1DRR_1)}(2) \setminus P_{f(6AMH_1)}(2)|=2\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111101110011001111100100100010000010001000100111010110010010001111010100001101110000011101111011100011100111001110111111111010101110010000101101011110111100100010011001100111010000011101111010101100100111001010110101011011110111100111110111000010111101110110010001010110111001110011000010101000111110011111001010010010011100001101100100001111110010111011011001000011110101010001011101010110000000
Pair \(Z_2\) Length of longest common subsequence
6AMH_1,1DRR_1 210 3
6AMH_1,3AMT_1 156 6
1DRR_1,3AMT_1 224 2

Newick tree

 
[
	1DRR_1:11.98,
	[
		6AMH_1:78,3AMT_1:78
	]:38.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{406 }{\log_{20} 406}-\frac{10}{\log_{20}10})=128.\)
Status Protein1 Protein2 d d1/2
Query variables 6AMH_1 1DRR_1 163 83
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]