CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2IRU_1 1BDU_1 7SSG_1 Letter Amino acid
31 28 140 L Leucine
7 13 46 F Phenylalanine
8 9 27 Y Tyrosine
11 16 59 Q Glutamine
22 18 66 G Glycine
6 11 34 H Histidine
9 12 43 K Lycine
20 10 47 S Serine
26 14 58 T Threonine
19 21 69 D Aspartic acid
13 13 93 E Glutamic acid
8 16 61 I Isoleucine
6 7 10 W Tryptophan
24 13 67 V Valine
39 13 109 A Alanine
23 13 93 R Arginine
4 11 33 N Asparagine
0 5 7 C Cysteine
6 7 30 M Methionine
21 14 56 P Proline

2IRU_1|Chains A, B|Putative DNA ligase-like protein Rv0938/MT0965|Mycobacterium tuberculosis (83332)
>1BDU_1|Chain A|THYMIDYLATE SYNTHASE|Escherichia coli (562)
>7SSG_1|Chain A|Transcription-repair-coupling factor|Escherichia coli (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2IRU , Knot 128 303 0.80 38 173 282
GSHMGSASEQRVTLTNADKVLYPATGTTKSDIFDYYAGVAEVMLGHIAGRPATRKRWPNGVDQPAFFEKQLALSAPPWLSRATVAHRSGTTTYPIIDSATGLAWIAQQAALEVHVPQWRFVAEPGSGELNPGPATRLVFDLDPGEGVMMAQLAEVARAVRDLLADIGLVTFPVTSGSKGLHLYTPLDEPVSSRGATVLAKRVAQRLEQAMPALVTSTMTKSLRAGKVFVDWSQNSGSKTTIAPYSLRGRTHPTVAAPRTWAELDDPALRQLSYDEVLTRIARDGDLLERLDADAPVADRLTRY
1BDU , Knot 125 265 0.87 42 187 257
XMKQYLELMQKVLDEGTQKNDRTGTGTLSIFGHQMRFNLQDGFPLVTTKRCHLRSIIHELLWFLQGDTNIAYLHENNVTIWDEWADENGDLGPVYGKQWRAWPTPDGRHIDQITTVLNQLKNDPDSRRIIVSAWNVGELDKMALAPCHAFFQFYVADGKLSCQLYQRSCDVFLGLPFNIASYALLVHMMAQQCDLEVGDFVWTGGDTHLYSNHMDQTHLQLSREPRPLPKLIIKRKPESIFDYRFEDFEIEGYDPHPGIKAPVAI
7SSG , Knot 407 1148 0.83 40 317 956
MPEQYRYTLPVKAGEQRLLGELTGAACATLVAEIAERHAGPVVLIAPDMQNALRLHDEISQFTDQMVMNLADWETLPYDSFSPHQDIISSRLSTLYQLPTMQRGVLIVPVNTLMQRVCPHSFLHGHALVMKKGQRLSRDALRTQLDSAGYRHVDQVMEHGEYATRGALLDLFPMGSELPYRLDFFDDEIDSLRVFDVDSQRTLEEVEAINLLPAHEFPTDKAAIELFRSQWRDTFEVKRDPEHIYQQVSKGTLPAGIEYWQPLFFSEPLPPLFSYFPANTLLVNTGDLETSAERFQADTLARFENRGVDPMRPLLPPQSLWLRVDELFSELKNWPRVQLKTEHLPTKAANANLGFQKLPDLAVQAQQKAPLDALRKFLETFDGPVVFSVESEGRREALGELLARIKIAPQRIMRLDEASDRGRYLMIGAAEHGFVDTVRNLALICESDLLGERVARRRQDSRRTINPDTLIRNLAELHIGQPVVHLEHGVGRYAGMTTLEAGGITGEYLMLTYANDAKLYVPVSSLHLISRYAGGAEENAPLHKLGGDAWSRARQKAAEKVRDVAAELLDIYAQRAAKEGFAFKHDREQYQLFCDSFPFETTPDQAQAINAVLSDMCQPLAMDRLVCGDVGFGKTEVAMRAAFLAVDNHKQVAVLVPTTLLAQQHYDNFRDRFANWPVRIEMISRFRSAKEQTQILAEVAEGKIDILIGTHKLLQSDVKFKDLGLLIVDEEHRFGVRHKERIKAMRANVDILTLTATPIPRTLNMAMSGMRDLSIIATPPARRLAVKTFVREYDSMVVREAILREILRGGQVYYLYNDVENIQKAAERLAELVPEARIAIGHGQMRERELERVMNDFHHQRFNVLVCTTIIETGIDIPTANTIIIERADHFGLAQLHQLRGRVGRSHHQAYAWLLTPHPKAMTTDAQKRLEAIASLEDLGAGFALATHDLEIRGAGELLGEEQSGSMETIGFSLYMELLENAVDALKAGREPSLEDLTSQQTEVELRMPSLLPDDFIPDVNTRLSFYKRIASAKTENELEEIKVELIDRFGLLPDPARTLLDIARLRQQAQKLGIRKLEGNEKGGVIEFAEKNHVNPAWLIGLLQKQPQHYRLDGPTRLKFIQDLSERKTRIEWVRQFMRELEENAIA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2IRU_1)}(2) \setminus P_{f(1BDU_1)}(2)|=75\), \(|P_{f(1BDU_1)}(2) \setminus P_{f(2IRU_1)}(2)|=89\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110100001010010011011010000011000111101111011101100001101100111100011101111100101100010000111001011111100111010110101110110101011110011101011011111011011011001110111101110010011010011001100011011100110010011111100010001011011101000010000111001010001011110011010011100100001100110010110010101111001000
Pair \(Z_2\) Length of longest common subsequence
2IRU_1,1BDU_1 164 4
2IRU_1,7SSG_1 180 5
1BDU_1,7SSG_1 190 4

Newick tree

 
[
	7SSG_1:95.78,
	[
		2IRU_1:82,1BDU_1:82
	]:13.78
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{568 }{\log_{20} 568}-\frac{265}{\log_{20}265})=85.6\)
Status Protein1 Protein2 d d1/2
Query variables 2IRU_1 1BDU_1 106 102
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]