CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2JGT_1 4KGS_1 3DGH_1 Letter Amino acid
20 0 21 R Arginine
21 6 29 D Aspartic acid
38 3 44 L Leucine
18 6 31 K Lycine
18 0 6 M Methionine
21 0 20 S Serine
25 11 30 T Threonine
10 3 13 N Asparagine
11 0 9 H Histidine
18 2 14 F Phenylalanine
19 0 21 P Proline
2 1 5 W Tryptophan
10 3 21 Y Tyrosine
49 6 41 A Alanine
5 0 7 C Cysteine
27 5 29 E Glutamic acid
43 4 50 G Glycine
20 1 26 I Isoleucine
14 1 15 Q Glutamine
33 3 51 V Valine

2JGT_1|Chains A, B|SERINE PALMITOYLTRANSFERASE|PSEUDOMONAS PAUCIMOBILIS (13689)
>4KGS_1|Chains A, B|Streptococcal Protein GB1 Backbone Modified Variant: beta-3-Val21, beta-3-Asp40|null
>3DGH_1|Chains A, B|Thioredoxin reductase 1, mitochondrial|Drosophila melanogaster (7227)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2JGT , Knot 176 422 0.84 40 225 395
MTEAAAQPHALPADAPDIAPERDLLSKFDGLIAERQKLLDSGVTEPFAIVMEQVKSPTEAVIRGKDTILLGTYNYMGMTFDPDVIAAGKEALEKFGSGTNGSRMLNGTFHDHMEVEQALRDFYGTTGAIVFSTGYMANLGIISTLAGKGEYVILDADSHASIYDGCQQGNAEIVRFRHNSVEDLDKRLGRLPKEPAKLVVLEGVYSMLGDIAPLKEMVAVAKKHGAMVLVDEAHSMGFFGPNGRGVYEAQGLEGQIDFVVGTFSKSVGTVGGFVVSNHPKFEAVRLACRPYIFTASLPPSVVATATTSIRKLMTAHEKRERLWSNARALHGGLKAMGFRLGTETCDSAIVAVMLEDQEQAAMMWQALLDGGLYVNMARPPATPAGTFLLRCSICAEHTPAQIQTVLGMFQAAGRAVGVIGLE
4KGS , Knot 31 57 0.73 30 48 55
DTYKLILNGKTLKGETTTEAXDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEX
3DGH , Knot 195 483 0.83 40 239 459
MAPVQGSYDYDLIVIGGGSAGLACAKEAVLNGARVACLDFVKPTPTLGTKWGVGGTCVNVGCIPKKLMHQASLLGEAVHEAAAYGWNVDDKIKPDWHKLVQSVQNHIKSVNWVTRVDLRDKKVEYINGLGSFVDSHTLLAKLKSGERTITAQTFVIAVGGRPRYPDIPGAVEYGITSDDLFSLDREPGKTLVVGAGYIGLECAGFLKGLGYEPTVMVRSIVLRGFDQQMAELVAASMEERGIPFLRKTVPLSVEKQDDGKLLVKYKNVETGEESEDVYDTVLWAIGRKGLVDDLNLPNAGVTVQKDKIPVDSQEATNVANIYAVGDIIYGKPELTPVAVLAGRLLARRLYGGSTQRMDYKDVATTVFTPLEYACVGLSEEDAVKQFGADEIEVFHGYYKPTEFFIPQKSVRYCYLKAVAERHGDQRVYGLHYIGPVAGEVIQGFAAALKSGLTINTLINTVGIHPTTAEEFTRLAITKRSGLD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2JGT_1)}(2) \setminus P_{f(4KGS_1)}(2)|=193\), \(|P_{f(4KGS_1)}(2) \setminus P_{f(2JGT_1)}(2)|=16\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10011101011110110111000110010111100001100110011111100100100111010001111000011101010111110011001101001001101010001010011001010011111001011011110011101001110100010100100010101101000010010001101100110111101100111011110011111000111111001001111110101100101101010111101000110111111000101011011001011010111011101000100110100000011001011011101111011000000111111100000111110111011101011011101110111000101000110100111110111011111110
Pair \(Z_2\) Length of longest common subsequence
2JGT_1,4KGS_1 209 3
2JGT_1,3DGH_1 144 4
4KGS_1,3DGH_1 213 4

Newick tree

 
[
	4KGS_1:11.51,
	[
		2JGT_1:72,3DGH_1:72
	]:42.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{479 }{\log_{20} 479}-\frac{57}{\log_{20}57})=129.\)
Status Protein1 Protein2 d d1/2
Query variables 2JGT_1 4KGS_1 164 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]