CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3WZX_1 2JKA_1 6VCT_1 Letter Amino acid
13 25 3 H Histidine
21 38 8 I Isoleucine
14 28 4 F Phenylalanine
24 28 5 R Arginine
20 54 7 D Aspartic acid
6 6 2 C Cysteine
7 42 4 Y Tyrosine
22 38 9 V Valine
9 22 3 Q Glutamine
30 42 7 E Glutamic acid
11 23 1 M Methionine
37 45 6 L Leucine
16 29 6 P Proline
2 17 1 W Tryptophan
17 54 9 K Lycine
25 37 5 S Serine
12 54 7 T Threonine
42 52 4 A Alanine
10 45 4 N Asparagine
37 48 16 G Glycine

3WZX_1|Chain A|3-isopropylmalate dehydrogenase|Shewanella oneidensis MR-1 (211586)
>2JKA_1|Chains A, B|ALPHA-GLUCOSIDASE (ALPHA-GLUCOSIDASE SUSB)|BACTEROIDES THETAIOTAOMICRON (226186)
>6VCT_1|Chain A|Peptidylprolyl isomerase|Mucor circinelloides (36080)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3WZX , Knot 154 375 0.81 40 190 349
MRGSHHHHHHGSSYQIAVLAGDGIGPEVMAEARKVLKAVEARFGLNIEYTEYDVGGIAIDNHGCPLPEATLKGCEAADAILFGSVGGPKWEKLPPNEQPERGALLPLRGHFELFCNLRPAKLHDGLEHMSPLRSDISARGFDVLCVRELTGGIYFGKPKGRQGEGESEEAFDTMRYSRREISRIARIAFEAARGRRKKVTSVDKANVLACSVLWRQVVEEVAVDFPDVELEHIYIDNATMQLLRRPDEFDVMLCSNLFGDILSDEIAMLTGSMGLLASASMNSTGFGLFEPAGGSAPDIAGKGIANPIAQILSAALMLRHSLKQEEAASAIERAVTKALNSGYLTGELLSSDQRHKAKTTVQMGDFIADAVKAGV
2JKA , Knot 280 727 0.84 40 303 673
MGSSHHHHHHQQKLTSPDNNLVMTFQVDSKGAPTYELTYKNKVVIKPSTLGLELKKEDNTRTDFDWVDRRDLTKLDSKTNLYDGFEVKDTQTATFDETWQPVWGEEKEIRNHYNELAVTLYQPMNDRSIVIRFRLFNDGLGFRYEFPQQKSLNYFVIKEEHSQFGMNGDHIAFWIPGDYDTQEYDYTISRLSEIRGLMKEAITPNSSQTPFSQTGVQTALMMKTDDGLYINLHEAALVDYSCMHLNLDDKNMVFESWLTPDAKGDKGYMQTPCNTPWRTIIVSDDARNILASRITLNLNEPCKIADAASWVKPVKYIGVWWDMITGKGSWAYTDELTSVKLGETDYSKTKPNGKHSANTANVKRYIDFAAAHGFDAVLVEGWNEGWEDWFGNSKDYVFDFVTPYPDFDVKEIHRYAARKGIKMMMHHETSASVRNYERHMDKAYQFMADNGYNSVKSGYVGNIIPRGEHHYGQWMNNHYLYAVKKAADYKIMVNAHEATRPTGICRTYPNLIGNESARGTEYESFGGNKVYHTTILPFTRLVGGPMDYTPGIFETHCNKMNPANNSQVRSTIARQLALYVTMYSPLQMAADIPENYERFMDAFQFIKDVALDWDETNYLEAEPGEYITIARKAKDTDDWYVGCTAGENGHTSKLVFDFLTPGKQYIATVYADAKDADWKENPQAYTIKKGILTNKSKLNLHAANGGGYAISIKEVKDKSEAKGLKRL
6VCT , Knot 54 111 0.76 40 85 107
GSHMGVTVERIAPGDGKNFPKKGDKVTIHYVGTLENGDKFDSSRDRGSPFQCTIGVGQVIKGWDEGVTQLSVGEKARLICTHDYAYGERGYPGLIPPKATLNFEVELIKIN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3WZX_1)}(2) \setminus P_{f(2JKA_1)}(2)|=35\), \(|P_{f(2JKA_1)}(2) \setminus P_{f(3WZX_1)}(2)|=148\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000000010000111111011110111010011011010111010000001111110001011101010100110111110111101001110001001111110101011001011010011001011000101011011010010111011010100101000011001000000100110111011010000100100101110011100110011101101010010100101011001001011100011101100011110101111101010001111101111011011101110111011011111000100001101100110011001010101100000001000101101110110111
Pair \(Z_2\) Length of longest common subsequence
3WZX_1,2JKA_1 183 7
3WZX_1,6VCT_1 177 3
2JKA_1,6VCT_1 230 4

Newick tree

 
[
	2JKA_1:10.56,
	[
		3WZX_1:88.5,6VCT_1:88.5
	]:20.06
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1102 }{\log_{20} 1102}-\frac{375}{\log_{20}375})=191.\)
Status Protein1 Protein2 d d1/2
Query variables 3WZX_1 2JKA_1 248 183.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]