CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6IPX_1 6UEX_1 5ZXA_1 Letter Amino acid
18 10 63 A Alanine
13 13 11 R Arginine
15 12 4 M Methionine
10 11 21 T Threonine
3 0 4 W Tryptophan
20 9 6 Y Tyrosine
17 19 16 D Aspartic acid
14 10 9 Q Glutamine
7 14 2 H Histidine
24 18 8 I Isoleucine
20 9 7 F Phenylalanine
15 20 21 V Valine
39 17 15 N Asparagine
54 20 16 L Leucine
33 25 8 K Lycine
32 17 19 S Serine
6 0 1 C Cysteine
35 12 9 E Glutamic acid
9 27 21 G Glycine
6 10 64 P Proline

6IPX_1|Chains A, B|AimR transcriptional regulator|Bacillus phage SPbeta (66797)
>6UEX_1|Chain A|Regulatory protein MsrR|Staphylococcus aureus (strain N315) (158879)
>5ZXA_1|Chain A|Alanine and proline-rich secreted protein Apa|Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (83332)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6IPX , Knot 162 390 0.82 40 212 360
MELIRIAMKKDLENDNSLMNKWATVAGLKNPNPLYDFLNHDGKTFNEFSSIVNIVKSQYPDREYELMKDYCLNLDVKTKAARSALEYADANMFFEIEDVLIDSMISCSNMKSKEYGKVYKIHRELSNSVITEFEAVKRLGKLNIKTPEMNSFSRLLLLYHYLSTGNFSPMAQLIKQIDLSEISENMYIRNTYQTRVHVLMSNIKLNENSLEECREYSKKALESTNILRFQVFSYLTIGNSLLFSNYELAQENFLKGLSISVQNENYNMIFQQALCFLNNVWRKENKWINFESDSIMDLQEQAHCFINFNENSKAKEVLDKLDLLVHNDNELAMHYYLKGRLEQNKACFYSSIEYFKKSNDKFLIRLPLLELQKMGENQKLLELLLLLEYA
6UEX , Knot 115 273 0.78 36 166 255
MGSSHHHHHHHHHHSSGLVPRGSHMDGKISILVLGADKAQGGQSRTDSIMVVQYDFINKKMKMMSVMRDIYADIPGYGKHKINSAYALGGPELLRKTLDKNLGINPEYYAVVDFTGFEKMIDELMPEGVPINVEKDMSKNIGVSLKKGNHRLNGKELLGYARFRHDPEGDFGRVRRQQQVMQTLKKEMVNFRTVVKLPKVAGILRGYVNTNIPDSGIFQTGLSFGIRGEKDVKSLTVPIKNSYEDVNTNTDGSALQINKNTNKQAIKDFLDED
5ZXA , Knot 127 325 0.75 40 154 260
MHQVDPNLTRRKGRLAALAIAAMASASLVTVAVPATANADPEPAPPVPTTAASPPSTAAAPPAPATPVAPPPPAAANTPNAQPGDPNAAPPPADPNAPPPPVIAPNAPQPVRIDNPVGGFSFALPAGWVESDAAHFDYGSALLSKTTGDPPFPGQPPPVCNDTRIVLGRLDQKLYASAEATDSKAAARLGSDMGEFYMPYPGTRINQETVSLDANGVSGSASYYEVKFSDPSKPNGQIWTGVIGSPAANAPDAGPPQRWFVVWLGTANNPVDKGAAKALAESIRPLVAPPPAPAPAPAEPAPAPAPAGEVAPTPTTPTPQRTLPA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6IPX_1)}(2) \setminus P_{f(6UEX_1)}(2)|=104\), \(|P_{f(6UEX_1)}(2) \setminus P_{f(6IPX_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101101110001000001100110111100101100110001001001001101100001000001100001010100011001100101011101001110011000010000010100100010001100101100110101001010010011110001001010111011001010010001010000000101110010100001000000000110000110101100101100111000011000110110101000000111001101100110000011010000110100010011010000010011001011100000111000101010000101000100100000011101111010011000011011111001
Pair \(Z_2\) Length of longest common subsequence
6IPX_1,6UEX_1 162 4
6IPX_1,5ZXA_1 190 4
6UEX_1,5ZXA_1 170 4

Newick tree

 
[
	5ZXA_1:92.98,
	[
		6IPX_1:81,6UEX_1:81
	]:11.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{663 }{\log_{20} 663}-\frac{273}{\log_{20}273})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 6IPX_1 6UEX_1 138 115
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]