CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OCU_1 7VHF_1 1DWJ_1 Letter Amino acid
22 9 26 P Proline
31 17 31 N Asparagine
32 24 18 R Arginine
9 2 8 C Cysteine
49 14 15 Q Glutamine
59 15 20 E Glutamic acid
85 22 36 L Leucine
30 31 34 T Threonine
3 2 10 W Tryptophan
52 20 22 A Alanine
29 10 37 Y Tyrosine
30 7 12 H Histidine
50 19 32 I Isoleucine
41 3 26 K Lycine
39 30 37 S Serine
38 25 20 V Valine
41 11 38 D Aspartic acid
25 5 8 M Methionine
28 15 27 F Phenylalanine
34 16 42 G Glycine

7OCU_1|Chains A, B|Mannitol-1-phosphate dehydrogenase/phosphatase MtlD|Acinetobacter baumannii ATCC 19606 = CIP 70.34 = JCM 6841 (575584)
>7VHF_1|Chain A|rRNA N-glycosylase|Escherichia coli (562)
>1DWJ_1|Chain A[auth M]|MYROSINASE MA1|SINAPIS ALBA (3728)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OCU , Knot 279 727 0.84 40 294 671
MVLIFHGKPVHGAIFDMDGTMFDTERLRFQTLQQASQELIGQEFSHEYLMQCLGLSATTAEKLAQRLYGVDVPYKEIRKRADEMELEHIRKHGVPIKKGLVQVLERLRKSGLRMAVATSSRRAIAEEYLINANVYKFFDVITCGDEVEQGKPHPEIFLKAASQLHLDANQCLMFEDSENGLTSAHTSKGLTILLKDIKEPNDEMLEKAHFYYDQMYDFLTDLDQFIPVMDMPEMQEPFPQSLNQLTVGIHGFGAIGGGYIAQILSHWDGYTKPKRIIASTRNSLFREAVNAFGTYSIRYGQFSYDERIENMSIVDSDNEQQMLEMYTHSSLIALCLPEQAIESESKIIAKGLYARFNSQLETCIEPLTFLIILAKVGAKYLVMKHLKEALLELTNDEDVTEHILKEHYFCDTVVNRMVSKLSNQNLYRQLRIKHNFLEQHLEDVEQEDQIEIEDCNKLTPDQLNQASIYVDNMRRNFQPGHILQSMDLILFHSETDMPIYVEKGSPLLEKLRQVVLVDQITDIQLIKNRLWNGVHAMLAWYASLMGYESIGVAMGDHLVKAFAENLIAEVKQGLAIVLPNYAKDLDRMSQSFLDSCEYAFKDPCQRVARDPLRKLNHNERVMASIAVNIRHDLPYKNLLKGAALGYAYAIQFLEIEETKAVEHLQQQIQNLDLSTAQRRQLEAELVQLIQYLFSEQGKQPLDIKSNNTKTTSTQYVAAALEHHHHHH
7VHF , Knot 127 297 0.81 40 180 283
REFTIDFSTQQSYVSSLNSIRTEISTPLEHISQGTTSVSVINHTPPGSYFAVDIRGLDVYQARFDHLRLIIEQNNLYVAGFVNTATNTFYRFSDFTHISVPGVTTVSMTTDSSYTTLQRVAALERSGMQISRHSLVSSYLALMEFSGNTMTRDASRAVLRFVTVTAEALRFRQIQREFRQALSETAPVYTMTPGDVDLTLNWGRISNVLPEYRGEDGVRVGRISFNNISAILGTVAVILNCHHQGARSVRAVNEESQPECQITGDRPVIKINNTLWESNTAAAFLNRKSQFLYTTGK
1DWJ , Knot 204 499 0.84 40 257 477
EITCQENLPFTCGNTDALNSSSFSSDFIFGVASSAYQIEGTIGRGLNIWDGFTHRYPNKSGPDHGNGDTTCDSFSYWQKDIDVLDELNATGYRFSIAWSRIIPRGKRSRGVNEKGIDYYHGLISGLIKKGITPFVTLFHWDLPQTLQDEYEGFLDPQIIDDFKDYADLCFEEFGDSVKYWLTINQLYSVPTRGYGSALDAPGRCSPTVDPSCYAGNSSTEPYIVAHHQLLAHAKVVDLYRKNYTHQGGKIGPTMITRWFLPYNDTDRHSIAATERMKEFFLGWFMGPLTNGTYPQIMIDTVGERLPSFSPEESNLVKGSYDFLGLNYYFTQYAQPSPNPVNSTNHTAMMDAGAKLTYINASGHYIGPLFEKDKADSTDNIYYYPKGIYSVMDYFKNKYYNPLIYVTENGISTPGDENRNQSMLDYTRIDYLCSHLCFLNKVIKEKDVNVKGYLAWALGDNYEFNKGFTVRFGLSYIDWNNVTDRDLKKSGQWYQTFISP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OCU_1)}(2) \setminus P_{f(7VHF_1)}(2)|=141\), \(|P_{f(7VHF_1)}(2) \setminus P_{f(7OCU_1)}(2)|=27\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111101011011110101011000010100100100011100100001100111010010011001011011000100010010100100011110011101100100011011110000011100011010100110110010010010101011101100101010001110000011001000011011100100100011001010000100110010011111011010011100100101110111111110110110010100010011100000110011011100010010100000100101100000001101000001111011001100000111011010100010001011011111101110011100100111010000010001100001000110011001000010001010001100010010000010100000101001001010100100010110110010111100000111010010111001001111001001011000110110111110101110001111110011011100111010011111110010010010001100000110010001100110010000011101110100011000110111110101101101000011001000100101001000010101101100110001001101000000000000111110000000
Pair \(Z_2\) Length of longest common subsequence
7OCU_1,7VHF_1 168 4
7OCU_1,1DWJ_1 137 4
7VHF_1,1DWJ_1 177 4

Newick tree

 
[
	7VHF_1:91.44,
	[
		7OCU_1:68.5,1DWJ_1:68.5
	]:22.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1024 }{\log_{20} 1024}-\frac{297}{\log_{20}297})=194.\)
Status Protein1 Protein2 d d1/2
Query variables 7OCU_1 7VHF_1 251 174
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]