CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7YPM_1 2QGM_1 2XYU_1 Letter Amino acid
35 33 23 I Isoleucine
23 44 20 K Lycine
25 19 13 P Proline
7 1 7 C Cysteine
52 27 22 G Glycine
26 9 4 H Histidine
14 30 10 T Threonine
14 18 7 Q Glutamine
15 19 9 F Phenylalanine
17 26 21 S Serine
25 19 18 D Aspartic acid
35 33 18 E Glutamic acid
10 14 13 M Methionine
5 6 5 W Tryptophan
14 19 10 Y Tyrosine
46 28 16 A Alanine
25 11 19 R Arginine
11 27 10 N Asparagine
34 21 20 V Valine
41 41 20 L Leucine

7YPM_1|Chains A, B, C, D|Aspartate aminotransferase family protein|Caulobacter sp. D5 (357400)
>2QGM_1|Chain A|Succinoglycan biosynthesis protein|Bacillus cereus ATCC 14579 (226900)
>2XYU_1|Chain A|EPHRIN TYPE-A RECEPTOR 4,|MUS MUSCULUS (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7YPM , Knot 189 474 0.82 40 228 443
MHHHHHHTAPLRNHDIAELKRLDLAHHLPAQADHKVIAEQGGSRIITRAEGVYIHDGEGHQILDGMAGLWCVNVGYGREELAKAAYDQMLELPYYNTFFKTATPPPIELAAKIAQKMGGHLSHVFYNSSGSEANDTVFRLVRHFWKLKGEPSRTVFISRWNAYHGSTVAGVSLGGMKHMHKQGDLPIAGVEHVMQPYQFGDGFGEDPAAFRDRAVQAIEDKILEVGPENVAAFIGEPVQGAGGVIIPPDGYWPAVEALCRKYGILLVCDEVICGFGRLGQWFGHQHYGIKPDLIAMAKGLSSGYLPISAVGVADHIVAELREKGGDFIHGFTYSGHPTAAAVALKNIEIMEREGLVERTRDETGPYLAQALASLNDHPLVGEVRSLGLIGAVEIVREKGTNHRFLDKEGEAGPIVRDLCIKNGLMVRAIRDSIVCCPPLIITKAQIDELVGIIRKSLDEAEPVLRALKPKEGED
2QGM , Knot 178 445 0.81 40 218 420
MNKKRMIAMVSTALLVTGCAEVGNAQTVAVENSGQSVQKNIVKSIQSQANPLKTIEPSKPFEDLKPLKKMIGNAQYVGLGENTHGSSEIFTMKFRLVKYLVTEMGFTNFAMEEDWGNGLKLNEYIQTGKGNPREFLKLLYPTDEIIAMIEWMKDYNADPSNKKKIQFIGLDLKALDQGSFNKVIDYVRLHRPDLLAEVEENYKELSSFTGSIQEYMKLTPKLKEKFKANAERVARLLKDENEQANTEIIPSEYIWAKATASAIEKFTTMLLPNDYPSIIKLHEQYLADHAMWAQETFGGKTMVWAHNIHIAKGIIDEKLYPYVAGQFLKERLDNNYVTIGSTTTEGNFTLYSEYNPSTGGKITTDTIPQDVKSFNYTLGKVPYKMFLLDNRHLKGQAEKWVKAKRPLLSIGGQILPNSSVYFDTSLLEQFDIIFHIRKTSPSHIK
2XYU , Knot 126 285 0.83 40 188 278
FAKEIDASCIKIEKVIGVGEFGEVCSGRLKVPGKREICVAIKTLKAGYTDKQRRDFLSEASIMGQFDHPNIIHLEGVVTKCKPVMIITEYMENGSLDAFLRKNDGRFTVIQLVGMLRGIGSGMKYLSDMSYVHRDLAARNILVNSNLVCKVSDFGMSRVLEDDPEAAYTTRGGKIPIRWTAPEAIAYRKFTSASDVWSYGIVMWEVMSYGERPYWDMSNQDVIKAIEEGYRLPPPMDCPIALHQLMLDCWQKERSDRPKFGQIVNMLDKLIRNPNSLKRTGSESS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7YPM_1)}(2) \setminus P_{f(2QGM_1)}(2)|=92\), \(|P_{f(2QGM_1)}(2) \setminus P_{f(7YPM_1)}(2)|=82\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000001110000110100101100111010001110011001100101101001010011011111101011010001101100011011000011001011110111011001110100110000100100011011001101010100011100101001001111011110010001011111100110100110111001111000110110001101110011111101101111111110101111011000011111000110111011011100001101011111011001011101111100111010001101101100010101111110010110001110000000110110111010001111010011111110110001000011000101111100101001111011000110011111001010011111000100101110110100100
Pair \(Z_2\) Length of longest common subsequence
7YPM_1,2QGM_1 174 4
7YPM_1,2XYU_1 160 3
2QGM_1,2XYU_1 178 4

Newick tree

 
[
	2QGM_1:90.51,
	[
		7YPM_1:80,2XYU_1:80
	]:10.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{919 }{\log_{20} 919}-\frac{445}{\log_{20}445})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 7YPM_1 2QGM_1 159 154.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]