CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4AVZ_1 9DIV_1 5DMZ_1 Letter Amino acid
10 2 15 M Methionine
32 1 14 Y Tyrosine
48 52 16 A Alanine
12 6 11 H Histidine
62 4 18 S Serine
7 1 5 W Tryptophan
58 3 20 V Valine
51 1 19 N Asparagine
60 7 25 G Glycine
15 15 19 E Glutamic acid
20 3 20 P Proline
33 5 13 T Threonine
28 6 19 D Aspartic acid
14 0 8 C Cysteine
34 10 18 I Isoleucine
30 14 45 L Leucine
6 17 30 K Lycine
30 1 22 F Phenylalanine
23 1 10 R Arginine
27 3 18 Q Glutamine

4AVZ_1|Chain A|TAIL SPIKE PROTEIN|SALMONELLA PHAGE HK620 (155148)
>9DIV_1|Chains A, B, C, D, E, F, G, H|De novo designed ChuA binding protein C8|synthetic construct (32630)
>5DMZ_1|Chains A, B|Mitotic checkpoint serine/threonine-protein kinase BUB1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4AVZ , Knot 232 600 0.82 40 248 547
DPDQFRAIIESPEGAGHVGYQYRRNTGSTMRMVSDVLDERVSLWDFHCDPSGNVIQPGPNVDSRQYLQAAIDYVSSNGGGTITIPAGYTWYLGSYGVGGIAGHSGIIQLRSNVNLNIEGRIHLSPFFDLKPFQVFVGFDNGDPASSGNLENCHIYGHGVVDFGGYEFGASSQLRNGVAFGRSYNCSVTGITFQNGDVTWAITLGWNGYGSNCYVRKCRFINLVNSSVNADHSTVYVNCPYSGVESCYFSMSSSFARNIACSVQLHQHDTFYRGSTVNGYCRGAYVVMHAAEAAGAGSYAYNMQVENNIAVIYGQFVILGSDVTATVSGHLNDVIVSGNIVSIGERAAFSAPFGAFIDIGPDNSGASNVQDIQRVLVTGNSFYAPANITDSAAITLRANLNGCTFIANNFDCRYMVYNAPGTTSPVVQNLVWDKSNVIGGTHANQRAGQNLFDMQFASVVNSTIEVQLSCEDLSMFSCILFPASCQLSYSKITVDSAWTKSMSNTAVFEGNQQAGANVYVSYPATVNLTSYNTQGAVPFFSTDTNYAWVTSAYSLSINENLDFSPPATYTNKANGQLVGVGYNEIGGVRSVSVRLMLQRQV
9DIV , Knot 61 152 0.67 38 72 125
DDSRISSALQNLWTAAQAAMAAAVKAKAAEIAATKTPEEAKKVAEIAEKAIEIGKLAADAALGIAAAAGGKAVIAKMADGISPEKQAKYLAKFDAEAAAAKEGLAEAEKILKELLKEDPEAAKALTATALAAAAAAIAALLAAGLEHHHHHH
5DMZ , Knot 158 365 0.85 40 219 358
GPMDPSSLGTVDAPNFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLGEGAFAQVYEATQGDLNDAKNKQKFVLKVQKPANPWEFYIGTQLMERLKPSMQHMFMKFYSAHLFQNGSVLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEIIHGDIKPDNFILGNGFLEQDDEDDLSAGLALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLECKRSRK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4AVZ_1)}(2) \setminus P_{f(9DIV_1)}(2)|=205\), \(|P_{f(9DIV_1)}(2) \setminus P_{f(4AVZ_1)}(2)|=29\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010010111001011101100000001001011001100010110100010101101110100000101110010001110101111001011001111111001110100010101010101011101011011111001011001010000101011101110011100010011111000000101101001010111011101010000100001101100010100001010010011000010100011001100101000001001001010001101110110111110010010100011110101111100101010101001110101101100111011111110111000110010010011101001011101000111010101010011100100001100111000111001110000111100100011001101011011000101010000101100111110001000010100110001000111010001110101001101010000001111110000001110010010100010101110000010101111100011110010101110001
Pair \(Z_2\) Length of longest common subsequence
4AVZ_1,9DIV_1 234 4
4AVZ_1,5DMZ_1 165 4
9DIV_1,5DMZ_1 197 3

Newick tree

 
[
	9DIV_1:11.43,
	[
		4AVZ_1:82.5,5DMZ_1:82.5
	]:32.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{752 }{\log_{20} 752}-\frac{152}{\log_{20}152})=169.\)
Status Protein1 Protein2 d d1/2
Query variables 4AVZ_1 9DIV_1 221 134
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]