CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7BQX_1 2MSJ_1 4KAX_1 Letter Amino acid
8 0 9 H Histidine
22 0 4 F Phenylalanine
31 4 5 S Serine
15 2 13 D Aspartic acid
4 0 2 C Cysteine
40 3 13 G Glycine
6 1 9 K Lycine
5 5 4 M Methionine
17 5 13 T Threonine
11 3 8 N Asparagine
15 4 13 I Isoleucine
9 1 7 Y Tyrosine
34 10 10 V Valine
35 8 8 A Alanine
27 3 12 R Arginine
10 3 6 Q Glutamine
14 2 7 E Glutamic acid
33 6 17 L Leucine
22 6 5 P Proline
6 0 4 W Tryptophan

7BQX_1|Chains A[auth 5], Q[auth e]|Triplex capsid protein 1|Epstein-Barr virus (strain B95-8) (10377)
>2MSJ_1|Chain A|PROTEIN (ANTIFREEZE PROTEIN TYPE III)|Macrozoarces americanus (8199)
>4KAX_1|Chain A|ADP-ribosylation factor 6|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7BQX , Knot 149 364 0.80 40 195 338
MKVQGSVDRRRLQRRIAGLLPPPARRLNISRGSEFTRDVRGLVEEHAQASSLSAAAVWRAGLLAPGEVAVAGGGSGGGSFSWSGWRPPVFGDFLIHASSFNNAEATGTPLFQFKQSDPFSGVDAVFTPLSLFILMNHGRGVAARVEAGGGLTRMANLLYDSPATLADLVPDFGRLVADRRFHNFITPVGPLVENIKSTYLNKITTVVHGPVVSKAIPRSTVKVTVPQEAFVDLDAWLSGGAGGGGGVCFVGGLGLQPCPADARLYVALTYEEAGPRFTFFQSSRGHCQIMNILRIYYSPSIMHRYAVVQPLHIEELTFGAVACLGTFSATDGWRRSAFNYRGSSLPVVEIDSFYSNVSDWEVIL
2MSJ , Knot 34 66 0.72 32 57 63
ANQASVVANQLIPINTALTLVMMRSEVVTPVGIPAEDIPRLVSMQVSRAVPLGTTLMPDMVKGYAA
4KAX , Knot 79 169 0.80 40 122 156
HHHHHHGSMRILMLGLDAAGKTTILYKLKLGQSVTTIPTVGFNVETVTYKNVKFNVWDVGGLDKIRPLWRHYYTGTQGLIFVVDCADRDRIDEARQELHRIINDREMRDAIILIFANKQDLPDAMKPHEIQEKLGLTRIRDRNWYVQPSCATSGDGLYEGLTWLTSNYN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7BQX_1)}(2) \setminus P_{f(2MSJ_1)}(2)|=158\), \(|P_{f(2MSJ_1)}(2) \setminus P_{f(7BQX_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010101000010001111111110010100100100010111000101001011111011111110111111101110101011011111011101001001010101110100001101101110110111110010111101011111001101100011011011101101110001001101111110010000100100110111100111000101011001110101110111111111011111110101101010111000011101011000010001101101000101100011101101001011111011010100110001100010011110100100010010111
Pair \(Z_2\) Length of longest common subsequence
7BQX_1,2MSJ_1 178 4
7BQX_1,4KAX_1 173 4
2MSJ_1,4KAX_1 141 2

Newick tree

 
[
	7BQX_1:92.80,
	[
		4KAX_1:70.5,2MSJ_1:70.5
	]:22.30
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{430 }{\log_{20} 430}-\frac{66}{\log_{20}66})=112.\)
Status Protein1 Protein2 d d1/2
Query variables 7BQX_1 2MSJ_1 135 79
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]