CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7VEJ_1 3TVB_1 3TIJ_1 Letter Amino acid
22 0 17 P Proline
28 0 27 S Serine
11 0 5 W Tryptophan
24 0 11 R Arginine
7 0 2 C Cysteine
12 0 18 M Methionine
35 0 14 K Lycine
11 0 8 Y Tyrosine
24 0 14 N Asparagine
32 4 56 G Glycine
48 0 56 L Leucine
19 0 14 T Threonine
39 0 56 A Alanine
19 0 9 D Aspartic acid
21 0 0 H Histidine
26 0 31 F Phenylalanine
29 0 31 V Valine
17 0 9 Q Glutamine
49 0 16 E Glutamic acid
32 0 30 I Isoleucine

7VEJ_1|Chains A, B|Glycosyltransferase|Phytolacca americana (3527)
>3TVB_1|Chains A, B|DNA (5'-D(*GP*GP*GP*G)-3')|null
>3TIJ_1|Chain A|NupC family protein|Vibrio cholerae (666)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7VEJ , Knot 202 505 0.83 40 260 469
MNHKVHHHHHHLQENLYFQGMGAEPQQLHVVFFPIMAHGHMIPTLDIARLFAARNVRATIITTPLNAHTFTKAIEMGKKNGSPTIHLELFKFPAQDVGLPEGCENLEQALGSSLIEKFFKGVGLLREQLEAYLEKTRPNCLVADMFFPWATDSAAKFNIPRLVFHGTSFFSLCALEVVRLYEPHKNVSSDEELFSLPLFPHDIKMMRLQLPEDVWKHEKAEGKTRLKLIKESELKSYGVIVNSFYELEPNYAEFFRKELGRRAWNIGPVSLCNRSTEDKAQRGKQTSIDEHECLKWLNSKKKNSVIYICFGSTAHQIAPQLYEIAMALEASGQEFIWVVRNNNNNDDDDDDSWLPRGFEQRVEGKGLIIRGWAPQVLILEHEAIGAFVTHCGWNSTLEGITAGVPMVTWPIFAEQFYNEKLVNQILKIGVPVGANKWSRETSIEDVIKKDAIEKALREIMVGDEAEERRSRAKKLKEMAWKAVEEGGSSYSDLSALIEELRGYHA
3TVB , Knot 2 4 0.23 2 1 1
GGGG
3TIJ , Knot 166 424 0.79 38 185 378
GPAVPRMSLFMSLIGMAVLLGIAVLLSSNRKAINLRTVGGAFAIQFSLGAFILYVPWGQELLRGFSDAVSNVINYGNDGTSFLFGGLVSGKMFEVFGGGGFIFAFRVLPTLIFFSALISVLYYLGVMQWVIRILGGGLQKALGTSRAESMSAAANIFVGQTEAPLVVRPFVPKMTQSELFAVMCGGLASIAGGVLAGYASMGVKIEYLVAASFMAAPGGLLFAKLMMPETEKPQDNEDITLDGGDDKPANVIDAAAGGASAGLQLALNVGAMLIAFIGLIALINGMLGGIGGWFGMPELKLEMLLGWLFAPLAFLIGVPWNEATVAGEFIGLKTVANEFVAYSQFAPYLTEAAPVVLSEKTKAIISFALCGFANLSSIAILLGGLGSLAPKRRGDIARMGVKAVIAGTLSNLMAATIAGFFLSF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7VEJ_1)}(2) \setminus P_{f(3TVB_1)}(2)|=259\), \(|P_{f(3TVB_1)}(2) \setminus P_{f(7VEJ_1)}(2)|=0\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000100000010001010111101001011111111010111010110111100101011001101001001101100010101010110111001111010001001110011001101111100010101000010011101111110001101011011101001101011011010010001000001101111100101101011001100001010001011000010001111001001010010110001100110111101000000001001000010000010110000000110101100100111010011111010100111110000000000000111011000101011110111101111000111111000110001011011111101111100100001100110111111100100000100110001100110011110010000001001001110110011000001011100101001
Pair \(Z_2\) Length of longest common subsequence
7VEJ_1,3TVB_1 259 2
7VEJ_1,3TIJ_1 159 4
3TVB_1,3TIJ_1 184 4

Newick tree

 
[
	3TVB_1:12.30,
	[
		7VEJ_1:79.5,3TIJ_1:79.5
	]:41.80
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{509 }{\log_{20} 509}-\frac{4}{\log_{20}4})=160.\)
Status Protein1 Protein2 d d1/2
Query variables 7VEJ_1 3TVB_1 201 101
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]