CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7VNT_1 2AIT_1 2JNC_1 Letter Amino acid
37 7 7 A Alanine
19 1 7 N Asparagine
36 4 8 L Leucine
19 0 4 F Phenylalanine
13 5 6 S Serine
5 1 2 W Tryptophan
37 8 2 V Valine
20 5 6 D Aspartic acid
7 3 5 Q Glutamine
12 0 2 M Methionine
17 8 12 T Threonine
12 2 4 H Histidine
38 2 4 I Isoleucine
33 1 4 K Lycine
24 3 7 P Proline
21 3 7 R Arginine
1 4 6 C Cysteine
45 4 8 E Glutamic acid
42 7 9 G Glycine
16 6 9 Y Tyrosine

7VNT_1|Chains A, B|454aa long hypothetical 4-aminobutyrate aminotransferase|Pyrococcus horikoshii (strain ATCC 700860 / DSM 12428 / JCM 9974 / NBRC 100139 / OT-3) (70601)
>2AIT_1|Chain A|TENDAMISTAT|Streptomyces tendae (1932)
>2JNC_1|Chain A|Corticotropin-releasing factor receptor 2|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7VNT , Knot 184 454 0.82 40 228 425
MELKPNVKEIPGPKARKVIEEHHKYMATTTNDPNEYFLVIERAEGVYWIDVDGNVLLDFSSGIGVMNVGLRNPKVIEAIKKQLDLVLHAAGTDYYNPYQVELAKKLVEIAPGDIERKVFLSNSGTEANEAALKIAKWSTNRKMFIAFIGAFHGRTHGTMSLTASKPVQRSRMFPTMPGVVHVPYPNPYRNPWGIDGYENPDELINRVIDYIEEYLFEHYVPAEEVAGIFFEPIQGEGGYVVPPKNFFKELKKLADKHGILLIDDEVQMGMGRTGRMWAIEHFDIVPDIVTVAKALGGGIPIGATIFRADLDFGVSGVHSNTFGGNTVAAAAALAVIEELQNGLIENAQKLEPLFRERLEEMKEKYEIIGDVRGLGLAWGVEFVKDRKTKEYATKERGEIVVEALKRGLALLGCGKSAIRLIPPLIISEEEAKMGLDIFEEAIKVVSERHGYKIH
2AIT , Knot 42 74 0.81 36 64 71
DTTVSEPAPSCVTLYQSWRYSQADNGCAETVTVKVVYEDDTEGLCYAVAPGQITTVGDGYIGSHGHARYLARCL
2JNC , Knot 60 119 0.80 40 101 115
GSGMKETAAAKFERQHMDSPDLGTTLLEQYCHRTTIGNFSGPYTYCNTTLDQIGTCWPQSAPGALVERPCPEYFNGIKYNTTRNAYRECLENGTWASRVNYSHCEPILDDFQRKYDLHY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7VNT_1)}(2) \setminus P_{f(2AIT_1)}(2)|=191\), \(|P_{f(2AIT_1)}(2) \setminus P_{f(7VNT_1)}(2)|=27\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010101001111010011000000110000010001111001011011010101110100111110111001011011000101110111000001001011001101111010001110001001001110110100000111111111010001010101001100001110111110110101000111101000100110011001000110001110011111101101011011110011001001100011111000101111001011110010111011011011111111110110101011101100001110011111111110010011100100101110001001000001110101111111101100000000100001011101100111111010011011111110000101110110011011000010010
Pair \(Z_2\) Length of longest common subsequence
7VNT_1,2AIT_1 218 3
7VNT_1,2JNC_1 213 3
2AIT_1,2JNC_1 135 3

Newick tree

 
[
	7VNT_1:11.16,
	[
		2JNC_1:67.5,2AIT_1:67.5
	]:50.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{528 }{\log_{20} 528}-\frac{74}{\log_{20}74})=136.\)
Status Protein1 Protein2 d d1/2
Query variables 7VNT_1 2AIT_1 171 99.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]