CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7VAY_1 5GLX_1 7LVN_1 Letter Amino acid
6 0 3 H Histidine
47 6 2 L Leucine
23 26 2 S Serine
27 11 1 T Threonine
3 14 6 C Cysteine
53 3 2 E Glutamic acid
19 2 0 M Methionine
56 11 2 V Valine
55 26 4 G Glycine
24 5 2 K Lycine
17 15 0 Q Glutamine
35 14 2 P Proline
10 10 0 N Asparagine
26 13 3 D Aspartic acid
35 5 0 I Isoleucine
19 15 1 F Phenylalanine
8 6 4 W Tryptophan
21 6 0 Y Tyrosine
56 23 0 A Alanine
38 6 2 R Arginine

7VAY_1|Chains A, B, C|V-type ATP synthase alpha chain|Thermus thermophilus HB8 (300852)
>5GLX_1|Chain A|Glycoside hydrolase family 45 protein|Thielavia terrestris NRRL 8126    (578455)
>7LVN_1|Chain A|Omega-Avsp1a|Avicularia sp. AVIC29FPM (2042175)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7VAY , Knot 224 578 0.82 40 243 526
MIQGVIQKIAGPAVIAKGMLGARMYDICKVGEEGLVGEIIRLDGDTAFVQVYEDTSGLKVGEPVVSTGLPLAVELGPGMLNGIYDGIQRPLERIREKTGIYITRGVVVHALDREKKWAWTPMVKPGDEVRGGMVLGTVPEFGFTHKILVPPDVRGRVKEVKPAGEYTVEEPVVVLEDGTELKMYHTWPVRRARPVQRKLDPNTPFLTGMRILDVLFPVAMGGTAAIPGPFGAGKSVTQQSLAKWSNADVVVYVGCGERGNEMTDVLVEFPELTDPKTGGPLMHRTVLIANTSNMPVAAREASIYVGVTIAEYFRDQGFSVALMADSTSRWAEALREISSRLEEMPAEEGYPPYLAARLAAFYERAGKVITLGGEEGAVTIVGAVSPPGGDMSEPVTQSTLRIVGAFWRLDASLAFRRHFPAINWNGSYSLFTSALDPWYRENVAEDYPELRDAISELLQREAGLQEIVQLVGPDALQDAERLVIEVGRIIREDFLQQNAYHEVDAYCSMKKAYGIMKMILAFYKEAEAAIKRGVSIDEILQLPVLERIGRARYVSEEEFPAYFEEAMKEIQGAFKALA
5GLX , Knot 98 217 0.81 38 142 208
AEFASGSGQSTRYWDCCKPSCAWPGKAAVSQPVYACDANFQRLSDFNVQSGCNGGSAYSCADQTPWAVNDNLAYGFAATSIAGGSESSWCCACYALTFTSGPVAGKTMVVQSTSTGGDLGSNQFDIAMPGGGVGIFNGCSSQFGGLPGAQYGGISSRDQCDSFPAPLKPGCQWRFDWFQNADNPTFTFQQVQCPAEIVARSGCKRNDDSSFPVFTPS
7LVN , Knot 22 37 0.71 30 34 34
GDCHKFLGWCRGEPDPCCEHLSCSRKHGWCVWDWTVX

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7VAY_1)}(2) \setminus P_{f(5GLX_1)}(2)|=155\), \(|P_{f(5GLX_1)}(2) \setminus P_{f(7VAY_1)}(2)|=54\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11011100111111110111110100100110011110110101001110100000110110111001111110111111011001100110010000110100111101100000111011101100101111110110111000111110101010010111000100111110010010100011100101100010100111011011011111111101111111111001000011010010111011010010010011101101001001111100011110000111110010101110110010001101111100000110110010001001110010110111011110001101101110011101111101111010011000010111111010101110001111010100011001101100001100010100110011000111001101111011001001110110110001100010001010001001011101111100010111001101001101111001101001000011101001100101110111
Pair \(Z_2\) Length of longest common subsequence
7VAY_1,5GLX_1 209 4
7VAY_1,7LVN_1 249 3
5GLX_1,7LVN_1 154 2

Newick tree

 
[
	7VAY_1:12.04,
	[
		5GLX_1:77,7LVN_1:77
	]:48.04
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{795 }{\log_{20} 795}-\frac{217}{\log_{20}217})=160.\)
Status Protein1 Protein2 d d1/2
Query variables 7VAY_1 5GLX_1 204 139.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]