CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7DUP_1 3BVZ_1 1KFO_1 Letter Amino acid
28 21 0 D Aspartic acid
32 5 0 Q Glutamine
41 13 0 E Glutamic acid
28 6 0 P Proline
24 14 0 T Threonine
21 16 0 S Serine
25 18 0 Y Tyrosine
3 2 7 C Cysteine
30 12 5 G Glycine
36 9 0 I Isoleucine
39 15 0 L Leucine
14 8 0 M Methionine
21 23 0 N Asparagine
47 30 0 K Lycine
21 10 0 F Phenylalanine
26 16 0 V Valine
45 6 4 A Alanine
23 3 0 R Arginine
17 7 0 H Histidine
13 3 0 W Tryptophan

7DUP_1|Chain A|Beta-N-acetylhexosaminidase|Bacteroides thetaiotaomicron (818)
>3BVZ_1|Chain A|Enterotoxin type C-3|Staphylococcus aureus (1280)
>1KFO_1|Chain A|5'-R(*GP*AP*AP*UP*GP*CP*CP*UP*GP*CP*GP*AP*GP*CP*AP*(5BU)P*CP*CP*C)-3'|
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7DUP , Knot 216 534 0.84 40 257 502
MQEIALTPQPAHLTVKDGRFEFGNQLKAKVTPYQGDSIRMVFESFKKELQEATGIKVSSTQKEAKARIILDLNPQLPAEAYKLNVSKKQVRIEASRPAGFYYALQTLKQLMPRNVMAGVATSDHSQWSLPSVEIEDAPRFEWRGFMLDEGRHFFGKDEIKRVIDMMAIYKMNRFHWHLTEDQGWRIEIKKYPKLTETGAWRNSKVLAYGDVKPDGERYGGFYTQKDIKEIVAYAKKKFIEIIPEIDIPGHSQAAVAAYPEFLACDPRDKHEVWLQQGISTDVINVANPKAMQFAKEVIDELTELFPFNYIHLGGDECPTRKWQKNDECKKLLSEIGSSNFRDLQIYFYKQLKDYIATKPADQQRQLIFWNEVLHGNTSILGNDITIMAWIGANAAAKQAAKQGMNTILSPQIPYYINRKQSKLPTEPMSQGHGTETVEAVYNYQPLKDVDAALQPYYKGVQANFWTEWVTEPSVLEYLMLPRLAAVAEAGWTPQEKRNYEDFKERIRKDAELYDLKGWNYGKHIMKLEHHHHHH
3BVZ , Knot 107 237 0.82 40 160 227
ESQPDPMPDDLHKSSEFTGTMGNMKYLYDDHYVSATKVKSVDKFLAHDLIYNISDKKLKNYDKVKTELLNEDLAKKYKDEVVDVYGSNYYVNCYFSSKDNKWWHGKTCMYGGITKHEGNHFDNGNLQNVLVRVYENKRNTISFEVQTDKKSVTAQELDIKARNFLINKKNLYEFNSSPYETGYIKFIENNGNTFWYDMMPAPGDKFDQSKYLMMYNDNKTVDSKSVKIEVHLTTKNG
1KFO , Knot 10 19 0.51 8 11 16
GAAUGCCUGCGAGCAUCCC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7DUP_1)}(2) \setminus P_{f(3BVZ_1)}(2)|=137\), \(|P_{f(3BVZ_1)}(2) \setminus P_{f(7DUP_1)}(2)|=40\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111010110101001010110010101010010010111001000100101101000000101011101010111010010100001010100111100110010011100111111000000101101010011010101111001001110001001101111001001010100001101010001010001110000111010101010001110000010011101000110111010111000111110101110010000011100110001101101011011001100100111100101110001000100000000110011000100101010001000110011000001111001101000111001011111110111001100110011010110010000001100110010100010110000110010111010001101011001100101100111101111101110100000000100010001010010110010011010000000
Pair \(Z_2\) Length of longest common subsequence
7DUP_1,3BVZ_1 177 4
7DUP_1,1KFO_1 262 2
3BVZ_1,1KFO_1 171 1

Newick tree

 
[
	7DUP_1:11.27,
	[
		3BVZ_1:85.5,1KFO_1:85.5
	]:33.77
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{771 }{\log_{20} 771}-\frac{237}{\log_{20}237})=147.\)
Status Protein1 Protein2 d d1/2
Query variables 7DUP_1 3BVZ_1 187 133.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]