CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3VBN_1 4DQI_1 4URX_1 Letter Amino acid
18 25 9 S Serine
1 2 3 C Cysteine
3 27 12 Q Glutamine
10 61 14 E Glutamic acid
17 40 8 K Lycine
7 16 5 M Methionine
1 5 0 W Tryptophan
19 45 15 V Valine
5 38 11 R Arginine
19 25 15 G Glycine
23 31 11 I Isoleucine
11 66 12 L Leucine
8 21 6 F Phenylalanine
13 60 11 A Alanine
13 19 5 N Asparagine
6 31 14 D Aspartic acid
5 25 3 P Proline
11 19 10 Y Tyrosine
10 12 10 H Histidine
5 24 11 T Threonine

3VBN_1|Chains A, B[auth C], C[auth E]|Galactoside O-acetyltransferase|Bacillus cereus (699184)
>4DQI_1|Chains A, B[auth D]|DNA polymerase I|Geobacillus kaustophilus (235909)
>4URX_1|Chain A[auth R]|GTPASE HRAS|HOMO SAPIENS (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3VBN , Knot 96 205 0.83 40 137 195
MGSHHHHHHENLYFQGHMNSFYSQEELKKIGFLSVGKNVLISKKASIYNPGVISIGNNVRIDDFCILSGKVTIGSYSHIAAYTALYGGEVGIEMYDFANISSRTIVYAAIADFSGNALMGPTIPNQYKNVKTGKVILKKHVIIGAHSIIFPNVVIGEGVAVGAMSMVKESLDDWYIYVGVPVRKIKARKRKIVELENEFLKSMNS
4DQI , Knot 226 592 0.81 40 252 548
MESPSSEEEKPLAKMAFTLADRVTEEMLADKAALVVEVVEENYHDAPIVGIAVVNEHGRFFLRPETALADPQFVAWLGDETKKKSMFDSKRAAVALKWKGIELCGVSFDLLLAAYLLDPAQGVDDVAAAAKMKQYEAVRPDEAVYGKGAKRAVPDEPVLAEHLVRKAAAIWELERPFLDELRRNEQDRLLVELEQPLSSILAEMEFAGVKVDTKRLEQMGKELAEQLGTVEQRIYELAGQEFNINSPKQLGVILFEKLQLPVLKKTKTGYSTSADVLEKLAPYHEIVENILHYRQLGKLQSTYIEGLLKVVRPATKKVHTIFNQALTQTGRLSSTEPNLQNIPIRLEEGRKIRQAFVPSESDWLIFAADYSQIELRVLAHIAEDDNLMEAFRRDLDIHTKTAMDIFQVSEDEVTPNMRRQAKAVNFGIVYGISDYGLAQNLNISRKEAAEFIERYFESFPGVKRYMENIVQEAKQKGYVTTLLHRRRYLPDITSRNFNVRSFAERMAMNTPIQGSAADIIKKAMIDLNARLKEERLQAHLLLQVHDELILEAPKEEMERLCRLVPEVMEQAVTLRVPLKVDYHYGSTWYDAK
4URX , Knot 85 185 0.80 38 141 179
MHHHHHHGGGENLYFQGSHMTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3VBN_1)}(2) \setminus P_{f(4DQI_1)}(2)|=37\), \(|P_{f(4DQI_1)}(2) \setminus P_{f(3VBN_1)}(2)|=152\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000010101010010000010011110110011100010100111101100101001011010101100001110011011011101001101000011011110101011111011000001001011100011111001111011110111111101100010010101111100101000011010001100100
Pair \(Z_2\) Length of longest common subsequence
3VBN_1,4DQI_1 189 4
3VBN_1,4URX_1 144 7
4DQI_1,4URX_1 183 4

Newick tree

 
[
	4DQI_1:99.03,
	[
		3VBN_1:72,4URX_1:72
	]:27.03
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{797 }{\log_{20} 797}-\frac{205}{\log_{20}205})=164.\)
Status Protein1 Protein2 d d1/2
Query variables 3VBN_1 4DQI_1 206 137.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]