CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2BNI_1 3MYB_1 1REG_1 Letter Amino acid
1 6 5 N Asparagine
1 6 1 C Cysteine
1 8 5 Q Glutamine
2 9 3 H Histidine
5 10 14 K Lycine
0 10 5 F Phenylalanine
0 4 4 Y Tyrosine
3 18 7 R Arginine
1 13 6 M Methionine
0 23 8 V Valine
5 20 11 E Glutamic acid
2 20 6 G Glycine
5 26 12 L Leucine
0 1 3 W Tryptophan
1 47 2 A Alanine
1 14 7 D Aspartic acid
4 8 10 I Isoleucine
0 13 3 P Proline
1 15 3 S Serine
0 15 7 T Threonine

2BNI_1|Chains A, B, C, D|GENERAL CONTROL PROTEIN GCN4|SACCHAROMYCES CEREVISIAE (4932)
>3MYB_1|Chains A, B, C|Enoyl-CoA hydratase|Mycobacterium smegmatis (246196)
>1REG_1|Chains A[auth X], B[auth Y]|T4 REGA|Enterobacteria phage T4 (10665)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2BNI , Knot 24 34 0.83 30 32 32
XRMKQIEDKLEEILSKGHHICNELARIKKLLGER
3MYB , Knot 122 286 0.80 40 170 270
MAHHHHHHMGTLEAQTQGPGSMSEPLLLQDRDERGVVTLTLNRPQAFNALSEAMLAALGEAFGTLAEDESVRAVVLAASGKAFCAGHDLKEMRAEPSREYYEKLFARCTDVMLAIQRLPAPVIARVHGIATAAGCQLVAMCDLAVATRDARFAVSGINVGLFCSTPGVALSRNVGRKAAFEMLVTGEFVSADDAKGLGLVNRVVAPKALDDEIEAMVSKIVAKPRAAVAMGKALFYRQIETDIESAYADAGTTMACNMMDPSALEGVSAFLEKRRPEWHTPQPSTA
1REG , Knot 63 122 0.82 40 100 119
MIEITLKKPEDFLKVKETLTRMGIANNKDKVLYQSCHILQKKGLYYIVHFKEMLRMDGRQVEMTEEDEVRRDSIAWLLEDWGLIEIVPGQRTFMKDLTNNFRVISFKQKHEWKLVPKYTIGN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2BNI_1)}(2) \setminus P_{f(3MYB_1)}(2)|=14\), \(|P_{f(3MYB_1)}(2) \setminus P_{f(2BNI_1)}(2)|=152\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010010001001100100100011010011100
Pair \(Z_2\) Length of longest common subsequence
2BNI_1,3MYB_1 166 3
2BNI_1,1REG_1 102 2
3MYB_1,1REG_1 186 3

Newick tree

 
[
	3MYB_1:97.42,
	[
		2BNI_1:51,1REG_1:51
	]:46.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{320 }{\log_{20} 320}-\frac{34}{\log_{20}34})=93.3\)
Status Protein1 Protein2 d d1/2
Query variables 2BNI_1 3MYB_1 113 63
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: