CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2BTT_1 3MUH_1 8CLA_1 Letter Amino acid
2 5 27 I Isoleucine
7 15 19 K Lycine
1 0 9 M Methionine
7 14 20 P Proline
4 16 36 A Alanine
1 7 16 N Asparagine
0 3 13 H Histidine
8 35 22 S Serine
4 21 29 T Threonine
2 3 4 W Tryptophan
4 18 32 V Valine
2 5 21 R Arginine
0 5 12 C Cysteine
4 10 30 E Glutamic acid
6 6 26 D Aspartic acid
3 9 18 Y Tyrosine
3 4 20 F Phenylalanine
0 11 16 Q Glutamine
6 17 33 G Glycine
5 12 32 L Leucine

2BTT_1|Chain A|MYOSIN-3 ISOFORM|SACCHAROMYCES CEREVISIAE (4932)
>3MUH_1|Chain A[auth L]|Antibody PG9 light chain|Homo sapiens (9606)
>8CLA_1|Chain A|Tubulin alpha-1B chain|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2BTT , Knot 38 69 0.77 34 60 67
KDPKFEAAYDFPGSGSSSELPLKKGDIVFISRDEPSGWSLAKLLDGSKEGWVPTAYMTPYKDTRNTVPV
3MUH , Knot 96 216 0.79 38 142 207
QSALTQPASVSGSPGQSITISCNGTSNDVGGYESVSWYQQHPGKAPKVVIYDVSKRPSGVSNRFSGSKSGNTASLTISGLQAEDEGDYYCKSLTSTRRRVFGTGTKLTVLGQPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADSSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKSHKSYSCQVTHEGSTVEKTVAPTECS
8CLA , Knot 187 435 0.87 40 256 416
RECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITASLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLSVAEITNACFEPANQMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRSIQFVDWCPTGFKVGINYQPPTVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSEAREDMAALEKDYEEVG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2BTT_1)}(2) \setminus P_{f(3MUH_1)}(2)|=33\), \(|P_{f(3MUH_1)}(2) \setminus P_{f(2BTT_1)}(2)|=115\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001010110011101000011100101111000010110110110100011110101010000000111
Pair \(Z_2\) Length of longest common subsequence
2BTT_1,3MUH_1 148 3
2BTT_1,8CLA_1 226 3
3MUH_1,8CLA_1 176 4

Newick tree

 
[
	8CLA_1:10.85,
	[
		2BTT_1:74,3MUH_1:74
	]:34.85
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{285 }{\log_{20} 285}-\frac{69}{\log_{20}69})=69.5\)
Status Protein1 Protein2 d d1/2
Query variables 2BTT_1 3MUH_1 88 57.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: