CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GTW_1 7KSC_1 7AIX_1 Letter Amino acid
13 13 25 A Alanine
2 8 9 C Cysteine
14 8 48 G Glycine
2 0 14 W Tryptophan
19 6 45 V Valine
14 2 36 N Asparagine
7 3 27 D Aspartic acid
5 2 17 Q Glutamine
2 0 35 E Glutamic acid
14 11 52 S Serine
14 7 24 T Threonine
1 0 15 H Histidine
8 6 25 I Isoleucine
10 5 60 L Leucine
4 4 24 R Arginine
4 6 26 K Lycine
0 1 16 M Methionine
6 0 37 F Phenylalanine
10 8 32 P Proline
10 3 19 Y Tyrosine

6GTW_1|Chains A, B, C, D|FimH protein|Escherichia coli F18+ (488477)
>7KSC_1|Chains A, B, C, D|Non-specific lipid-transfer protein|Punica granatum (22663)
>7AIX_1|Chain A|Acetylcholinesterase|Tetronarce californica (7787)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GTW , Knot 73 159 0.77 38 115 154
FACKTANGTAIPIGGGSANVYVNLAPAVNVGQNLVVDLSTQIFCHNDYPETITDYVTLQRGSAYGGVLSSFSGTVKYNGSSYPFPTTSETPRVVYNSRTDKPWPVALYLTPVSSAGGVAIKAGSLIAVLILRQTNNYNSDDFQFVWNIYANNDVVVPTG
7KSC , Knot 45 93 0.73 32 74 88
AVTCGQVASSLAPCIPYARSAGGAVPPACCSGIKTLDGMARTTPDRQATCKCLKSASTSISGINYGLVASLPAKCGVNIPYKISPSTDCARVK
7AIX , Knot 233 586 0.84 40 274 557
MNLLVTSSLGVLLHLVVLCQADDHSELLVNTKSGKVMGTRVPVLSSHISAFLGIPFAEPPVGNMRFRRPEPKKPWSGVWNASTYPNNCQQYVDEQFPGFSGSEMWNPNREMSEDCLYLNIWVPSPRPKSTTVMVWIYGGGFYSGSSTLDVYNGKYLAYTEEVVLVSLSYRVGAFGFLALHGSQEAPGNVGLLDQRMALQWVHDNIQFFGGDPKTVTIFGESAGGASVGMHILSPGSRDLFRRAILQSGSPNCPWASVSVAEGRRRAVELGRNLNCNLNSDEELIHCLREKKPQELIDVEWNVLPFDSIFRFSFVPVIDGEFFPTSLESMLNSGNFKKTQILLGVNKDEGSFFLLYGAPGFSKDSESKISREDFMSGVKLSVPHANDLGLDAVTLQYTDWMDDNNGIKNRDGLDDIVGDHNVICPLMHFVNKYTKFGNGTYLYFFNHRASNLVWPEWMGVIHGYEIEFVFGLPLVKELNYTAEEEALSRRIMHYWATFAKTGNPNEPHSQESKWPLFTTKEQKFIDLNTEPMKVHQRLRVQMCVFWNQFLPKLLNATACDGELSSSGTSSSKGIIFYVLFSILYLIF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GTW_1)}(2) \setminus P_{f(7KSC_1)}(2)|=79\), \(|P_{f(7KSC_1)}(2) \setminus P_{f(6GTW_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001010111111101010101111101100111010001100000100100010100101011110010101000100011100000101100000001111110101100111111011011111110000000000101110101000111101
Pair \(Z_2\) Length of longest common subsequence
6GTW_1,7KSC_1 117 4
6GTW_1,7AIX_1 205 4
7KSC_1,7AIX_1 248 5

Newick tree

 
[
	7AIX_1:12.94,
	[
		6GTW_1:58.5,7KSC_1:58.5
	]:68.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{252 }{\log_{20} 252}-\frac{93}{\log_{20}93})=51.0\)
Status Protein1 Protein2 d d1/2
Query variables 6GTW_1 7KSC_1 63 48.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]