CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5TTY_1 7MLX_1 4XDR_1 Letter Amino acid
19 7 23 D Aspartic acid
4 1 2 M Methionine
25 6 17 F Phenylalanine
10 19 11 T Threonine
21 38 18 S Serine
15 21 37 V Valine
15 16 51 A Alanine
19 6 6 N Asparagine
18 8 11 Q Glutamine
25 12 10 K Lycine
9 12 13 P Proline
10 9 29 R Arginine
7 4 8 H Histidine
15 13 7 Y Tyrosine
3 4 2 W Tryptophan
2 5 6 C Cysteine
20 7 15 E Glutamic acid
11 23 26 G Glycine
18 6 11 I Isoleucine
34 16 37 L Leucine

5TTY_1|Chain A|PagF prenyltransferase|Planktothrix agardhii NIES-596 (443922)
>7MLX_1|Chain A[auth H]|BL3-6 Fab Heavy Chain|Homo sapiens (9606)
>4XDR_1|Chain A|FAD:protein FMN transferase|Treponema pallidum (strain Nichols) (243276)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5TTY , Knot 131 300 0.83 40 180 290
MIVNVIQKDRLKEQKLQFIRNHQQAFDVEPIYPLPLFEDFVTSIEGDCSLEASCKIESDKLIASRFLLFFEDKTQEWQKYLHQSLTFFGLVENRVGVKINYSLLQQFLGSSFDFSKVTVLSAGIDLRNNLAESSLKMHIRIKDYPEKLDKAFALSDGAADGNYLKDFVNLIGFDFYFNGKSEIEIYAEVQEDDFFKPEINNLVWQHFPKTALQPLKASSLFFTGLSKANNNPVLYYHLKNRQDLTNYFKLNDTAQRVHSFYQHQDILPYMWVGTAQKELEKTRIENIRLYYYKSFKMESN
7MLX , Knot 101 233 0.78 40 141 224
EISEVQLVESGGGLVQPGGSLRLSCAASGFYISYSSIHWVRQAPGKGLEWVASISPYSGSTYYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARQGYRRRSGRGFDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHT
4XDR , Knot 137 340 0.78 40 174 311
CGGRARVREYSRAELVIGTLCRVRVYSKRPAAEVHAALEEVFTLLQQQEMVLSANRDDSALAALNAQAGSAPVVVDRSLYALLERALFFAEKSGGAFNPALGAVVKLWNIGFDRAAVPDPDALKEALTRCDFRQVHLRAGVSVGAPHTVQLAQAGMQLDLGAIAKGFLADKIVQLLTAHALDSALVDLGGNIFALGLKYGDVRSAAAQRLEWNVGIRDPHGTGQKPALVVSVRDCSVVTSGAYERFFERDGVRYHHIIDPVTGFPAHTDVDSVSIFAPRSTDAAALATACFVLGYEKSCALLREFPGVDALFIFPDKRVRASAGIVDRVRVLDARFVLER

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5TTY_1)}(2) \setminus P_{f(7MLX_1)}(2)|=101\), \(|P_{f(7MLX_1)}(2) \setminus P_{f(5TTY_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111011000010000101100000110101101111100110010100010100010000111001111100000010001000101111100011101000110011100101001011011101000110001010101000100100111100111010010011011110101010001010101000011010100111001100110110100111011001000111000100000100010100010010010000011101111010001000010010100000101000
Pair \(Z_2\) Length of longest common subsequence
5TTY_1,7MLX_1 163 3
5TTY_1,4XDR_1 146 4
7MLX_1,4XDR_1 157 4

Newick tree

 
[
	7MLX_1:82.21,
	[
		5TTY_1:73,4XDR_1:73
	]:9.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{533 }{\log_{20} 533}-\frac{233}{\log_{20}233})=85.8\)
Status Protein1 Protein2 d d1/2
Query variables 5TTY_1 7MLX_1 112 98
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]