CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2KRX_1 2RDK_1 5LAE_1 Letter Amino acid
5 0 28 P Proline
3 3 7 Y Tyrosine
1 3 23 R Arginine
6 5 22 D Aspartic acid
1 4 8 C Cysteine
8 6 21 Q Glutamine
6 4 18 K Lycine
3 3 28 F Phenylalanine
7 7 14 H Histidine
3 8 17 I Isoleucine
4 9 30 T Threonine
1 1 10 W Tryptophan
4 8 39 A Alanine
2 9 9 N Asparagine
17 9 58 L Leucine
4 12 39 S Serine
4 4 35 V Valine
10 6 41 E Glutamic acid
3 8 43 G Glycine
2 0 7 M Methionine

2KRX_1|Chain A|Asl3597 protein|Nostoc sp. (103690)
>2RDK_1|Chains A, B|Cyanovirin-N|Nostoc ellipsosporum (45916)
>5LAE_1|Chain A|Peroxisomal N(1)-acetyl-spermine/spermidine oxidase,Peroxisomal N(1)-acetyl-spermine/spermidine oxidase|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2KRX , Knot 47 94 0.75 40 72 86
MPDPLMYQQDNFVVLETNQPEQFLTTIELLEKLKGELEKISFSDLPLELQKLDSLPAQAQHLIDTSCELDVGAGKYLQWYAVRLEKLEHHHHHH
2RDK , Knot 55 109 0.79 36 83 103
LGNFSQACYNSAIQGSVLTSTCIRTNGGYNTSSIDLNSVIENVDGSLKWQGSNFIETCRNTQLAGSSELAAECKTRAQQFVSTKINLDDHIAAIDGTLKYELEHHHHHH
5LAE , Knot 199 497 0.82 40 240 463
GGPGPRVLVVGSGIAGLGAAQKLCSHRAAPHLRVLEATASAGGRIRSERCFGGVVELGAHWIHGPSQDNPVFQLAAEFGLLGEKELSEENQLVDTGGHVALPSMIWSSSGTSVSLELMTEMARLFYGLIERTREFLNESETPMASVGEFLKKEISQQVASWTEDDEDTRKRKLAILNTFFNIECCVSGTHSMDLVALAPFGEYTVLPGLDCILAGGYQGLTDRILASLPKDTVAFDKPVKTIHWNGSFQEAAFPGETFPVLVECEDGARLPAHHVIVTVPLGFLKEHQDTFFEPPLPAKKAEAIKKLGFGTNNKIFLEFEEPFWEPDCQFIQVVWEDTSPLQDTALSLQDTWFKKLIGFLVQPSFESSHVLCGFIAGLESEFMETLSDEEVLLSLTQVLRRVTGNPQLPAAKSVRRSQWHSAPYTRGSYSYVAVGSTGDDLDLMAQPLPGLQVLFAGEATHRTFYSTTHGALLSGWREADRLVSLWDSQVEQSRPRL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2KRX_1)}(2) \setminus P_{f(2RDK_1)}(2)|=47\), \(|P_{f(2RDK_1)}(2) \setminus P_{f(2KRX_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101110000011110000100110010110010101001010011101001001110100110000010111100101011010010000000
Pair \(Z_2\) Length of longest common subsequence
2KRX_1,2RDK_1 105 8
2KRX_1,5LAE_1 204 3
2RDK_1,5LAE_1 207 4

Newick tree

 
[
	5LAE_1:11.71,
	[
		2KRX_1:52.5,2RDK_1:52.5
	]:62.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{203 }{\log_{20} 203}-\frac{94}{\log_{20}94})=35.6\)
Status Protein1 Protein2 d d1/2
Query variables 2KRX_1 2RDK_1 43 39
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: