CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1XIO_1 1NRP_1 5FMR_1 Letter Amino acid
3 1 1 C Cysteine
7 6 19 E Glutamic acid
4 0 6 H Histidine
23 1 13 I Isoleucine
6 3 22 K Lycine
8 1 15 P Proline
17 2 13 T Threonine
11 0 6 Q Glutamine
17 4 24 G Glycine
33 5 25 L Leucine
16 2 17 F Phenylalanine
13 0 2 W Tryptophan
11 1 7 Y Tyrosine
10 3 21 D Aspartic acid
8 0 7 M Methionine
17 3 19 S Serine
15 0 19 V Valine
17 1 21 A Alanine
15 3 12 R Arginine
10 0 11 N Asparagine

1XIO_1|Chain A|ANABAENA SENSORY RHODOPSIN|Nostoc sp. PCC 7120 (103690)
>1NRP_1|Chain A[auth L]|ALPHA-THROMBIN (SMALL SUBUNIT)|Homo sapiens (9606)
>5FMR_1|Chains A, B, C|INTRAFLAGELLAR TRANSPORT PROTEIN COMPONENT IFT52|CHLAMYDOMONAS REINHARDTII (3055)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1XIO , Knot 120 261 0.85 40 175 255
MNLESLLHWIYVAGMTIGALHFWSLSRNPRGVPQYEYLVAMFIPIWSGLAYMAMAIDQGKVEAAGQIAHYARYIDWMVTTPLLLLSLSWTAMQFIKKDWTLIGFLMSTQIVVITSGLIADLSERDWVRYLWYICGVCAFLIILWGIWNPLRAKTRTQSSELANLYDKLVTYFTVLWIGYPIVWIIGPSGFGWINQTIDTFLFCLLPFFSKVGFSFLDLHGLRNLNDSRQTTGDRFAENTLQFVENITLFANSRRQQSRRRV
1NRP , Knot 24 36 0.79 28 34 34
TFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGR
5FMR , Knot 124 280 0.83 40 182 270
GAASMEEPGAEEVRILFSTAKGESHTHKAGFKQLFRRLRSTYRPDKVDKDDFTLDTLRSAHILVLGGPKEKFTAPEVDMLKKFVKNGGSILILMSEGGEEKAGTNINYFLEQFGMSVNNDAVVRTTHYKYLHPKEVLISDGILNRAVITGAGKSLNSNDDDEFRVSRGPQAFDGTGLEYVFPFGATLSVQKPAVPVLSSGKIAYPMNRPVGAVWAQPGYGRIAVLGSCAMFDDKWLDKEENSKIMDFFFKFLEPHSKIQLNDIDAEEPDVSDLKLLPDTA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1XIO_1)}(2) \setminus P_{f(1NRP_1)}(2)|=154\), \(|P_{f(1NRP_1)}(2) \setminus P_{f(1XIO_1)}(2)|=13\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101001101101111011110110100010111000011111111101110111110010101110110010010111001111101010110110001011111100011110011110100001100110101101111111111011010000000011010001100101111101111111101111100010011101111100111011010110010000000100110001011001011100000000001
Pair \(Z_2\) Length of longest common subsequence
1XIO_1,1NRP_1 167 3
1XIO_1,5FMR_1 181 3
1NRP_1,5FMR_1 178 3

Newick tree

 
[
	5FMR_1:91.74,
	[
		1XIO_1:83.5,1NRP_1:83.5
	]:8.24
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{297 }{\log_{20} 297}-\frac{36}{\log_{20}36})=85.7\)
Status Protein1 Protein2 d d1/2
Query variables 1XIO_1 1NRP_1 110 61.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]