CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2CRF_1 4AVX_1 2ZWI_1 Letter Amino acid
10 9 25 T Threonine
9 20 14 V Valine
18 15 28 L Leucine
3 8 21 F Phenylalanine
1 6 15 P Proline
2 10 0 C Cysteine
9 13 18 Q Glutamine
7 10 44 I Isoleucine
10 16 15 A Alanine
3 9 37 N Asparagine
8 11 21 D Aspartic acid
5 6 8 M Methionine
19 10 30 S Serine
2 3 3 W Tryptophan
9 21 25 E Glutamic acid
3 7 4 H Histidine
9 21 28 K Lycine
10 13 8 R Arginine
12 11 14 G Glycine
1 7 15 Y Tyrosine

2CRF_1|Chain A|RAN binding protein 3|Homo sapiens (9606)
>4AVX_1|Chain A|HEPATOCYTE GROWTH FACTOR-REGULATED TYROSINE KINASE SUBSTRATE|HOMO SAPIENS (9606)
>2ZWI_1|Chains A, B|Alpha-/beta-galactoside alpha-2,3-sialyltransferase|Photobacterium phosphoreum (659)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2CRF , Knot 72 150 0.80 40 110 143
GSSGSSGTARKCLLEKVEVITGEEAESNVLQMQCKLFVFDKTSQSWVERGRGLLRLNDMASTDDGTLQSRLVMRTQGSLRLILNTKLWAQMQIDKASEKSIHITAMDTEDQGVKVFLISASSKDTGQLYAALHHRILALRSRVESGPSSG
4AVX , Knot 106 226 0.84 40 163 223
SMGRGSGTFERLLDKATSQLLLETDWESILQICDLIRQGDTQAKYAVNSIKKKVNDKNPHVALYALEVMESVVKNCGQTVHDEVANKQTMEELKDLLKRQVEVNVRNKILYLIQAWAHAFRNEPKYKVVQDTYQIMKVEGHVFPEFKESDAMFAAERAPDWVDAEECHRCRVQFGVMTRKHHCRACGQIFCGKCSSKYSTIPKFGIEKEVRVCEPCYEQLNRKAEG
2ZWI , Knot 151 373 0.80 38 196 347
KNKTIEVYVDRATLPTIQQMTQIINENSNNKKLISWSRYPINDETLLESINGSFFKNRPELIKSLDSMILTNEIKKVIINGNTLWAVDVVNIIKSIEALGKKTEIELNFYDDGSAEYVRLYDFSRLPESEQEYKISLSKDNIQSSINGTQPFDNSIENIYGFSQLYPTTYHMLRADIFETNLPLTSLKRVISNNIKQMKWDYFTTFNSQQKNKFYNFTGFNPEKIKEQYKASPHENFIFIGTNSGTATAEQQIDILTEAKKPDSPIITNSIQGLDLFFKGHPSATYNQQIIDAHNMIEIYNKIPFEALIMTDALPDAVGGMGSSVFFSLPNTVENKFIFYKSDTDIENNALIQVMIELNIVNRNDVKLISDLQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2CRF_1)}(2) \setminus P_{f(4AVX_1)}(2)|=56\), \(|P_{f(4AVX_1)}(2) \setminus P_{f(2CRF_1)}(2)|=109\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100101000110010110100100011010001111000000110010111010011000010100011100010101110001110101001000010101100000110111101000001010111000111100010011001
Pair \(Z_2\) Length of longest common subsequence
2CRF_1,4AVX_1 165 3
2CRF_1,2ZWI_1 172 4
4AVX_1,2ZWI_1 175 3

Newick tree

 
[
	2ZWI_1:88.12,
	[
		2CRF_1:82.5,4AVX_1:82.5
	]:5.62
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{376 }{\log_{20} 376}-\frac{150}{\log_{20}150})=68.1\)
Status Protein1 Protein2 d d1/2
Query variables 2CRF_1 4AVX_1 87 72
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]