CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4NPL_1 4FLJ_1 1BQO_1 Letter Amino acid
4 11 6 N Asparagine
14 19 14 P Proline
19 15 9 S Serine
1 3 3 W Tryptophan
8 14 8 Y Tyrosine
25 20 7 R Arginine
5 10 0 C Cysteine
5 11 3 Q Glutamine
15 24 13 G Glycine
20 17 8 I Isoleucine
10 10 10 F Phenylalanine
22 24 8 V Valine
18 22 13 A Alanine
17 18 15 D Aspartic acid
19 21 10 E Glutamic acid
9 20 8 H Histidine
16 14 6 K Lycine
0 9 2 M Methionine
16 25 16 L Leucine
7 19 14 T Threonine

4NPL_1|Chains A, B|RNA demethylase ALKBH5|Danio rerio (7955)
>4FLJ_1|Chain A|Methionine aminopeptidase 1|Homo sapiens (9606)
>1BQO_1|Chains A, B|STROMELYSIN-1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4NPL , Knot 114 250 0.84 38 166 243
DESEYEERRDAEARRVKSGIKQASIFTLEECARIEAKIDEVVAKADKGLYREHTVDRAPLRNKYFFGEGYTYGAQLEKRGPGQERLYSKGEVDDIPDWVHELVIDRLVTHGVIPEGFVNSAVINDYQPGGCIVSHVDPIHIFERPIVSVSFFSDSALCFGCKFLFKPIRVSEPVLHLPVRRGSVTVLSGYAADDITHCIRPQDIKERRAVIILRKTRADAPRLDSNSLSPSIVSPKRRHILKAKRSHRKA
4FLJ , Knot 142 326 0.84 40 207 315
MYRYTGKLRPHYPLMPTRPVPSYIQRPDYADHPLGMSESEQALKGTSQIKLLSSEDIEGMRLVCRLAREVLDVAAGMIKPGVTTEEIDHAVHLACIARNCYPSPLNYYNFPKSCCTSVNEVICHGIPDRRPLQEGDIVNVDITLYRNGYHGDLNETFFVGEVDDGARKLVQTTYECLMQAIDAVKPGVRYRELGNIIQKHAQANGFSVVRSYCGHGIHKLFHTAPNVPHYAKNKAVGVMKSGHVFTIEPMICEGGWQDETWPDGWTAVTRDGKRSAQFEHTLLVTDTGCEILTRRLDSARPHFMSQFEFELVDKLAAALEHHHHHH
1BQO , Knot 83 173 0.82 38 128 167
FRTFPGIPKWRKTHLTYRIVNYTPDLPKDAVDSAVEKALKVWEEVTPLTFSRLYEGEADIMISFAVREHGDFYPFDGPGNVLAHAYAPGPGINGDAHFDDDEQWTKDTTGTNLFLVAAHEIGHSLGLFHSANTEALMYPLYHSLTDLTRFRLSQDDINGIQSLYGPPPDSPET

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4NPL_1)}(2) \setminus P_{f(4FLJ_1)}(2)|=62\), \(|P_{f(4FLJ_1)}(2) \setminus P_{f(4NPL_1)}(2)|=103\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000000000101001001100101101000101010100111010011000001001110000111010001101000111000100010100110110011100110011110111001110000111011001011011001110101100011011001110110100111011100101011010110010001010010000111110000101101000010101101000011010000001
Pair \(Z_2\) Length of longest common subsequence
4NPL_1,4FLJ_1 165 5
4NPL_1,1BQO_1 166 3
4FLJ_1,1BQO_1 169 3

Newick tree

 
[
	1BQO_1:84.16,
	[
		4NPL_1:82.5,4FLJ_1:82.5
	]:1.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{576 }{\log_{20} 576}-\frac{250}{\log_{20}250})=92.3\)
Status Protein1 Protein2 d d1/2
Query variables 4NPL_1 4FLJ_1 114 100.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]