CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2RAV_1 2BQJ_1 3MUO_1 Letter Amino acid
26 17 68 A Alanine
9 10 25 N Asparagine
13 8 48 D Aspartic acid
5 6 0 C Cysteine
12 6 46 S Serine
14 5 37 T Threonine
1 5 14 W Tryptophan
17 8 45 V Valine
26 3 33 E Glutamic acid
18 11 61 G Glycine
9 2 9 M Methionine
11 2 38 P Proline
6 14 41 R Arginine
11 8 64 L Leucine
20 5 25 K Lycine
15 2 29 F Phenylalanine
6 6 33 Q Glutamine
7 1 18 H Histidine
29 5 18 I Isoleucine
6 6 41 Y Tyrosine

2RAV_1|Chain A|Putative uncharacterized protein|Bacteroides thetaiotaomicron (226186)
>2BQJ_1|Chain A|LYSOZYME|Homo sapiens (9606)
>3MUO_1|Chain A|Prolyl endopeptidase|Aeromonas punctata (648)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2RAV , Knot 112 261 0.79 40 170 244
MTKALFFDIAGTLVSFETHRIPSSTIEALEAAHAKGLKIFIATGRPKAIINNLSELQDRNLIDGYITMNGAYCFVGEEVIYKSAIPQEEVKAMAAFCEKKGVPCIFVEEHNISVCQPNEMVKKIFYDFLHVNVIPTVSFEEASNKEVIQMTPFITEEEEKEVLPSIPTCEIGRWYPAFADVTAKGDTKQKGIDEIIRHFGIKLEETMSFGDGGNDISMLRHAAIGVAMGQAKEDVKAAADYVTAPIDEDGISKAMKHFGII
2BQJ , Knot 68 130 0.84 40 108 128
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNAAHLSCSALLQDNIADAVAAAKRVVRDPQGIRAWVAWRNRCQNRDVRQYAQGCGV
3MUO , Knot 268 693 0.84 38 276 639
GSHMSGKARLHYPVTRQGEQVDHYFGQAVADPYRWLEDDRSPETEAWVKAQNAVTQDYLAQIPYRAAIKEKLAASWNYAKEGAPFWWGRYHYFFKNDGLQNQNVLWRQQEGKPAEVFLDPNTLSPDGTTALDQLSFSRDGRILAYSLSLAGSDWREIHLMDVESKQPLETPLKDVKFSGISWLGNEGFFYSSYDKPDGSELSARTDQHKVYFHRLGTAQEDDRLVFGAIPAQHHRYVGATVTEDQRFLLISAANSTSGNRLYVKDLSQENAPLLTVQGDLDADVSLVDNKGSTLYLLTNRDAPNRRLVTVDAANPGPAHWRDLIPERQQVLTVHSGSGYLFAEYMVDATARVEQFDYEGKRVREVALPGLGSVSGFNGYWWDPALYFGFENYAQPPTLYRFEPKSGAISLYRASAAPFKPEDYVSEQRFYQSKDGTRVPLIISYRKGLKLDGSNPTILYGYGGFDVSLTPSFSVSVANWLDLGGVYAVANLRGGGEYGQAWHLAGTQQNKQNVFDDFIAAAEYLKAEGYTRTDRLAIRGGSNGGLLVGAVMTQRPDLMRVALPAVGVLDMLRYHTFTAGTGWAYDYGTSADSEAMFDYLKGYSPLHNVRPGVSYPSTMVTTADHDDRVVPAHSFKFAATLQADNAGPHPQLIRIETNAGHGAGTPVAKLIEQSADIYAFTLYEMGYRELPRQP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2RAV_1)}(2) \setminus P_{f(2BQJ_1)}(2)|=131\), \(|P_{f(2BQJ_1)}(2) \setminus P_{f(2RAV_1)}(2)|=69\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111101110110100001100010110110101101111010101110010010000110101010110011100110001110001011111000011101110000101001001100110011010111010100100001101011100000001110110001101011110101010000011001100111010001011011001011001111111101000101110010111000110011001111
Pair \(Z_2\) Length of longest common subsequence
2RAV_1,2BQJ_1 200 3
2RAV_1,3MUO_1 196 4
2BQJ_1,3MUO_1 216 3

Newick tree

 
[
	2BQJ_1:10.02,
	[
		2RAV_1:98,3MUO_1:98
	]:8.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{391 }{\log_{20} 391}-\frac{130}{\log_{20}130})=79.0\)
Status Protein1 Protein2 d d1/2
Query variables 2RAV_1 2BQJ_1 97 74
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]