CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ALJ_1 3CUX_1 3ETI_1 Letter Amino acid
24 16 1 M Methionine
41 43 11 A Alanine
10 15 2 H Histidine
33 36 14 K Lycine
62 50 20 L Leucine
34 22 5 P Proline
25 27 7 T Threonine
13 13 5 Y Tyrosine
26 32 7 R Arginine
21 15 3 Q Glutamine
24 35 9 I Isoleucine
35 27 20 V Valine
26 22 11 D Aspartic acid
31 20 8 S Serine
12 9 0 W Tryptophan
34 33 14 G Glycine
25 26 9 F Phenylalanine
19 30 11 N Asparagine
13 2 3 C Cysteine
41 55 8 E Glutamic acid

5ALJ_1|Chain A|BIFUNCTIONAL EPOXIDE HYDROLASE 2|HOMO SAPIENS (9606)
>3CUX_1|Chain A|Malate synthase|Bacillus anthracis (1392)
>3ETI_1|Chains A, B, C, D, E, F, G, H|macro domain of Non-structural protein 3|Feline infectious peritonitis virus (33734)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ALJ , Knot 219 549 0.83 40 258 507
GMTLRAAVFDLDGVLALPAVFGVLGRTEEALALPRGLLNDAFQKGGPEGATTRLMKGEITLSQWIPLMEENCRKCSETAKVCLPKNFSIKEIFDKAISARKINRPMLQAALMLRKKGFTTAILTNTWLDDRAERDGLAQLMCELKMHFDFLIESCQVGMVKPEPQIYKFLLDTLKASPSEVVFLDDIGANLKPARDLGMVTILVQDTDTALKELEKVTGIQLLNTPAPLPTSCNPSDMSHGYVTVKPRVRLHFVELGSGPAVCLCHGFPESWYSWRYQIPALAQAGYRVLAMDMKGYGESSAPPEIEEYCMEVLCKEMVTFLDKLGLSQAVFIGHDWGGMLVWYMALFYPERVRAVASLNTPFIPANPNMSPLESIKANPVFDYQLYFQEPGVAEAELEQNLSRTFKSLFRASDESVLSMHKVCEAGGLFVNSPEEPSLSRMVTEEEIQFYVQQFKKSGFRGPLNWYRNMERNWKWACKSLGRKILIPALMVTAEKDFVLVPQMSQHMEDWIPHLKRGHIEDCGHWTQMDKPTEVNQILIKWLDSDARN
3CUX , Knot 216 528 0.85 40 257 502
STQTSRVTLVGEMLPAYNEILTPEALSFLKELHENFNERRIELLQKRMKKQQKIDAGEFPKFLEETKRIREADWTIAKLPKDLEDRRVEITGPVDRKMVINALNSGAHLFMADFEDSNSPTWENAIEGQINLRDAVKGTISHKNENGKEYRLNSKTAVLIVRPRGWHLEEKHMQVDGKNMSGSLVDFGLYFFHNAKALLEKGSGPYFYLPKMESYLEARLWNDVFVFAQKYIGIPNGTIKATVLLETIHASFEMDEILYELKDHSAGLNCGRWDYIFSFLKAFRNHNEFLLPDRAQVTMTAPFMRAYSLKVIQTCHRRNAPAIGGMAAQIPIKNNPEANEAAFEKVRADKEREALDGHDGTWVAHPGLVPVAMEVFNHIMKTPNQIFRKREEIHVTEKDLLEVPVGTITEEGLRMNISVGIQYIASWLSGRGAAPIYNLMEDAATAEISRAQVWQWIRHEGGKLNDGRNITLELMEELKEEELAKIEREIGKEAFKKGRFQEATTLFTNLVRNDEFVPFLTLPGYEIL
3ETI , Knot 78 168 0.79 38 117 158
DLILPFYKAGKVSFYQGDLDVLINFLEPDVLVNAANGDLRHVGGVARAIDVFTGGKLTKRSKEYLKSSKAIAPGNAVLFENVLEHLSVMNAVGPRNGDSRVEGKLCNVYKAIAKCDGKILTPLISVGIFKVKLEVSLQCLLKTVTDRDLNVFVYTDQERVTIENFFNG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ALJ_1)}(2) \setminus P_{f(3CUX_1)}(2)|=75\), \(|P_{f(3CUX_1)}(2) \setminus P_{f(5ALJ_1)}(2)|=74\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101111010111111111111100001111101110011001110110001101010100111110000000000101011001010011001101001001110111110001100111000110001000111011001010101110000111101010100111001010100111100111010110011110111000001100100101101100111110000100100101010101010110110111101001110010010001111101100111101010100011101000010110001101100111001111100111111101111010010111010011111010101100101011100010100111101010001000100110100001101001001111110010010100110000101010010001101110100010001011000110011111111010001111101000100111010010100010100100100100111011000100
Pair \(Z_2\) Length of longest common subsequence
5ALJ_1,3CUX_1 149 4
5ALJ_1,3ETI_1 203 3
3CUX_1,3ETI_1 192 5

Newick tree

 
[
	3ETI_1:10.65,
	[
		5ALJ_1:74.5,3CUX_1:74.5
	]:31.15
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1077 }{\log_{20} 1077}-\frac{528}{\log_{20}528})=142.\)
Status Protein1 Protein2 d d1/2
Query variables 5ALJ_1 3CUX_1 180 177
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]