CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3WKD_1 5JAQ_1 4JYI_1 Letter Amino acid
26 30 16 D Aspartic acid
41 22 20 E Glutamic acid
16 8 12 H Histidine
24 26 19 I Isoleucine
41 37 14 A Alanine
19 14 5 N Asparagine
62 33 35 L Leucine
34 26 17 K Lycine
25 21 22 T Threonine
12 5 1 W Tryptophan
13 14 4 Y Tyrosine
13 6 6 C Cysteine
33 40 15 G Glycine
25 8 12 M Methionine
25 14 9 F Phenylalanine
36 16 14 P Proline
32 22 17 S Serine
37 30 6 V Valine
26 13 14 R Arginine
21 21 9 Q Glutamine

3WKD_1|Chain A|Bifunctional epoxide hydrolase 2|Homo sapiens (9606)
>5JAQ_1|Chain A|Enoyl-[acyl-carrier-protein] reductase [NADH]|Yersinia pestis (632)
>4JYI_1|Chains A, B|Retinoic acid receptor beta|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3WKD , Knot 222 561 0.83 40 261 515
MTLRAAVFDLDGVLALPAVFGVLGRTEEALALPRGLLNDAFQKGGPEGATTRLMKGEITLSQWIPLMEENCRKCSETAKVCLPKNFSIKEIFDKAISARKINRPMLQAALMLRKKGFTTAILTNTWLDDRAERDGLAQLMCELKMHFDFLIESCQVGMVKPEPQIYKFLLDTLKASPSEVVFLDDIGANLKPARDLGMVTILVQDTDTALKELEKVTGIQLLNTPAPLPTSCNPSDMSHGYVTVKPRVRLHFVELGSGPAVCLCHGFPESWYSWRYQIPALAQAGYRVLAMDMKGYGESSAPPEIEEYCMEVLCKEMVTFLDKLGLSQAVFIGHDWGGMLVWYMALFYPERVRAVASLNTPFIPANPNMSPLESIKANPVFDYQLYFQEPGVAEAELEQNLSRTFKSLFRASDESVLSMHKVCEAGGLFVNSPEEPSLSRMVTEEEIQFYVQQFKKSGFRGPLNWYRNMERNWKWACKSLGRKILIPALMVTAEKDFVLVPQMSQHMEDWIPHLKRGHIEDCGHWTQMDKPTEVNQILIKWLDSDARNPPVVSKMHHHHHH
5JAQ , Knot 170 406 0.83 40 234 391
RGSHMLEMIIKPRVRGFICVTAHPTGCEANVKKQIDYVTTEGPIANGPKRVLVIGASTGYGLAARITAAFGCGADTLGVFFERPGEEGKPGTSGWYNSAAFHKFAAQKGLYAKSINGDAFSDEIKQLTIDAIKQDLGQVDQVIYSLASPRRTHPKTGEVFNSALKPIGNAVNLRGLDTDKEVIKESVLQPATQSEIDSTVAVMGGEDWQMWIDALLDAGVLAEGAQTTAFTYLGEKITHDIYWNGSIGAAKKDLDQKVLAIRESLAAHGGGDARVSVLKAVVCQASSAIPMMPLYLSLLFKVMKEKGTHEGCIEQVYSLYKDSLCGDSPHMDQEGRLRADYKELDPEVQNQVQQLWDQVTNDNIYQLTDFVGYKSEFLNLFGFGIDGVDYDADVNPDVKIPNLIQG
4JYI , Knot 119 267 0.83 40 172 257
MGSSHHHHHHSSGLVPRGSHMESYEMTAELDDLTEKIRKAHQETFPSLCQLGKYTTNSSADHRVRLDLGLWDKFSELATKCIIKIVEFAKRLPGFTGLTIADQITLLKAACLDILILRICTRYTPEQDTMTFSDGLTLNRTQMHNAGFGPLTDLVFTFANQLLPLEMDDTETGLLSAICLICGDRQDLEEPTKVDKLQEPLLEALKIYIRKRRPSKPHMFPKILMKITDLRSISAKGAERVITLKMEIPGSMPPLIQEMMENSEGHE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3WKD_1)}(2) \setminus P_{f(5JAQ_1)}(2)|=91\), \(|P_{f(5JAQ_1)}(2) \setminus P_{f(3WKD_1)}(2)|=64\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011110101111111111111000011111011100110011101100011010101001111100000000001010110010100110011010010011101111100011001110001100010001110110010101011100001111010101001110010101001111001110101100111101110000011001001011011001111100001001001010101010101101101111010011100100100011111011001111010101000111010000101100011011001110011111001111111011110100101110100111110101011001010111000101001111010100010001001101000011010010011111100100101001100001010100100011011101000100010110001100111111110100011111010001001110100101000101001001001001110110001001111001000000
Pair \(Z_2\) Length of longest common subsequence
3WKD_1,5JAQ_1 155 5
3WKD_1,4JYI_1 169 6
5JAQ_1,4JYI_1 160 5

Newick tree

 
[
	4JYI_1:83.81,
	[
		3WKD_1:77.5,5JAQ_1:77.5
	]:6.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{967 }{\log_{20} 967}-\frac{406}{\log_{20}406})=148.\)
Status Protein1 Protein2 d d1/2
Query variables 3WKD_1 5JAQ_1 188 161
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]