CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4UVA_1 5MDG_1 6HCB_1 Letter Amino acid
56 2 18 S Serine
22 2 13 Y Tyrosine
9 2 4 C Cysteine
70 6 20 E Glutamic acid
51 5 28 K Lycine
15 3 10 M Methionine
80 7 20 L Leucine
60 4 18 V Valine
35 4 13 D Aspartic acid
46 5 3 Q Glutamine
73 4 26 G Glycine
32 3 16 I Isoleucine
27 2 7 F Phenylalanine
66 4 7 P Proline
44 7 17 T Threonine
9 1 4 W Tryptophan
88 6 19 A Alanine
43 4 10 R Arginine
31 3 9 N Asparagine
15 0 2 H Histidine

4UVA_1|Chain A|LYSINE-SPECIFIC HISTONE DEMETHYLASE 1A|HOMO SAPIENS (9606)
>5MDG_1|Chains A, B, C, D, H[auth E], I[auth F], J[auth G], K[auth W]|Gag protein|Human immunodeficiency virus 1 (11676)
>6HCB_1|Chains A, B|Glutamate receptor 2|Rattus norvegicus (10116)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4UVA , Knot 321 872 0.83 40 294 772
MLSGKKAAAAAAAAAAAATGTEAGPGTAGGSENGSEVAAQPAGLSGPAEVGPGAVGERTPRKKEPPRASPPGGLAEPPGSAGPQAGPTVVPGSATPMETGIAETPEGRRTSRRKRAKVEYREMDESLANLSEDEYYSEEERNAKAEKEKKLPPPPPQAPPEEENESEPEEPSGQAGGLQDDSSGGYGDGQPSGVEGAAFQSRLPHDRMTSQEAACFPDIISGPQQTQKVFLFIRNRTLQLWLDNPKIQLTFEATLQQLEAPYNSDTVLVHRVHSYLERHGLINFGIYKRIKPLPTKKTGKVIIIGSGVSGLAAARQLQSFGMDVTLLEARDRVGGRVATFRKGNYVADLGAMVVTGLGGNPMAVVSKQVNMELAKIKQKCPLYEANGQAVPKEKDEMVEQEFNRLLEATSYLSHQLDFNVLNNKPVSLGQALEVVIQLQEKHVKDEQIEHWKKIVKTQEELKELLNKMVNLKEKIKELHQQYKEASEVKPPRDITAEFLVKSKHRDLTALCKEYDELAETQGKLEEKLQELEANPPSDVYLSSRDRQILDWHFANLEFANATPLSTLSLKHWDQDDDFEFTGSHLTVRNGYSCVPVALAEGLDIKLNTAVRQVRYTASGCEVIAVNTRSTSQTFIYKCDAVLCTLPLGVLKQQPPAVQFVPPLPEWKTSAVQRMGFGNLNKVVLCFDRVFWDPSVNLFGHVGSTTASRGELFLFWNLYKAPILLALVAGEAAGIMENISDDVIVGRCLAILKGIFGSSAVPQPKETVVSRWRADPWARGSYSYVAAGSSGNDYDLMAQPITPGPSIPGAPQPIPRLFFAGEHTIRNYPATVHGALLSGLREAGRIADQFLGAMYTLPRQATPGVPAQQSPSM
5MDG , Knot 43 74 0.83 38 66 72
TSILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPGATLEEMMTACQGV
6HCB , Knot 122 264 0.86 40 173 254
GANKTVVVTTILESPYVMMKKNHEMLEGNERYEGYCVDLAAEIAKHCGFKYKLTIVGDGKYGARDADTKIWNGMVGELVYGKADIAIAPLTITLVREEVIDFSKPFMSLGISIMIKKGTPIESAEDLSKQTEIAYGTLDSGSTKEFFRRSKIAVFDKMWTYMRSAEPSVFVRTTAEGVARVRKSKGKYAYLLESTMNEYIEQRKPCDTMKVGGNLDSKGYGIATPKGSSLGNAVNLAVLKLSEQGLLDKLKNKWWYDKGECGSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4UVA_1)}(2) \setminus P_{f(5MDG_1)}(2)|=239\), \(|P_{f(5MDG_1)}(2) \setminus P_{f(4UVA_1)}(2)|=11\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11010011111111111101001111011100010011101111011101111111000100001101011111101110111011101111010110011100101000000001010000100011010000000000001010000011111101110000000100101011110000011010101011011110001100010000110110110110000011111000010111001010101010100101100000111001000100011101110001011100001011111011011111001001110101101000111011010010011011111101111011111000101011010000110010101110000011000100110100010001010110001101101101110100001000010010011000001001100110100010010000001001011001010111000000101100000011000101000100101011001010000001101011010110101100101001000001010100101001000111111011010100110010001010011110000000011000011100111111000111101111110100011001111010011101001110101011101100010010111110100111111111101111100100011110011110111100111010001100101011101000011110010000111011011101111101110111110001000110101111011001101100111110011001011111000101
Pair \(Z_2\) Length of longest common subsequence
4UVA_1,5MDG_1 250 4
4UVA_1,6HCB_1 193 3
5MDG_1,6HCB_1 165 3

Newick tree

 
[
	4UVA_1:11.81,
	[
		6HCB_1:82.5,5MDG_1:82.5
	]:37.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{946 }{\log_{20} 946}-\frac{74}{\log_{20}74})=246.\)
Status Protein1 Protein2 d d1/2
Query variables 4UVA_1 5MDG_1 305 165
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]