CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4URF_1 8WKQ_1 6ASH_1 Letter Amino acid
12 41 15 S Serine
19 48 15 V Valine
8 19 19 K Lycine
8 44 9 Q Glutamine
31 35 24 G Glycine
19 22 8 I Isoleucine
19 51 12 L Leucine
7 7 13 Y Tyrosine
32 58 19 A Alanine
9 25 9 D Aspartic acid
1 0 8 C Cysteine
6 12 5 F Phenylalanine
5 33 8 P Proline
13 39 7 R Arginine
17 36 10 E Glutamic acid
5 8 2 H Histidine
3 10 4 M Methionine
20 33 5 T Threonine
4 7 4 W Tryptophan
10 32 19 N Asparagine

4URF_1|Chains A, B|CYCLOHEXANOL DEHYDROGENASE|AROMATOLEUM AROMATICUM EBN1 (76114)
>8WKQ_1|Chains AA[auth 1], AC[auth 2], A[auth 0], BA[auth 4], BC[auth 5], B[auth 3], CA[auth 7], CC[auth 8], C[auth 6], DA[auth AA], DC[auth AB], D[auth 9], EA[auth AD], EC[auth AE], E[auth AC], FC[auth AG], F[auth AF], GB[auth AI], GC[auth AJ], G[auth AH], HB[auth AL], HC[auth AM], H[auth AK], IB[auth AO], IC[auth AP], I[auth AN], JB[auth UI], JC[auth UJ], J[auth AQ], KB[auth UL], KC[auth UM], K[auth UK], LB[auth UO], LC[auth UP], L[auth UN], MB[auth WB], MC[auth WC], M[auth WA], NB[auth WE], NC[auth WF], N[auth WD], OB[auth WH], OC[auth WI], O[auth WG], PB[auth WK], PC[auth WL], P[auth WJ], QB[auth WN], QC[auth WO], Q[auth WM], RB[auth WQ], RC[auth WR], R[auth WP], SB[auth WT], SC[auth WU], S[auth WS], TB[auth WW], TC[auth b], T[auth WV], UB[auth d], UC[auth e], U[auth c], VB[auth g], VC[auth h], V[auth f], WB[auth j], WC[auth k], W[auth i], XB[auth t], XC[auth u], X[auth l], YB[auth w], YC[auth x], Y[auth v], ZB[auth z], Z[auth y]|Flagellar M-ring protein|Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (99287)
>6ASH_1|Chain A|Cathepsin K|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4URF , Knot 108 248 0.80 40 152 239
MLLEGKTALVTGAGNGIGRTIALTYAAEGANVVVSDISDEWGRETLALIEGKGGKAVFQHADTAHPEDHDELIAAAKRAFGRLDIACNNAGISGEFTPTAETTDAQWQRVIGINLSGVFYGVRAQIRAMLETGGGAIVNISSIAGQIGIEGITPYTAAKHGVVGLTKTVAWEYGSKGIRINSVGPAFINTTLVQNVPLETRRQLEQMHALRRLGETEEVANLVAWLSSDKASFVTGSYYAVDGGYLAR
8WKQ , Knot 220 560 0.82 38 233 518
MSATASTATQPKPLEWLNRLRANPRIPLIVAGSAAVAIVVAMVLWAKTPDYRTLFSNLSDQDGGAIVAQLTQMNIPYRFANGSGAIEVPADKVHELRLRLAQQGLPKGGAVGFELLDQEKFGISQFSEQVNYQRALEGELARTIETLGPVKSARVHLAMPKPSLFVREQKSPSASVTVTLEPGRALDEGQISAVVHLVSSAVAGLPPGNVTLVDQSGHLLTQSNTSGRDLNDAQLKFANDVESRIQRRIEAILSPIVGNGNVHAQVTAQLDFANKEQTEEHYSPNGDASKATLRSRQLNISEQVGAGYPGGVPGALSNQPAPPNEAPIATPPTNQQNAQNTPQTSTSTNSNSAGPRSTQRNETSNYEVDRTIRHTKMNVGDIERLSVAVVVNYKTLADGKPLPLTADQMKQIEDLTREAMGFSDKRGDTLNVVNSPFSAVDNTGGELPFWQQQSFIDQLLAAGRWLLVLVVAWILWRKAVRPQLTRRVEEAKAAQEQAQVRQETEEAVEVRLSKDEQLQQRRANQRLGAEVMSQRIREMSDNDPRVVALVIRQWMSNDHE
6ASH , Knot 99 215 0.82 40 146 208
APDSVDYRKKGYVTPVKNQGQCGSCWAFSSVGALEGQLKKKTGKLLNLAPQNLVDCVSENDGCGGGYMTNAFQYVQKNRGIDSEDAYPYVGQEESCMYNPTGKAAKCRGYREIPEGNEKALKRAVARVGPVSVAIDASLTSFQFYSKGVYYDESCNSDNLNHAVLAVGYGIQKGNKHWIIKNSWGENWGNKGYILMARNKNNACGIANLASFPKM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4URF_1)}(2) \setminus P_{f(8WKQ_1)}(2)|=35\), \(|P_{f(8WKQ_1)}(2) \setminus P_{f(4URF_1)}(2)|=116\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101001110111011100111001101101110010001100011110101101110010010100000111110011101011000111010101010000101001111010111011010101110011111101001110111011010011001111100011100100110100111111000110011100000100101100110000110111110000101101000110110110
Pair \(Z_2\) Length of longest common subsequence
4URF_1,8WKQ_1 151 5
4URF_1,6ASH_1 158 3
8WKQ_1,6ASH_1 181 4

Newick tree

 
[
	6ASH_1:87.86,
	[
		4URF_1:75.5,8WKQ_1:75.5
	]:12.36
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{808 }{\log_{20} 808}-\frac{248}{\log_{20}248})=154.\)
Status Protein1 Protein2 d d1/2
Query variables 4URF_1 8WKQ_1 190 134.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]