CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5LLU_1 8TWC_1 2EZN_1 Letter Amino acid
49 7 7 I Isoleucine
68 9 8 L Leucine
20 2 0 M Methionine
20 6 3 R Arginine
14 4 6 Q Glutamine
43 7 7 G Glycine
3 1 1 H Histidine
25 7 5 K Lycine
19 8 1 P Proline
28 13 12 S Serine
2 2 1 W Tryptophan
2 2 4 C Cysteine
31 6 6 E Glutamic acid
12 2 3 Y Tyrosine
21 3 3 F Phenylalanine
48 10 4 V Valine
48 14 6 A Alanine
19 9 9 N Asparagine
17 5 5 D Aspartic acid
33 12 10 T Threonine

5LLU_1|Chain A|Excitatory amino acid transporter 1,Neutral amino acid transporter B(0),Excitatory amino acid transporter 1|Homo sapiens (9606)
>8TWC_1|Chains AA[auth AC], AB[auth AD], AC[auth AE], AD[auth AF], AE[auth AG], AF[auth AH], A[auth AB], BA[auth AJ], BB[auth AK], BC[auth AL], BD[auth AM], BE[auth AN], BF[auth AO], B[auth AI], CA[auth AQ], CB[auth AR], CC[auth AS], CD[auth AT], CE[auth AU], CF[auth AV], C[auth AP], DA[auth AX], DB[auth AY], DC[auth AZ], DD[auth BA], DE[auth BB], DF[auth BC], D[auth AW], EA[auth BE], EB[auth BF], EC[auth BG], ED[auth BH], EE[auth BI], EF[auth BJ], E[auth BD], FA[auth BL], FB[auth BM], FC[auth BN], FD[auth BO], FE[auth BP], FF[auth BQ], F[auth BK], GA[auth BS], GB[auth BT], GC[auth BU], GD[auth BV], GE[auth BW], GF[auth BX], G[auth BR], HA[auth BZ], HB[auth CA], HC[auth CB], HD[auth CC], HE[auth CD], HF[auth CE], H[auth BY], IA[auth CG], IB[auth CH], IC[auth CI], ID[auth CJ], IE[auth CK], IF[auth CL], I[auth CF], JA[auth CN], JB[auth CO], JC[auth CP], JD[auth CQ], JE[auth CR], JF[auth CS], J[auth CM], KA[auth CU], KB[auth CV], KC[auth CW], KD[auth CX], KE[auth CY], KF[auth CZ], K[auth CT], LA[auth DB], LB[auth DC], LC[auth DD], LD[auth DE], LE[auth DF], LF[auth DG], L[auth DA], MA[auth DI], MB[auth DJ], MC[auth DK], MD[auth DL], ME[auth DM], MF[auth DN], M[auth DH], NA[auth DP], NB[auth DQ], NC[auth DR], ND[auth DS], NE[auth DT], NF[auth DU], N[auth DO], OA[auth DW], OB[auth DX], OC[auth DY], OD[auth DZ], OE[auth EA], OF[auth EB], O[auth DV], PA[auth ED], PB[auth EE], PC[auth EF], PD[auth EG], PE[auth EH], PF[auth EI], P[auth EC], QA[auth EK], QB[auth EL], QC[auth EM], QD[auth EN], QE[auth EO], QF[auth EP], Q[auth EJ], RA[auth ER], RB[auth ES], RC[auth ET], RD[auth EU], RE[auth EV], RF[auth EW], R[auth EQ], SA[auth EY], SB[auth EZ], SC[auth FA], SD[auth FB], SE[auth FC], SF[auth FD], S[auth EX], TA[auth FF], TB[auth FG], TC[auth FH], TD[auth FI], TE[auth FJ], TF[auth FK], T[auth FE], UA[auth FM], UB[auth FN], UC[auth FO], UD[auth FP], UE[auth FQ], UF[auth FR], U[auth FL], VA[auth FT], VB[auth FU], VC[auth FV], VD[auth FW], VE[auth FX], VF[auth FY], V[auth FS], WA[auth GA], WB[auth GB], WC[auth GC], WD[auth GD], WE[auth GE], WF[auth GF], W[auth FZ], XA[auth GH], XB[auth GI], XC[auth GJ], XD[auth GK], XE[auth GL], XF[auth GM], X[auth GG], YA[auth GO], YB[auth GP], YC[auth GQ], YD[auth GR], YE[auth GS], Y[auth GN], ZA[auth GU], ZB[auth GV], ZC[auth GW], ZD[auth GX], ZE[auth GY], Z[auth GT]|Coat protein|Acinetobacter phage AP205 (154784)
>2EZN_1|Chain A|CYANOVIRIN-N|Nostoc ellipsosporum (45916)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5LLU , Knot 205 522 0.82 40 227 466
MTKSNGEEPKMGGRMERFQQGVSKRTLLAKKKVQNITKEDVKSFLRRNALLLLTVLAVILGVVLGFLLRPYPLSPREVKYFAFPGELLMRMLKMLILPLIVSSLITGLASLDAKASGRLGMRAVVYYMSTTIIAVVLGIILVLIIHPGAASAAITASVGAAGSAENAPSKEVLDCFLDLARNIFPSNLVSAAFRSYSTTYEERTITGTRVKVPVGQEVEGMNILGLVVFSIVFGIALGKMGEQGQLLVDFFNSLNEATMKLVAIIMWYAPLGILFLIAGKIVEMEDLEVLGGQLGMYMVTVIVGLVIHGLIVLPLIYFLITRKNPFVFIAGILQALITALGTSSSSATLPITFKCLEENNGVDKRITRFVLPVGATINMDGTALYEAVAAIFIAQVNNYELDFGQIITISITATAASIGAAGIPQAGLVTMVIVLTAVGLPTDDITLIIAVDWLLDRFRTMVNVLGDALGAGIVEHLSRKELEKQDAELGNSVIEENEMKKPYQLIAQDNETEKPIDSETKM
8TWC , Knot 66 129 0.82 40 104 125
ANKPMQPITSTANKIVWSDPTRLSTTFSASLLRQRVKVGIAELNNVSGQYVSVYKRPAPKPEGCADACVIMPNENQSIRTVISGSAENLATLKAEWETHKRNVDTLFASGNAGLGFLDPTAAIVSSDTT
2EZN , Knot 53 101 0.80 38 84 99
LGKFSQTCYNSAIQGSVLTSTCERTNGGYNTSSIDLNSVIENVDGSLKWQPSNFIETCRNTQLAGSSELAAECKTRAQQFVSTKINLDDHIANIDGTLKYE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5LLU_1)}(2) \setminus P_{f(8TWC_1)}(2)|=154\), \(|P_{f(8TWC_1)}(2) \setminus P_{f(5LLU_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100001001011101001001100001110001001000010011000111110111111111111110101101001001111101110110111111110011011101010101011101110010001111111111111101111011101011111010011000110011011001110011011100000000000101001011110010110111111101111111101100101110110010010101111111011111111111011010010111101110110111111101111111101110000111111111011101110000010111010010000110001001111111010101011001111111101000010110110101010110111111101111011111011111000101111101110010011011101111111001000010000101100110000100100111000000011000001
Pair \(Z_2\) Length of longest common subsequence
5LLU_1,8TWC_1 185 5
5LLU_1,2EZN_1 191 5
8TWC_1,2EZN_1 120 3

Newick tree

 
[
	5LLU_1:10.88,
	[
		8TWC_1:60,2EZN_1:60
	]:42.88
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{651 }{\log_{20} 651}-\frac{129}{\log_{20}129})=150.\)
Status Protein1 Protein2 d d1/2
Query variables 5LLU_1 8TWC_1 185 113.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]