CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5LQP_1 9FHA_1 6ATY_1 Letter Amino acid
6 12 1 E Glutamic acid
1 4 1 H Histidine
6 4 2 R Arginine
9 3 0 N Asparagine
2 1 6 C Cysteine
4 0 1 Q Glutamine
7 8 2 K Lycine
2 1 0 M Methionine
12 12 2 T Threonine
2 5 2 Y Tyrosine
14 12 0 A Alanine
5 5 2 D Aspartic acid
7 10 6 G Glycine
9 7 1 L Leucine
7 5 5 I Isoleucine
13 11 6 S Serine
3 5 1 F Phenylalanine
8 8 1 P Proline
2 2 0 W Tryptophan
10 12 0 V Valine

5LQP_1|Chains AA[auth AC], AB[auth AD], AC[auth AE], AD[auth AF], AE[auth AG], AF[auth AH], A[auth AB], BA[auth AJ], BB[auth AK], BC[auth AL], BD[auth AM], BE[auth AN], BF[auth AO], B[auth AI], CA[auth AQ], CB[auth AR], CC[auth AS], CD[auth AT], CE[auth AU], CF[auth AV], C[auth AP], DA[auth AX], DB[auth AY], DC[auth AZ], DD[auth BA], DE[auth BB], DF[auth BC], D[auth AW], EA[auth BE], EB[auth BF], EC[auth BG], ED[auth BH], EE[auth BI], EF[auth BJ], E[auth BD], FA[auth BL], FB[auth BM], FC[auth BN], FD[auth BO], FE[auth BP], FF[auth BQ], F[auth BK], GA[auth BS], GB[auth BT], GC[auth BU], GD[auth BV], GE[auth BW], GF[auth BX], G[auth BR], HA[auth BZ], HB[auth CA], HC[auth CB], HD[auth CC], HE[auth CD], HF[auth CE], H[auth BY], IA[auth CG], IB[auth CH], IC[auth CI], ID[auth CJ], IE[auth CK], IF[auth CL], I[auth CF], JA[auth CN], JB[auth CO], JC[auth CP], JD[auth CQ], JE[auth CR], JF[auth CS], J[auth CM], KA[auth CU], KB[auth CV], KC[auth CW], KD[auth CX], KE[auth CY], KF[auth CZ], K[auth CT], LA[auth DB], LB[auth DC], LC[auth DD], LD[auth DE], LE[auth DF], LF[auth DG], L[auth DA], MA[auth DI], MB[auth DJ], MC[auth DK], MD[auth DL], ME[auth DM], MF[auth DN], M[auth DH], NA[auth DP], NB[auth DQ], NC[auth DR], ND[auth DS], NE[auth DT], NF[auth DU], N[auth DO], OA[auth DW], OB[auth DX], OC[auth DY], OD[auth DZ], OE[auth EA], OF[auth EB], O[auth DV], PA[auth ED], PB[auth EE], PC[auth EF], PD[auth EG], PE[auth EH], PF[auth EI], P[auth EC], QA[auth EK], QB[auth EL], QC[auth EM], QD[auth EN], QE[auth EO], QF[auth EP], Q[auth EJ], RA[auth ER], RB[auth ES], RC[auth ET], RD[auth EU], RE[auth EV], RF[auth EW], R[auth EQ], SA[auth EY], SB[auth EZ], SC[auth FA], SD[auth FB], SE[auth FC], SF[auth FD], S[auth EX], TA[auth FF], TB[auth FG], TC[auth FH], TD[auth FI], TE[auth FJ], TF[auth FK], T[auth FE], UA[auth FM], UB[auth FN], UC[auth FO], UD[auth FP], UE[auth FQ], UF[auth FR], U[auth FL], VA[auth FT], VB[auth FU], VC[auth FV], VD[auth FW], VE[auth FX], VF[auth FY], V[auth FS], WA[auth GA], WB[auth GB], WC[auth GC], WD[auth GD], WE[auth GE], WF[auth GF], W[auth FZ], XA[auth GH], XB[auth GI], XC[auth GJ], XD[auth GK], XE[auth GL], XF[auth GM], X[auth GG], YA[auth GO], YB[auth GP], YC[auth GQ], YD[auth GR], YE[auth GS], Y[auth GN], ZA[auth GU], ZB[auth GV], ZC[auth GW], ZD[auth GX], ZE[auth GY], Z[auth GT]|Coat protein|Acinetobacter phage AP205 (154784)
>9FHA_1|Chains A, B|Transthyretin|Homo sapiens (9606)
>6ATY_1|Chain A|Venom protein 51.1|Lychas mucronatus (172552)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5LQP , Knot 66 129 0.82 40 104 125
ANKPMQPITSTANKIVWSDPTRLSTTFSASLLRQRVKVGIAELNNVSGQYVSVYKRPAPKPEGCADACVIMPNENQSIRTVISGSAENLATLKAEWETHKRNVDTLFASGNAGLGFLDPTAAIVSSDTT
9FHA , Knot 65 127 0.82 38 101 125
GPTGTGESKCPLMVKVLDAVRGSPAINVAVHVFRKAADDTWEPFASGKTSESGELHGLTTEEEFVEGIYKVEIDTKSYWKALGISPFHEHAEVVFTANDSGPRRYTIAALLSPYSYSTTAVVTNPKE
6ATY , Knot 21 39 0.65 30 33 36
GSISIGIKCSPSIDLCEGQCRIRKYFTGYCSGDTCHCSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5LQP_1)}(2) \setminus P_{f(9FHA_1)}(2)|=70\), \(|P_{f(9FHA_1)}(2) \setminus P_{f(5LQP_1)}(2)|=67\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110110001001110010010001010110001011110100101001010001110101010101111000001001101010011010101000000100111010111111010111100000
Pair \(Z_2\) Length of longest common subsequence
5LQP_1,9FHA_1 137 4
5LQP_1,6ATY_1 119 2
9FHA_1,6ATY_1 110 2

Newick tree

 
[
	5LQP_1:66.93,
	[
		6ATY_1:55,9FHA_1:55
	]:11.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{256 }{\log_{20} 256}-\frac{127}{\log_{20}127})=40.6\)
Status Protein1 Protein2 d d1/2
Query variables 5LQP_1 9FHA_1 53 51.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]