CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5PYT_1 8PKH_1 4FYP_1 Letter Amino acid
19 21 20 E Glutamic acid
10 27 15 I Isoleucine
13 32 20 K Lycine
5 9 0 M Methionine
13 12 5 F Phenylalanine
6 17 15 T Threonine
3 20 12 A Alanine
6 13 17 G Glycine
11 32 30 L Leucine
11 5 4 R Arginine
6 4 10 H Histidine
8 13 13 P Proline
13 22 21 S Serine
3 3 7 W Tryptophan
5 15 17 Y Tyrosine
8 19 21 V Valine
11 20 18 N Asparagine
5 20 11 D Aspartic acid
12 1 2 C Cysteine
12 14 5 Q Glutamine

5PYT_1|Chains A, B|Nuclear autoantigen Sp-100|Homo sapiens (9606)
>8PKH_1|Chains AA[auth AB], AB[auth AC], AC[auth AD], A[auth AA], BA[auth AF], BB[auth AG], BC[auth AH], B[auth AE], CA[auth AJ], CB[auth AK], CC[auth AL], C[auth AI], DA[auth AN], DB[auth AO], DC[auth AP], D[auth AM], EA[auth AR], EB[auth AS], EC[auth AT], E[auth AQ], FA[auth AV], FB[auth AW], F[auth AU], GA[auth AY], GB[auth AZ], G[auth AX], HA[auth BB], HB[auth BC], H[auth BA], IA[auth BE], IB[auth BF], I[auth BD], JA[auth BH], JB[auth BI], J[auth BG], KA[auth BK], KB[auth BL], K[auth BJ], LA[auth BN], LB[auth BO], L[auth BM], MA[auth BQ], MB[auth BR], M[auth BP], NA[auth BT], NB[auth BU], N[auth BS], OA[auth BW], OB[auth BX], O[auth BV], PA[auth BZ], PB[auth CA], P[auth BY], QA[auth CC], QB[auth CD], Q[auth CB], RA[auth CF], RB[auth CG], R[auth CE], SA[auth CI], SB[auth CJ], S[auth CH], TA[auth CL], TB[auth CM], T[auth CK], UA[auth CO], UB[auth CP], U[auth CN], VA[auth CR], VB[auth CS], V[auth CQ], WA[auth CU], WB[auth CV], W[auth CT], XA[auth CX], XB[auth CY], X[auth CW], YA[auth DA], YB[auth DB], Y[auth CZ], ZA[auth DD], ZB[auth DE], Z[auth DC]|Major capsid protein|Borreliella burgdorferi B31 (224326)
>4FYP_1|Chains A, B|Vegetative storage protein 1|Arabidopsis thaliana (3702)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5PYT , Knot 87 180 0.83 40 137 176
ENSNICEVCNKWGRLFCCDTCPRSFHEHCHIPSVEANKNPWSCIFCRIKTIQERCPESQSGHQESEVLMRQMLPEEQLKCEFLLLKVYCDSKSCFFASEPYYNREGSQGPQKPMWLNKVKTSLNEQMYTRVEGFVQDMRLIFHNHKEFYREDKFTRLGIQVQDIFEKNFRNIFAIQETSK
8PKH , Knot 137 319 0.82 40 187 305
MELFDENYYAKAVANIIGEVKDPIMYKWFSPDQIEDVDLQMGYQKTVKWDAFLNANPTTIANEVNTISTIGFSSEVVRLNYLKLQYKFRHLKQTSEKFYTSDSYIGDINNNLLPFAQAYKLASSEIIKLINHFVLTGTVSIQKDGKNQKRLLPNMYGLLNMPEQIKEEVASGDKDKMDKIFEKIEAGLSKLELGDEFSTPMMVIVDPATSLKLVKPYAAAQGAASSCEKWEDVLIQTIKAINNREDVYIETSNLLKHKILIYPLNSELIKFKPSKYMLPTPNEQVDKDSTDVAHSYIDFVLGGLLATRKTILQVNIKQS
4FYP , Knot 116 263 0.82 38 167 247
VSHVQSSASVPGLIELLESNTIFGNEAELLEKEGLSINYPNCRSWHLGVETSNIINFDTVPANCKAYVEDYLITSKQYQYDSKTVNKEAYFYAKGLALKNDTVNVWIFDLDDTLLSSIPYYAKYGYGTENTAPGAYWSWLESGESTPGLPETLHLYENLLELGIEPIIISDRWKKLSEVTVENLKAVGVTKWKHLILKPNGSKLTQVVYKSKVRNSLVKKGYNIVGNIGDQWADLVEDTPGRVFKLPNPLYYVPSLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5PYT_1)}(2) \setminus P_{f(8PKH_1)}(2)|=64\), \(|P_{f(8PKH_1)}(2) \setminus P_{f(5PYT_1)}(2)|=114\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000010010001101100000100100000110101000110011001001000010000100000111001110001000111101000000011100100000100110011110010001000100010111001011100000100000100111010011000100111100000
Pair \(Z_2\) Length of longest common subsequence
5PYT_1,8PKH_1 178 4
5PYT_1,4FYP_1 178 3
8PKH_1,4FYP_1 162 4

Newick tree

 
[
	5PYT_1:91.51,
	[
		8PKH_1:81,4FYP_1:81
	]:10.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{499 }{\log_{20} 499}-\frac{180}{\log_{20}180})=93.0\)
Status Protein1 Protein2 d d1/2
Query variables 5PYT_1 8PKH_1 118 92.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]