CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4DEE_1 4QIZ_1 6DZU_1 Letter Amino acid
3 2 3 M Methionine
12 11 13 F Phenylalanine
14 18 14 P Proline
21 11 14 R Arginine
9 9 13 N Asparagine
4 1 1 C Cysteine
10 12 3 H Histidine
34 26 11 L Leucine
19 29 13 S Serine
11 16 11 D Aspartic acid
10 11 7 Q Glutamine
21 16 8 K Lycine
14 16 14 V Valine
12 8 12 Y Tyrosine
17 14 6 A Alanine
15 17 11 G Glycine
13 14 10 I Isoleucine
12 11 17 T Threonine
4 7 3 W Tryptophan
24 14 5 E Glutamic acid

4DEE_1|Chain A|Aurora kinase A|Homo sapiens (9606)
>4QIZ_1|Chains A, B|Carbonic anhydrase 13|Homo sapiens (9606)
>6DZU_1|Chains AA[auth A2], AB[auth A3], A[auth A1], BA[auth A5], BB[auth A6], B[auth A4], CA[auth A8], CB[auth A9], C[auth A7], DA[auth AB], DB[auth AC], D[auth AA], EA[auth AE], EB[auth AF], E[auth AD], FA[auth AH], FB[auth AI], F[auth AG], GA[auth AK], GB[auth AL], G[auth AJ], HA[auth AN], HB[auth AO], H[auth AM], IA[auth AQ], I[auth AP], JA[auth AS], J[auth AR], KA[auth AU], K[auth AT], LA[auth AW], L[auth AV], MA[auth AY], M[auth AX], NA[auth Aa], N[auth AZ], OA[auth Ac], O[auth Ab], PA[auth Ae], P[auth Ad], QA[auth Ag], Q[auth Af], RA[auth Ai], R[auth Ah], SA[auth Ak], S[auth Aj], TA[auth Am], T[auth Al], UA[auth Ao], U[auth An], VA[auth Aq], V[auth Ap], WA[auth As], W[auth Ar], XA[auth Au], X[auth At], YA[auth Aw], Y[auth Av], ZA[auth Ay], Z[auth Ax]|Putative capsid protein|Porcine circovirus 2 (85708)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4DEE , Knot 127 279 0.85 40 177 272
SKKRQWALEDFEIGRPLGKGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYGYFHDATRVYLILEYAPLGTVYRELQKLSKFDEQRTATYITELANALSYCHSKRVIHRDIKPENLLLGSAGELKIADFGWSVHAPSSRRDTLCGTLDYLPPEMIEGRMHDEKVDLWSLGVLCYEFLVGKPPFEANTYQETYKRISRVEFTFPDFVTEGARDLISRLLKHNPSQRPMLREVLEHPWITANSSKPSNCQNKESASK
4QIZ , Knot 116 263 0.82 40 175 257
MMSRLSWGYREHNGPIHWKEFFPIADGDQQSPIEIKTKEVKYDSSLRPLSIKYDPSSAKIISNSGHSFNVDFDDTENKSVLRGGPLTGSYRLRQVHLHWGSADDHGSEHIVDGVSYAAELHVVHWNSDKYPSFVEAAHEPDGLAVLGVFLQIGEPNSQLQKITDTLDSIKEKGKQTRFTNFDLLSLLPPSWDYWTYPGSLTVPPLLESVTWIVLKQPINISSQQLAKFRSLLCTAEGEAAAFLVSNHRPPQPLKGRKVRASFH
6DZU , Knot 94 189 0.87 40 152 186
GIFNTRLSRTFGYTIKRTTVKTPSWAVDMMRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTAGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4DEE_1)}(2) \setminus P_{f(4QIZ_1)}(2)|=82\), \(|P_{f(4QIZ_1)}(2) \setminus P_{f(4DEE_1)}(2)|=80\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000001110010110111010110101100000011111011101010011100010001010001001011010101001001011100111101000100100100000100100110110000000110001010011110110101101110101100000010101001110110101000010110111100011110111010000000001001010110110011001100110001000111001100111010000100000000100
Pair \(Z_2\) Length of longest common subsequence
4DEE_1,4QIZ_1 162 3
4DEE_1,6DZU_1 181 3
4QIZ_1,6DZU_1 167 3

Newick tree

 
[
	6DZU_1:89.00,
	[
		4DEE_1:81,4QIZ_1:81
	]:8.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{542 }{\log_{20} 542}-\frac{263}{\log_{20}263})=79.2\)
Status Protein1 Protein2 d d1/2
Query variables 4DEE_1 4QIZ_1 103 99.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]