CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7ZTS_1 3LCX_1 8AVA_1 Letter Amino acid
36 13 23 R Arginine
18 21 26 Q Glutamine
42 20 31 I Isoleucine
60 35 69 L Leucine
7 1 13 W Tryptophan
45 22 38 V Valine
57 27 38 A Alanine
37 17 41 E Glutamic acid
36 31 30 G Glycine
7 5 11 C Cysteine
13 18 22 H Histidine
29 16 40 K Lycine
28 9 11 M Methionine
27 9 27 F Phenylalanine
51 17 45 S Serine
43 14 40 T Threonine
36 12 22 Y Tyrosine
48 15 34 D Aspartic acid
39 11 35 P Proline
38 6 21 N Asparagine

7ZTS_1|Chains AA[auth AB], AB[auth AC], AC[auth AD], AD[auth AE], A[auth AA], BA[auth AG], BB[auth AH], BC[auth AI], BD[auth AJ], B[auth AF], CA[auth AL], CB[auth AM], CC[auth AN], CD[auth AO], C[auth AK], DA[auth AQ], DB[auth AR], DC[auth AS], DD[auth AT], D[auth AP], EA[auth AV], EB[auth AW], EC[auth AX], ED[auth AY], E[auth AU], FA[auth BA], FB[auth BB], FC[auth BC], FD[auth BD], F[auth AZ], GA[auth BF], GB[auth BG], GC[auth BH], G[auth BE], HA[auth BJ], HB[auth BK], HC[auth BL], H[auth BI], IA[auth BN], IB[auth BO], IC[auth BP], I[auth BM], JA[auth BR], JB[auth BS], JC[auth BT], J[auth BQ], KA[auth BV], KB[auth BW], KC[auth BX], K[auth BU], LA[auth BZ], LB[auth CA], LC[auth CB], L[auth BY], MA[auth CD], MB[auth CE], MC[auth CF], M[auth CC], NA[auth CH], NB[auth CI], NC[auth CJ], N[auth CG], OA[auth CL], OB[auth CM], OC[auth CN], O[auth CK], PA[auth CP], PB[auth CQ], PC[auth CR], P[auth CO], QA[auth CT], QB[auth CU], QC[auth CV], Q[auth CS], RA[auth CX], RB[auth CY], RC[auth CZ], R[auth CW], SA[auth DB], SB[auth DC], SC[auth DD], S[auth DA], TA[auth DF], TB[auth DG], TC[auth DH], T[auth DE], UA[auth DJ], UB[auth DK], UC[auth DL], U[auth DI], VA[auth DN], VB[auth DO], VC[auth DP], V[auth DM], WA[auth DR], WB[auth DS], WC[auth DT], W[auth DQ], XA[auth DV], XB[auth DW], XC[auth DX], X[auth DU], YA[auth DZ], YB[auth EA], YC[auth EB], Y[auth DY], ZA[auth ED], ZB[auth EE], ZC[auth EF], Z[auth EC]|Major capsid protein|Saccharomyces cerevisiae BY4741 (1247190)
>3LCX_1|Chains A, B, C, D|N-acetylneuraminate lyase|Escherichia coli (83333)
>8AVA_1|Chain A|Leukotriene A-4 hydrolase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7ZTS , Knot 269 697 0.84 40 291 650
MSSLLNSLLPEYFKPKTNLNINSSRVQYGFNARIDMQYEDDSGTRKGSRPNAFMSNTVAFIGNYEGIIVDDIPILDGLRADIFDTHGDLDMGLVEDALSKSTMIRRNVPTYTAYASELLYKRNLTSLFYNMLRLYYIKKWGSIKYEKDAIFYDNGHACLLNRQLFPKSRDASLESSLSLPEAEIAMLDPGLEFPEEDVPAILWHGRVSSRATCILGQACSEFAPLAPFSIAHYSPQLTRKLFVNAPAGIEPSSGRYTHEDVKDAITILVSANQAYTDFEAAYLMLAQTLVSPVPRTAEASAWFINAGMVNMPTLSCANGYYPALTNVNPYHRLDTWKDTLNHWVAYPDMLFYHSVAMIESCYVELGNVARVSDSDAINKYTFTELSVQGRPVMNRGIIVDLTLVAMRTGREISLPYPVSCGLTRTDALLQGTEIHVPVVVKDIDMPQYYNAIDKDVIEGQETVIKVKQLPPAMYPIYTYGINTTEFYSDHFEDQVQVEMAPIDNGKAVFNDARKFSKFMSIMRMMGNDVTATDLVTGRKVSNWADNSSGRFLYTDVKYEGQTAFLVDMDTVKARDHCWVSIVDPNGTMNLSYKMTNFRAAMFSRNKPLYMTGGSVRTIATGNYRDAAERLRAMDETLRLKPFKITEKLDFRVAAYAIPSLSGSNMPSLHHQEQLQISEVDAEPINPIGEDELPPDIE
3LCX , Knot 138 319 0.83 40 194 301
MGHHHHHHHHHHSSGHIEGRHMMATNLRGVMAALLTPFDQQQALDKASLRRLVQFNIQQGIDGLYVGGSTGEAFVQSLSERAQVLEIVAEEAKGKIKLIAHVGCVSTAESQQLAASAKRHGFDAVSAVTPFYYPFSLEEHCDHYRAIIDSADGLPMVVYNIPALSGVKLTLGQIYTLVTLPGVGALKQTSGDLYQMEQIRREHPDLVLYNGYDEIFASGLLAGADGGIGSTYNIMGWRYQGIVKALKEGDIQTAQKLQTECNKVIDLLIKTGIFRGLKTVLHYMDVISVPLCRKPFGPVDEKCLPELKALAQQLMQERG
8AVA , Knot 246 617 0.85 40 286 574
MHHHHHHPEIVDTCSLASPASVCRTKHLHLRCSVDFTRRTLTGTAALTVQSQEDNLRSLVLDTKDLTIEKVVINGQEVKYALGERQSYKGSPMEISLPIALSKNQEIVIEISFETSPKSSALQWLTPEQTSGKEHPYLFSQCQAIHCRAILPCQDTPSVKLTYTAEVSVPKELVALMSAIRDGETPDPEDPSRKIYKFIQKVPIPCYLIALVVGALESRQIGPRTLVWSEKEQVEKSAYEFSETESMLKIAEDLGGPYVWGQYDLLVLPPSFPYGGMENPCLTFVTPTLLAGDKSLSNVIAHEISHSWTGNLVTNKTWDHFWLNEGHTVYLERHICGRLFGEKFRHFNALGGWGELQNSVKTFGETHPFTKLVVDLTDIDPDVAYSSVPYEKGFALLFYLEQLLGGPEIFLGFLKAYVEKFSYKSITTDDWKDFLYSYFKDKVDVLNQVDWNAWLYSPGLPPIKPNYDMTLTNACIALSQRWITAKEDDLNSFNATDLKDLSSHQLNEFLAQTLQRAPLPLGHIKRMQEVYNFNAINNSEIRFRWLRLCIQSKWEDAIPLALKMATEQGRMKFTRPLFKDLAAFDKSHDQAVRTYQEHKASMHPVTAMLVGKDLKVD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7ZTS_1)}(2) \setminus P_{f(3LCX_1)}(2)|=133\), \(|P_{f(3LCX_1)}(2) \setminus P_{f(7ZTS_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001100111001010001010000100110101010000001000100101110001111100011110011110110101100010101111001100001100011000101001100001001100110100100110100000111000101011000111000010100010110101111011101100011111101010001001110100011111110110001010001110111110100100000010011011101001000101101111001101110010101111011110110100101001110010100010010001001110101110001111000010110110100001100001001010101110011110101111001001011011001100001110100101111100101100001100011010001101001111101100011000010000100010101111001011100100100110110111001010011010010011000010110001000100111101001010000110110101010100010010111100001101011010011010000110010110001010110100010101110111010100110100000101001010110111000111010
Pair \(Z_2\) Length of longest common subsequence
7ZTS_1,3LCX_1 169 3
7ZTS_1,8AVA_1 129 5
3LCX_1,8AVA_1 170 6

Newick tree

 
[
	3LCX_1:90.49,
	[
		7ZTS_1:64.5,8AVA_1:64.5
	]:25.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1016 }{\log_{20} 1016}-\frac{319}{\log_{20}319})=186.\)
Status Protein1 Protein2 d d1/2
Query variables 7ZTS_1 3LCX_1 236 169.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]