CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1JQI_1 8QQD_1 7ABK_1 Letter Amino acid
18 2 19 R Arginine
27 3 29 E Glutamic acid
6 1 13 H Histidine
12 4 4 F Phenylalanine
5 1 3 W Tryptophan
41 11 34 L Leucine
31 12 16 S Serine
23 11 7 T Threonine
10 5 4 Y Tyrosine
46 18 26 A Alanine
15 4 9 D Aspartic acid
31 5 10 G Glycine
13 2 4 M Methionine
15 5 8 P Proline
14 10 5 V Valine
12 9 6 N Asparagine
5 2 0 C Cysteine
18 4 24 Q Glutamine
24 11 9 I Isoleucine
22 5 15 K Lycine

1JQI_1|Chains A, B|short chain acyl-CoA dehydrogenase|Rattus norvegicus (10116)
>8QQD_1|Chains A, AA[auth B], BA[auth D], B[auth C], CA[auth F], C[auth E], DA[auth H], D[auth G], EA[auth J], E[auth I], FA[auth L], F[auth K], GA[auth N], G[auth M], HA[auth P], H[auth O], IA[auth R], I[auth Q], JA[auth T], J[auth S], KA[auth V], K[auth U], LA[auth X], L[auth W], MA[auth Z], M[auth Y], NA[auth b], N[auth a], OA[auth d], O[auth c], PA[auth f], P[auth e], QA[auth h], Q[auth g], RA[auth j], R[auth i], SA[auth l], S[auth k], T[auth m], U[auth n], V[auth o], W[auth p], X[auth q], Y[auth r], Z[auth s]|Type IV wide pilus major component PilA4|Thermus thermophilus (274)
>7ABK_1|Chain A|Chloroplast membrane-associated 30 kD protein|Synechocystis sp. (strain PCC 6803 / Kazusa) (1111708)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1JQI , Knot 163 388 0.83 40 210 373
LHSVYQSVELPETHQMLRQTCRDFAEKELVPIAAQLDKEHLFPTSQVKKMGELGLLAMDVPEELSGAGLDYLAYSIALEEISRGCASTGVIMSVNNSLYLGPILKFGSSQQKQQWITPFTNGDKIGCFALSEPGNGSDAGAASTTAREEGDSWVLNGTKAWITNSWEASATVVFASTDRSRQNKGISAFLVPMPTPGLTLGKKEDKLGIRASSTANLIFEDCRIPKENLLGEPGMGFKIAMQTLDMGRIGIASQALGIAQASLDCAVKYAENRHAFGAPLTKLQNIQFKLADMALALESARLLTWRAAMLKDNKKPFTKESAMAKLAASEAATAISHQAIQILGGMGYVTEMPAERYYRDARITEIYEGTSEIQRLVIAGHLLRSYRS
8QQD , Knot 65 125 0.83 40 98 120
FTLIELLIVIAIIAILAAVLIPNLLAARKRANDTVVTAYLNDAVKFQEMYQIDNNSYTSNQAALISLGLKSTPANVTFSIVSASANSYCMIAGHSGGTVWFAATPDKGVYKTNTAVTSSQPESCP
7ABK , Knot 101 245 0.75 38 130 220
MGHHHHHHHHHSSGHIDDDDKHMELFNRVGRVLKSQLTHWQQQQEAPEDLLERLLGEMELELIELRRALAQTIATFKSTERQRDAQQLIAQRWYEKAQAALDRGNEQLAREALGQRQSYQSHTEALGKSLGEQRALVEQVRGQLQKLERKYLELKSQKNLYLARLKSAIAAQKIEEIAGNLDNASASSLFERIETKILELEAERELLNPPPSPLDKKFEQWEEQQAVEATLAAMKARRSLPPPSS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1JQI_1)}(2) \setminus P_{f(8QQD_1)}(2)|=144\), \(|P_{f(8QQD_1)}(2) \setminus P_{f(1JQI_1)}(2)|=32\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001000101100001100000011000111111010000111000100110111111011001011110011001110010010100111101000101111101100000001101100100110111001101001111000100010011101001110001010101111000000000110111111101110110000011101000101110000110001110111110111001011011110011111010100110010000111111001001010110111110010110101111000001100001110111001101100011011111101001110000001010010010001001111101100000
Pair \(Z_2\) Length of longest common subsequence
1JQI_1,8QQD_1 176 4
1JQI_1,7ABK_1 148 4
8QQD_1,7ABK_1 138 4

Newick tree

 
[
	1JQI_1:85.00,
	[
		7ABK_1:69,8QQD_1:69
	]:16.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{513 }{\log_{20} 513}-\frac{125}{\log_{20}125})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 1JQI_1 8QQD_1 145 94
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]