CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4WWX_1 1QIG_1 4OHO_1 Letter Amino acid
20 17 8 Y Tyrosine
22 35 16 N Asparagine
27 26 27 D Aspartic acid
32 44 45 G Glycine
25 34 28 F Phenylalanine
44 43 47 S Serine
43 24 51 A Alanine
28 21 41 I Isoleucine
61 49 69 L Leucine
42 25 34 K Lycine
19 15 17 M Methionine
15 8 9 C Cysteine
21 14 21 H Histidine
27 22 38 T Threonine
6 14 6 W Tryptophan
36 40 46 V Valine
45 24 26 R Arginine
21 16 39 Q Glutamine
58 34 42 E Glutamic acid
26 32 28 P Proline

4WWX_1|Chains A[auth B], C[auth E]|V(D)J recombination-activating protein 1|Mus musculus (10090)
>1QIG_1|Chain A|ACETYLCHOLINESTERASE|Torpedo californica (7787)
>4OHO_1|Chains A, B|Glucokinase regulatory protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4WWX , Knot 244 618 0.84 40 272 577
APRQHLLSLTRRAQKHRLRELKIQVKEFADKEEGGDVKAVCLTLFLLALRARNEHRQADELEAIMQGRGSGLQPAVCLAIRVNTFLSCSQYHKMYRTVKAITGRQIFQPLHALRNAEKVLLPGYHPFEWQPPLKNVSSRTDVGIIDGLSGLASSVDEYPVDTIAKRFRYDSALVSALMDMEEDILEGMRSQDLDDYLNGPFTVVVKESCDGMGDVSEKHGSGPAVPEKAVRFSFTVMRITIEHGSQNVKVFEEPKPNSELCCKPLCLMLADESDHETLTAILSPLIAEREAMKSSELTLEMGGIPRTFKFIFRGTGYDEKLVREVEGLEASGSVYICTLCDTTRLEASQNLVFHSITRSHAENLQRYEVWRSNPYHESVEELRDRVKGVSAKPFIETVPSIDALHCDIGNAAEFYKIFQLEIGEVYKHPNASKEERKRWQATLDKHLRKRMNLKPIMRMNGNFARKLMTQETVDAVCELIPSEERHEALRELMDLYLKMKPVWRSSCPAKECPESLCQYSFNSQRFAELLSTKFKYRYEGKITNYFHKTLAHVPEIIERDGSIGAWASEGNESGNKLFRRFRKMNARQSKCYEMEDVLKHHWLYTSKYLQKFMNAHNA
1QIG , Knot 216 537 0.84 40 266 516
DDHSELLVNTKSGKVMGTRVPVLSSHISAFLGIPFAEPPVGNMRFRRPEPKKPWSGVWNASTYPNNCQQYVDEQFPGFSGSEMWNPNREMSEDCLYLNIWVPSPRPKSTTVMVWIYGGGFYSGSSTLDVYNGKYLAYTEEVVLVSLSYRVGAFGFLALHGSQEAPGNVGLLDQRMALQWVHDNIQFFGGDPKTVTIFGESAGGASVGMHILSPGSRDLFRRAILQSGSPNCPWASVSVAEGRRRAVELGRNLNCNLNSDEELIHCLREKKPQELIDVEWNVLPFDSIFRFSFVPVIDGEFFPTSLESMLNSGNFKKTQILLGVNKDEGSFFLLYGAPGFSKDSESKISREDFMSGVKLSVPHANDLGLDAVTLQYTDWMDDNNGIKNRDGLDDIVGDHNVICPLMHFVNKYTKFGNGTYLYFFNHRASNLVWPEWMGVIHGYEIEFVFGLPLVKELNYTAEEEALSRRIMHYWATFAKTGNPNEPHSQESKWPLFTTKEQKFIDLNTEPMKVHQRLRVQMCVFWNQFLPKLLNATAC
4OHO , Knot 252 638 0.85 40 270 590
MAHHHHHHDEVDMPGTKRFQHVIETPEPGKWELSGYEAAVPITEKSNPLTQDLDKADAENIVRLLGQCDAEIFQEEGQALSTYQRLYSESILTTMVQVAGKVQEVLKEPDGGLVVLSGGGTSGRMAFLMSVSFNQLMKGLGQKPLYTYLIAGGDRSVVASREGTEDSALHGIEELKKVAAGKKRVIVIGISVGLSAPFVAGQMDCCMNNTAVFLPVLVGFNPVSMARNDPIEDWSSTFRQVAERMQKMQEKQKAFVLNPAIGPEGLSGSSRMKGGSATKILLETLLLAAHKTVDQGIAASQRCLLEILRTFERAHQVTYSQSPKIATLMKSVSTSLEKKGHVYLVGWQTLGIIAIMDGVECIHTFGADFRDVRGFLIGDHSDMFNQKAELTNQGPQFTFSQEDFLTSILPSLTEIDTVVFIFTLDDNLTEVQTIVEQVKEKTNHIQALAHSTVGQTLPIPLKKLFPSIISITWPLLFFEYEGNFIQKFQRELSTKWVLNTVSTGAHVLLGKILQNHMLDLRISNSKLFWRALAMLQRFSGQSKARCIESLLRAIHFPQPLSDDIRAAPISCHVQVAHEKEQVIPIALLSLLFRCSITEAQAHLAAAPSVCEAVRSALAGPGQKRTADPLEILEPDVQG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4WWX_1)}(2) \setminus P_{f(1QIG_1)}(2)|=85\), \(|P_{f(1QIG_1)}(2) \setminus P_{f(4WWX_1)}(2)|=79\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001101000100001001010100110000110101101011111101000000100101110101011011101110100110000000100010110100110110110010011111001101011100100000111101101110010001100110010000111011101000110110000100010111011100000111010000101111100110101011010100100010110010100010001101111000000010111011110001100001010111110010111010100001100101101010101001000001010001110010000100100001100010000100100010110101110011010110001101101001101011010001010000000101010001000101011101010110011000010110011100000011001101010101110000110001001000010000110110001000001010001000110110110001011111001000100110010010100000001001100011000001001101001
Pair \(Z_2\) Length of longest common subsequence
4WWX_1,1QIG_1 164 4
4WWX_1,4OHO_1 134 4
1QIG_1,4OHO_1 156 4

Newick tree

 
[
	1QIG_1:83.91,
	[
		4WWX_1:67,4OHO_1:67
	]:16.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1155 }{\log_{20} 1155}-\frac{537}{\log_{20}537})=159.\)
Status Protein1 Protein2 d d1/2
Query variables 4WWX_1 1QIG_1 205 194.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]