CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1KJU_1 9CHX_1 7HPQ_1 Letter Amino acid
33 14 2 M Methionine
86 26 15 A Alanine
23 18 3 Q Glutamine
68 26 13 G Glycine
93 43 19 L Leucine
45 9 5 P Proline
22 15 7 Y Tyrosine
82 44 22 V Valine
48 17 2 R Arginine
50 28 9 D Aspartic acid
78 22 6 E Glutamic acid
53 27 13 K Lycine
57 22 11 S Serine
62 20 6 T Threonine
36 19 16 N Asparagine
24 12 3 C Cysteine
37 33 5 F Phenylalanine
13 8 0 W Tryptophan
12 6 6 H Histidine
72 40 6 I Isoleucine

1KJU_1|Chain A|Sarcoplasmic/endoplasmic reticulum calcium ATPase 1a|Oryctolagus cuniculus (9986)
>9CHX_1|Chain A|Beta-2 adrenergic receptor,Calcineurin subunit B type 1|Homo sapiens (9606)
>7HPQ_1|Chains A, B|Non-structural protein 3|Severe acute respiratory syndrome coronavirus 2 (2697049)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1KJU , Knot 367 994 0.85 40 320 881
MEAAHSKSTEECLAYFGVSETTGLTPDQVKRHLEKYGHNELPAEEGKSLWELVIEQFEDLLVRILLLAACISFVLAWFEEGEETITAFVEPFVILLILIANAIVGVWQERNAENAIEALKEYEPEMGKVYRADRKSVQRIKARDIVPGDIVEVAVGDKVPADIRILSIKSTTLRVDQSILTGESVSVIKHTEPVPDPRAVNQDKKNMLFSGTNIAAGKALGIVATTGVSTEIGKIRDQMAATEQDKTPLQQKLDEFGEQLSKVISLICVAVWLINIGHFNDPVHGGSWIRGAIYYFKIAVALAVAAIPEGLPAVITTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVCKMFIIDKVDGDFCSLNEFSITGSTYAPEGEVLKNDKPIRSGQFDGLVELATICALCNDSSLDFNETKGVYEKVGEATETALTTLVEKMNVFNTEVRNLSKVERANACNSVIRQLMKKEFTLEFSRDRKSMSVYCSPAKSSRAAVGNKMFVKGAPEGVIDRCNYVRVGTTRVPMTGPVKEKILSVIKEWGTGRDTLRCLALATRDTPPKREEMVLDDSSRFMEYETDLTFVGVVGMLDPPRKEVMGSIQLCRDAGIRVIMITGDNKGTAIAICRRIGIFGENEEVADRAYTGREFDDLPLAEQREACRRACCFARVEPSHKSKIVEYLQSYDEITAMTGDGVNDAPALKKAEIGIAMGSGTAVAKTASEMVLADDNFSTIVAAVEEGRAIYNNMKQFIRYLISSNVGEVVCIFLTAALGLPEALIPVQLLWVNLVTDGLPATALGFNPPDLDIMDRPPRSPKEPLISGWLFFRYMAIGGYVGAATVGAAAWWFMYAEDGPGVTYHQLTHFMQCTEDHPHFEGLDCEIFEAPEPMTMALSVLVTIEMCNALNSLSENQSLMRMPPWVNIWLLGSICLSMSLHFLILYVDPLPMIFKLKALDLTQWLMVLKISLPVIGLDEILKFIARNYLEG
9CHX , Knot 184 449 0.83 40 237 423
DEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINCYAEETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLDADEIKRLGKRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNLKDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDVKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRDDLKAYGNGY
7HPQ , Knot 80 169 0.81 38 119 162
SMVNSFSGYLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESDDYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLAPLLSAGIFGADPIHSLRVCVDTVRTNVYLAVFDKNLYDKLVSSFL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1KJU_1)}(2) \setminus P_{f(9CHX_1)}(2)|=115\), \(|P_{f(9CHX_1)}(2) \setminus P_{f(1KJU_1)}(2)|=32\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011000000001101110000110100100010001000111001001101110010011101111110101111110010001011101111111111011111100001001101100001011010010000100101001111011011110011101011010000101000110100101100001110101100000011101001111011111100110001101000111000000110001001100100110110111111011010011011011011100101111111111101111110001111000110001110011010011000110000010100001010011110010101001001010100011010110000110010101110110101100000101000011000110100011001100101100010010010010100011001100010101000000101000110000111100111011101110000010110001110111000110110011010001001111000011000011100000110000010111111110110001110101000111011110100010111100011111000011001001001001111000010001001101010000011001000001011010110011110010111111010111001001111000100111110010110001001100110001101101110111111011111011110110011110111101101011001100100111011111001111101111011111111101001111000010011000000101011000110110110111011101010011001000001101111101111101010101011110101111110101101001111101011111100110111000101
Pair \(Z_2\) Length of longest common subsequence
1KJU_1,9CHX_1 147 4
1KJU_1,7HPQ_1 227 4
9CHX_1,7HPQ_1 190 3

Newick tree

 
[
	7HPQ_1:11.15,
	[
		1KJU_1:73.5,9CHX_1:73.5
	]:39.65
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1443 }{\log_{20} 1443}-\frac{449}{\log_{20}449})=254.\)
Status Protein1 Protein2 d d1/2
Query variables 1KJU_1 9CHX_1 321 230
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]