CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8PXB_1 7HIB_1 5XXM_1 Letter Amino acid
22 8 35 P Proline
8 2 10 W Tryptophan
8 4 5 C Cysteine
76 8 67 L Leucine
24 12 40 K Lycine
44 17 62 A Alanine
18 9 44 D Aspartic acid
54 3 30 F Phenylalanine
23 7 25 Y Tyrosine
19 2 24 Q Glutamine
27 11 51 E Glutamic acid
36 6 42 I Isoleucine
17 2 12 H Histidine
19 5 30 M Methionine
33 11 48 S Serine
40 11 45 T Threonine
28 16 51 V Valine
14 9 39 R Arginine
25 8 33 N Asparagine
39 12 67 G Glycine

8PXB_1|Chains A, B|Sodium/hydrogen exchanger 9|Equus caballus (9796)
>7HIB_1|Chains A, B, C, D|Non-structural protein 3|Chikungunya virus (37124)
>5XXM_1|Chains A, B|Periplasmic beta-glucosidase|Bacteroides thetaiotaomicron (strain ATCC 29148 / DSM 2079 / NCTC 10582 / E50 / VPI-5482) (226186)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8PXB , Knot 222 574 0.82 40 267 523
MSEKDEYQFQHQGAVELLVFNFLLILTILTIWLFKNHRFRFLHETGGAMVYGLIMGLILRYATAPTDIESGTVYDCGKLAFSPSTLLINITDQVYEYKYKREISQHNINPHLGNAILEKMTFDPEIFFNVLLPPIIFHAGYSLKKRHFFQNLGSILTYAFLGTAISCIVIGLIMYGFVKAMVYAGQLKNGDFHFTDCLFFGSLMSATDPVTVLAIFHELHVDPDLYTLLFGESVLNDAVAIVLTYSISIYSPKENPNAFDAAAFFQSVGNFLGIFAGSFAMGSAYAVVTALLTKFTKLCEFPMLETGLFFLLSWSAFLSAEAAGLTGIVAVLFCGVTQAHYTYNNLSLDSKMRTKQLFEFMNFLAENVIFCYMGLALFTFQNHIFNALFILGAFLAIFVARACNIYPLSFLLNLGRKHKIPWNFQHMMMFSGLRGAIAFALAIRDTESQPKQMMFSTTLLLVFFTVWVFGGGTTPMLTWLQIRVGVDLDEDLKERPSSHQEANNLEKSTTKTESAWLFRMWYGFDHKYLKPILTHSGPPLTTTLPEWCGPISRLLTSPQAYGEQLKEGENLYFQ
7HIB , Knot 82 163 0.85 40 130 161
GAMAPSYRVKRMDIAKNDEECVVNAANPRGLPGDGVCKAVYKKWPESFKNSATPVGTAKTVMCGTYPVIHAVGPNFSNYTESEGDRELAAAYREVAKEVTRLGVNSVAIPLLSTGVYSGGKDRLTQSLNHLFTAMDSTDADVVIYCRDKEWEKKISEAIQMRT
5XXM , Knot 288 760 0.83 40 284 686
MAAQKSPQDMDRFIDALMKKMTVEEKIGQLNLPVTGEITTGQAKSSDIAAKIKRGEVGGLFNLKGVEKIRDVQKQAVEQSRLGIPLLFGMDVIHGYETMFPIPLGLSCTWDMTAIEESARIAAIEASADGISWTFSPMVDISRDPRWGRVSEGSGEDPFLGAMIAEAMVLGYQGKDMQRNDEIMACVKHFALYGAGEGGRDYNTVDMSRQRMFNEYMLPYEAAVEAGVGSVMASFNEVDGVPATANKWLMTDVLRGQWGFNGFVVTDYTGISEMIDHGIGDLQTVSARAINAGVDMDMVSEGFVSTLKKSIQEGKVSMETLNTACRRILEAKYKLGLFDNPYKYCDLKRPARDIFTKAHRDAARRIAAESFVLLKNDNVTLRPGTPAEPLLPFNPKGNIAVIGPLADSRTNMPGTWSVAAVLDRCPSLVEGLKEMTAGKANILYAKGSNLISDASYEERATMFGRSLNRDNRTDEQLLNEALTVANQSDIIIAALGESSEMSGESSSRTDLNIPDVQQNLLKELLKTGKPVVLVLFTGRPLTLTWEQEHVPAILNVWFGGSEAAYAIGDALFGYVNPGGKLTMSFPKNVGQIPLYYAHKNTGRPLAQGKWFEKFRSNYLDVDNEPLYPFGYGLSYTTFSYGDIDLSRSTIDMTGELTAAVMVTNTGTWPGSEVVQLYIRDLVGSTTRPVKELKGFQKIFLEPGQSEIVRFKIAPEMLRYYNYDLQLVAEPGEFEVMIGTNSRDVKSARFTLKLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8PXB_1)}(2) \setminus P_{f(7HIB_1)}(2)|=169\), \(|P_{f(7HIB_1)}(2) \setminus P_{f(8PXB_1)}(2)|=32\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000010001110111101111101101111000010110001111101111111100101100100101000101110100111010001000000001000010101101110010101011101111111101100100001100110110011110110011111110111011101101001010100011110110100110111110010101010011110011001111110001010010001011011111001101111111011110101110111001001001111001111110101110101111011111110110010000001010001000011011011100111001111110100011011111111111111010010110111011000011101001111011011111111100000010011100011111101111111001110110101110100010001000001001000000000111101101100001011100011110001101011100110010101001001001010
Pair \(Z_2\) Length of longest common subsequence
8PXB_1,7HIB_1 201 3
8PXB_1,5XXM_1 155 4
7HIB_1,5XXM_1 198 4

Newick tree

 
[
	7HIB_1:10.13,
	[
		8PXB_1:77.5,5XXM_1:77.5
	]:28.63
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{737 }{\log_{20} 737}-\frac{163}{\log_{20}163})=162.\)
Status Protein1 Protein2 d d1/2
Query variables 8PXB_1 7HIB_1 199 128
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]