CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5WRZ_1 4BAC_1 1MXD_1 Letter Amino acid
25 18 25 I Isoleucine
10 4 4 M Methionine
14 31 16 T Threonine
16 20 21 N Asparagine
2 3 5 C Cysteine
14 20 7 Q Glutamine
38 45 27 L Leucine
1 7 26 W Tryptophan
20 20 29 A Alanine
23 20 45 G Glycine
10 13 12 H Histidine
10 10 20 F Phenylalanine
16 14 37 Y Tyrosine
33 28 19 K Lycine
16 32 20 P Proline
29 30 21 S Serine
21 27 31 V Valine
11 20 14 R Arginine
25 19 33 D Aspartic acid
18 15 23 E Glutamic acid

5WRZ_1|Chains A, B|Poly [ADP-ribose] polymerase 1|Homo sapiens (9606)
>4BAC_1|Chains A, B|INTEGRASE|HUMAN SPUMARETROVIRUS (11963)
>1MXD_1|Chain A|alpha amylase|Pyrococcus woesei (2262)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5WRZ , Knot 152 352 0.84 40 206 335
HMKSKLPKPVQDLIKMIFDVESMKKAMVEYEIDLQKMPLGKLSKRQIQAAYSILSEVQQAVSQGSSDSQILDLSNRFYTLIPHDFGMKKPPLLNNADSVQAKAEMLDNLLDIEVAYSLLRGGSDDSSKDPIDVNYEKLKTDIKVVDRDSEEAEIIRKYVKNTHATTHNAYDLEVIDIFKIEREGECQRYKPFKQLHNRRLLWHGSRTTNFAGILSQGLRIAPPEAPVTGYMFGKGIYFADMVSKSANYCHTSQGDPIGLILLGEVALGNMYELKHASHISKLPKGKHSVKGLGKTTPDPSANISLDGVDVPLGTGISSGVNDTSLLYNEYIVYDIAQVNLKYLLKLKFNFKT
4BAC , Knot 170 396 0.85 40 223 378
GSHMCNTKKPNLDAELDQLLQGHYIKGYPKQYTYFLEDGKVKVSRPEGVKIIPPQSDRQKIVLQAHNLAHTGREATLLKIANLYWWPNMRKDVVKQLGRCQQCLITNASNKASGPILRPDRPQKPFDKFFIDYIGPLPPSQGYLYVLVVVDGMTGFTWLYPTKAPSTSATVKSLNVLTSIAIPKVIHSDQGAAFTSSTFAEWAKERGIHLEFSTPYHPQSGSKVERKNSDIKRLLTKLLVGRPTKWYDLLPVVQLALNNTYSPVLKYTPHQLLFGIDSNTPFANQDTLDLTREEELSLLQEIRTSLYHPSTPPASSRSWSPVVGQLVQERVARPASLRPRWHKPSTVLKVLNPRTVVILDHLGNNRTVSIDNLKPTSHQNGTTNDTATMDHLEKNE
1MXD , Knot 181 435 0.84 40 240 423
AKYLELEEGGVIMQAFYWDVPGGGIWWDHIRSKIPEWYEAGISAIWLPPPSKGMSGGYSMGYDPYDYFDLGEYYQKGTVETRFGSKEELVRLIQTAHAYGIKVIADVVINHRAGGDLEWNPFVGDYTWTDFSKVASGKYTANYLDFHPNELHCCDEGTFGGFPDICHHKEWDQYWLWKSNESYAAYLRSIGFDGWRFDYVKGYGAWVVRDWLNWWGGWAVGEYWDTNVDALLSWAYESGAKVFDFPLYYKMDEAFDNNNIPALVYALQNGQTVVSRDPFKAVTFVANHDTDIIWNKYPAYAFILTYEGQPVIFYRDFEEWLNKDKLINLIWIHDHLAGGSTTIVYYDNDELIFVRNGDSRRPGLITYINLSPNWVGRWVYVPKFAGACIHEYTGNLGGWVDKRVDSSGWVYLEAPPHDPANGYYGYSVWSYCGVG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5WRZ_1)}(2) \setminus P_{f(4BAC_1)}(2)|=69\), \(|P_{f(4BAC_1)}(2) \setminus P_{f(5WRZ_1)}(2)|=86\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0100011011001101110100100111000101001111010000101100110010011001000001101000100111001110011110010010101011001101011001101100000001101000010001011000000101100010000100001001011011010001000000110010000111010000011111001101111011101011101101101100010000000101111111101111010010010010011010001011100010101010101101111011001100001100001100110101001101010100
Pair \(Z_2\) Length of longest common subsequence
5WRZ_1,4BAC_1 155 3
5WRZ_1,1MXD_1 176 3
4BAC_1,1MXD_1 159 3

Newick tree

 
[
	1MXD_1:85.87,
	[
		5WRZ_1:77.5,4BAC_1:77.5
	]:8.37
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{748 }{\log_{20} 748}-\frac{352}{\log_{20}352})=107.\)
Status Protein1 Protein2 d d1/2
Query variables 5WRZ_1 4BAC_1 136 129
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]