CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5HWP_1 6XLH_1 7WGI_1 Letter Amino acid
32 27 29 R Arginine
23 46 26 D Aspartic acid
19 40 24 T Threonine
71 44 32 A Alanine
27 96 45 E Glutamic acid
41 25 19 G Glycine
13 73 34 K Lycine
4 34 28 F Phenylalanine
16 27 20 P Proline
23 45 17 S Serine
4 5 6 W Tryptophan
4 28 20 N Asparagine
7 0 11 C Cysteine
9 4 10 H Histidine
10 13 15 M Methionine
17 19 14 Y Tyrosine
13 22 18 Q Glutamine
13 47 26 I Isoleucine
41 68 50 L Leucine
32 42 26 V Valine

5HWP_1|Chain A|Hydroxymethylglutaryl-CoA synthase|Myxococcus xanthus (strain DK 1622) (246197)
>6XLH_1|Chains A, B|ATP-dependent molecular chaperone HSC82|Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (559292)
>7WGI_1|Chain A|Squalene synthase|Aspergillus flavus (5059)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5HWP , Knot 164 419 0.78 40 198 370
GHMKKRVGIEALAVAVPSRYVDIEDLARARGVDPAKYTAGLGAREMAVTDPGEDTVALAATAAARLIRQQDVDPSRIGMLVVGTETGIDHSKPVASHVQGLLKLPRTMRTYDTQHACYGGTAGLMAAVEWIASGAGAGKVAVVVCSDIARYGLNTAGEPTQGGGAVALLVSEQPDLLAMDVGLNGVCSMDVYDFWRPVGRREALVDGHYSITCYLEALSGAYRGWREKALAAGLVRWSDALPGEQLARIAYHVPFCKMARKAHTQLRLCDLEDAADAAASTPESREAQAKSAASYDAQVATSLGLNSRIGNVYTASLYLALAGLLQHEAGALAGQRIGLLSYGSGCAAEFYSGTVGEKAAERMAKADLEAVLARRERVSIEEYERLMKLPADAPEAVAPSPGAFRLTEIRDHRRQYAEG
6XLH , Knot 266 705 0.82 38 250 607
MAGETFEFQAEITQLMSLIINTVYSNKEIFLRELISNASDALDKIRYQALSDPKQLETEPDLFIRITPKPEEKVLEIRDSGIGMTKAELINNLGTIAKSGTKAFMEALSAGADVSMIGQFGVGFYSLFLVADRVQVISKNNEDEQYIWESNAGGSFTVTLDEVNERIGRGTVLRLFLKDDQLEYLEEKRIKEVIKRHSEFVAYPIQLLVTKEVEKEVPIPEEEKKDEEKKDEDDKKPKLEEVDEEEEEKKPKTKKVKEEVQELEELNKTKPLWTRNPSDITQEEYNAFYKSISNDWEDPLYVKHFSVEGQLEFRAILFIPKRAPFDLFESKKKKNNIKLYVRRVFITDEAEDLIPEWLSFVKGVVDSEDLPLNLSREMLQQNKIMKVIRKNIVKKLIEAFNEIAEDSEQFDKFYSAFAKNIKLGVHEDTQNRAALAKLLRYNSTKSVDELTSLTDYVTRMPEHQKNIYYITGESLKAVEKSPFLDALKAKNFEVLFLTDPIDEYAFTQLKEFEGKTLVDITKDFELEETDEEKAEREKEIKEYEPLTKALKDILGDQVEKVVVSYKLLDAPAAIRTGQFGWSANMERIMKAQALRDSSMSSYMSSKKTFEISPKSPIIKELKKRVDEGGAQDKTVKDLTNLLFETALLTSGFSLEEPTSFASRINRLISLGLNIDEDEETETAPEASTEAPVEEVPADTEMEEVD
7WGI , Knot 198 470 0.86 40 257 451
MRATEVLYYMLRPSQLRSIVQWKVWHNPVHERNVNNETETQKACFKFLDLTSRSFSAVIKELHPELLLPVCVFYLVLRGLDTIEDDTSIPLKTKEPMLREFKDYLEQDGWTFDGNRPEEKDRELLVQFHNVITEFKNMKPAYREIVKDITDKMGNGMADYCRKAEFEDASVKTIEEYDLYCYYVAGLVGEGLTRLFVEAEFGNPALLSRPRLHKSMGLFLQKTNIIRDVREDHDDDRHFWPKEIWSKYVTEFEDLFKPENRETALNCGSEMVLNALEHAEECLFYLAGLREQSVFNFCAIPQAMAIATLELCFRNPDMFDRNIKITKGEACQLMMESTQNLHVLCDTFRRYARRIHKKNTPKDPNFLKISIVCGKIEKFIDTIFPQQTAAQAKLKVQGEKSEAEKEKARQEAETRQDLYFMLALMGVIVLIVSIIMLTAAWLLGARFDLAFQELKSGNFRPPAKQIPGEL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5HWP_1)}(2) \setminus P_{f(6XLH_1)}(2)|=51\), \(|P_{f(6XLH_1)}(2) \setminus P_{f(5HWP_1)}(2)|=103\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100011101111111000101001101011011000111110011100110001111101110110000101001111111000110000111001011101100100000001001101111111011101111101111100011001100110100111111111000101111011101100101001101110001110100010001011011001100011111110100111100110110011100110010001010010011011100100001010011000101100111000110100101011111110001111110011110010101101001011001100110101011110000101000001101110110111101111010010000000101
Pair \(Z_2\) Length of longest common subsequence
5HWP_1,6XLH_1 154 5
5HWP_1,7WGI_1 167 4
6XLH_1,7WGI_1 131 4

Newick tree

 
[
	5HWP_1:84.68,
	[
		6XLH_1:65.5,7WGI_1:65.5
	]:19.18
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1124 }{\log_{20} 1124}-\frac{419}{\log_{20}419})=184.\)
Status Protein1 Protein2 d d1/2
Query variables 5HWP_1 6XLH_1 228 180.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]