CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ONM_1 8WFU_1 9QBB_1 Letter Amino acid
8 23 161 V Valine
5 18 169 R Arginine
2 2 14 C Cysteine
12 34 188 G Glycine
7 35 202 K Lycine
6 28 118 Y Tyrosine
1 11 152 Q Glutamine
6 11 79 H Histidine
6 10 71 M Methionine
11 14 223 T Threonine
7 13 134 P Proline
6 21 241 S Serine
3 13 36 W Tryptophan
7 25 155 A Alanine
4 22 195 N Asparagine
13 29 211 E Glutamic acid
10 24 243 I Isoleucine
9 43 183 D Aspartic acid
12 42 341 L Leucine
3 26 107 F Phenylalanine

5ONM_1|Chain A|L-ectoine synthase|Paenibacillus lautus (1401)
>8WFU_1|Chains A, B, C, D|Beta-glucosidase|Thermoanaerobacterium saccharolyticum (28896)
>9QBB_1|Chain A|Lymphostatin|Escherichia coli O127:H6 (168807)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ONM , Knot 71 138 0.84 40 108 131
MIVKHLEEIVDTKDDIDTKTWNSRRLLLTKDGMGFSLNDTLIKAGTETLIWYKNHVEAVYCIEGEGEIEVIGGETYPITPGMMYALDGHEKHYLRARSQMRMVCVFNPPLTGAEVHDEEGTYPLLAPISDGSAWSHPF
8WFU , Knot 185 444 0.84 40 234 421
MKDFSKDFLFGVATASYQVEGAYNEDGRTMSIWDTFSRQDGKVYKSHNGDVACDHYHLYKDDVKMMKDLGIEAYRFSIAWPRIFPAKGQYNPKGMDFYKRLTDELLKNDIKPFATIYHWDLPQWADDLGGWLNREIVEWYGEYAEKLFSELGGYIKNWITLNEPWCSSFLSYFIGEHAPGHKDLGEALLVSHNLLLSHGKAVEIFRGLNLDDSKIGITLNLNEVFPASDSDDDKVAAQIADGFQNRWFLDPLFKGKYPQDMVEYFGKYAKVDFINDEDLKLISQKLDFLGVNYYTRAVVQKGNDGLLDAVQIDPGNERTEMGWEIYPESLYNILMRLKREYTYDMPLYITENGAAFNDVVEDDGRVHDEKRVEFLKQHFKEAKRFLNDGGNLKGYFVWSLMDNFEWAHGYSKRFGIVYVDYETEKRILKDSALWYKDLISTRTI
9QBB , Knot 1018 3223 0.85 40 371 2315
MRLPEKVLFPPVTSGLSGQEKQKKPKSITGFQENYQRNIRPIKTASEARLRFFDKMVSKENSLEDVVSLGEMIQKEIYGHEQRTFSPVHHTGNWKSSLLHNALLGLANVYNGLRETEYPNTFNRDGIKSTNSFRDNLLTKTRTPRDNFEEGIKHPEHATIPYDNDNESNKLLKAGKIAGNNNELLMEIKKESQSDHQIPLSDKFLKRKKRSPVAEDKVQNSLTPENFVQKISLSDELKTKYANEIIEIKRIMGEYNLLPDKNSRNGLKLLQKQADLLKIIMEDTSVTENTFKNIEIAITDIKREYYSHTVDIEKNIHAIWVAGSPPESISDYIKTFLKTYKEFTYYLWVDEKAFGAAKFTSVLKQIAFDLACRTIQQNTPQKNIDFINLYNEIRKKYNNNPSGQQEYLNKLRELYATYQKISTPLKHMFNSFFLENMIKLQDNFFNYCIVKGVTEINDELRINYLKNVIKLSDDDIGNYQKTINDNKDRVKKLILDLQKQFGENRISIKDVNSLTSLSKSENNHNYQTEMLLRWNYPAASDLLRMYILKEHGGIYTDTDMMPAYSKQVIFKIMMQTNGDNRFLEDLKLRRAISDGVLRYVNNQNIDEVNYNEISDADKNIIKKILTEISKMPEDSIFTKINTRIPRDTMPILRRYHLWPDGWNIRGLNGFMLSHKGSEVIDAVIAGQNQAYRELRRIRDNIHSEIYFKQTDELSSLPDTDKIGGILVKKYLSGSLFSKFRQDTIIPEALSTLQISGPDLIQRKMLQFFRSRGVLGEEFINERKLSDKAYIGVYKTTGTGKYDWLTPESIGVNDVTPADESTWCIGKGRCVDDFLFKDVSTLKTENLPELFLTKIDTDTFFSQWSTKTKKDLQKKIQDLTVRYNELIDSSTIDFKNLYEIDQMLHMIMLEMNDDIAKRSLFSLQVQISEKIRRMTIPVDNIINIYPDLHKKNDNDLSMSIKGFLASNPHTKINILYSNKTEHNIFIKDLFSFAVMENELRDIINNMSKDKTPENWEGRVMLQRYLELKMKDHLSLQSSQEANEFLEISTFIYENDFLREKIEAVKNKMNSHELYFEKIKKEQNTWQDLSTKEQKLQLIKALKEISGNTEKDSHYDRLLDAFFKKHNENIHNKIQRIKDEFKEYSRVAIHNIDKVIFKGQTLDRLYHEGYVFSDINTLSRYTLHGLGITGVHTEENLLPAPSSSLINILKEHYNEDEISAKLPLAYDYILNKKESSSIPVEILNKLSELPPHELLTPVLGQSVNPLGMGYSSDNGKITEQVIVSGADGFDNPISGLIYTYLEDLYNIHVRMREGTLNSQNLRQLLENSVSSCFLTEQSINKLLSEAEKRPYQSLTEIHQHLTGLPTIADATLSLLSVGLPGTGKLLRREQDYGRPPVTAIQDSTFVLPYNFKGIGFNDNIISSAPVASSLHFIAEHAKYTLLSWPEFYRHHAQRWFEMAKGYGSQNIDFHPQSLLVTQEGRCMGLALLYLQTEDTAHYSILQENLMTVSALHQTSNRDKLPLSKDDNSLMTRTYSLIEMLQYQGNKYITNESLLHKTAWNQERITLLFNEKGVKRALISTPNHTLVLQQLEDIYRLTDPNFGHADFLSPIDALKFIEAMIQLTPTLQEYYGLLNKDINKHIQVHYAESDMVWNKLLPENDAGLSTRIQHTTTDRLANLAEPVAVAGISLPVKTLYDIGATLDGRRITSPPTSEQIPSLRLNGDVLNDYLSRTVLTPEQADNIRKILHTQGIRSGTRPIDPEMIRGTQDDLVSSQTRLQRQATRVKQQLAGVLDTLQQHFQNIPRSSGRHLSVENIELADIGSGRFNLQIRDGETLHTTSVEVPEVVSRFQKLSTMLSALPASGIMDFDLGMSVVGVVQYARLLQQGHEDSTLAKINLAMDIKQLSEATLGSMIQIAGNKFLNTEGIQGFRLESAVAEGMRSVATRTGGTMGKALSASARVLELPVLETVLGTWNLYNSVIQLQQATSYSETMAARVQIAFDSISLGLTAASVAFPPLIIATGPIAAIGMGASSIARNVARKEERHTQWLEYKKFLTDGSKHIVVASPERGLLDFSGNKVFGKMVLDLRQSPPLLHGESSFNADRKIGHRPDLGDWQIREKVGYANSISPYSSLAHGYANSKWPRTIPKIPSGEYDTIILGYGHQYQANTEIEYLSNWIVWREAVPDSTSRHKRPPLEVLNSQCTVIAGERKTTVLPLRVLSDLTPECTEQAISLKDYKFILRGGSGGLAVQVGGAGYYDIDANLVAKENTLSFRGLPEEFPLTFDLSKQTQSVMLKTPDDEVPVMTITQKGINTLVGTAAGKDRLIGNDKDNTFHTSSGGGTVISGGGNNRYIIPRDLKTPLTLTLSSNSVSHEIFLPETTLAELKPVAFELSLIYWAGNNINVQPEDEAKLNHFAGNFRVHTRDGMTLEAVSRENGIQLAISLCDVQRWQAVYPEENNRPDAILDRLHDMGWSLTPEVRFQGGETQVSYDPLTRQLVYQLQARYSEFQLAGSRHHTTAVTGTPGSRYIIMKPVTTQILPTQIILAGDNDHPETIDLLEASPVLVEGKKDKNSVILTIATIQYSLQLTISGIEESLPETTRVAIQPQDTRLLGDVLRILPDNGNWVGIFRSGHTPTVNRLENLMALNQVMTFLPRVSGSAEQVLCLENLGGVRKKVEGELLSGKLKGAWKAEGEPTVPVNISDLSIPPYSRLYLIFEGKNNVLLRSKVHAAPLKITSAGEMQLSERQWQQQEHIIVKPDNEAPSLILSEFRRFTISSDKTFSLKLMCHQGMVRIDRRSLSVRLFYLREQPGIGSLRLTFRDFFTEVMDTTDREILEKELRPILIGDTHRFINAAYKNHLNIQLGDGVLNLADIVAEYARIQKEETSKILYQYQGAMKKKTDGPSVVEDAIMTTTVTTDSGELFPTFHPWYTDDLSGRYKSVPMARKADTLYHLTPKGDLQIIYQVATKMVNQAMIVSLPNYRHEWEKYNLSILSEIPQNNNTVVHSILRVNGPTMQVRTIDYRGTDENNPIVSFSDTTFINGEQMLSYDSHSSGRVYSREEYMMWELQQRVSEASSARTQDYWLMDAAVRNGEWKITPELLRHTPGYIRSTVSKWSRGWLKTGTILQTPEDRNTDVYLTTIQNNVFSRQGGGYQVYYRIDGMAGADIADNAPGETRCTLRPGTCFEVTSVDERHYEWNIIYVTLKTCGWSRNGQSKTPNGDNLFN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ONM_1)}(2) \setminus P_{f(8WFU_1)}(2)|=34\), \(|P_{f(8WFU_1)}(2) \setminus P_{f(5ONM_1)}(2)|=160\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111001001100000100001000011100011110100011011000111000010110010101010111100011011110110100000101000101101101110110100001001111110010110011
Pair \(Z_2\) Length of longest common subsequence
5ONM_1,8WFU_1 194 4
5ONM_1,9QBB_1 267 4
8WFU_1,9QBB_1 145 5

Newick tree

 
[
	5ONM_1:12.07,
	[
		8WFU_1:72.5,9QBB_1:72.5
	]:55.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{582 }{\log_{20} 582}-\frac{138}{\log_{20}138})=129.\)
Status Protein1 Protein2 d d1/2
Query variables 5ONM_1 8WFU_1 166 107
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]