CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8VLX_1 5TFC_1 5JXM_1 Letter Amino acid
89 3 18 H Histidine
126 18 3 K Lycine
66 8 7 M Methionine
188 5 34 P Proline
170 13 23 T Threonine
232 12 25 V Valine
144 10 27 D Aspartic acid
70 4 4 C Cysteine
154 8 38 R Arginine
64 8 8 Y Tyrosine
140 20 40 G Glycine
96 11 13 F Phenylalanine
306 21 30 S Serine
91 6 3 N Asparagine
204 20 23 E Glutamic acid
149 18 3 I Isoleucine
419 24 49 L Leucine
32 2 3 W Tryptophan
239 12 49 A Alanine
208 6 5 Q Glutamine

8VLX_1|Chain A|Huntingtin|Homo sapiens (9606)
>5TFC_1|Chain A|Cystic fibrosis transmembrane conductance regulator|Homo sapiens (9606)
>5JXM_1|Chain A|PriB|Streptomyces sp. RM-5-8 (1429103)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8VLX , Knot 987 3187 0.83 40 379 2157
MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFGNFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSIVELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAVPSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQENKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVGAAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDWMGTIRTLTGNTFSLADCIPLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQLIIDVLTLRNSSYWLVRTELLETLAEIDFRLVSFLEAKAENLHRGAHHYTGLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRLVPKLFYKCDQGQADPVVAVARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTMENNLSRVIAAVSHELITSTTRALTFGCCEALCLLSTAFPVCIWSLGWHCGVPPLSASDESRKSCTVGMATMILTLLSSAWFPLDLSAHQDALILAGNLLAASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKVINICAHVLDDVAPGPAIKAALPSLTNPPSLSPIRRKGKEKEPGEQASVPLSPKKGSEASAASRQSDTSGPVTTSKSSSLGSFYHLPSYLKLHDVLKATHANYKVTLDLQNSTEKFGGFLRSALDVLSQILELATLQDIGKCVEEILGYLKSCFSREPMMATVCVQQLLKTLFGTNLASQFDGLSSNPSKSQGRAQRLGSSSVRPGLYHYCFMAPYTHFTQALADASLRNMVQAEQENDTSGWFDVLQKVSTQLKTNLTSVTKNRADKNAIHNHIRLFEPLVIKALKQYTTTTCVQLQKQVLDLLAQLVQLRVNYCLLDSDQVFIGFVLKQFEYIEVGQFRESEAIIPNIFFFLVLLSYERYHSKQIIGIPKIIQLCDGIMASGRKAVTHAIPALQPIVHDLFVLRGTNKADAGKELETQKEVVVSMLLRLIQYHQVLEMFILVLQQCHKENEDKWKRLSRQIADIILPMLAKQQMHIDSHEALGVLNTLFEILAPSSLRPVDMLLRSMFVTPNTMASVSTVQLWISGILAILRVLISQSTEDIVLSRIQELSFSPYLISCTVINRLRDGDSTSTLEEHSEGKQIKNLPEETFSRFLLQLVGILLEDIVTKQLKVEMSEQQHTFYCQELGTLLMCLIHIFKSGMFRRITAAATRLFRSDGCGGSFYTLDSLNLRARSMITTHPALVLLWCQILLLVNHTDYRWWAEVQQTPKRHSLSSTKLLSPQMSGEEEDSDLAAKLGMCNREIVRRGALILFCDYVCQNLHDSEHLTWLIVNHIQDLISLSHEPPVQDFISAVHRNSAASGLFIQAIQSRCENLSTPTMLKKTLQCLEGIHLSQSGAVLTLYVDRLLCTPFRVLARMVDILACRRVEMLLAANLQSSMAQLPMEELNRIQEYLQSSGLAQRHQRLYSLLDRFRLSTMQDSLSPSPPVSSHPLDGDGHVSLETVSPDKDWYVHLVKSQCWTRSDSALLEGAELVNRIPAEDMNAFMMNSEFNLSLLAPCLSLGMSEISGGQKSALFEAAREVTLARVSGTVQQLPAVHHVFQPELPAEPAAYWSKLNDLFGDAALYQSLPTLARALAQYLVVVSKLPSHLHLPPEKEKDIVKFVVATLEALSWHLIHEQIPLSLDLQAGLDCCCLALQLPGLWSVVSSTEFVTHACSLIYCVHFILEAVAVQPGEQLLSPERRTNTPKAISEEEEEVDPNTQNPKYITAACEMVAEMVESLQSVLALGHKRNSGVPAFLTPLLRNIIISLARLPLVNSYTRVPPLVWKLGWSPKPGGDFGTAFPEIPVEFLQEKEVFKEFIYRINTLGWTSRTQFEETWATLLGVLVTQPLVMEQEESPPEEDTERTQINVLAVQAITSLVLSAMTVPVAGNPAVSCLEQQPRNKPLKALDTRFGRKLSIIRGIVEQEIQAMVSKRENIATHHLYQAWDPVPSLSPATTGALISHEKLLLQINPERELGSMSYKLGQVSIHSVWLGNSITPLREEEWDEEEEEEADAPAPSSPPTSPVNSRKHRAGVDIHSCSQFLLELYSRWILPSSSARRTPAILISEVVRSLLVVSDLFTERNQFELMYVTLTELRRVHPSEDEILAQYLVPATCKAAAVLGMDKAVAEPVSRLLESTLRSSHLPSRVGALHGVLYVLECDLLDDTAKQLIPVISDYLLSNLKGIAHCVNIHSQQHVLVMCATAFYLIENYPLDVGPEFSASIIQMCGVMLSGSEESTPSIIYHCALRGLERLLLSEQLSRLDAESLVKLSVDRVNVHSPHRAMAALGLMLTCMYTGKEKVSPGRTSDPNPAAPDSESVIVAMERVSVLFDRIRKGFPCEARVVARILPQFLDDFFPPQDIMNKVIGEFLSNQQPYPQFMATVVYKVFQTLHSTGQSSMVRDWVMLSLSNFTQRAPVAMATWSLSCFFVSASTSPWVAAILPHVISRMGKLEQVDVNLFCLVATDFYRHQIEEELDRRAFQSVLEVVAAPGSPYHRLLTCLRNVHKVTTCAAAENLYFQGDYKDDDDK
5TFC , Knot 107 229 0.84 40 156 224
SLTTTEVVMENVTAFWEEGGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMMIMGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSSYFYGTFSELQNLQPDFSSKLMG
5JXM , Knot 157 405 0.77 40 170 362
MGSSHHHHHHSSGLVPRGSHMGGPMSGFHSGEALLGDLATGQLTRLCEVAGLTEADTAAYTGVLIESLGTSAGRPLSLPPPSRTFLSDDHTPVEFSLAFLPGRAPHLRVLVEPGCSSGDDLAENGRAGLRAVHTMADRWGFSTEQLDRLEDLFFPSSPEGPLALWCALELRSGGVPGVKVYLNPAANGADRAAETVREALARLGHLQAFDALPRADGFPFLALDLGDWDAPRVKIYLKHLGMSAADAGSLPRMSPAPSREQLEEFFRTAGDLPAPGDPGPTEDTGRLAGRPALTCHSFTETATGRPSGYTLHVPVRDYVRHDGEARDRAVAVLREHDMDSAALDRALAAVSPRPLSDGVGLIAYLALVHQRGRPTRVTVYVSSEAYEVRPPRETVPTRDRARARL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8VLX_1)}(2) \setminus P_{f(5TFC_1)}(2)|=229\), \(|P_{f(5TFC_1)}(2) \setminus P_{f(8VLX_1)}(2)|=6\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101001101100100100000000000000000000000000000000000000000000000011111111111011011101011110101111111111111110011001000101000001000101000111001000101001111110111100001000101110001001101110001101010100010001110010111101101101101000010110111010000001000100011111101110110110000101110111101000010100011101101000000000100111011111111100000011111111010011111000100001010111000010101010011010010100000000011011101100110011101100101111110101100001100001011011111100001110000010111100011000000000100011010100010101110011001101100110001000001010010110001000100100001100000010111001110100100100110000000001100110100000111010000011101101000000101111001001100001110010110010000010000100111000100110000010010101100000001111001011010111011001111000101010111100111111101001100100111000001000010011001001010101101110101100110000101101110100101001011001111000100000100011001100011010000000111011101101000001110001100110101011011010100100110000111010001100111011100010100111101101110110000010101111110000010101110000110010100100100100111010010100010011111000110000011011000110110011110110111001111101000000000111101110110011111010100011111101111011001000110000101110000011111100111111001100110110101011001111111011110100110101100010000110010111010010010110000000111000000011010011001010011010010001010100000011111001101100110110100110010011101000100011110101001100111001100101100010000101001100010111000011110001001110101001101000000011101100100010001001000010001100010110111101100000000101000110111011010100011000011111110010010110100001111011111111000000000111110110100111101001100111110111001111010001011001000001110111011000011011111100000000001001000110111111100010100001111100110111100101101110011101001101001011101111110111000000111001001010101100011001001000001000001001001100010011101111110011000101010000001000011011101101100111001011100110001011010010010101001100011111110011111000000111010001000010000110101010000001110111000011001111110001000100000101111001001101000111001101100001101111011000000100101100010010110100011110101001100110111011011100010111110100011011100100100010001110000010011001010010001010111000110101010100101000101011000010000011101101100111001011110001010111101011100101100011101100101101010100111100110101110111010010011101110001101101110011110011001011100000110111101011010110001110101011100001110111110110000110010011001011101111011001101000000101100000010100001001011001110110010011111000001111110111001110110111100000111111011101011101101110111011000011001100100111000001000110111111001111000001100000000101111011001110110111110111001000100011011000110010110111000101110000011000100110111010110011110000111010100011010001101010011110010110000100000001011110011001100000011101000001110100011110001000111110011001111001100000101101010010010100001110011110001111111001110110011000100001100111101110110001100010011111000110010111001010000011110101101100011011101010110101111010000010110001101100111000100101001101010010100100111111111001001000101100001011110000111110010111001001110010111011101100111100110011101100001010111011001100100010001100111101001000111111010100111010001111111101100110100101011011100100001000100011001101111110100011001001001000111001010100000000
Pair \(Z_2\) Length of longest common subsequence
8VLX_1,5TFC_1 235 5
8VLX_1,5JXM_1 213 4
5TFC_1,5JXM_1 168 4

Newick tree

 
[
	8VLX_1:12.05,
	[
		5JXM_1:84,5TFC_1:84
	]:36.05
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{3416 }{\log_{20} 3416}-\frac{229}{\log_{20}229})=769.\)
Status Protein1 Protein2 d d1/2
Query variables 8VLX_1 5TFC_1 949 506.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]