CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3HOZ_1 8XXA_1 3JRK_1 Letter Amino acid
115 49 18 G Glycine
143 58 28 L Leucine
115 32 16 T Threonine
44 15 6 M Methionine
58 33 15 F Phenylalanine
71 26 8 N Asparagine
35 18 2 H Histidine
97 31 13 I Isoleucine
93 18 28 K Lycine
124 46 11 P Proline
163 25 20 S Serine
13 17 4 W Tryptophan
58 41 11 Y Tyrosine
102 54 20 D Aspartic acid
21 3 3 C Cysteine
66 21 10 Q Glutamine
117 36 36 E Glutamic acid
117 39 24 V Valine
98 49 35 A Alanine
83 44 14 R Arginine

3HOZ_1|Chain A|DNA-directed RNA polymerase II subunit RPB1|Saccharomyces cerevisiae (4932)
>8XXA_1|Chain A|alpha-amylase|Rhodothermus marinus JCM 9785 (1295135)
>3JRK_1|Chains A, B, C, D, E, F, G, H|Tagatose 1,6-diphosphate aldolase 2|Streptococcus pyogenes serotype M1 (301447)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3HOZ , Knot 556 1733 0.79 40 343 1337
MVGQQYSSAPLRTVKEVQFGLFSPEEVRAISVAKIRFPETMDETQTRAKIGGLNDPRLGSIDRNLKCQTCQEGMNECPGHFGHIDLAKPVFHVGFIAKIKKVCECVCMHCGKLLLDEHNELMRQALAIKDSKKRFAAIWTLCKTKMVCETDVPSEDDPTQLVSRGGCGNTQPTIRKDGLKLVGSWKKDRATGDADEPELRVLSTEEILNIFKHISVKDFTSLGFNEVFSRPEWMILTCLPVPPPPVRPSISFNESQRGEDDLTFKLADILKANISLETLEHNGAPHHAIEEAESLLQFHVATYMDNDIAGQPQALQKSGRPVKSIRARLKGKEGRIRGNLMGKRVDFSARTVISGDPNLELDQVGVPKSIAKTLTYPEVVTPYNIDRLTQLVRNGPNEHPGAKYVIRDSGDRIDLRYSKRAGDIQLQYGWKVERHIMDNDPVLFNRQPSLHKMSMMAHRVKVIPYSTFRLNLSVTSPYNADFDGDEMNLHVPQSEETRAELSQLCAVPLQIVSPQSNKPCMGIVQDTLCGIRKLTLRDTFIELDQVLNMLYWVPDWDGVIPTPAIIKPKPLWSGKQILSVAIPNGIHLQRFDEGTTLLSPKDNGMLIIDGQIIFGVVEKKTVGSSNGGLIHVVTREKGPQVCAKLFGNIQKVVNFWLLHNGFSTGIGDTIADGPTMREITETIAEAKKKVLDVTKEAQANLLTAKHGMTLRESFEDNVVRFLNEARDKAGRLAEVNLKDLNNVKQMVMAGSKGSFINIAQMSACVGQQSVEGKRIAFGFVDRTLPHFSKDDYSPESKGFVENSYLRGLTPQEFFFHAMGGREGLIDTAVKTAETGYIQRRLVKALEDIMVHYDNTTRNSLGNVIQFIYGEDGMDAAHIEKQSLDTIGGSDAAFEKRYRVDLLNTDHTLDPSLLESGSEILGDLKLQVLLDEEYKQLVKDRKFLREVFVDGEANWPLPVNIRRIIQNAQQTFHIDHTKPSDLTIKDIVLGVKDLQENLLVLRGKNEIIQNAQRDAVTLFCCLLRSRLATRRVLQEYRLTKQAFDWVLSNIEAQFLRSVVHPGEMVGVLAAQSIGEPATQMTLNTFHFAGVASKKVTSGVPRLKEILNVAKNMKTPSLTVYLEPGHAADQEQAKLIRSAIEHTTLKSVTIASEIYYDPDPRSTVIPEDEEIIQLHFSLLDEEAEQSFDQQSPWLLRLELDRAAMNDKDLTMGQVGERIKQTFKNDLFVIWSEDNDEKLIIRCRVVRPKSLDAETEAEEDHMLKKIENTMLENITLRGVENIERVVMMKYDRKVPSPTGEYVKEPEWVLETDGVNLSEVMTVPGIDPTRIYTNSFIDIMEVLGIEAGRAALYKEVYNVIASDGSYVNYRHMALLVDVMTTQGGLTSVTRHGFNRSNTGALMRCSFEETVEILFEAGASAELDDCRGVSENVILGQMAPIGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGVTPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMAGGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSPGFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSPAYSPKQDEQKHNENENSR
8XXA , Knot 259 655 0.85 40 291 611
GSGMKETAAAKFERQHMDSPDLGTDDDDKAMADIGIEDGINYDPNDPTRVTLSLYAPGKSFVYVIGDFTNWEVDPAYFMYRDAPRPDSVHWWITIEGLTPGQEYAFQYFIDGELRLADLFAHKVLDPWHDPFIPSSTYPNLKPYPTGKTEGIVAVLQPGAPQYQWQVTDFERPPAHELVIYELLIRDFVARHDYVTLIDTLDYLERLGVNAIELMPVAEFDGNISWGYNPAFHLALDKYYGPADDLKRFVDECHRRGIAVILDVVYNHATGNSPLVQLYGPTADNPFINIPARHPFNVFYDLNHEHPYIQYWLDRANRYWLEEFRVDGFRFALSKGFTQKYTDDDVGAWSAYDASRIRLLKRMADAIWAVDSTAYIILEHFADNQEEKELAAYGQDRGRAGMLLWHNLNRAFSQSVMGYLNDPNFSSDLTTIYYKNRGFPTPNLIAYMESHDEQWLMYRMRAYGARQGAYDVRSLPVALDRMKLAGAFFFTVPGPKMIWQFGELGYGYGERGEQCLEGTGDSCPSIAPGRIDPKPIRWDYRNDPLRMKLYRTWAELLRLRREHAVFRSPETQVRMRLQHGVPGRWISLTHPELSVVVVGNFGLEPLVVTPTFPQTGTWYDYFNGDSLVVDDPNTGIELLPGEFRLYTNRYVGQAE
3JRK , Knot 140 322 0.83 40 191 309
TENKRKSMEKLSVDGVISALAFDQRGALKRMMAQHQTKEPTVEQIEELKSLVSEELTPFASSILLDPEYGLPASRVRSEEAGLLLAYEKTGYDATTTSRLPDCLDVWSAKRIKEAGAEAVKFLLYYDIDGDQDVNEQKKAYIERIGSECRAEDIPFYLEILTYDEKIADNASPEFAKVKAHKVNEAMKVFSKERFGVDVLKVEVPVNMKFVEGFADGEVLFTKEEAAQAFRDQEASTDLPYIYLSAGVSAKLFQDTLVFAAESGAKFNGVLCGRATWAGSVKVYIEEGPQAAREWLRTEGFKNIDELNKVLDKTASPWTEKM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3HOZ_1)}(2) \setminus P_{f(8XXA_1)}(2)|=85\), \(|P_{f(8XXA_1)}(2) \setminus P_{f(3HOZ_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100000111001001011110100101101101011001000000101111001011010001000000011000110110101101110111110100100010100101110000011001111000000111110100001100001100001001100110100010100011011101000010101001010110000110110010100100111001100101111001111111101010100000100010101101101010100100011100110010011010110010001110101100010110010101010010101011100101010011010101010011110011001001011010010010011001100011100110001001010000011010100110100011000111100010100101110010111000101010100100101010010101100000010100101111011010000101111000101100101000110100110110111010111101111010111010011011110110100100100110100011111010111111000011000111101100001101010111010011011110011001110011011010010001101000110100010101101001101000100011011001000110110101001001001111100101101101010110001010011111100011010000001000111000010110100111011110011100110010010100011011001110000000001101101101001101101000010011100111000001011000001010110010011101010111000000110000110011101010111110100110010001010000100101001111100100011110100011001000110110011000110001100001000110111001010110011011011111110011011001010010111110001001110100110110010010101010110110000101100110000100101100100010100011100001101010110001000100001111010100111000010110110010001000111110000000111000110100101000100001100100011001010110010011110000011010100100101110001101001101111010010000110110111101101110001001110010010000111110110001110010001100000111100010001011101110101000011000111101111101110111000011001100010010010011101000001110101010001110111001000111111010111001010011110101100111110011101001000100110010010001001000100100010010001001000100100010010001001000100100010010001001000100100010010001001000100110010010001001000100100010010001001000100100010011001101100100000000000000
Pair \(Z_2\) Length of longest common subsequence
3HOZ_1,8XXA_1 118 4
3HOZ_1,3JRK_1 170 4
8XXA_1,3JRK_1 164 4

Newick tree

 
[
	3JRK_1:90.21,
	[
		3HOZ_1:59,8XXA_1:59
	]:31.21
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2388 }{\log_{20} 2388}-\frac{655}{\log_{20}655})=419.\)
Status Protein1 Protein2 d d1/2
Query variables 3HOZ_1 8XXA_1 493 346.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]