CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3GTQ_1 7NNQ_1 5EJY_1 Letter Amino acid
163 19 54 S Serine
115 35 13 G Glycine
44 2 4 M Methionine
124 22 17 P Proline
93 3 46 K Lycine
58 7 28 F Phenylalanine
13 3 2 W Tryptophan
58 9 22 Y Tyrosine
83 20 15 R Arginine
21 3 7 C Cysteine
143 37 48 L Leucine
66 8 28 Q Glutamine
117 15 29 E Glutamic acid
35 13 9 H Histidine
97 16 37 I Isoleucine
117 28 28 V Valine
98 62 20 A Alanine
71 4 36 N Asparagine
102 14 31 D Aspartic acid
115 24 26 T Threonine

3GTQ_1|Chain A|DNA-directed RNA polymerase II subunit RPB1|Saccharomyces cerevisiae (4932)
>7NNQ_1|Chains A, B, C, D|N-acetyl-gamma-glutamyl-phosphate reductase|Mycobacterium tuberculosis H37Rv (83332)
>5EJY_1|Chain A|Myosin-I heavy chain|Dictyostelium discoideum (44689)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3GTQ , Knot 556 1733 0.79 40 343 1337
MVGQQYSSAPLRTVKEVQFGLFSPEEVRAISVAKIRFPETMDETQTRAKIGGLNDPRLGSIDRNLKCQTCQEGMNECPGHFGHIDLAKPVFHVGFIAKIKKVCECVCMHCGKLLLDEHNELMRQALAIKDSKKRFAAIWTLCKTKMVCETDVPSEDDPTQLVSRGGCGNTQPTIRKDGLKLVGSWKKDRATGDADEPELRVLSTEEILNIFKHISVKDFTSLGFNEVFSRPEWMILTCLPVPPPPVRPSISFNESQRGEDDLTFKLADILKANISLETLEHNGAPHHAIEEAESLLQFHVATYMDNDIAGQPQALQKSGRPVKSIRARLKGKEGRIRGNLMGKRVDFSARTVISGDPNLELDQVGVPKSIAKTLTYPEVVTPYNIDRLTQLVRNGPNEHPGAKYVIRDSGDRIDLRYSKRAGDIQLQYGWKVERHIMDNDPVLFNRQPSLHKMSMMAHRVKVIPYSTFRLNLSVTSPYNADFDGDEMNLHVPQSEETRAELSQLCAVPLQIVSPQSNKPCMGIVQDTLCGIRKLTLRDTFIELDQVLNMLYWVPDWDGVIPTPAIIKPKPLWSGKQILSVAIPNGIHLQRFDEGTTLLSPKDNGMLIIDGQIIFGVVEKKTVGSSNGGLIHVVTREKGPQVCAKLFGNIQKVVNFWLLHNGFSTGIGDTIADGPTMREITETIAEAKKKVLDVTKEAQANLLTAKHGMTLRESFEDNVVRFLNEARDKAGRLAEVNLKDLNNVKQMVMAGSKGSFINIAQMSACVGQQSVEGKRIAFGFVDRTLPHFSKDDYSPESKGFVENSYLRGLTPQEFFFHAMGGREGLIDTAVKTAETGYIQRRLVKALEDIMVHYDNTTRNSLGNVIQFIYGEDGMDAAHIEKQSLDTIGGSDAAFEKRYRVDLLNTDHTLDPSLLESGSEILGDLKLQVLLDEEYKQLVKDRKFLREVFVDGEANWPLPVNIRRIIQNAQQTFHIDHTKPSDLTIKDIVLGVKDLQENLLVLRGKNEIIQNAQRDAVTLFCCLLRSRLATRRVLQEYRLTKQAFDWVLSNIEAQFLRSVVHPGEMVGVLAAQSIGEPATQMTLNTFHFAGVASKKVTSGVPRLKEILNVAKNMKTPSLTVYLEPGHAADQEQAKLIRSAIEHTTLKSVTIASEIYYDPDPRSTVIPEDEEIIQLHFSLLDEEAEQSFDQQSPWLLRLELDRAAMNDKDLTMGQVGERIKQTFKNDLFVIWSEDNDEKLIIRCRVVRPKSLDAETEAEEDHMLKKIENTMLENITLRGVENIERVVMMKYDRKVPSPTGEYVKEPEWVLETDGVNLSEVMTVPGIDPTRIYTNSFIDIMEVLGIEAGRAALYKEVYNVIASDGSYVNYRHMALLVDVMTTQGGLTSVTRHGFNRSNTGALMRCSFEETVEILFEAGASAELDDCRGVSENVILGQMAPIGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGVTPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMAGGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSPGFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSPAYSPKQDEQKHNENENSR
7NNQ , Knot 140 344 0.79 40 172 321
ATKVAVAGASGYAGGEILRLLLGHPAYADGRLRIGALTAATSAGSTLGEHHPHLTPLAHRVVEPTEAAVLGGHDAVFLALPHGHSAVLAQQLSPETLIIDCGADFRLTDAAVWERFYGSSHAGSWPYGLPELPGARDQLRGTRRIAVPGCYPTAALLALFPALAADLIEPAVTVVAVSGTSGAGRAATTDLLGAEVIGSARAYNIAGVHRHTPEIAQGLRAVTDRDVSVSFTPVLIPASRGILATCTARTRSPLSQLRAAYEKAYHAEPFIYLMPEGQLPRTGAVIGSNAAHIAVAVDEDAQTFVAIAAIDNLVKGTAGAAVQSMNLALGWPETDGLSVVGVAP
5EJY , Knot 201 500 0.83 40 235 464
ELPQILNDEEISLYSFYDYANKNFNIEKLKQKDDIFSYQKSHIKSSLLVHSDAEQTKVAVEIFSKVLHYMNSNPLVSKKDPADFYSPVKFILTKGLAIESLRDEIYCQLIKQSTSNPIQDLNIRVWELIHFTCSTFPPTRKLIKYFAAYLKTTIQQSDVSKSVKDSAQASYFILQRFTLNGARKQVPSVTELESIKENRPIFVRITATDGSLKGLHIDSATTCQESSNDLSQRSRMRVNSKENGFTIIESFNGIERDIAPTDKLCDVLSKVENLQATLSSIQVNFKFVFKKKLFFDNITNNVPTTSINVENEFYYHQLFNDLFNSNYCKDQDYQISIGSLKLQFESSDYTDEIRAWLPGNGRGKYFTTDIEKNRFDDFINKYKSHKGLSPEDAKKQMVQLLEKHPLANCSLVVCEHQSESLPYPKNFVLALNVNGINIYDPATSKMLESVKYSNQSQQNLKSDDKSVSIILENKSTLQAFTGDVQKLVSLIKEYSLYLRN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3GTQ_1)}(2) \setminus P_{f(7NNQ_1)}(2)|=182\), \(|P_{f(7NNQ_1)}(2) \setminus P_{f(3GTQ_1)}(2)|=11\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100000111001001011110100101101101011001000000101111001011010001000000011000110110101101110111110100100010100101110000011001111000000111110100001100001100001001100110100010100011011101000010101001010110000110110010100100111001100101111001111111101010100000100010101101101010100100011100110010011010110010001110101100010110010101010010101011100101010011010101010011110011001001011010010010011001100011100110001001010000011010100110100011000111100010100101110010111000101010100100101010010101100000010100101111011010000101111000101100101000110100110110111010111101111010111010011011110110100100100110100011111010111111000011000111101100001101010111010011011110011001110011011010010001101000110100010101101001101000100011011001000110110101001001001111100101101101010110001010011111100011010000001000111000010110100111011110011100110010010100011011001110000000001101101101001101101000010011100111000001011000001010110010011101010111000000110000110011101010111110100110010001010000100101001111100100011110100011001000110110011000110001100001000110111001010110011011011111110011011001010010111110001001110100110110010010101010110110000101100110000100101100100010100011100001101010110001000100001111010100111000010110110010001000111110000000111000110100101000100001100100011001010110010011110000011010100100101110001101001101111010010000110110111101101110001001110010010000111110110001110010001100000111100010001011101110101000011000111101111101110111000011001100010010010011101000001110101010001110111001000111111010111001010011110101100111110011101001000100110010010001001000100100010010001001000100100010010001001000100100010010001001000100100010010001001000100110010010001001000100100010010001001000100100010011001101100100000000000000
Pair \(Z_2\) Length of longest common subsequence
3GTQ_1,7NNQ_1 193 5
3GTQ_1,5EJY_1 142 4
7NNQ_1,5EJY_1 175 5

Newick tree

 
[
	7NNQ_1:98.14,
	[
		3GTQ_1:71,5EJY_1:71
	]:27.14
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2077 }{\log_{20} 2077}-\frac{344}{\log_{20}344})=433.\)
Status Protein1 Protein2 d d1/2
Query variables 3GTQ_1 7NNQ_1 508 306.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]