CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5OQM_1 6QNL_1 5DEW_1 Letter Amino acid
44 4 11 M Methionine
58 12 7 F Phenylalanine
21 3 5 C Cysteine
117 15 27 E Glutamic acid
143 29 31 L Leucine
115 15 18 G Glycine
97 9 21 I Isoleucine
124 17 14 P Proline
13 4 3 W Tryptophan
117 12 20 V Valine
98 14 21 A Alanine
83 8 11 R Arginine
71 16 16 N Asparagine
163 26 16 S Serine
115 14 15 T Threonine
102 12 13 D Aspartic acid
66 16 14 Q Glutamine
93 10 21 K Lycine
35 13 4 H Histidine
58 14 9 Y Tyrosine

5OQM_1|Chain A|DNA-directed RNA polymerase II subunit RPB1|Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (559292)
>6QNL_1|Chains A, B, C, D|Carbonic anhydrase 12|Homo sapiens (9606)
>5DEW_1|Chains A, B|Serine/threonine-protein kinase PAK 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5OQM , Knot 556 1733 0.79 40 343 1337
MVGQQYSSAPLRTVKEVQFGLFSPEEVRAISVAKIRFPETMDETQTRAKIGGLNDPRLGSIDRNLKCQTCQEGMNECPGHFGHIDLAKPVFHVGFIAKIKKVCECVCMHCGKLLLDEHNELMRQALAIKDSKKRFAAIWTLCKTKMVCETDVPSEDDPTQLVSRGGCGNTQPTIRKDGLKLVGSWKKDRATGDADEPELRVLSTEEILNIFKHISVKDFTSLGFNEVFSRPEWMILTCLPVPPPPVRPSISFNESQRGEDDLTFKLADILKANISLETLEHNGAPHHAIEEAESLLQFHVATYMDNDIAGQPQALQKSGRPVKSIRARLKGKEGRIRGNLMGKRVDFSARTVISGDPNLELDQVGVPKSIAKTLTYPEVVTPYNIDRLTQLVRNGPNEHPGAKYVIRDSGDRIDLRYSKRAGDIQLQYGWKVERHIMDNDPVLFNRQPSLHKMSMMAHRVKVIPYSTFRLNLSVTSPYNADFDGDEMNLHVPQSEETRAELSQLCAVPLQIVSPQSNKPCMGIVQDTLCGIRKLTLRDTFIELDQVLNMLYWVPDWDGVIPTPAIIKPKPLWSGKQILSVAIPNGIHLQRFDEGTTLLSPKDNGMLIIDGQIIFGVVEKKTVGSSNGGLIHVVTREKGPQVCAKLFGNIQKVVNFWLLHNGFSTGIGDTIADGPTMREITETIAEAKKKVLDVTKEAQANLLTAKHGMTLRESFEDNVVRFLNEARDKAGRLAEVNLKDLNNVKQMVMAGSKGSFINIAQMSACVGQQSVEGKRIAFGFVDRTLPHFSKDDYSPESKGFVENSYLRGLTPQEFFFHAMGGREGLIDTAVKTAETGYIQRRLVKALEDIMVHYDNTTRNSLGNVIQFIYGEDGMDAAHIEKQSLDTIGGSDAAFEKRYRVDLLNTDHTLDPSLLESGSEILGDLKLQVLLDEEYKQLVKDRKFLREVFVDGEANWPLPVNIRRIIQNAQQTFHIDHTKPSDLTIKDIVLGVKDLQENLLVLRGKNEIIQNAQRDAVTLFCCLLRSRLATRRVLQEYRLTKQAFDWVLSNIEAQFLRSVVHPGEMVGVLAAQSIGEPATQMTLNTFHFAGVASKKVTSGVPRLKEILNVAKNMKTPSLTVYLEPGHAADQEQAKLIRSAIEHTTLKSVTIASEIYYDPDPRSTVIPEDEEIIQLHFSLLDEEAEQSFDQQSPWLLRLELDRAAMNDKDLTMGQVGERIKQTFKNDLFVIWSEDNDEKLIIRCRVVRPKSLDAETEAEEDHMLKKIENTMLENITLRGVENIERVVMMKYDRKVPSPTGEYVKEPEWVLETDGVNLSEVMTVPGIDPTRIYTNSFIDIMEVLGIEAGRAALYKEVYNVIASDGSYVNYRHMALLVDVMTTQGGLTSVTRHGFNRSNTGALMRCSFEETVEILFEAGASAELDDCRGVSENVILGQMAPIGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGVTPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMAGGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSPGFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSPAYSPKQDEQKHNENENSR
6QNL , Knot 123 263 0.86 40 181 256
ASKWTYFGPDGENSWSKKYPSCGGLLQSPIDLHSDILQYDASLTPLEFQGYNLSANKQFLLTNNGHSVKLNLPSDMHIQGLQSRYSATQLHLHWGNPNDPHGSEHTVSGQHFAAELHIVHYNSDLYPDASTASNKSEGLAVLAVLIEMGSFNPSYDKIFSHLQHVKYKGQEAFVPGFNIEELLPERTAEYYRYRGSLTTPPCNPTVLWTVFRNPVQISQEQLLALETALYCTHMDDPSPREMINNFRQVQKFDERLVYTSFSQ
5DEW , Knot 126 297 0.80 40 185 281
SDEEILEKLRSIVSVGDPKKKYTRFEKIGQGASGTVYTAMDVATGQEVAIKQMNLQQQPKKELIINEILVMRENKNPNIVNYLDSYLVGDELWVVMEYLAGGSLTDVVTETCMDEGQIAAVCRECLQALEFLHSNQVIHRNIKSDNILLGMDGSVKLTDFGFCAQITPEQSKRSEMVGTPYWMAPEVVTRKAYGPKVDIWSLGIMAIEMIEGEPPYLNENPLRALYLIATNGTPELQNPEKLSAIFRDFLNRCLEMDVEKRGSAKELLQHQFLKIAKPLSSLTPLIAAAKEATKNNH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5OQM_1)}(2) \setminus P_{f(6QNL_1)}(2)|=177\), \(|P_{f(6QNL_1)}(2) \setminus P_{f(5OQM_1)}(2)|=15\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100000111001001011110100101101101011001000000101111001011010001000000011000110110101101110111110100100010100101110000011001111000000111110100001100001100001001100110100010100011011101000010101001010110000110110010100100111001100101111001111111101010100000100010101101101010100100011100110010011010110010001110101100010110010101010010101011100101010011010101010011110011001001011010010010011001100011100110001001010000011010100110100011000111100010100101110010111000101010100100101010010101100000010100101111011010000101111000101100101000110100110110111010111101111010111010011011110110100100100110100011111010111111000011000111101100001101010111010011011110011001110011011010010001101000110100010101101001101000100011011001000110110101001001001111100101101101010110001010011111100011010000001000111000010110100111011110011100110010010100011011001110000000001101101101001101101000010011100111000001011000001010110010011101010111000000110000110011101010111110100110010001010000100101001111100100011110100011001000110110011000110001100001000110111001010110011011011111110011011001010010111110001001110100110110010010101010110110000101100110000100101100100010100011100001101010110001000100001111010100111000010110110010001000111110000000111000110100101000100001100100011001010110010011110000011010100100101110001101001101111010010000110110111101101110001001110010010000111110110001110010001100000111100010001011101110101000011000111101111101110111000011001100010010010011101000001110101010001110111001000111111010111001010011110101100111110011101001000100110010010001001000100100010010001001000100100010010001001000100100010010001001000100100010010001001000100110010010001001000100100010010001001000100100010011001101100100000000000000
Pair \(Z_2\) Length of longest common subsequence
5OQM_1,6QNL_1 192 3
5OQM_1,5DEW_1 170 5
6QNL_1,5DEW_1 182 5

Newick tree

 
[
	6QNL_1:96.20,
	[
		5OQM_1:85,5DEW_1:85
	]:11.20
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1996 }{\log_{20} 1996}-\frac{263}{\log_{20}263})=438.\)
Status Protein1 Protein2 d d1/2
Query variables 5OQM_1 6QNL_1 518 300
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]