CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5SXR_1 9JAK_1 6PRX_1 Letter Amino acid
67 11 35 G Glycine
17 2 16 H Histidine
18 7 14 I Isoleucine
14 2 15 M Methionine
22 3 9 W Tryptophan
100 7 20 A Alanine
43 5 23 R Arginine
66 15 46 L Leucine
40 21 16 T Threonine
28 20 10 N Asparagine
2 2 5 C Cysteine
14 8 9 Y Tyrosine
49 7 14 D Aspartic acid
41 5 23 E Glutamic acid
32 7 16 F Phenylalanine
37 10 26 P Proline
36 15 19 S Serine
42 11 31 V Valine
28 6 19 Q Glutamine
31 7 17 K Lycine

5SXR_1|Chains A, B|Catalase-peroxidase|Burkholderia pseudomallei (strain 1710b) (320372)
>9JAK_1|Chain A|Outer capsid protein VP4|Rotavirus (10912)
>6PRX_1|Chains A, B|Branched-chain-amino-acid aminotransferase, mitochondrial|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5SXR , Knot 278 728 0.84 42 275 655
MSNEAKCPFHQAAGNGTSNRDWWPNQLDLSILHRHSSLSDPMGKDFNYAQAFEKLDLAAVKRDLHALMTTSQDWWPADFGHYGGLFIRMAXHSAGTYRTADGRGGAGEGQQRFAPLNSWPDNANLDKARRLLWPIKQKYGRAISWADLLILTGNVALESMGFKTFGFAGGRADTWEPEDVYWGSEKIWLELSGGPNSRYSGDRQLENPLAAVQMGLIYVNPEGPDGNPDPVAAARDIRDTFARMAMNDEETVALIAGGHTFGKTHGAGPASNVGAEPEAAGIEAQGLGWKSAYRTGKGADAITSGLEVTWTTTPTQWSHNFFENLFGYEWELTKSPAGAHQWVAKGADAVIPDAFDPSKKHRPTMLTTDLSLRFDPAYEKISRRFHENPEQFADAFARAWFKLTHRDMGPRARYLGPEVPAEVLLWQDPIPAVDHPLIDAADAAELKAKVLASGLTVSQLVSTAWAAASTFRGSDKRGGANGARIRLAPQKDWEANQPEQLAAVLETLEAIRTAFNGAQRGGKQVSLADLIVLAGCAGVEQAAKNAGHAVTVPFAPGRADASQEQTDVESMAVLEPVADGFRNYLKGKYRVPAEVLLVDKAQLLTLSAPEMTVLLGGLRVLGANVGQSRHGVFTAREQALTNDFFVNLLDMGTEWKPTAADADVFEGRDRATGELKWTGTRVDLVFGSHSQLRALAEVYGSADAQEKFVRDFVAVWNKVMNLDRFDLA
9JAK , Knot 80 171 0.80 40 126 166
TLDGPYQPTSFNPPINYWLLLSPTNAGVVMQGTNNTNRWLATLLVEPNVESTTRNYNLFGSSVDITVENTSSDKWKFIDVGKTSLNGSYVQHGTLISSTKLCAAMKHGGNLYTFSGTTPNALPKAYSTTNFDSVNVTTFADFYIISRDNEQKCRQYVNNGLPPIQNTRNLE
6PRX , Knot 161 383 0.83 40 218 362
MGGSHHHHHHGMASGSHMASSSFKAADLQLEMTQKPHKKPGPGEPLVFGKTFTDHMLMVEWNDKGWGQPRIQPFQNLTLHPASSSLHYSLQLFEGMKAFKGKDQQVRLFRPWLNMDRMLRSAMRLCLPSFDKLELLECIRRLIEVDKDWVPDAAGTSLYVRPVLIGNEPSLGVSQPRRALLFVILCPVGAYFPGGSVTPVSLLADPAFIRAWVGGVGNYKLGGNYGPTVLVQQEALKRGCEQVLWLYGPDHQLTEVGTMNIFVYWTHEDGVLELVTPPLNGVILPGVVRQSLLDMAQTWGEFRVVERTITMKQLLRALEEGRVREVFGSGTACQVAPVHRILYKDRNLHIPTMENGPELILRFQKELKEIQYGIRAHEWMFPV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5SXR_1)}(2) \setminus P_{f(9JAK_1)}(2)|=178\), \(|P_{f(9JAK_1)}(2) \setminus P_{f(5SXR_1)}(2)|=29\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10001001100111010000011100101011000001001110010010110010111100010111000001111011001111101100011000010101111010001111001100101001001111100001011011011110101110011100111111010010100101100011101011100000100010011111011110101011010101111100100011011100000111111100110001111100111010111101011110010001011011001101010001001000110011100101000111100111011011110110100000101100010101011000100010001001101110111010000111010011101110111100111110011101101101010111011010011001111100101000011101101011100010100100111110010110011011001100101101111110111001100110110111111010100000010011110111011000101000111011110010110101101011111101111011000011101000110001110110110010101101011010001010101010010111100001011101010101000110011111001101001011
Pair \(Z_2\) Length of longest common subsequence
5SXR_1,9JAK_1 207 4
5SXR_1,6PRX_1 151 4
9JAK_1,6PRX_1 202 3

Newick tree

 
[
	9JAK_1:10.73,
	[
		5SXR_1:75.5,6PRX_1:75.5
	]:34.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{899 }{\log_{20} 899}-\frac{171}{\log_{20}171})=201.\)
Status Protein1 Protein2 d d1/2
Query variables 5SXR_1 9JAK_1 252 154.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]