CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8IMO_1 7VYE_1 2QQF_1 Letter Amino acid
61 3 24 V Valine
1 0 7 C Cysteine
28 3 23 K Lycine
15 1 6 M Methionine
90 2 14 T Threonine
63 1 9 Y Tyrosine
96 4 19 A Alanine
95 1 11 R Arginine
53 3 18 D Aspartic acid
83 2 18 G Glycine
58 7 20 P Proline
82 0 21 S Serine
6 4 3 W Tryptophan
39 1 8 N Asparagine
84 3 11 Q Glutamine
16 1 15 H Histidine
110 0 27 L Leucine
60 1 19 F Phenylalanine
101 3 20 E Glutamic acid
41 0 15 I Isoleucine

8IMO_1|Chain A[auth 5]|CpcN|Anthocerotibacter panamensis (2857077)
>7VYE_1|Chain A[auth Q]|NADH dehydrogenase [ubiquinone] iron-sulfur protein 2|Sus scrofa (9823)
>2QQF_1|Chain A|NAD-dependent deacetylase HST2|Saccharomyces cerevisiae (4932)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8IMO , Knot 289 1182 0.57 40 248 642
MSDGLNIFLEMARDLTQTQNKSPYAVSRLGVTPLQPTATPQVAGMDPFLDMARRMAGRPVPKGNPFLDMARELTDNRALTLVKEFTAPSPYQQTETYGQERIRALGTIEAPRVTLRAPFTDEQFQGALYAIYRHIFGNTYVMESERPTTAESQLKDGRITVRGFIKLLAKSEVYRSRFFQKTSQNRFIELNHKLLLGRAPYDQAEISAHLDLWNTQGYDAEIDSYVESEEYLENFGEDVIPYFRGFKYQTGQSAQGFNRLLDLYGGWAGSDTDRNQSGQVARLTNSLVRPGQVVEPPVAPPLEFTREAERAAWLAGALTLPSSLGHTETHGQERIRAVGALEAAQVTLRAPFTEEQFQGALYAIYKQVFGNTYVMESERPTTAESQLKDGRITVRGFIRLLAKTEAYKSRFLYTTSQNRFIELNHKLLLGRAPYDQAEIIRHLDLWNSQGYDAEIDSYIESEEYQEFFGEEVVPFFRGFKYQVGQNPLGFNGLVRLYDGYAGSDTERNQSGQVARLTDRLSRPVREQSSVDRIERLLRSYTSPSPLEQTNTYGQERVQANAVLETPQVTLRAPFTEEQFQGALYAIYKQVFGNTYVMESERPATAESQLRDGRITVRGFIRLLAKSDTYKARFFNPATQTRFIELNHKLLLGRAPYDQAEISRHVALYTSQGYEAEIDSYLDSEEYQECFGEDTVPFFRGFTSQPGQSTEAFNRMVTLYDGYATSDSEWDRGGQSARLTDSLARSTMDQDPEYRIGNLISSYTRPSPYGQPQGYGQERIQATAVLERPRATLRVGTTENLEGVIYAIYQQVLGNTHVMTSERLLFAESQLRDGKLTVRGFIRQLAKSEAYKTRFFYPSSQTRFIELNHKLLLGRAPYDQAEISHHVTLYTSQGYDLEIDSYLDSEEYQENFGEDTVPFLRGFTSQPGQVTEAFNRMVNLNDGYATSDSGWSQSAEVARLTESLSRPVREAGASVRVERLLNALTQPSSLGQSPTFGQEQIQATAVLEESPVTLRAPFTEEQLQGALYAIYKQVLGKTHVMESERPTFAESQLRDGKLTVRGFVRQVAQSEAYKARFFNPAAQTRFIELNHKLLLGRAPYDQAEISRHVALYTSQGYEAEIDSYLDSEEYQENFGEDTVPFLRGFTSQPGQSTEAFNRMVTLYDGYAASDGQTPRPTDSLNEP
7VYE , Knot 25 40 0.76 32 36 38
ARQWQPDVEWAEQFGGAVMYPTKETAHWKPPPWNDVDPPK
2QQF , Knot 135 308 0.83 40 197 297
MRGSHHHHHHGMASMSVSTASTEMSVRKIAAHMKSNPNAKVIFMVGAGISTSCGIPDFRSPGTGLYHNLARLKLPYPEAVFDVDFFQSDPLPFYTLAKELYPGNFRPSKFHYLLKLFQDKDVLKRVYTQNIDTLERQAGVKDDLIIEAHGSFAHCHCIGCGKVYPPQVFKSKLAEHPIKDFVKCDVCGELVKPAIVFFGEDLPDSFSETWLNDSEWLREKITTSGKHPQQPLVIVVGTSLAVYPFASLPEEIPRKVKRVLCNLETVGDFKANKRPTDLIVHQYSDEFAEQLVEELGWQEDFEKILTAQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8IMO_1)}(2) \setminus P_{f(7VYE_1)}(2)|=221\), \(|P_{f(7VYE_1)}(2) \setminus P_{f(8IMO_1)}(2)|=9\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110111011001000000010110011101101010101111011101100111011101011101100100001101100101101000000010001011101011010101110000101110110001110001100001001000100101010111011100010000110000000110100011110110001010101011000100101000100000100110011101011000010010110011010111110000000010110100011011011011111110100010011111111011001100000100010111110110101011100001011101100011100011000010010001001010101110111000100001100000001101000111101100010110010110001001010001000000011100111110110001100111101110100101100000000101101000100110000010010011000001011000000100010101110010101011100001011101100011100011000011010001001010101110111000000101101100001101000111101100010100011100001001010001000000001100011110110001100001100110100101000001001100101000110001000100011011000001010101010100010101110010101011000010111011000111000110000111100010010101011100110001000011010000011010001111011000101000101000010010100010000000011000111101100011010011001101001010000110001011010001001100111010100110110010011001011000101011100011010111000010111011000111000110000101100010010101011100110001001011011100011010001111011000101000111000010010100010000000011000111101100011000011001101001011001001010001001
Pair \(Z_2\) Length of longest common subsequence
8IMO_1,7VYE_1 230 3
8IMO_1,2QQF_1 157 4
7VYE_1,2QQF_1 185 3

Newick tree

 
[
	7VYE_1:11.65,
	[
		8IMO_1:78.5,2QQF_1:78.5
	]:33.15
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1222 }{\log_{20} 1222}-\frac{40}{\log_{20}40})=328.\)
Status Protein1 Protein2 d d1/2
Query variables 8IMO_1 7VYE_1 277 145.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]