CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3NFH_1 7EEP_1 2PBL_1 Letter Amino acid
16 29 22 V Valine
17 66 37 A Alanine
15 42 19 D Aspartic acid
7 40 18 G Glycine
9 19 7 Y Tyrosine
22 24 8 K Lycine
4 19 7 M Methionine
19 32 16 S Serine
0 11 7 W Tryptophan
11 47 13 R Arginine
1 4 1 C Cysteine
14 54 19 E Glutamic acid
29 42 22 L Leucine
11 23 7 N Asparagine
2 4 8 H Histidine
8 24 10 F Phenylalanine
14 16 6 T Threonine
10 38 6 Q Glutamine
23 32 13 I Isoleucine
14 30 16 P Proline

3NFH_1|Chains A, B|DNA-directed RNA polymerase I subunit RPA49|Saccharomyces cerevisiae (4932)
>7EEP_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|Pam1 portal proteins|unidentified (32644)
>2PBL_1|Chains A, B, C, D|Putative esterase/lipase/thioesterase|Silicibacter sp. (292414)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3NFH , Knot 112 246 0.83 38 160 237
MTDSAIDIVDSVRTASKDLPTRAQLDEITSNDRPTPLANIDATDVEQIYPIESIIPKKELQFIRVSSILKEADKEKKLELFPYQNNSKYVAKKLDSLTQPSQMTKLQLLYYLSLLLGVYENRRVNNKTKLLERLNSPPEILVDGILSRFTVIKPGQFGRSKDRSYFIDPQNEDKILCYILAIIMHLDNFIVEITPLAHELNLKPSKVVSLFRVLGAIVKGATVAQAEAFGIPKSTAASYKIATMKV
7EEP , Knot 238 596 0.85 40 254 553
MEDTMTMPSHAQLKAYFEEARDANEEYRKEAFIDRDYFDGHQWTEEELQKLEARKQPATYFNEVKLSIRGLVGVFEQGDSDPRAWPRNPQDEDSADIATKALRYVKDYSEWSDERSRAALNYFVEGTCAAIVGVDENGRPEIEPIRFEEFFHDPRSRELDFSDARFKGVAKWRFADEVGMEYGIKGEIDGALDGDSEGLSIGGDTFGDRPDGKISSWIDSKLRRVFVVEMYVRWNGVWIRALFWGRGILEMSVSAYLDRNGKPTCPIEARSCYIDRENRRYGEVRDLRSPQDAINKRESKLLHMLNNRQAIATNPEYAYNSDAEMVRKEMSKPDGIIPPGWQPASMTDLANGQFALLSSAREFIQRIGQNPSVLAAQSASASGRAQLARQQAGMVDSAMALNGLRRFELAVYRQAWLRCRQFWKAPDYIRVTDDEGAPQFVGINQPIKGPPQPVLNEMGQVVIAEPILGYENALAELDVDINIDAVPDTANLAQEQFLQLTELARLYGPQEVPFDDLLELSSMPEKTKLIAKRRERSEQMAQVQAQQGQMQEQIAMQGAMAEIENTQADTAYLAARAQNEMLKPQIEAFKAGFGAA
2PBL , Knot 116 262 0.82 40 167 252
GMELDDAYANGAYIEGAADYPPRWAASAEDFRNSLQDRARLNLSYGEGDRHKFDLFLPEGTPVGLFVFVHGGYWMAFDKSSWSHLAVGALSKGWAVAMPSYELCPEVRISEITQQISQAVTAAAKEIDGPIVLAGHSAGGHLVARMLDPEVLPEAVGARIRNVVPISPLSDLRPLLRTSMNEKFKMDADAAIAESPVEMQNRYDAKVTVWVGGAERPAFLDQAIWLVEAWDADHVIAFEKHHFNVIEPLADPESDLVAVITA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3NFH_1)}(2) \setminus P_{f(7EEP_1)}(2)|=39\), \(|P_{f(7EEP_1)}(2) \setminus P_{f(3NFH_1)}(2)|=133\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100011011001001000110010100100000101110101001001011001110001011010011001000001011100000001100100100100100101100101111100000100000110010011011101110010110110110000000110100000110011111101001110101110010101001101101111110110110101111100011000110101
Pair \(Z_2\) Length of longest common subsequence
3NFH_1,7EEP_1 172 4
3NFH_1,2PBL_1 165 3
7EEP_1,2PBL_1 169 4

Newick tree

 
[
	7EEP_1:86.15,
	[
		3NFH_1:82.5,2PBL_1:82.5
	]:3.65
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{842 }{\log_{20} 842}-\frac{246}{\log_{20}246})=163.\)
Status Protein1 Protein2 d d1/2
Query variables 3NFH_1 7EEP_1 208 143.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]