CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6NBY_1 3ZPX_1 6KBZ_1 Letter Amino acid
16 20 19 E Glutamic acid
1 12 5 H Histidine
69 36 17 L Leucine
20 28 17 P Proline
12 21 5 Y Tyrosine
8 22 14 D Aspartic acid
28 25 13 I Isoleucine
21 23 12 F Phenylalanine
10 6 8 W Tryptophan
10 2 3 M Methionine
32 25 13 V Valine
39 58 25 A Alanine
9 10 15 R Arginine
11 25 5 N Asparagine
14 22 11 Q Glutamine
9 24 9 K Lycine
1 4 3 C Cysteine
31 38 17 G Glycine
21 36 10 S Serine
10 21 6 T Threonine

6NBY_1|Chain A|NAD(P)H-quinone oxidoreductase subunit 1|Thermosynechococcus elongatus BP-1 (197221)
>3ZPX_1|Chains A, B|LIPASE|USTILAGO MAYDIS (5270)
>6KBZ_1|Chains A[auth B], C[auth D], E[auth F], G[auth H]|SOS response-associated protein|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6NBY , Knot 156 372 0.82 40 183 347
MESGIDLQGQFISALQSLGLSHDLAKLLWLPLPMLMMLIVATVGVLVAVWLERKISAAVQQRIGPEYIGPLGILAPLADGLKLIFKEDVLPANSDRWLFTLGPAVVVIPVFLSYIIVPFGQNLLISNLAMGVFLWIALSSIAPIGLLMAGYASNNKYSLLGGLRAAAQSISYEIPLALAVLAVAMMSNGLGTVEIVEQQSQYGILSWNVWRQPIGFLVFWIAALAECERLPFDLPEAEEELVAGYQTEYAGMKFALFYLGAYVNLVLSALLVSVLYFGGWSFPIPLETIANLLGVSETNPFLQIAFAVLGITMTLIKAYFFVFLAILLRWTVPRVRIDQLLDLGWKFLLPVGLVNLLLTAGLKLAFPVAFGG
3ZPX , Knot 189 458 0.84 40 223 434
AAFADPNDDLFYTTPDNINTYANGQVIQSRKADTDIGNSNKVEAFQLQYRTTNTQKEAQANVATVWIPNKPASPPKIFSYQVYQDSTQLNCAPSYSFLKGLDKPNKATTILEAPIIIGWALQQGFYVVSSDHEGPRSSFIAGYEEGMAILDGIRALKNYAKLPTDSAIGFYGYSGGAHATGWAANLAGSYAPEHNIIGAAYGGLPASARDTFNFLNKGAFAGFAIAGVSGLALAYPDVETYIQSRLNAKGEKVFKQVRSRGFCIGQVVLTYPFVDAYSLINDTNLLNEEPVASTLKSETLVQAEASYTVPVPKFPRFIWHALLDEIVPFHSAATYVKEQCSKGADINWNVYSFAEHISAELFGLLPGLDWLNKAYKGQAPKVPCGGGAQSVMGASGPPAQDVLGADLASQLRSLQGKPSAFGNKPFGSISPAAASFLEQKLISEEDLNSAVDHHHHHH
6KBZ , Knot 105 227 0.83 40 157 221
CGRFAQSQTREDYLALLAEDIERDIPYDPEPIGRYNVAPGTKVLLLSERDEHLHLDPVFWGYAPGWWDKPPLINARVETAATSRMFKPLWQHGRAICFADGWFEWKKEGDKKQPFFIYRADGQPIFMAAIGSTPFERGDEAEGFLIVTAAADQGLVDIHDRRPLVLSPEAAREWMRQEISGKEASEIAASGCVPANQFSWHPVSRAVGNVKNQGAELIQPVLEVLFQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6NBY_1)}(2) \setminus P_{f(3ZPX_1)}(2)|=55\), \(|P_{f(3ZPX_1)}(2) \setminus P_{f(6NBY_1)}(2)|=95\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110101011011001110001101111111111111110111111111000101110001110011111111111011011100011110000111011111111111100111111001110011111111111001111111111010000001111101110010001111111111111001110101100000011101011001111111111111000011101101000111100000111011110111010111011110110111101111100110111100001110111111110101101011111111101011010100110111011111111011101110111111111
Pair \(Z_2\) Length of longest common subsequence
6NBY_1,3ZPX_1 150 4
6NBY_1,6KBZ_1 178 4
3ZPX_1,6KBZ_1 190 3

Newick tree

 
[
	6KBZ_1:97.06,
	[
		6NBY_1:75,3ZPX_1:75
	]:22.06
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{830 }{\log_{20} 830}-\frac{372}{\log_{20}372})=123.\)
Status Protein1 Protein2 d d1/2
Query variables 6NBY_1 3ZPX_1 154 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]