CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2JGP_1 3WQK_1 8QKO_1 Letter Amino acid
5 3 6 W Tryptophan
2 4 9 C Cysteine
27 16 22 I Isoleucine
15 11 28 K Lycine
23 13 19 P Proline
33 22 37 S Serine
17 16 16 Y Tyrosine
13 9 8 H Histidine
53 26 34 L Leucine
32 9 18 Q Glutamine
48 19 18 E Glutamic acid
25 19 23 G Glycine
14 6 7 M Methionine
39 20 26 V Valine
45 18 26 A Alanine
27 21 17 R Arginine
30 21 21 F Phenylalanine
34 14 13 T Threonine
14 8 16 N Asparagine
24 21 18 D Aspartic acid

2JGP_1|Chain A|TYROCIDINE SYNTHETASE 3|BREVIBACILLUS BREVIS (1393)
>3WQK_1|Chain A|Diterpene synthase|Mycobacterium tuberculosis (1773)
>8QKO_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|Gap junction alpha-1 protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2JGP , Knot 208 520 0.83 40 245 480
RSEYVAPRSVWEARLAQVWEQVLNVPQVGALDDFFALGGHSLRAMRVLSSMHNEYQVDIPLRILFEKPTIQELAAFIEETAKGNVFSIEPVQKQAYYPVSSAQKRMYILDQFEGVGISYNMPSTMLIEGKLERTRVEAAFQRLIARHESLRTSFAVVNGEPVQNIHEDVPFALAYSEVTEEEARELVSSLVQPFDLEVAPLIRVSLLKIGEDRYVLFTDMHHSISDGVSSGILLAEWVQLYQGDVLPELRIQYKDFAVWQQEFSQSAAFHKQEAYWLQTFADDIPVLNLPTDFTRPSTQSFAGDQCTIGAGKALTEGLHQLAQATGTTLYMVLLAAYNVLLAKYAGQEDIIVGTPITGRSHADLEPIVGMFVNTLAMRNKPQREKTFSEFLQEVKQNALDAYGHQDYPFEELVEKLAIARDLSRNPLFDTVFTFQNSTEEVMTLPECTLAPFMTDETGQHAKFDLTFSATEEREEMTIGVEYSTSLFTRETMERFSRHFLTIAASIVQNPHIRLGEIDML
3WQK , Knot 130 296 0.83 40 198 286
MNLVSEKEFLDLPLVSVAEIVRCRGPKVSVFPFDGTRRWFHLECNPQYDDYQQAALRQSIRILKMLFEHGIETVISPIFSDDLLDRGDRYIVQALEGMALLANDEEILSFYKEHEVHVLFYGDYKKRLPSTAQGAAVVKSFDDLTISTSSNTEHRLCFGVFGNDAAESVAQFSISWNETHGKPPTRREIIEGYYGEYVDKADMFIGFGRFSTFDFPLLSSGKTSLYFTVAPSYYMTETTLRRILYDHIYLRHFRPKPDYSAMSADQLNVLRNRYRAQPDRVFGVGCVHDGIWFAEG
8QKO , Knot 164 382 0.85 40 232 366
MGDWSALGKLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFRCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSVPTLLYLAHVFYVMRKEEKLNKKEEELKVAQTDGVNVDMHLKQIEIKKFKYGIEEHGKVKMRGGLLRTYIISILFKSIFEVAFLLIQWYIYGFSLSAVYTCKRDPCPHQVDCFLSRPTEKTIFIIFMLVVSLVSLALNIIELFYVFFKGVKDRVKGKSDPYHATSGALSPAKDCGSQKYAYFNGCSSPTAPLSPMSPPGYKLVTGDRNNSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNSHAQPFDFPDDNQNSKKLAAGHELQPLAIVDQRPSSRASSRASSRPRPDDLEI

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2JGP_1)}(2) \setminus P_{f(3WQK_1)}(2)|=103\), \(|P_{f(3WQK_1)}(2) \setminus P_{f(2JGP_1)}(2)|=56\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000111001101011011001101101111001111110010110110010000010111011100101001111100010101101011000100110010001011001011110001100111010100001011100111000010001111010110010001111110001000010011001101101011111010110110000111001000100110011111011010010111010100001111000100011100001011001100111101100100100001110000111101100110011010100101111110011110011000111101101000101011111110011100010000010011001000110101000011001100111100100011100110100000011011000111110000100101010101000000101110000011000010010001101110110010101101011
Pair \(Z_2\) Length of longest common subsequence
2JGP_1,3WQK_1 159 3
2JGP_1,8QKO_1 167 4
3WQK_1,8QKO_1 172 4

Newick tree

 
[
	8QKO_1:86.44,
	[
		2JGP_1:79.5,3WQK_1:79.5
	]:6.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{816 }{\log_{20} 816}-\frac{296}{\log_{20}296})=141.\)
Status Protein1 Protein2 d d1/2
Query variables 2JGP_1 3WQK_1 177 140
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]