CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3IXJ_1 6ANA_1 4NYE_1 Letter Amino acid
22 0 21 E Glutamic acid
29 0 24 L Leucine
15 0 12 N Asparagine
16 0 9 Q Glutamine
33 0 18 G Glycine
15 0 24 K Lycine
19 0 5 P Proline
23 0 10 T Threonine
19 0 8 Y Tyrosine
24 0 13 A Alanine
17 0 8 R Arginine
21 0 21 D Aspartic acid
6 0 1 C Cysteine
7 0 10 H Histidine
34 0 16 V Valine
23 0 20 I Isoleucine
8 0 3 M Methionine
20 0 15 F Phenylalanine
30 0 14 S Serine
7 0 3 W Tryptophan

3IXJ_1|Chains A, B, C|Beta-secretase 1|Homo sapiens (9606)
>6ANA_1|Chain A[auth K]|anti Kappa VHH domain|Lama glama (9844)
>4NYE_1|Chains A, B|Phosphoribosylaminoimidazole-succinocarboxamide synthase|Streptococcus pneumoniae (170187)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3IXJ , Knot 166 388 0.85 40 226 372
SFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYN
6ANA , Knot 2 122 0.02 2 1 1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4NYE , Knot 114 255 0.82 40 162 243
MGSSHHHHHHSSGLVPRGSHMSKQLIYSGKAKDIYTTEDENLIISTYKDQATAFNGVKKEQIAGKGVLNNQISSFIFEKLNVAGVATHFVEKLSDTEQLNKKVKIIPLEVVLRNYTAGSFSKRFGVDEGIALETPIVEFYYKNDDLDDPFINDEHVKFLQIAGDQQIAYLKEETRRINELLKVWFAEIGLKLIDFKLEFGFDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRRGLGELTDVYEIVWEKLQELK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3IXJ_1)}(2) \setminus P_{f(6ANA_1)}(2)|=226\), \(|P_{f(6ANA_1)}(2) \setminus P_{f(3IXJ_1)}(2)|=1\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0110110010100101001010110110010111001000111111101110000000100000010011011000101010110011011011010101011110000011101001011111101011010001011100110000110110101011111100001110111011111100010010110011000100011110101010010100000000001100100010110011011100101100000110111110011010110011011111010111010000101011100010110011000000001110000010111111101101110010001111101001000100111011110101000100
Pair \(Z_2\) Length of longest common subsequence
3IXJ_1,6ANA_1 227 0
3IXJ_1,4NYE_1 162 3
6ANA_1,4NYE_1 163 0

Newick tree

 
[
	6ANA_1:10.06,
	[
		3IXJ_1:81,4NYE_1:81
	]:23.06
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{510 }{\log_{20} 510}-\frac{122}{\log_{20}122})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 3IXJ_1 6ANA_1 165 83.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]