CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9FDC_1 3TDS_1 5CTS_1 Letter Amino acid
13 33 55 L Leucine
1 2 9 W Tryptophan
9 3 19 R Arginine
14 16 17 N Asparagine
6 6 18 E Glutamic acid
5 22 22 T Threonine
13 21 24 V Valine
1 4 3 C Cysteine
2 11 14 M Methionine
6 21 30 S Serine
8 28 39 G Glycine
4 2 12 H Histidine
10 27 17 I Isoleucine
8 14 18 K Lycine
3 7 18 Y Tyrosine
6 26 45 A Alanine
7 5 17 D Aspartic acid
4 0 17 Q Glutamine
8 16 13 F Phenylalanine
10 4 22 P Proline

9FDC_1|Chain A|Galectin-3|Homo sapiens (9606)
>3TDS_1|Chains A, B, C, D, E|formate/nitrite transporter|Clostridium difficile (272563)
>5CTS_1|Chain A|CITRATE SYNTHASE|Gallus gallus (9031)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9FDC , Knot 68 138 0.81 40 112 136
PLIVPYNLPLPGGVVPRMLITILGTVKPNANRIALDFQRGNDVAFHFNPRFNENNRRVIVCNTKLDNNWGREERQSVFPFESGKPFKIQVLVEPDHFKVAVNDAHLLQYNHRVKKLNEISKLGISGDIDLTSASYTMI
3TDS , Knot 113 268 0.78 38 144 256
MGRAHKETLDKLTNAAINKINLLNTSKVKYLVSSAFAGLYVGIGILLIFTIGGLLTDAGSPMTKIVMGLSFAIALSLVIMTGTELFTGNNMVMSAGMLNKGVSIKDTSKIWAYSWVGNLIGALVLGIIFVGTGLVDKGPVAEFFANTAASKASMPFTALFFRGILCNILVCVSVLCSFRTNSDTAKIIMIFLCLFAIITSGFEHSVANMTIYSVSLFSPTISTVTIGGAIYNLVAVTLGNIVGGALFMGLGTYILGKEKLNAAAENLY
5CTS , Knot 182 433 0.85 42 236 410
ASSTNLKDVLAALIPKEQARIKTFRQQHGGTALGQITVDMSYGGMRGMKGLVYETSVLDPDEGIRFRGFSIPECQKLLPKGGXGGEPLPEGLFWLLVTGQIPTGAQVSWLSKEWAKRAALPSHVVTMLDNFPTNLHPMSQLSAAITALNSESNFARAYAEGILRTKYWEMVYESAMDLIAKLPCVAAKIYRNLYRAGSSIGAIDSKLDWSHNFTNMLGYTDAQFTELMRLYLTIHSDHEGGNVSAHTSHLVGSALSDPYLSFAAAMNGLAGPLHGLANQEVLGWLAQLQKAXXXAGADASLRDYIWNTLNSGRVVPGYGHAVLRKTDPRYTCQREFALKHLPGDPMFKLVAQLYKIVPNVLLEQGAAANPWPNVDAHSGVLLQYYGMTEMNYYTVLFGVSRALGVLAQLIWSRALGFPLERPKSMSTDGLIAL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9FDC_1)}(2) \setminus P_{f(3TDS_1)}(2)|=58\), \(|P_{f(3TDS_1)}(2) \setminus P_{f(9FDC_1)}(2)|=90\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111110011111111101110111010101001110100100111010101000000111000010001100000011110010110101110100101110010110000010010010011101010100100011
Pair \(Z_2\) Length of longest common subsequence
9FDC_1,3TDS_1 148 3
9FDC_1,5CTS_1 202 4
3TDS_1,5CTS_1 176 4

Newick tree

 
[
	5CTS_1:10.68,
	[
		9FDC_1:74,3TDS_1:74
	]:26.68
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{406 }{\log_{20} 406}-\frac{138}{\log_{20}138})=80.6\)
Status Protein1 Protein2 d d1/2
Query variables 9FDC_1 3TDS_1 98 75
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]