CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2DOS_1 4TPO_1 2PHG_1 Letter Amino acid
7 31 12 V Valine
7 29 16 R Arginine
4 13 1 H Histidine
23 47 16 L Leucine
6 9 15 K Lycine
4 6 6 M Methionine
11 17 9 F Phenylalanine
3 6 0 W Tryptophan
6 51 24 A Alanine
16 18 13 S Serine
16 19 9 E Glutamic acid
13 21 7 G Glycine
10 14 17 I Isoleucine
5 25 12 T Threonine
6 9 5 Y Tyrosine
9 16 7 N Asparagine
8 24 12 D Aspartic acid
3 3 6 C Cysteine
12 18 9 Q Glutamine
7 31 10 P Proline

2DOS_1|Chain A|Ataxin-3|Homo sapiens (9606)
>4TPO_1|Chain A|Putative P450-like protein|Streptomyces scabies (680198)
>2PHG_1|Chain A|Transcription initiation factor IIB|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2DOS , Knot 87 176 0.85 40 136 170
GPLGSMESIFHEKQEGSLCAQHCLNNLLQGEYFSPVELSSIAHQLDEEERMRMAEGGVTSEDYRTFLQQPSGNMDDSGFFSIQVISNALKVWGLELILFNSPEYQRLRIDPINERSFICNYKEHWFTVRKLGKQWFNLNSLLTGPELISDTYLALFLAQLQQEGYSIFVVKGDLPD
4TPO , Knot 170 407 0.83 40 222 386
HMTVPSPLADPSIVPDPYPVYADLAQRRPVHWVERLNAWAVLTYADCAAGLKDPRLTADRGTEVLAAKFPGQPLPPDNIFHRWTKNVVMYTDPPLHDALRRSVRAGFTRAAHQHYDQVLQKVAHDLVASIPAGATEIDAVPALAAELPVRSAVHAFGVPEEDLGFLIPRVNTIMTYHSGPKDQPVTQEIILEKLTDLHTYASELLQGMRGKVLPDTVIARLAAAQDGLTETTPEQTVHQLALVFIALFAPTTPGSLSSGTLAFARNPRQVERFLADQACVDNTANEVLRYNASNQFTWRVAAKDVEMGGVRIEAGQTLALFLGSANRDANMFERPNDFDLDRPNSARHLSFGQGVHACLAAQLISLQLKWFYVALLNRFPGIRTAGEPIWNENLEFRSLRSLPLSLR
2PHG , Knot 91 206 0.78 38 138 195
SRAMMNAFKEITTMADRINLPRNIVDRTNNLFKQVYEQKSLKGRANDAIASACLYIACRQEGVPRTFKEICAVSRISKKEIGRCFKLILKALETSVDLITTGDFMSRFCSNLCLPKQVQMAATHIARKAVELDLVPGRSPISVAAAAIYMASQASAEKRTQKEIGDIAGVADVTIRQSYRLIYPRAPDLFPTDFKFDTPVDKLPQL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2DOS_1)}(2) \setminus P_{f(4TPO_1)}(2)|=48\), \(|P_{f(4TPO_1)}(2) \setminus P_{f(2DOS_1)}(2)|=134\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11110100110000010101000100110100101101001100100000101101110000000110010101000111010110011011110111100100001010110000110000001101001100110100110110110000111111010001001111010110
Pair \(Z_2\) Length of longest common subsequence
2DOS_1,4TPO_1 182 5
2DOS_1,2PHG_1 172 3
4TPO_1,2PHG_1 192 3

Newick tree

 
[
	4TPO_1:95.91,
	[
		2DOS_1:86,2PHG_1:86
	]:9.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{583 }{\log_{20} 583}-\frac{176}{\log_{20}176})=117.\)
Status Protein1 Protein2 d d1/2
Query variables 2DOS_1 4TPO_1 147 103.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]