CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3NSO_1 1NAB_1 5IKV_1 Letter Amino acid
4 0 24 R Arginine
11 0 33 E Glutamic acid
1 0 18 H Histidine
5 1 29 T Threonine
6 0 27 Y Tyrosine
6 0 32 V Valine
8 1 25 A Alanine
6 0 23 D Aspartic acid
1 0 31 I Isoleucine
4 0 29 S Serine
10 2 12 C Cysteine
2 2 36 G Glycine
10 0 46 L Leucine
5 0 38 F Phenylalanine
2 0 27 N Asparagine
5 0 31 Q Glutamine
7 0 32 K Lycine
2 0 14 M Methionine
5 0 38 P Proline
1 0 6 W Tryptophan

3NSO_1|Chains A, B|Protein S100-A3|Homo sapiens (9606)
>1NAB_1|Chains A, B|5'-D(*CP*GP*AP*TP*CP*G)-3'|null
>5IKV_1|Chains A, B|Prostaglandin G/H synthase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3NSO , Knot 53 101 0.80 40 88 97
MARPLEQAVAAIVCTFQEYAGRCGDKYKLCQAELKELLQKELATWTPTEFRECDYNKFMSVLDTNKDCEVDFVEYVRSLACLCLYCHEYFKDCPSEPPCSQ
1NAB , Knot 5 6 0.49 8 4 4
CGATCG
5IKV , Knot 223 551 0.85 40 279 527
NPCCSHPCQNRGVCMSVGFDQYKCDCTRTGFYGENCSTPEFLTRIKLFLKPTPNTVHYILTHFKGFWNVVNNIPFLRNAIMSYVLTSRSHLIDSPPTYNADYGYKSWEAFSNLSYYTRALPPVPDDCPTPLGVKGKKQLPDSNEIVEKLLLRRKFIPDPQGSNMMFAFFAQHFTHQFFKTDHKRGPAFTNGLGHGVDLNHIYGETLARQRKLRLFKDGKMKYQIIDGEMYPPTVKDTQAEMIYPPQVPEHLRFAVGQEVFGLVPGLMMYATIWLREHNRVCDVLKQEHPEWGDEQLFQTSRLILIGETIKIVIEDYVQHLSGYHFKLKFDPELLFNKQFQYQNRIAAEFNTLYHWHPLLPDTFQIHDQKYNYQQFIYNNSILLEHGITQFVESFTRQIAGRVAGGRNVPPAVQKVSQASIDQSRQMKYQSFNEYRKRFMLKPYESFEELTGEKEMSAELEALYGDIDAVELYPALLVEKPRPDAIFGETMVEVGAPFSLKGLMGNVICSPAYWKPSTFGGEVGFQIINTASIQSLICNNVKGCPFTSFSVP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3NSO_1)}(2) \setminus P_{f(1NAB_1)}(2)|=86\), \(|P_{f(1NAB_1)}(2) \setminus P_{f(3NSO_1)}(2)|=2\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11011001111110010001100100001001010011000110101001000000011011000000010110010011010100000100010011000
Pair \(Z_2\) Length of longest common subsequence
3NSO_1,1NAB_1 88 2
3NSO_1,5IKV_1 237 3
1NAB_1,5IKV_1 279 2

Newick tree

 
[
	5IKV_1:14.27,
	[
		3NSO_1:44,1NAB_1:44
	]:10.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{107 }{\log_{20} 107}-\frac{6}{\log_{20}6})=39.8\)
Status Protein1 Protein2 d d1/2
Query variables 3NSO_1 1NAB_1 50 26.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]