CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2IYB_1 6XBS_1 8JFQ_1 Letter Amino acid
6 5 0 K Lycine
5 5 0 F Phenylalanine
5 10 2 T Threonine
7 9 0 R Arginine
5 13 0 D Aspartic acid
2 2 3 C Cysteine
6 8 0 H Histidine
4 11 0 L Leucine
12 14 6 A Alanine
3 12 0 E Glutamic acid
5 7 0 I Isoleucine
7 8 0 S Serine
2 4 0 P Proline
2 1 0 W Tryptophan
4 4 0 Y Tyrosine
8 3 0 N Asparagine
8 3 0 Q Glutamine
8 13 15 G Glycine
4 2 0 M Methionine
11 12 0 V Valine

2IYB_1|Chains A, B, C, D|PROTEIN ENABLED HOMOLOG|HOMO SAPIENS (9606)
>6XBS_1|Chain A|Methylmalonyl-CoA epimerase|Streptomyces coelicolor (100226)
>8JFQ_1|Chain A[auth X]|26mer-DNA|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2IYB , Knot 60 114 0.83 40 98 111
RMSEQSICQARAAVMVYDDANKKWVPAGGSTGFSRVHIYHHTGNNTFRVVGRKIQDHQVVINCAIPKGLKYNQATQTFHQWRDARQVYGLNFGSKEDANVFASAMMHALEVLNS
6XBS , Knot 76 146 0.86 40 120 143
SLTRIDHIGIACHDLDATVEFYRATYGFEVFHTEVNEEQGVRQAMLKINDTSDGGASYLQLLEPTREDSAVGKWLAKNGEGVHHIAFGTADVDADAADIRDKGVRVLYDEPRRGSMGSRITFLHPKDCHGVLTELVTSAAVESPEH
8JFQ , Knot 9 26 0.37 8 9 14
TCGGGGAGGCAGGGCGGGAGGAAGAT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2IYB_1)}(2) \setminus P_{f(6XBS_1)}(2)|=59\), \(|P_{f(6XBS_1)}(2) \setminus P_{f(2IYB_1)}(2)|=81\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010000100101111100010001111110011001010000100010111001000011100111011000010001001001001011011000010111011101101100
Pair \(Z_2\) Length of longest common subsequence
2IYB_1,6XBS_1 140 3
2IYB_1,8JFQ_1 97 3
6XBS_1,8JFQ_1 121 3

Newick tree

 
[
	6XBS_1:70.16,
	[
		2IYB_1:48.5,8JFQ_1:48.5
	]:21.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{260 }{\log_{20} 260}-\frac{114}{\log_{20}114})=46.2\)
Status Protein1 Protein2 d d1/2
Query variables 2IYB_1 6XBS_1 62 54
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]