CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6TBL_1 2XGR_1 6YLN_1 Letter Amino acid
9 16 17 V Valine
10 14 10 A Alanine
7 10 6 R Arginine
27 31 21 L Leucine
15 20 9 S Serine
2 5 13 F Phenylalanine
1 24 12 N Asparagine
4 14 19 D Aspartic acid
6 15 23 G Glycine
6 11 20 K Lycine
8 11 10 P Proline
3 1 2 C Cysteine
5 9 8 Q Glutamine
2 7 14 H Histidine
3 4 5 M Methionine
2 14 10 Y Tyrosine
5 11 16 E Glutamic acid
3 11 12 I Isoleucine
6 19 16 T Threonine
1 5 2 W Tryptophan

6TBL_1|Chains A, B|MMS19 nucleotide excision repair protein homolog|Mus musculus (10090)
>2XGR_1|Chain A|SPD1 NUCLEASE|STREPTOCOCCUS PYOGENES SEROTYPE M1 (301447)
>6YLN_1|Chain A|EGFP|Vaccinia virus (10245)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6TBL , Knot 57 125 0.73 40 90 118
GGGRELPTLLSLLLEALSCPDSVVQLSTLSCLQPLLLEAPQIMSLHVDTLVTKFLNLSSSYSMAVRIAALQCMHALTRLPTSVLLPYKSQVIRALAKPLDDKKRLVRKEAVSARGEWFLLGSPGS
2XGR , Knot 114 252 0.83 40 166 241
MKLSKQKASLLTAVLLLLSLSITTITVDAARVRTYPNVSHANTHYKNTVSSKLLPFTANYQLQLGELDNLNRATFSHIQLQDRHETKDVRTKINYDPVGWHNYQFPYGDGSKSSWVMNRGHLVGYQFCGLNDEPRNLVAMTAWLNTGAYSGANDSNPEGMLYYENRLDSWLALHPDFWLDYKVTPIYSGNEVVPRQIELQYVGIDSSGELLTIRLNSNKESIDENGVTTVILENSAPNINLDYLNGTATPKN
6YLN , Knot 111 245 0.83 40 161 229
MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLSWGVQCFARYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYFSDNVYITADKQKNGIKANFKIRHNIEDGGVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6TBL_1)}(2) \setminus P_{f(2XGR_1)}(2)|=44\), \(|P_{f(2XGR_1)}(2) \setminus P_{f(6TBL_1)}(2)|=120\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100110110111011001001101001001011110110110101001100110100000111011110010110011001111000011011101100000110001101010111110110
Pair \(Z_2\) Length of longest common subsequence
6TBL_1,2XGR_1 164 4
6TBL_1,6YLN_1 171 4
2XGR_1,6YLN_1 147 3

Newick tree

 
[
	6TBL_1:86.92,
	[
		2XGR_1:73.5,6YLN_1:73.5
	]:13.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{377 }{\log_{20} 377}-\frac{125}{\log_{20}125})=76.7\)
Status Protein1 Protein2 d d1/2
Query variables 6TBL_1 2XGR_1 99 71
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]