CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7MMT_1 6MIJ_1 9BJN_1 Letter Amino acid
7 0 1 W Tryptophan
18 12 0 Y Tyrosine
17 39 0 V Valine
21 11 2 F Phenylalanine
13 20 0 P Proline
25 13 1 S Serine
18 29 0 T Threonine
21 26 0 I Isoleucine
18 21 0 K Lycine
24 35 0 A Alanine
12 21 0 R Arginine
24 23 0 D Aspartic acid
17 40 0 G Glycine
18 9 0 N Asparagine
1 5 0 C Cysteine
5 11 0 H Histidine
6 11 0 M Methionine
17 8 0 Q Glutamine
25 35 0 E Glutamic acid
35 27 0 L Leucine

7MMT_1|Chains A, B|Ribonucleoside-diphosphate reductase|Aerococcus urinae (strain ACS-120-V-Col10a) (2976812)
>6MIJ_1|Chain A|Elongation factor Tu|Acinetobacter baumannii (strain ATCC 19606 / DSM 30007 / CIP 70.34 / JCM 6841 / NBRC 109757 / NCIMB 12457 / NCTC 12156 / 81) (575584)
>9BJN_1|Chain A|D-peptide ffspy|synthetic construct (32630)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7MMT , Knot 153 342 0.87 40 214 331
SNILEMTKNYYDRSVSPVEYAYFDQSQNMRAINWNKIVDEKDLEVWNRVTQNFWLPENIPVSNDLPSWNELDDDWQQLITRTFTGLTLLDTVQSSIGDVAQIKNSLTEQEQVIYANFAFMVGVHARSYGTIFSTLCTSEQIEEAHEWVVDNEALQARPKALIPFYTADDPLKSKIAAALMPGFLLYGGFYLPFYLSARGKLPNTSDIIRLILRDKVIHNFYSGYKYQLKVAKLSPEKQAEMKQFVFDLLDKMIGLEKTYLHQLYDGFGLADEAIRFSLYNAGKFLQNLGYESPFTKEETRIAPEVFAQLSARADENHDFFSGSGSSYIIGTSEETLDEDWDF
6MIJ , Knot 168 396 0.84 38 207 367
MAKAKFERNKPHVNVGTIGHVDHGKTTLTAAIATICAKTYGGEAKDYSQIDSAPEEKARGITINTSHVEYDSPTRHYAHVDCPGHADYVKNMITGAAQMDGAILVCAATDGPMPQTREHILLSRQVGVPYIIVFLNKCDLVDDEELLELVEMEVRELLSTYDFPGDDTPVIRGSALAALNGEAGPYGEESVLALVAALDSYIPEPERAIDKAFLMPIEDVFSISGRGTVVTGRVEAGIIKVGEEVEIVGIKDTVKTTVTGVEMFRKLLDEGRAGENCGILLRGTKREEVQRGQVLAKPGTIKPHTKFDAEVYVLSKEEGGRHTPFLNGYRPQFYFRTTDVTGAIQLKEGVEMVMPGDNVEMSVELIHPIAMDPGLRFAIREGGRTVGAGVVAKVTA
9BJN , Knot 4 5 0.42 8 4 3
XFFSW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7MMT_1)}(2) \setminus P_{f(6MIJ_1)}(2)|=85\), \(|P_{f(6MIJ_1)}(2) \setminus P_{f(7MMT_1)}(2)|=78\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001101000000001011001010000010110100110000101100100011110011100011010010001001100010110110010001101101000100000110101111111010001011001000001001001110001101010111110010011000111111111110111011101010101100001101110001100100100001011010100010100111011001111000010010011111001101010011011001100011000000111011101010100000110101000111000001000101
Pair \(Z_2\) Length of longest common subsequence
7MMT_1,6MIJ_1 163 3
7MMT_1,9BJN_1 212 3
6MIJ_1,9BJN_1 209 2

Newick tree

 
[
	9BJN_1:11.05,
	[
		7MMT_1:81.5,6MIJ_1:81.5
	]:30.55
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{738 }{\log_{20} 738}-\frac{342}{\log_{20}342})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 7MMT_1 6MIJ_1 134 128.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]