CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8GFQ_1 9EOI_1 6VBP_1 Letter Amino acid
33 33 14 A Alanine
17 23 6 R Arginine
14 2 1 M Methionine
22 10 10 P Proline
50 12 9 N Asparagine
28 14 12 E Glutamic acid
15 21 13 G Glycine
42 6 7 I Isoleucine
3 7 3 W Tryptophan
2 1 5 C Cysteine
21 12 14 Q Glutamine
33 5 7 F Phenylalanine
42 6 33 S Serine
21 8 16 T Threonine
35 4 10 Y Tyrosine
13 9 17 V Valine
28 7 9 D Aspartic acid
14 2 4 H Histidine
55 26 16 L Leucine
56 1 14 K Lycine

8GFQ_1|Chain A|Lytic transglycosylase domain-containing protein|Campylobacter jejuni (197)
>9EOI_1|Chains A, B|Putative lytic enzyme|Pseudomonas aeruginosa UCBPP-PA14 (208963)
>6VBP_1|Chains A[auth B], C[auth D], E[auth I], J[auth L]|DH815 light chain|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8GFQ , Knot 210 544 0.81 40 235 497
MGSSHHHHHHSSGLVPRGSHMQYSIEKLKKEENSLAKDYYIYRLLEKNKISKKDAQDLNSHIFRYIGKIKSELEKIIPLKPYINPKYAKCYTYTANTILDANLTCQSVRLNSLVFIASLNSKDRTTLAQTFKNQRPDLTNLLLAFNTSDPMSYIVQKEDINGFFKLYNYSKKYDLDLNTSLVNKLPNHIGFKDFAQNIIIKKENPKFRHSMLEINPENVSEDSAFYLGVNALTYDKTELAYDFFKKAAQSFKSQSNKDNAIFWMWLIKNNEEDLKTLSQSSSLNIYSLYAKELTNTPFPKIESLNPSKKKNNFNMQDPFAWQKINKQIRDANASQLDVLAKEFDTQETLPIYAYILERKNNFKKHYFIMPYYDNIKDYNKTRQALILAIARQESRFIPTAISVSYALGMMQFMPFLANHIGEKELKIPNFDQDFMFKPEIAYYFGNYHLNYLESRLKSPLFVAYAYNGGIGFTNRMLARNDMFKTGKFEPFLSMELVPYQESRIYGKKVLANYIVYRHLLNDSIKISDIFENLIQNKANDLNKS
9EOI , Knot 96 209 0.81 40 127 200
MKLTEQQLLRIFPNARLVAGVFVVALQRAMDEREIDTPARCAAFLAQVGHESSQLTRLVENLNYSAQGLAATWPGRYLGPDGQPNALALRLARNPQAIADNTYATRNGNGDEASGDGWRFRGRGLLQITGRANYRLVGEALGEPLEAEPWRLEQPVPAARSAAWWWAGHGLNELADRGEFAAITRRINGGLNGQAERLALWQRARAVLS
6VBP , Knot 99 220 0.81 40 155 211
DIVMTQSPDSLAVSLGERATINCKSSQSVFHSSNNKNYLAWYQQKPGQSPKLLIHWASARESGVPERISGSGSGTDFTLTISSLQAEDVAVYYCQQYYSTPLTFGGGTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8GFQ_1)}(2) \setminus P_{f(9EOI_1)}(2)|=156\), \(|P_{f(9EOI_1)}(2) \setminus P_{f(8GFQ_1)}(2)|=48\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000001111010010001001000000110000100110000100001001000110011010001001111010101001000000100110101000010100111110100000001100100001010011111000011001100001011101000000001010001100110011100110011100001010001101010010000110111011000000110011001100100000000111111110000001001000001010010100100011101001010000001010011110010001001010010111001000001110101100000100001111000010000000011111110000011101101001111101111110011000101101000111010110011000100100010011111010011111000111000110010101110101110000010100111001100011000101001100110001001000
Pair \(Z_2\) Length of longest common subsequence
8GFQ_1,9EOI_1 204 4
8GFQ_1,6VBP_1 180 3
9EOI_1,6VBP_1 156 3

Newick tree

 
[
	8GFQ_1:10.52,
	[
		6VBP_1:78,9EOI_1:78
	]:23.52
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{753 }{\log_{20} 753}-\frac{209}{\log_{20}209})=151.\)
Status Protein1 Protein2 d d1/2
Query variables 8GFQ_1 9EOI_1 189 131.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]