CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5KDU_1 8BCU_1 1TRB_1 Letter Amino acid
46 7 12 K Lycine
12 5 7 M Methionine
20 9 9 P Proline
0 1 3 C Cysteine
15 10 10 Q Glutamine
32 13 22 E Glutamic acid
29 13 25 T Threonine
26 9 9 Y Tyrosine
32 7 32 A Alanine
41 10 18 D Aspartic acid
12 2 10 H Histidine
25 5 10 F Phenylalanine
9 1 1 W Tryptophan
19 9 16 R Arginine
40 10 35 G Glycine
27 11 22 I Isoleucine
33 11 18 V Valine
42 6 18 N Asparagine
34 12 29 L Leucine
36 10 14 S Serine

5KDU_1|Chain A|F5/8 type C domain protein|Clostridium perfringens (strain ATCC 13124 / DSM 756 / JCM 1290 / NCIMB 6125 / NCTC 8237 / Type A) (195103)
>8BCU_1|Chains A, B, D[auth C], E[auth D], G[auth E], H[auth F]|Tail tube terminator protein p142|Escherichia phage T5 (2695836)
>1TRB_1|Chain A|THIOREDOXIN REDUCTASE|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5KDU , Knot 215 530 0.84 38 261 505
MGSSHHHHHHSSGLVPRGSHMASVLELEMRGDSISEAKKRKVWNFQDWQITGLSARAGDKITVYVDVAEGDPTPTLLYKQSLTQHGGATSFQLKPGKNEITIPEINYESNGIPKDVIQGGDLFFTNYKSDSQKRAPKVRIEGASKYPVFILGKSDENEVMKELEAYVEKIKAEPKTTPNIFAVSSNKSLEFVQATYALDWYKKNNKTPKYTAEQWDQYIADAMGFWGFDNSKDVNSDFNFRIMPMVKNLSGGAFMNAGNGVIGIRPGNQDAILAANKGWGVAHELGHNFDTGGRTIVEVTNNMMPLFFESKYKTKTRITDQNIWENNTYPKVGLDDYSNNELYNKADSTHLAQLAPLWQLYLYDNTFYGKFERQFRERDFGNKNREDIYKSWVVAASDAMELDLTEFFARHGIRVDDKVKEDLAKYPKPDKKIYYLNDLAMNYKGDGFTENAKVSVSTSGSNGNIKLSFSVDDENKDNILGYEIRRDGKYVGFTSNDSFVDTKSNLDEDGVYVVTPYDRKLNTLNPIEVN
8BCU , Knot 82 161 0.86 40 126 159
MDHRTSIAQAMVDRISKQMDGSQPDEYFNNLYGNVSRQTYKFEEIREFPYVAVHIGTETGQYLPSGQQWMFLELPILVYDKEKTDIQEQLEKLVADIKTVIDTGGNLEYTVSKPNGSTFPCEATDMIITSVSTDEGLLAPYGLAEINVTVRYQPPRRSLRR
1TRB , Knot 135 320 0.81 40 189 308
GTTKHSKLLILGSGPAGYTAAVYAARANLQPVLITGMEKGGQLTTTTEVENWPGDPNDLTGPLLMERMHEHATKFETEIIFDHINKVDLQNRPFRLNGDNGEYTCDALIIATGASARYLGLPSEEAFKGRGVSACATSDGFFYRNQKVAVIGGGNTAVEEALYLSNIASEVHLIHRRDGFRAEKILIKRLMDKVENGNIILHTNRTLEEVTGDQMGVTGVRLRDTQNSDNIESLDVAGLFVAIGHSPNTAIFEGQLELENGYIKVQSGIHGNATQTSIPGVFAAGDVMDHIYRQAITSAGTGCMAALDAERYLDGLADAK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5KDU_1)}(2) \setminus P_{f(8BCU_1)}(2)|=170\), \(|P_{f(8BCU_1)}(2) \setminus P_{f(5KDU_1)}(2)|=35\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000000000011110100110110101010010010000110100101011010110010101011010101011000010001110010101100010110100000111001101101110000000000110101011000111111000000110010101001010100010111100000101101001101000000010001001000110111111100000100010101111100101111101101111101100011111001111100110010011001101000111111000000000100001100000101110000000100010000110111110101000010101000100001100000010001111100110101001110011010001000110010100010010011100010110001010100010010101010100000001110010001001110000011000001000110110100001001011010
Pair \(Z_2\) Length of longest common subsequence
5KDU_1,8BCU_1 205 4
5KDU_1,1TRB_1 166 4
8BCU_1,1TRB_1 173 3

Newick tree

 
[
	8BCU_1:98.46,
	[
		5KDU_1:83,1TRB_1:83
	]:15.46
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{691 }{\log_{20} 691}-\frac{161}{\log_{20}161})=150.\)
Status Protein1 Protein2 d d1/2
Query variables 5KDU_1 8BCU_1 193 126
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]