CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7PTM_1 5NFH_1 6UFW_1 Letter Amino acid
22 14 3 H Histidine
7 24 6 F Phenylalanine
14 29 4 R Arginine
18 34 10 D Aspartic acid
6 9 10 Q Glutamine
9 7 2 M Methionine
10 30 9 S Serine
5 11 3 W Tryptophan
20 36 9 A Alanine
2 5 2 C Cysteine
42 37 14 G Glycine
12 48 10 L Leucine
16 29 4 P Proline
21 29 13 T Threonine
11 23 8 Y Tyrosine
21 49 5 V Valine
10 18 14 N Asparagine
16 39 2 E Glutamic acid
14 30 7 I Isoleucine
8 35 14 K Lycine

7PTM_1|Chains A, B, C, D, E, F|Two-domain laccase|Streptomyces griseoflavus (35619)
>5NFH_1|Chains A, B|Methionyl-tRNA synthetase, putative|Trypanosoma brucei brucei (185431)
>6UFW_1|Chain A|Endoglucanase|Bacillus subtilis (strain 168) (224308)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7PTM , Knot 124 284 0.82 40 182 269
AAGAAPAGGEVRRVTMYAERLAGGQMGYGLEKGKASIPGPLIELNEGDTLHVEFENTMDVPVSLHVHGLDYEISSDGTKQNKSHVEPGGTRTYTWRTHEPGRRADGTWRAGSAGYWHYHDHVVGTEHGTGGIRNGLYGPVIVRRKGDVLPDATHTIVFNDGTINNRPAHTGPNFEATVGDRVEIVMITHGEYYHTFHMHGHHWADNRTGMLTGPDDPSQVIDNKICGPADSFGFQIIAGEGVGAGAWMYHCHVQSHSDMGMVGLFLVKKPDGTIPGYDPQEHAH
5NFH , Knot 218 536 0.85 40 254 501
GPGSMKVEKVFFVTSPIYYVNAAPHIGHVYSTLITDVIGRYHRVKGERVFALTGTDEHGQKVAEAAKQKQVSPYDFTTAVAGEFKKCFEQMDYSIDYFIRTTNEQHKAVVKELWTKLEQKGDIYLGRYEGWYSISDESFLTPQNITDGVDKDGNPCKVSLESGHVVTWVSEENYMFRLSAFRERLLEWYHANPGCIVPEFRRREVIRAVEKGLPDLSVSRARATLHNWAIPVPGNPDHCVYVWLDALTNYLTGSRLRVDESGKEVSLVDDFNELERFPADVHVIGKDILKFHAIYWPAFLLSAGLPLPKKIVAHGWWTKDRKKISKSLGNVFDPVEKAEEFGYDALKYFLLRESGFSDDGDYSDKNMIARLNGELADTLGNLVMRCTSAKINVNGEWPSPAAYTEEDESLIQLIKDLPGTADHYYLIPDIQKAIIAVFDVLRAINAYVTDMAPWKLVKTDPERLRTVLYITLEGVRVTTLLLSPILPRKSVVIFDMLGVPEVHRKGIENFEFGAVPPGTRLGPAVEGEVLFSKRST
6UFW , Knot 73 149 0.81 40 120 146
MASISVQYRAGDGSMNSNQIRPQLQIKNNGNTTVDLKDVTARYWYKAKNKGQNFDCDYAQIGCGNVTHKFVTLHKPKQGADTYLELGFKNGTLAPGASTGNIQLRLHNDDWSNYAQSGDYSFFKSNTFKTTKKITLYDQGKLIWGTEPN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7PTM_1)}(2) \setminus P_{f(5NFH_1)}(2)|=51\), \(|P_{f(5NFH_1)}(2) \setminus P_{f(7PTM_1)}(2)|=123\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11111111101001010100111101101100101011111101001001010100010111010101100010001000000010111000001000011001010101101101000001110001011100110111110001011101000111001010001100110101011001011110010000010101001100001110110010011000101110011101111011111111000010000011111111100101011100100010
Pair \(Z_2\) Length of longest common subsequence
7PTM_1,5NFH_1 174 3
7PTM_1,6UFW_1 174 3
5NFH_1,6UFW_1 202 4

Newick tree

 
[
	6UFW_1:96.55,
	[
		7PTM_1:87,5NFH_1:87
	]:9.55
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{820 }{\log_{20} 820}-\frac{284}{\log_{20}284})=146.\)
Status Protein1 Protein2 d d1/2
Query variables 7PTM_1 5NFH_1 185 140
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]