CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5KVW_1 3WYR_1 1RGS_1 Letter Amino acid
16 11 23 A Alanine
4 4 9 Q Glutamine
6 9 26 E Glutamic acid
9 13 23 L Leucine
14 20 19 S Serine
16 5 2 C Cysteine
0 16 1 H Histidine
8 6 17 I Isoleucine
3 4 3 W Tryptophan
8 8 10 Y Tyrosine
10 14 26 V Valine
10 6 10 N Asparagine
24 17 21 G Glycine
11 5 16 K Lycine
12 21 10 P Proline
20 14 10 T Threonine
12 11 23 R Arginine
12 10 19 D Aspartic acid
1 1 6 M Methionine
11 11 14 F Phenylalanine

5KVW_1|Chain A|Thaumatin-1|Thaumatococcus daniellii (4621)
>3WYR_1|Chains A, B|Killer cell immunoglobulin-like receptor 2DL4|Homo sapiens (9606)
>1RGS_1|Chain A|CAMP DEPENDENT PROTEIN KINASE|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5KVW , Knot 99 207 0.85 38 150 203
ATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMNFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTA
3WYR , Knot 93 206 0.80 40 140 195
HHHHHHDDDDKHVGGQDKPFCSAWPSAVVPQGGHVTLRCHYRRGFNIFTLYKKDGVPVPELYNRIFWNSFLISPVTPAHAGTYRCRGFHPHSPTEWSAPSNPLVIMVTGLYEKPSLTARPGPTVRTGENVTLSCSSQSSFDIYHLSREGEAHELRLPAVPSINGTFQADFPLGPATHGETYRCFGSFHGSPYEWSDASDPLPVSVT
1RGS , Knot 122 288 0.80 40 177 271
RRRRGAISAEVYTEEDAASYVRKVIPKDYKTMAALAKAIEKNVLFSHLDDNERSDIFDAMFPVSFIAGETVIQQGDEGDNFYVIDQGEMDVYVNNEWATSVGEGGSFGELALIYGTPRAATVKAKTNVKLWGIDRDSYRRILMGSTLRKRKMYEEFLSKVSILESLDKWERLTVADALEPVQFEDGQKIVVQGEPGDEFFIILEGSAAVLQRRSENEEFVEVGRLGPSDYFGEIALLMNRPRAATVVARGPLKCVKLDRPRFERVLGPCSDILKRNIQQYNSFVSLSV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5KVW_1)}(2) \setminus P_{f(3WYR_1)}(2)|=90\), \(|P_{f(3WYR_1)}(2) \setminus P_{f(5KVW_1)}(2)|=80\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011000000111110010111011100100100101010110011011100000100010110001001111000011011001101010001000101001011011101010001001100110111001101011111000100110000000001001100000110010101100110010010011000001010101
Pair \(Z_2\) Length of longest common subsequence
5KVW_1,3WYR_1 170 3
5KVW_1,1RGS_1 163 3
3WYR_1,1RGS_1 175 3

Newick tree

 
[
	3WYR_1:87.78,
	[
		5KVW_1:81.5,1RGS_1:81.5
	]:6.28
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{413 }{\log_{20} 413}-\frac{206}{\log_{20}206})=60.9\)
Status Protein1 Protein2 d d1/2
Query variables 5KVW_1 3WYR_1 78 76.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]