CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7QRD_1 6ZTC_1 7BBJ_1 Letter Amino acid
18 15 41 D Aspartic acid
23 8 48 G Glycine
25 13 28 A Alanine
3 3 8 C Cysteine
20 9 16 Q Glutamine
19 14 34 I Isoleucine
33 23 49 L Leucine
17 12 33 K Lycine
12 6 12 M Methionine
24 8 21 F Phenylalanine
14 10 21 R Arginine
5 6 5 W Tryptophan
18 9 19 Y Tyrosine
18 10 26 P Proline
12 5 19 H Histidine
20 12 29 E Glutamic acid
32 5 39 S Serine
25 14 24 T Threonine
26 9 46 V Valine
14 9 28 N Asparagine

7QRD_1|Chains A, B, C, D|Histone-arginine methyltransferase CARM1|Mus musculus (10090)
>6ZTC_1|Chains A, B|Hematopoietic prostaglandin D synthase|Homo sapiens (9606)
>7BBJ_1|Chains A, B|5'-nucleotidase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7QRD , Knot 166 378 0.87 40 221 372
GHTLERSVFSERTEESSAVQYFQFYGYLSQQQNMMQDYVRTGTYQRAILQNHTDFKDKIVLDVGCGSGILSFFAAQAGARKIYAVEASTMAQHAEVLVKSNNLTDRIVVIPGKVEEVSLPEQVDIIISEPMGYMLFNERMLESYLHAKKYLKPSGNMFPTIGDVHLAPFTDEQLYMEQFTKANFWYQPSFHGVDLSALRGAAVDEYFRQPVVDTFDIRILMAKSVKYTVNFLEAKEGDLHRIEIPFKFHMLHSGLVHGLAFWFDVAFIGSIMTVWLSTAPTEPLTHWYQVRCLFQSPLFAKAGDTLSGTCLLIANKRQSYDISIVAQVDQTGSKSSNLLDLKNPFFRYTGTTPSPPPGSHYTSPSENMWNTGSTYNLS
6ZTC , Knot 99 200 0.87 40 152 197
GMPNYKLTYFNMRGRAEIIRYIFAYLDIQYEDHRIEQADWPEIKSTLPFGKIPILEVDGLTLHQSLAIARYLTKNTDLAGNTEMEQCHVDAIVDTLDDFMSCFPWAEKKQDVKEQMFNELLTYNAPHLMQDLDTYLGGREWLIGNSVTWADFYWEICSTTLLVFKPDLLDNHPRLVTLRKKVQAIPAVANWIKRRPQTKL
7BBJ , Knot 223 546 0.85 40 263 511
MAHHHHHHVGTGSNDDDDKSPDPWELTILHTNDVHSRLEQTSEDSSKCVDASRCMGGVARLFTKVQQIRRAEPNVLLLDAGDQYQGTIWFTVYKGAEVAHFMNALRYDAMALGNHEFDNGVEGLIEPLLKEAKFPILSANISASGPLASQISGLYLPYKVLPVGDEVVGIVGYTSKETPFLSNPGTNLVFEDEITALQPEVDKLKTLNVNKIIALGHSGFEMDKLIAQKVRGVDVVVGGHSNTFLYTGNPPSKEVPAGKYPFIVTSDDGRKVPVVQAYAFGKYLGYLKIEFDERGNVISSHGNPILLDSSIPEDPSIKADINKWRIKLDDYSTQELGKTIVYLDGSSQSCRFRECNMGNLICDAMINNNLRHADEMFWNHVSMCILNGGGIRSPIDERNDGTITWENLAAVLPFGGTFDLVQLKGSTLKKAFEHSVHRYGQSTGEFLQVGGIHVVYDLSRKPGDRVVKLDVLCTSCRVPSYDPLKMDEVYKVILPNFLANGGDGFQMIKDELLRHDSGDQDINVVSTYISKMKVIYPAVEGRIKFS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7QRD_1)}(2) \setminus P_{f(6ZTC_1)}(2)|=132\), \(|P_{f(6ZTC_1)}(2) \setminus P_{f(7QRD_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100011000000001100101010100000110001001000011100000100011101101011101111011100101101001100101110000100011111101001011001011100111011100011000101000101010111011010111100001010010010110010101101011011110001001110010101111001000101101001010010111010110011101111110111110110111001100110010010011001111011001010011110000000101110100010000011010011100010010111100000100011001000010
Pair \(Z_2\) Length of longest common subsequence
7QRD_1,6ZTC_1 195 4
7QRD_1,7BBJ_1 154 3
6ZTC_1,7BBJ_1 215 4

Newick tree

 
[
	6ZTC_1:10.84,
	[
		7QRD_1:77,7BBJ_1:77
	]:32.84
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{578 }{\log_{20} 578}-\frac{200}{\log_{20}200})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 7QRD_1 6ZTC_1 143 107.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]