CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6JGQ_1 8UVG_1 8PGL_1 Letter Amino acid
7 17 3 W Tryptophan
56 60 32 A Alanine
28 24 10 R Arginine
39 15 13 D Aspartic acid
63 37 24 G Glycine
12 6 11 H Histidine
18 21 3 M Methionine
23 23 9 N Asparagine
19 14 5 Q Glutamine
32 41 15 P Proline
52 38 30 V Valine
48 90 25 L Leucine
33 22 6 K Lycine
20 45 4 F Phenylalanine
30 36 23 S Serine
8 7 1 C Cysteine
24 27 18 E Glutamic acid
33 37 11 I Isoleucine
42 39 14 T Threonine
22 8 9 Y Tyrosine

6JGQ_1|Chain A|BETA-D-GLUCAN GLUCOHYDROLASE ISOENZYME EXO1|Hordeum vulgare subsp. vulgare (112509)
>8UVG_1|Chains A, B|Na(+)/dicarboxylate cotransporter 3|Homo sapiens (9606)
>8PGL_1|Chain A|Beta-lactamase VIM-1|Pseudomonas aeruginosa (287)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6JGQ , Knot 242 609 0.85 40 275 569
HHAADYVLYKDATKPVEDRVADLLGRMTLAEKIGQMTQIERLVATPDVLRDNFIGSLLSGGGSVPRKGATAKEWQDMVDGFQKACMSTRLGIPMIYGIDAVHGQNNVYGATIFPHNVGLGATRDPYLVKRIGEATALEVRATGIQYAFAPCIAVCRDPRWGRCYESYSEDRRIVQSMTELIPGLQGDVPKDFTSGMPFVAGKNKVAACAKHFVGDGGTVDGINENNTIINREGLMNIHMPAYKNAMDKGVSTVMISYSSWNGVKMHANQDLVTGYLKDTLKFKGFVISDWEGIDRITTPAGSDYSYSVKASILAGLDMIMVPNKYQQFISILTGHVNGGVIPMSRIDDAVTRILRVKFTMGLFENPYADPAMAEQLGKQEHRDLAREAARKSLVLLKNGKTSTDAPLLPLPKKAPKILVAGSHADNLGYQCGGWTIEYQGDTGRTTVGTTILEAVKAAVDPSTVVVFAENPDAEFVKSGGFSYAIVAVGEHPYTETKGDNLNLTIPEPGLSTVQAVCGGVRCATVLISGRPVVVQPLLAASDALVAAWLPGSEGQGVTDALFGDFGFTGRLPRTWFKSVDQLPMNVGDAHYDPLFRLGYGLTTNATKKY
8UVG , Knot 231 607 0.81 40 250 545
MAALAAAAKKVWSARRLLVLLFTPLALLPVVFALPPKEGRCLFVILLMAVYWCTEALPLSVTALLPIVLFPFMGILPSNKVCPQYFLDTNFLFLSGLIMASAIEEWNLHRRIALKILMLVGVQPARLILGMMVTTSFLSMWLSNTASTAMMLPIANAILKSLFGQKEVRKDPSQESEENTAAVRRNGLHTVPTEMQFLASTEAKDHPGETEVPLDLPADSRKEDEYRRNIWKGFLISIPYSASIGGTATLTGTAPNLILLGQLKSFFPQCDVVNFGSWFIFAFPLMLLFLLAGWLWISFLYGGLSFRGWRKNKSEIRTNAEDRARAVIREEYQNLGPIKFAEQAVFILFCMFAILLFTRDPKFIPGWASLFNPGFLSDAVTGVAIVTILFFFPSQRPSLKWWFDFKAPNTETEPLLTWKKAQETVPWNIILLLGGGFAMAKGCEESGLSVWIGGQLHPLENVPPALAVLLITVVIAFFTEFASNTATIIIFLPVLAELAIRLRVHPLYLMIPGTVGCSFAFMLPVSTPPNSIAFASGHLLVKDMVRTGLLMNLMGVLLLSLAMNTWAQTIFQLGTFPDWADMYSVNVTALPPTLANDTFRTLSGAGA
8PGL , Knot 117 266 0.81 40 152 253
MLKVISSLLVYMTASVMAVASPLAHSGEPSGEYPTVNEIPVGEVRLYQIADGVWSHIATQSFDGAVYPSNGLIVRDGDELLLIDTAWGAKNTAALLAEIEKQIGLPVTRAVSTHFHDDRVGGVDVLRAAGVATYASPSTRRLAEAEGNEIPTHSLEGLSSSGDAVRFGPVELFYPGAAHSTDNLVVYVPSANVLYGGCAVHELSSTSAGNVADADLAEWPTSVERIQKHYPEAEVVIPGHGLPGGLDLLQHTANVVKAHKNRSVAE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6JGQ_1)}(2) \setminus P_{f(8UVG_1)}(2)|=87\), \(|P_{f(8UVG_1)}(2) \setminus P_{f(6JGQ_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001100110001001100011011101011001101001001110101100011101101110110011010010011011001010001111110110110100010110111001111100010110011010110101011001111011100010110000000000011001001111101011001001111111000111010011101101011000001100011101011100011001100111000010110101000110101000101011110010110010011100000010101111101111100000110110101011111100100110011010101111001010111100110000001100110001111001000001111111001101111100100110001110100010010001100110110111010011111001010110011100111111001000001001010110111001011011100101110101111011111001111111110010110011110111010110011001001110110100011101101100010000
Pair \(Z_2\) Length of longest common subsequence
6JGQ_1,8UVG_1 149 4
6JGQ_1,8PGL_1 183 4
8UVG_1,8PGL_1 198 4

Newick tree

 
[
	8PGL_1:10.31,
	[
		6JGQ_1:74.5,8UVG_1:74.5
	]:26.81
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1216 }{\log_{20} 1216}-\frac{607}{\log_{20}607})=155.\)
Status Protein1 Protein2 d d1/2
Query variables 6JGQ_1 8UVG_1 198 198
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]