CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9CAM_1 2CEU_1 4XZD_1 Letter Amino acid
17 0 12 H Histidine
46 0 8 P Proline
95 2 13 L Leucine
42 0 9 F Phenylalanine
25 0 1 W Tryptophan
70 0 15 A Alanine
48 0 19 D Aspartic acid
40 2 8 Q Glutamine
61 2 9 E Glutamic acid
49 2 10 I Isoleucine
40 2 8 Y Tyrosine
40 0 4 R Arginine
71 2 11 N Asparagine
46 1 22 G Glycine
45 0 11 K Lycine
19 0 13 M Methionine
7 4 0 C Cysteine
78 2 26 S Serine
64 1 12 T Threonine
64 1 6 V Valine

9CAM_1|Chains A, B|Aminopeptidase N|Homo sapiens (9606)
>2CEU_1|Chains A, C|INSULIN|HOMO SAPIENS (9606)
>4XZD_1|Chains A, B[auth D]|Extracellular heme acquisition hemophore HasA|Yersinia pseudotuberculosis IP 32953 (273123)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9CAM , Knot 360 967 0.85 40 319 859
MAKGFYISKSLGILGILLGVAAVCTIIALSVVYSQEKNKNANSSPVASTTPSASATTNPASATTLDQSKAWNRYRLPNTLKPDSYRVTLRPYLTPNDRGLYVFKGSSTVRFTCKEATDVIIIHSKKLNYTLSQGHRVVLRGVGGSQPPDIDKTELVEPTEYLVVHLKGSLVKDSQYEMDSEFEGELADDLAGFYRSEYMEGNVRKVVATTQMQAADARKSFPCFDEPAMKAEFNITLIHPKDLTALSNMLPKGPSTPLPEDPNWNVTEFHTTPKMSTYLLAFIVSEFDYVEKQASNGVLIRIWARPSAIAAGHGDYALNVTGPILNFFAGHYDTPYPLPKSDQIGLPDFNAGAMENWGLVTYRENSLLFDPLSSSSSNKERVVTVIAHELAHQWFGNLVTIEWWNDLWLNEGFASYVEYLGADYAEPTWNLKDLMVLNDVYRVMAVDALASSHPLSTPASEINTPAQISELFDAISYSKGASVLRMLSSFLSEDVFKQGLASYLHTFAYQNTIYLNLWDHLQEAVNNRSIQLPTTVRDIMNRWTLQMGFPVITVDTSTGTLSQEHFLLDPDSNVTRPSEFNYVWIVPITSIRDGRQQQDYWLIDVRAQNDLFSTSGNEWVLLNLNVTGYYRVNYDEENWRKIQTQLQRDHSAIPVINRAQIINDAFNLASAHKVPVTLALNNTLFLIEERQYMPWEAALSSLSYFKLMFDRSEVYGPMKNYLKKQVTPLFIHFRNNTNNWREIPENLMDQYSEVNAISTACSNGVPECEEMVSGLFKQWMENPNNNPIHPNLRSTVYCNAIAQGGEEEWDFAWEQFRNATLVNEADKLRAALACSKELWILNRYLSYTLNPDLIRKQDATSTIISITNNVIGQGLVWDFVQSNWKKLFNDYGGGSFSFSNLIQAVTRRFSTEYELQQLEQFKKDNEETGFGSGTRALEQALEKTKANIKWVKENKEVVLQWFTENSK
2CEU , Knot 15 21 0.72 22 20 19
GIVEQCCTSICSLYQLENYCN
4XZD , Knot 96 217 0.79 38 149 205
MRGSHHHHHHGSMSTTIQYNSNYADYSISSYLREWANNFGDIDQAPAETKDRGSFSGSSTLFSGTQYAIGSSHSNPEGMIAEGDLKYSFMPQHTFHGQIDTLQFGKDLATNAGGPSAGKHLEKIDITFNELDLSGEFDSGKSMTENHQGDMHKSVRGLMKGNPDPMLEVMKAKGINVDTAFKDLSIASQYPDSGYMSDAPMVDTVGVMDSNDMLLAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9CAM_1)}(2) \setminus P_{f(2CEU_1)}(2)|=302\), \(|P_{f(2CEU_1)}(2) \setminus P_{f(9CAM_1)}(2)|=3\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101101000111111111111100111101100000000100011100010101000110100100001100001100101000010101010100011011010001010000100111100001000100100111011110011010000110100011101010110000001000101011001111000001010100111000101101000110100111010101011010010110011101100111001010100100010100011111100100100010011110111010111110100110101111011110000101110000111101011110011110000001110110000000001101110011001110110101100111001110010011100101010100111100100111101110001100110010011010011011000011011011001100011001110010011000010101100100110000101100100110010101111110100001010000111010001001001001111110010010000001110101000110001001111010101000100000010010001000001111100101100110110100111011100011110000011101110010010111000010111000100010111101000000100110011000001011001000111000011011100110010001101010001000111011000101110010010110010010111100001111000100010101100001000110100011101111011000100110001110101001101100010000010010010000000111010011001100001010110000011101100000
Pair \(Z_2\) Length of longest common subsequence
9CAM_1,2CEU_1 305 3
9CAM_1,4XZD_1 204 4
2CEU_1,4XZD_1 159 2

Newick tree

 
[
	9CAM_1:14.59,
	[
		4XZD_1:79.5,2CEU_1:79.5
	]:63.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{988 }{\log_{20} 988}-\frac{21}{\log_{20}21})=277.\)
Status Protein1 Protein2 d d1/2
Query variables 9CAM_1 2CEU_1 355 181
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]