CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3QXW_1 7TUI_1 7PZY_1 Letter Amino acid
0 28 3 H Histidine
3 85 15 P Proline
10 118 20 V Valine
5 153 16 E Glutamic acid
14 137 12 G Glycine
8 158 23 L Leucine
7 69 6 Y Tyrosine
10 63 9 R Arginine
2 13 1 C Cysteine
2 113 14 I Isoleucine
4 146 39 K Lycine
2 39 3 M Methionine
3 64 6 F Phenylalanine
4 19 2 W Tryptophan
12 169 34 A Alanine
2 75 14 N Asparagine
16 140 20 S Serine
7 119 12 T Threonine
8 98 6 D Aspartic acid
7 79 7 Q Glutamine

3QXW_1|Chains A, B, C, D, E|Anti-Methotrexate CDR1-4 Graft VHH|Lama Glama (9844)
>7TUI_1|Chain A|Fatty acid synthase subunit alpha|Candida albicans (5476)
>7PZY_11|Chain K[auth p]|60S ribosomal protein L8|Candida albicans SC5314 (237561)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3QXW , Knot 60 126 0.76 38 95 123
GSQVQLVESGGGLVQAGGSLRLSCAASRRSSRSWAMAWFRQAPGKEREFVAKISGDGRLTTYGDSVKGRFTISRDNAEYLVYLQMDSLKPEDTAVYYCAADDNYVTASWRSGPDYWGQGTQVTVSS
7TUI , Knot 643 1885 0.85 40 352 1494
MKPEIEQELSHTLLTELLAYQFASPVRWIETQDVFLKQHNTERIIEIGPSPTLAGMANRTIKAKYESYDAALSLQRQVLCYSKDAKEIYYKPDPADLAPKETPKQEESTPSAPAAATPTPAAAAAPTPAPAPASAGPVESIPDEPVKANLLIHVLVAQKLKKPLDAVPMTKAIKDLVNGKSTVQNEILGDLGKEFGSTPEKPEDTPLEELAEQFQDSFSGQLGKTSTSLIGRLMSSKMPGGFSITTARKYLESRFGLGAGRQDSVLLMALTNEPANRLGSEADAKTFFDGIAQKYASSAGISLSSGAGSGAGAANSGGAVVDSAALDALTAENKKLAKQQLEVLARYLQVDLNKGSAKSFIKEKEASAVLQKELDLWEAEHGEFYAKGIQPTFSALKSRTYDSYWNWARQDVLSMYFDIIFGKLTSVDRETINQCIQIMNRANPTLIKFMQYHIDHCPEYKGETYKLAKRLGQQLIDNCKQVLTEDPVYKDVSRITGPKTKVSAKGNIEYEETQKDSVRKFEQYVYEMAQGGAMTKVSQPTIQEDLARVYKAISKQASKDSKLELQRVYEDLLKVVESSKEIETEQLTKDILQAATVPTTPTEEVDDPCTPSSDDEIASLPDKTSIIQPVSSTIPSQTIPFLHIQKKTKDGWEYNKKLSSLYLDGLESAAINGLTFKDKYVLVTGAGAGSIGAEILQGLISGGAKVIVTTSRFSKKVTEYYQNMYARYGAAGSTLIVVPFNQGSKQDVDALVQYIYDEPKKGGLGWDLDAIIPFAAIPENGNGLDNIDSKSEFAHRIMLTNLLRLLGAVKSKKTTDTRPAQCILPLSPNHGTFGFDGLYSESKISLETLFNRWYSEDWGSKLTVCGAVIGWTRGTGLMSANNIIAEGIEKLGVRTFSQKEMAFNILGLLTPEIVQLCQEEPVMADLNGGLQFIDNLKDFTSKLRTDLLETADIRRAVSIESAIEQKVVNGDNVDANYSKVMVEPRANMKFDFPTLKSYDEIKQIAPELEGMLDLENVVVVTGFAEVGPWGNSRTRWEMEAYGEFSLEGAIEMAWIMGFIKYHNGNLKGKPYSGWVDAKTQTPIDEKDIKSKYEEEILEHSGIRLIEPELFNGYDPKKKQMIQEVVVQHDLEPFECSKETAEQYKHEHGEKCEIFEIEESGEYTVRILKGATLYVPKALRFDRLVAGQIPTGWDARTYGIPEDTISQVDPITLYVLVATVEALLSAGITDPYEFYKYVHVSEVGNCSGSGMGGVSALRGMFKDRYADKPVQNDILQESFINTMSAWVNMLLLSSSGPIKTPVGACATAVESVDIGIETILSGKAKVVLVGGYDDFQEEGSYEFANMNATSNSIEEFKHGRTPKEMSRPTTTTRNGFMEAQGSGIQVIMTADLALKMGVPIHAVLAMTATATDKIGRSVPAPGKGILTTAREHHGNLKYPSPLLNIEYRKRQLNKRLEQIKSWEETELSYLQEEAELAKEEFGDEFSMHEFLKERTEEVYRESKRQVSDAKKQWGNSFYKSDPRIAPLRGALAAFNLTIDDIGVASFHGTSTVANDKNESATINNMMKHLGRSEGNPVFGVFQKYLTGHPKGAAGAWMLNGAIQILESGLVPGNRNADNVDKLLEQYEYVLYPSRSIQTDGIKAVSVTSFGFGQKGAQAVVVHPDYLFAVLDRSTYEEYATKVSARNKKTYRYMHNAITRNTMFVAKDKAPYSDELEQPVYLDPLARVEENKKKLVFSDKTIQSSQSYVGEVAQKTAKALSTLNKSSKGVGVDVELLSAINIDNETFIERNFTGNEVEYCLNTAHPQASFTGTWSAKEAVFKALGVESKGAGASLIDIEITRDVNGAPKVILHGEAKKAAAKAGVKNVNISISHDDFQATAVALSEF
7PZY , Knot 114 262 0.80 40 159 252
MAPKGKKVAPAPLATKSAKSSESKNPLFESTPKNFGIGQSIQPKRNLSRFVKWPEYVRLQRQKKILSLRLKVPPSIAQFSQTLDKNTAAQAFKLLNKYRPETSAEKKERLTKEAAAIAEGKTAKDVSPKPVVVKYGLNHVVSLIENKKAKLVLIANDVDPIELVVFLPALCKKMGVPYAIVKGKARLGTLVHKKTSAVAALTEVNSADEAELSKLISTINANYIEKYEENRKHWGGGIMGSKANDKIAKKAKAAAAAVSTSN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3QXW_1)}(2) \setminus P_{f(7TUI_1)}(2)|=5\), \(|P_{f(7TUI_1)}(2) \setminus P_{f(3QXW_1)}(2)|=262\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100101100111110111010100110000000111111001110000111010101010001001010101000010011010100101000110001100001010100110011010010100
Pair \(Z_2\) Length of longest common subsequence
3QXW_1,7TUI_1 267 5
3QXW_1,7PZY_1 160 4
7TUI_1,7PZY_1 203 5

Newick tree

 
[
	7TUI_1:12.90,
	[
		3QXW_1:80,7PZY_1:80
	]:48.90
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2011 }{\log_{20} 2011}-\frac{126}{\log_{20}126})=485.\)
Status Protein1 Protein2 d d1/2
Query variables 3QXW_1 7TUI_1 618 327.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]