CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2NYN_1 5PSZ_1 3NGV_1 Letter Amino acid
37 4 15 N Asparagine
31 13 16 D Aspartic acid
6 1 8 C Cysteine
19 13 22 E Glutamic acid
42 5 10 I Isoleucine
28 5 12 T Threonine
50 11 26 A Alanine
33 8 13 Q Glutamine
18 6 29 K Lycine
16 7 4 M Methionine
33 10 17 V Valine
25 14 16 R Arginine
48 7 13 G Glycine
42 7 16 S Serine
20 5 22 Y Tyrosine
15 10 8 H Histidine
61 18 25 L Leucine
17 6 9 F Phenylalanine
23 6 10 P Proline
3 0 5 W Tryptophan

2NYN_1|Chains A, B, C, D|Phenylalanine/histidine ammonia-lyase|Anabaena variabilis (264691)
>5PSZ_1|Chains A, B|Bromodomain-containing protein 1|Homo sapiens (9606)
>3NGV_1|Chain A|D7 protein|Anopheles stephensi (30069)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2NYN , Knot 225 567 0.83 40 268 532
MKTLSQAQSKTSSQQFSFTGNSSANVIIGNQKLTINDVARVARNGTLVSLTNNTDILQGIQASCDYINNAVESGEPIYGVTSGFGGMANVAISREQASELQTNLVWFLKTGAGNKLPLADVRAAMLLRANSHMRGASGIRLELIKRMEIFLNAGVTPYVYEFGSIGASGDLVPLSYITGSLIGLDPSFKVDFNGKEMDAPTALRQLNLSPLTLLPKEGLAMMNGTSVMTGIAANCVYDTQILTAIAMGVHALDIQALNGTNQSFHPFIHNSKPHPGQLWAADQMISLLANSQLVRDELDGKHDYRDHELIQDRYSLRCLPQYLGPIVDGISQIAKQIEIEINSVTDNPLIDVDNQASYHGGNFLGQYVGMGMDHLRYYIGLLAKHLDVQIALLASPEFSNGLPPSLLGNRERKVNMGLKGLQICGNSIMPLLTFYGNSIADRFPTHAEQFNQNINSQGYTSATLARRSVDIFQNYVAIALMFGVQAVDLRTYKKTGHYDARACLSPATERLYSAVRHVVGQKPTSDRPYIWNDNEQGLDEHIARISADIAAGGVIVQAVQDILPCLH
5PSZ , Knot 74 156 0.79 38 120 151
MHHHHHHSSGVDLGTENLYFQSMEQVAMELRLTELTRLLRSVLDQLQDKDPARIFAQPVSLKEVPDYLDHIKHPMDFATMRKRLEAQGYKNLHEFEEDFDLIIDNCMKYNARDTVFYRAAVRLRDQGGVVLRQARREVDSIGLEEASGMHLPERPA
3NGV , Knot 131 296 0.84 40 187 284
QPWKALDAEQALYVYKRCYEDHLPSGSDRKTYMTLWNAWRLEPNDAITHCYAKCVLTGLQIYDPQENAFKSDRIPVQYQAYKTITQSKQKEVTEYQKALAAANAKSGSCVDLYNAYLPVHNRFVNLSRQLYHGTVEGAAKIYAAMPEIKQKGESFHAYCEKRAWKGNKQSEWKNGRRYKLTGSPELKDAIDCIFRGLRYMDDTGLKVDEIVRDFNLINKSELEPEVRSVLASCKGSEAYDYYVCLVNSRLKQHFKNAFDFHELRSADYAYLLRGKVYENPEKVKEEMKKLNTTVHF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2NYN_1)}(2) \setminus P_{f(5PSZ_1)}(2)|=178\), \(|P_{f(5PSZ_1)}(2) \setminus P_{f(2NYN_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100100000000101010001011110001010011011001011010000011011010000100110010110110011111101110000100100011111001110011110101111101000101101101011001011101110101001101110101111001010111101010101010010110110010101101110011111010011011110010000110111111011010110100001011100001011011110011011100011000101000000001100000100110011111011001100101010010001110100010001101110011111001000111110010101111101010011110111000001011101101010011111010100110011001001000100010001011000101100011111111101101000000100010101011000100110011100100001011000001100011010101111111101100111010
Pair \(Z_2\) Length of longest common subsequence
2NYN_1,5PSZ_1 208 3
2NYN_1,3NGV_1 177 4
5PSZ_1,3NGV_1 175 4

Newick tree

 
[
	2NYN_1:99.39,
	[
		3NGV_1:87.5,5PSZ_1:87.5
	]:11.89
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{723 }{\log_{20} 723}-\frac{156}{\log_{20}156})=160.\)
Status Protein1 Protein2 d d1/2
Query variables 2NYN_1 5PSZ_1 202 127
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]