CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2DWQ_1 2WWW_1 2XCD_1 Letter Amino acid
7 17 5 Q Glutamine
29 30 12 G Glycine
10 19 11 I Isoleucine
15 26 13 K Lycine
9 4 4 N Asparagine
7 6 2 H Histidine
39 23 10 V Valine
12 10 6 F Phenylalanine
19 13 5 P Proline
14 19 9 T Threonine
50 25 7 A Alanine
33 23 7 R Arginine
17 18 11 D Aspartic acid
41 24 13 E Glutamic acid
44 43 9 L Leucine
1 3 1 C Cysteine
3 11 6 M Methionine
6 25 6 S Serine
4 4 2 W Tryptophan
8 6 5 Y Tyrosine

2DWQ_1|Chains A, B|GTP-binding protein|Thermus thermophilus (300852)
>2WWW_1|Chains A, B, C, D|METHYLMALONIC ACIDURIA TYPE A PROTEIN, MITOCHONDRIAL|HOMO SAPIENS (9606)
>2XCD_1|Chains A, B, C, D, E, F|PROBABLE DEOXYURIDINE 5'-TRIPHOSPHATE NUCLEOTIDOHYDROLASE YNCF|BACILLUS SUBTILIS (1423)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2DWQ , Knot 150 368 0.80 40 169 326
MLAVGIVGLPNVGKSTLFNALTRANALAANYPFATIDKNVGVVPLEDERLYALQRTFAKGERVPPVVPTHVEFVDIAGLVKGAHKGEGLGNQFLAHIREVAAIAHVLRCFPDPDVVHVMGRVDPLEDAEVVETELLLADLATLERRLERLRKEARADRERLPLLEAAEGLYVHLQEGKPARTFPPSEAVARFLKETPLLTAKPVIYVANVAEEDLPDGRGNPQVEAVRRKALEEGAEVVVVSARLEAELAELSGEEARELLAAYGLQESGLQRLARAGYRALDLLTFFTAGEKEVRAWTVRRGTKAPRAAGEIHSDMERGFIRAEVIPWDKLVEAGGWARAKERGWVRLEGKDYEVQDGDVIYVLFNA
2WWW , Knot 146 349 0.81 40 193 333
SMKDHTEGLSDKEQRFVDKLYTGLIQGQRACLAEAITLVESTHSRKKELAQVLLQKVLLYHREQEQSNKGKPLAFRVGLSGPPGAGKSTFIEYFGKMLTERGHKLSVLAVDPSSCTSGGSLLGDKTRMTELSRDMNAYIRPSPTRGTLGGVTRTTNEAILLCEGAGYDIILIETVGVGQSEFAVADMVDMFVLLLPPAGGDELQGIKRGIIEMADLVAVTKSDGDLIVPARRIQAEYVSALKLLRKRSQVWKPKVIRISARSGEGISEMWDKMKDFQDLMLASGELTAKRRKQQKVWMWNLIQESVLEHFRTHPTVREQIPLLEQKVLIGALSPGLAADFLLKAFKSRD
2XCD , Knot 68 144 0.78 40 112 139
MTMQIKIKYLDETQTRISKIEQGDWIDLRAAEDVTIKKDEFKLVPLGVAMELPEGYEAHVVPRSSTYKNFGVIQTNSMGVIDESYKGDNDFWFFPAYALRDTEIKKGDRICQFRIMKKMPAVELVEVEHLGNEDRGGLGSTGTK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2DWQ_1)}(2) \setminus P_{f(2WWW_1)}(2)|=64\), \(|P_{f(2WWW_1)}(2) \setminus P_{f(2DWQ_1)}(2)|=88\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11111111110110001101100101111001110100011111100001011000110100111111001011011111011001011100111010011111011001101011011101011001011000111101101000100100010100001111011011010100101100111001110110001110101110110110001101010101011000110011011110101010110101001001111011000110011011001101101101100010110100100110111010001001110101111001101111101000111010100001001011011101
Pair \(Z_2\) Length of longest common subsequence
2DWQ_1,2WWW_1 152 4
2DWQ_1,2XCD_1 161 3
2WWW_1,2XCD_1 177 3

Newick tree

 
[
	2XCD_1:87.27,
	[
		2DWQ_1:76,2WWW_1:76
	]:11.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{717 }{\log_{20} 717}-\frac{349}{\log_{20}349})=100.\)
Status Protein1 Protein2 d d1/2
Query variables 2DWQ_1 2WWW_1 121 120
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]