CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6DVN_1 3HJP_1 9EJN_1 Letter Amino acid
18 8 32 P Proline
35 11 69 S Serine
16 7 70 T Threonine
32 11 70 A Alanine
21 13 52 E Glutamic acid
11 3 38 M Methionine
14 10 45 F Phenylalanine
4 1 8 W Tryptophan
11 5 18 Y Tyrosine
21 6 18 H Histidine
21 11 81 I Isoleucine
8 19 61 K Lycine
27 7 56 G Glycine
37 8 94 L Leucine
22 4 26 R Arginine
17 10 43 D Aspartic acid
8 1 23 Q Glutamine
17 8 40 N Asparagine
7 0 6 C Cysteine
17 21 71 V Valine

6DVN_1|Chains A, B, C, D|Hdac6 protein|Danio rerio (7955)
>3HJP_1|Chains A, B, C, D|Peroxiredoxin, bacterioferritin comigratory protein homolog (Bcp-4)|Sulfolobus solfataricus (2287)
>9EJN_1|Chains A, B, C, D|Magnesium-transporting ATPase, P-type 1|Lactococcus lactis subsp. lactis CV56 (929102)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6DVN , Knot 156 364 0.84 40 210 351
SNAGGSSPITGLVYDQRMMLHHNMWDSHHPELPQRISRIFSRHEELRLLSRCHRIPARLATEEELALCHSSKHISIIKSSEHMKPRDLNRLGDEYNSIFISNESYTCALLAAGSCFNSAQAILTGQVRNAVAIVRPPGHHAEKDTACGFCFFNTAALTARYAQSITRESLRVLIVDWDVHHGNGTQHIFEEDDSVLYISLHRYEDGAFFPNSEDANYDKVGLGKGRGYNVNIPWNGGKMGDPEYMAAFHHLVMPIAREFAPELVLVSAGFDAARGDPLGGFQVTPEGYAHLTHQLMSLAAGRVLIILEGGYNLTSISESMSMCTSMLLGDSPPSLDHLTPLKTSATVSINNVLRAHAPFWSSLR
3HJP , Knot 78 164 0.80 38 118 156
MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVSTKEMSTFRDSMAKFNEVNAVVIGISVDPPFSNKAFKEQNKINFTIVSDFNREAVKAYGVAGELPILKGYVLAKRSVFVIDKNGIVRYKWVSEDPTKEPNYDEIKDVVTKLSLEHHHHHH
9EJN , Knot 332 921 0.82 40 293 810
MHHHHHHHHLEMKKIRKTLENTKRATTFVDNNEINARLEFAKTSTKEELFQKFKTSNKGLSEEQVEISREQYGDNTITRGKKSSLIKRLYQAFINPFTIILFVLALVSAFTDIILAAPGEKNPQGLIIITTMVLISGILRFVQETRSGNAAENLLKMITTTTNVHRLESGSQEIPIEEVLVGDIIHLSAGDMVPADLRIIQAKDLFISQASLTGESEPVEKLDLATAAAAASITESVNLAFMGSNVISGSAYGVVIATGDATIFGEMAKSVTEDSTKTTFEKGVNSVSWVLIRFMLVMVPFVLLINGFTKGDWMEAALFALAVAVGLTPEMLPMIVTTCLAKGAVTMSKEKTIIKNLNSIQNLGSMNILCTDKTGTLTQDKVVLMRHLDIHGQENIRVLRHGFLNSYYQTGLKNLMDLAIIEGAEAKQDKNPELGGLSSKYTKVDEIPFDFERRRMSVVVKSNTNGATSKTQMITKGAAEEMLDICTLVEDKGNVVHLTPELRAYILKKVDELNEEGMRVILVAQKTNPSPIDTFSVQDESEMVLMGYLAFLDPPKESTAKAIKALNKYGVSVKILTGDNDKVTRSVCKQVGLPVDKTILGSDIDQLDDNELAAVAAAASVFAKLSPQQKARIVTTLRNSGNSVGYMGDGINDAAAMKSSDVGISVDSAVDIAKESADVILLEKDLMVLEKGIIEGRKTYANMIKYIKMTASSNFGNMFSVLIASAFLPFIPMLSIHILLLNLIYDFSCTAIPWDNVDEEYLVVPRKWDASSVSKFMLWIGPTSSVFDITTYLLMFFVICPATFGPFSSLVPGSVAYIGFIALFHTGWFVESMWTQTLVIHMIRTPKIPFLQSRASAPLTILTFMGIIGLTIIPFTSFGHSIGLMALPINFFPWLILTVVMYMMLVTIFKKIFVSKYGELL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6DVN_1)}(2) \setminus P_{f(3HJP_1)}(2)|=143\), \(|P_{f(3HJP_1)}(2) \setminus P_{f(6DVN_1)}(2)|=51\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0011100110111000011100011000010110010011000001011000001110110000111000000101100000101001001100000111000000011111100100101110101001111101110010000101101100111010010010000101111010100101000110000011010100000111110000100001111010100101110110110100111100111111001110111101110110101111101010101010001101111011111011001001000101000111100110100101100010101001101011110010
Pair \(Z_2\) Length of longest common subsequence
6DVN_1,3HJP_1 194 3
6DVN_1,9EJN_1 159 4
3HJP_1,9EJN_1 201 6

Newick tree

 
[
	3HJP_1:10.40,
	[
		6DVN_1:79.5,9EJN_1:79.5
	]:24.90
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{528 }{\log_{20} 528}-\frac{164}{\log_{20}164})=106.\)
Status Protein1 Protein2 d d1/2
Query variables 6DVN_1 3HJP_1 136 98
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]