CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6ODG_1 6EVK_1 2PNV_1 Letter Amino acid
2 29 1 V Valine
1 24 0 Q Glutamine
1 50 5 I Isoleucine
0 25 1 P Proline
0 13 0 W Tryptophan
0 36 2 R Arginine
0 44 2 N Asparagine
1 24 1 Y Tyrosine
1 53 4 S Serine
0 35 3 T Threonine
0 19 2 H Histidine
0 57 2 K Lycine
0 25 3 M Methionine
0 41 1 F Phenylalanine
0 40 2 G Glycine
0 59 5 L Leucine
0 41 1 A Alanine
0 34 3 D Aspartic acid
0 12 0 C Cysteine
0 77 5 E Glutamic acid

6ODG_1|Chains A, B|Microtubule-associated protein tau|Homo sapiens (9606)
>6EVK_1|Chain A|Polymerase acidic protein|Influenza A virus (A/little yellow-shouldered bat/Guatemala/060/2010(H17N10)) (1129347)
>2PNV_1|Chains A, B|Small conductance calcium-activated potassium channel protein 2|Rattus norvegicus (10116)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6ODG , Knot 5 6 0.49 10 5 4
SVQIVY
6EVK , Knot 288 738 0.86 40 300 674
GSHHHHHHHHGSGSMENFVRTNFNPMILERAEKTMKEYGENPQNEGNKFAAISTHMEVCFMYSDFHFIDLEGNTIVKENDDDNAMLKHRFEIIEGQERNIAWTIVNSICNMTENSKPRFLPDLYDYKTNKFIEIGVTRRKVEDYYYEKASKLKGENVYIHIFSFDGEEMATDDEYILDEESRARIKTRLFVLRQELATAGLWDSFRQSEKGEETLEEEFSYPPTFQRLANQSLPPSFKDYHQFKAYVSSFKANGNIEAKLGAMSEKVNAQIESFDPRTIRELELPEGKFCTQRSKFLLMDAMKLSVLNPAHEGEGIPMKDAKACLDTFWGWKKATIIKKHEKGVNTNYLMIWEQLLESIKEMEGKFLNLKKTNHLKWGLGEGQAPEKMDFEDCKEVPDLFQYKSEPPEKRKLASWIQSEFNKASELTNSNWIEFDELGNDVAPIEHIASRRRNFFTAEVSQCRASEYIMKAVYINTALLNSSCTAMEEYQVIPIITKCRDTSGQRRTNLYGFIIKGRSHLRNDTDVVNFISLEFSLTDPRNEIHKWEKYCVLEIGDMEIRTSISTIMKPVYLYVRTNGTSKIKMKWGMEMRRCLLQSLQQVESMIEAESAVKEKDMTEPFFRNRENDWPIGESPQGIEKGTIGKVCRVLLAKSVFNSIYASAQLEGFSAESRKLLLLIQAFRDNLDPGTFDLKGLYEAIEECIINDPWVLLNASWFNSFLKAVQLSMGSGSGENLYFQ
2PNV , Knot 26 43 0.75 34 38 40
GSHMNIMYDMISDLNERSEDFEKRIVTLETKLETLIGSIHALP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6ODG_1)}(2) \setminus P_{f(6EVK_1)}(2)|=0\), \(|P_{f(6EVK_1)}(2) \setminus P_{f(6ODG_1)}(2)|=295\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010110
Pair \(Z_2\) Length of longest common subsequence
6ODG_1,6EVK_1 295 2
6ODG_1,2PNV_1 41 2
6EVK_1,2PNV_1 266 3

Newick tree

 
[
	6EVK_1:16.73,
	[
		6ODG_1:20.5,2PNV_1:20.5
	]:14.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{744 }{\log_{20} 744}-\frac{6}{\log_{20}6})=222.\)
Status Protein1 Protein2 d d1/2
Query variables 6ODG_1 6EVK_1 283 142.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: