CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5DTE_1 2JXQ_1 4WRV_1 Letter Amino acid
25 0 17 V Valine
18 0 9 D Aspartic acid
22 0 10 E Glutamic acid
23 5 19 G Glycine
8 0 9 F Phenylalanine
1 0 6 W Tryptophan
38 3 33 A Alanine
7 0 10 H Histidine
22 0 8 N Asparagine
0 2 2 C Cysteine
12 0 16 S Serine
11 0 8 T Threonine
10 0 4 Y Tyrosine
7 0 6 M Methionine
11 0 23 P Proline
3 0 20 R Arginine
17 0 4 Q Glutamine
28 0 7 I Isoleucine
22 0 25 L Leucine
26 0 2 K Lycine

5DTE_1|Chains A, B, C, D|Monosaccharide-transporting ATPase|Actinobacillus succinogenes (strain ATCC 55618 / 130Z) (339671)
>2JXQ_1|Chain A|RNA (5'-R(*GP*CP*AP*GP*AP*GP*AP*GP*CP*G)-3')|
>4WRV_1|Chain A|Uracil-DNA glycosylase|Mycobacterium tuberculosis (83332)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5DTE , Knot 129 311 0.79 38 167 290
MHHHHHHSSGVDLGTENLYFQSMADKPQIALLMKTLSNEYFISMRQGAEETAKQKDIDLIVQVAEKEDSTEQLVGLVENMIAKKVDAIIVTPNDSIAFIPAFQKAEKAGIPIIDLDVRLDAKAAEAAGLKFNYVGVDNFNGGYLEAKNLAEAIGKKGNVAILEGIPGVDNGEQRKGGALKAFAEYPDIKIVASQSANWETEQALNVTTNILTANPNINGIFAANDNMAIGAVTAVENAGLAGKVLVSGYDGIPLAIEYVKQGKMQNTIDQLPKKQVAIAIEHALKQINKQEIPSVYYVDPVVVDKEQSKNY
2JXQ , Knot 6 10 0.46 6 5 6
GCAGAGAGCG
4WRV , Knot 104 238 0.79 40 151 225
MHHHHHHGMASMTARPLSELVERGWAAALEPVADQVAHMGQFLRAEIAAGRRYLPAGSNVLRAFTFPFDNVRVLIVGQDPYPTPGHAVGLSFSVAPDVRPWPRSLANIFDEYTADLGYPLPSNGDLTPWAQRGVLLLNRVLTVRPSNPASHRGKGWEAVTECAIRALAARAAPLVAILWGRDASTLKPMLAAGNCVAIESPHPSPLSASRGFFGSRPFSRANELLVGMGAEPIDWRLP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5DTE_1)}(2) \setminus P_{f(2JXQ_1)}(2)|=165\), \(|P_{f(2JXQ_1)}(2) \setminus P_{f(5DTE_1)}(2)|=3\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000000011011000101001100101111100100001101001100010000101110110000000011111001110010111101000111111100100111111010101010110111101001110010110101001101110010111101111100100001111011100101011100010100001101000110101010111110001111110110011111011101001111110010010100010011000111110011001000011010010111100000000
Pair \(Z_2\) Length of longest common subsequence
5DTE_1,2JXQ_1 168 2
5DTE_1,4WRV_1 156 7
2JXQ_1,4WRV_1 150 2

Newick tree

 
[
	5DTE_1:82.97,
	[
		4WRV_1:75,2JXQ_1:75
	]:7.97
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{321 }{\log_{20} 321}-\frac{10}{\log_{20}10})=104.\)
Status Protein1 Protein2 d d1/2
Query variables 5DTE_1 2JXQ_1 128 66
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]