CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7RJC_1 2RHD_1 2PNG_1 Letter Amino acid
0 2 0 C Cysteine
27 10 7 E Glutamic acid
28 15 5 I Isoleucine
48 12 7 S Serine
5 2 0 W Tryptophan
50 9 8 A Alanine
26 8 2 N Asparagine
23 15 8 D Aspartic acid
29 12 2 T Threonine
28 13 6 V Valine
24 12 7 G Glycine
12 3 2 H Histidine
5 3 2 M Methionine
9 6 4 Q Glutamine
39 13 15 L Leucine
8 1 2 P Proline
16 9 0 Y Tyrosine
16 10 7 R Arginine
32 13 5 K Lycine
14 7 0 F Phenylalanine

7RJC_1|Chain A|Ubiquinol--cytochrome-c reductase subunit|Candida albicans (strain SC5314 / ATCC MYA-2876) (237561)
>2RHD_1|Chain A|Small GTP binding protein rab1a|Cryptosporidium parvum (5807)
>2PNG_1|Chain A|Fatty acid synthase (EC 2.3.1.85)|Rattus norvegicus (10116)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7RJC , Knot 177 439 0.81 38 215 410
MIRGSSALKSLTSRRLYSTGVKYTTLSNGVTVATETNPAAKTSSVGLFFGAGSRSEHSHSNGISALTTNVLASQSAKGSLLTAKNDREFNGIIAQTTNDNITEAGKLIASIASNAVDIVEKTDLTKHKQYLSAQASAVEADPKSKVLSHLYSSAFQGYSLALPTLGTTESVENLENQDSLRHLAKHLVNNNTVIAASGNFDHDKLADAIEANLKIAEGVKPEIKPASFLGSEVRMRDDTLPKAYISIAVHGEGLNSPNYYLAKVAAAIYGDFYLHSTIAKFTSPKLASIVQEYNIVESYNHYSKSFSDTGIWGYYAEIADKFTVDDFTHFSLKEWNRLSISISEAEVARAKAQVKTALAKELANSFAVTSDIAEKVLLVGHRQSLREAFEKIDAIKVNDVKEWGKSKVWDRDIVISGTGLIEDLLDYNRNRNEMAMMRW
2RHD , Knot 81 175 0.79 40 131 168
GMNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTDSYISTIGVDFKIRTISLENKTVKLQIWDTAGQERFRTITSSYYRGAHGIIIVYDVTDRDSFDNVKQWIQEIDRYAMENVNKLLVGNKCDLVSKRVVTSDEGRELADSHGIKFIETSAKNAYNVEQAFHTMAGEIKKRVQ
2PNG , Knot 45 89 0.75 32 66 82
GDGEAQRDLVKAVAHILGIRDLAGINLDSSLADLGLDSLMGVEVRQILEREHDLVLPIREVRQLTLRKLQEMSSKAGSDTELAAPKSKN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7RJC_1)}(2) \setminus P_{f(2RHD_1)}(2)|=122\), \(|P_{f(2RHD_1)}(2) \setminus P_{f(7RJC_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101001100100001000110000100110110000111000011111111000000000110110001110001010110100000101111000000100110111011001101100001000000101010110101000110010001101001111011000010010000010011001100001111010100001101101010110110101011011100101000011010101110101100100011011111010101000110100101101100001100000000010001111001011001010010010100100101010010110101010011100110011100011001111100001001100101101001001100011000111010111001100000000111101
Pair \(Z_2\) Length of longest common subsequence
7RJC_1,2RHD_1 160 4
7RJC_1,2PNG_1 181 4
2RHD_1,2PNG_1 135 3

Newick tree

 
[
	7RJC_1:90.59,
	[
		2RHD_1:67.5,2PNG_1:67.5
	]:23.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{614 }{\log_{20} 614}-\frac{175}{\log_{20}175})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 7RJC_1 2RHD_1 156 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]