CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4EXQ_1 2QCI_1 2IMJ_1 Letter Amino acid
8 4 9 N Asparagine
25 3 11 D Aspartic acid
11 5 4 Q Glutamine
37 10 13 L Leucine
21 8 5 T Threonine
5 2 8 W Tryptophan
10 1 10 H Histidine
25 6 3 V Valine
51 5 15 A Alanine
30 4 18 R Arginine
1 0 1 C Cysteine
18 4 16 E Glutamic acid
6 7 7 K Lycine
14 2 8 F Phenylalanine
21 6 9 P Proline
37 13 7 G Glycine
17 15 6 I Isoleucine
7 2 3 M Methionine
13 1 8 S Serine
11 1 5 Y Tyrosine

4EXQ_1|Chain A|Uroporphyrinogen decarboxylase|Burkholderia thailandensis (271848)
>2QCI_1|Chains A, B|Protease|Human immunodeficiency virus 1 (11676)
>2IMJ_1|Chains A, B, C, D|Hypothetical protein DUF1348|Pseudomonas fluorescens Pf-5 (220664)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4EXQ , Knot 154 368 0.82 40 188 345
GPGSMAQTLINDTFLRALLREPTDYTPIWLMRQAGRYLPEYNATRARAGSFLGLAKHPDYATEVTLQPLERFPLDAAILFSDILTIPDAMGLGLDFAAGEGPKFAHPVRTEADVAKLAVPDIGATLGYVTDAVREIRRALTDGEGRQRVPLIGFSGSPWTLACYMVEGGGSDDFRTVKSMAYARPDLMHRILDVNAQAVAAYLNAQIEAGAQAVMIFDTWGGALADGAYQRFSLDYIRRVVAQLKREHDGARVPAIAFTKGGGLWLEDLAATGVDAVGLDWTVNLGRARERVAGRVALQGNLDPTILFAPPEAIRAEARAVLDSYGNHPGHVFNLGHGISQFTPPEHVAELVDEVHRHSRAIRSGTGS
2QCI , Knot 54 99 0.83 38 78 95
PQITLWKRPLVTIKIGGQLKEALLDTGADNTVIEEMSLPGRWKPKMIGGIGGFIKVRQYDQIIIEIAGHKAIGTVLVGPTPVNIIGRNLLTQIGATLNF
2IMJ , Knot 79 166 0.81 40 127 160
MSSNAQVRPPLPPFTRESAIEKIRLAEDGWNSRDPERVSLAYTLDTQWRNRAEFAHNREEAKAFLTRKWAKELDYRLIKELWAFTDNRIAVRYAYEWHDDSGNWFRSYGNENWEFDEQGLMARRFACINDMPIKAQERKFHWPLGRRPDDHPGLSELGLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4EXQ_1)}(2) \setminus P_{f(2QCI_1)}(2)|=138\), \(|P_{f(2QCI_1)}(2) \setminus P_{f(4EXQ_1)}(2)|=28\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101100110001101110010000111110011001100010010110111110010010010101100111011111001101101111110111101101101100010110111101110110100110010011001010001111110101101100110111000100100110101011001101010111101010101110111110011111101100010100100111010000011011111100111111001110110111101010110100011101110101010111111011010101110001001101101101100101100110110010000011001010
Pair \(Z_2\) Length of longest common subsequence
4EXQ_1,2QCI_1 166 5
4EXQ_1,2IMJ_1 163 4
2QCI_1,2IMJ_1 155 3

Newick tree

 
[
	4EXQ_1:83.77,
	[
		2IMJ_1:77.5,2QCI_1:77.5
	]:6.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{467 }{\log_{20} 467}-\frac{99}{\log_{20}99})=110.\)
Status Protein1 Protein2 d d1/2
Query variables 4EXQ_1 2QCI_1 138 87
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]