CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5LRY_1 1IVR_1 5AMP_1 Letter Amino acid
14 15 19 Q Glutamine
15 9 5 H Histidine
22 24 14 I Isoleucine
16 32 24 L Leucine
8 13 7 M Methionine
6 7 9 W Tryptophan
33 36 30 A Alanine
9 17 35 N Asparagine
11 16 14 F Phenylalanine
21 17 17 P Proline
19 29 44 S Serine
20 17 50 T Threonine
22 20 33 D Aspartic acid
13 28 11 K Lycine
14 23 10 E Glutamic acid
19 24 8 R Arginine
13 5 16 C Cysteine
22 24 26 V Valine
28 30 47 G Glycine
10 15 19 Y Tyrosine

5LRY_1|Chains A[auth S], C[auth T]|Hydrogenase-1 small chain|Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) (199310)
>1IVR_1|Chain A|ASPARTATE AMINOTRANSFERASE|Gallus gallus (9031)
>5AMP_1|Chain A|CELLOBIOHYDROLASE I|GALACTOMYCES CANDIDUM (1173061)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5LRY , Knot 143 335 0.82 40 204 319
LENKPRIPVVWIHGLECTCCTESFIRSAHPLAKDVILSLISLDYDDTLMAAAGTQAEEVFEDIITQYNGKYILAVEGNPPLGEQGMFCISSGRPFIEKLKRAAAGASAIIAWGTCASWGCVQAARPNPTQATPIDKVITDKPIIKVPGCPPIPDVMSAIITYMVTFDRLPDVDRMGRPLMFYGQRIHDKCYRRAHFDAGEFVQSWDDDAARKGYCLYKMGCKGPTTYNACSSTRWNDGVSFPIQSGHGCLGCAENGFWDRGSFYSRVVDIPQMGTHSTADTVGLTALGVVAAAVGVHAVASAVDQRRRHNQQPTETEHQPGNEDKQARSHHHHHH
1IVR , Knot 171 401 0.85 40 235 385
SSWWSHVEMGPPDPILGVTEAFKRDTNSKKMNLGVGAYRDDNGKPYVLNCVRKAEAMIAAKKMDKEYLPIAGLADFTRASAELALGENSEAFKSGRYVTVQGISGTGSLRVGANFLQRFFKFSRDVYLPKPSWGNHTPIFRDAGLQLQAYRYYDPKTCSLDFTGAMEDISKIPEKSIILLHACAHNPTGVDPRQEQWKELASVVKKRNLLAYFDMAYQGFASGDINRDAWALRHFIEQGIDVVLSQSYAKNMGLYGERAGAFTVICRDAEEAKRVESQLKILIRPMYSNPPMNGARIASLILNTPELRKEWLVEVKGMADRIISMRTQLVSNLKKEGSSHNWQHITDQIGMFCFTGLKPEQVERLTKEFSIYMTKDGRISVAGVASSNVGYLAHAIHQVTK
5AMP , Knot 181 438 0.83 40 227 414
QQIGTLTTETHPPLTWQTCTSGGSCTTNNGKVVLDANWRWLHSTSGSTNCYTGNTWNTTLCPDDTTCAQNCALDGADYEGTYGITASGNSLRLNFVTNGSQKNVGSRTYLMKDDTHYQTFNLLNQEFTFDVDVSGLPCGLNGALYMVPMAADGGVSNEPNNKAGAQYGVGYCDSQCPRDLKFIAGSANVQGWEPASNSANSGLGGNGSCCAELDIWEANSISAALTPHSADTVTQTVCNGDDCGGTYSNDRYSGTTDPDGCDFNSYRQGDTSFYGPGKTVDTNSKFTVVTQFLTDSSGNLNEIKRFYVQNGVVIPNSQSTIAGISGNSITQDYCTAQKQVFGDTNTWEDHGGFQSMTNAFKAGMVLVMSLWDDYYADMLWLDSVAYPTDADPSTPGVARGTCSTTSGVPSDIESSAASAYVIYSNIKVGPINSTFSGT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5LRY_1)}(2) \setminus P_{f(1IVR_1)}(2)|=77\), \(|P_{f(1IVR_1)}(2) \setminus P_{f(5LRY_1)}(2)|=108\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10001011111101100000000110010111001110110100000111111001001100110000100111101011110011101001011100100111110111111001011010110101001011001100011101110111101101110011010011010011011110100100000001010110110010001100100100110011000010000010011011100101011010011100101000110110110000100111011111111111011101100000000010000001100000100000000
Pair \(Z_2\) Length of longest common subsequence
5LRY_1,1IVR_1 185 4
5LRY_1,5AMP_1 185 4
1IVR_1,5AMP_1 182 5

Newick tree

 
[
	5LRY_1:92.99,
	[
		1IVR_1:91,5AMP_1:91
	]:1.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{736 }{\log_{20} 736}-\frac{335}{\log_{20}335})=109.\)
Status Protein1 Protein2 d d1/2
Query variables 5LRY_1 1IVR_1 143 129
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]