CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2BYY_1 6ZXC_1 5GUE_1 Letter Amino acid
12 8 12 P Proline
48 19 23 A Alanine
20 21 27 D Aspartic acid
15 13 15 H Histidine
13 11 21 F Phenylalanine
31 15 26 V Valine
18 19 23 R Arginine
13 9 8 N Asparagine
26 26 21 E Glutamic acid
16 20 14 K Lycine
18 8 10 M Methionine
34 29 19 S Serine
2 3 5 W Tryptophan
10 7 12 Y Tyrosine
6 1 9 C Cysteine
24 36 15 I Isoleucine
27 28 21 L Leucine
11 11 10 Q Glutamine
51 21 16 G Glycine
23 13 24 T Threonine

2BYY_1|Chains A, B, C, D|3-OXOACYL-[ACYL-CARRIER-PROTEIN] SYNTHASE I|ESCHERICHIA COLI (562)
>6ZXC_1|Chains A, B, C, D|Putative GGDEF/response regulator receiver domain protein|Leptospira biflexa serovar Patoc (strain Patoc 1 / ATCC 23582 / Paris) (456481)
>5GUE_1|Chains A, B, C, D|Cyclooctat-9-en-7-ol synthase|Streptomyces melanosporofaciens (67327)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2BYY , Knot 168 418 0.80 40 215 385
MRGSHHHHHHGSMKRAVITGLGIVSSIGNNQQEVLASLREGRSGITFSQELKDSGMRSHVWGNVKLDTTGLIDRKVVRFMSDASIYAFLSMEQAIADAGLSPEAYQNNPRVGLIAGSGGGSPRFQVFGADAMRGPRGLKAVGPYVVTKAMASGVSACLATPFKIHGVNYSISSACATSAHCIGNAVEQIQLGKQDIVFAGGGEELCWEMACEFDAMGALSTKYNDTPEKASRTYDAHRDGFVIAGGGGMVVVEELEHALARGAHIYAEIVGYGATSDGADMVAPSGEGAVRCMKMAMHGVDTPIDYLNSEGTSTPVGDVKELAAIREVFGDKSPAISATKAMTGHSLGAAGVQEAIYSLLMLEHGFIAPSINIEELDEQAAGLNIVTETTDRELTTVMSNSFGFGGTNATLVMRKLKD
6ZXC , Knot 135 318 0.81 40 194 299
MGSSHHHHHHSSGLVPRGSHMPKGQRKILIIEDSELQRKLLSRWVSKNGYIAIEAESISVAREKIISESIDVVLLDWELPDGNGIDLISDILSTSPVGWLPIIMVTGHTEPEYFKIAIEAGATDYITKPAKEIELLARIFSALRIKALHDQLRETAIRDVMTGLYNRRYMEERIEQEFQRCKRHDSLLSMAMIDIDKFKNINDTYGHEIGDQVIKQLAHELKTSFAKSAIISRFGGEEFVILFPETGVVDATRILDRVRENVSKLEMKSDTDQIFHFTFSGGVAGGDLSDIQSNQELLKIADKNLYEAKSSGRNQIIS
5GUE , Knot 143 331 0.83 40 210 310
MKHHHHHHHHGGLVPRGSHGGSEFMTTGLSTAGAQDIGRSSVRPYLEECTRRFQEMFDRHVVTRPTKVELTDAELREVIDDCNAAVAPLGKTVSDERWISYVGVVLWSQSPRHIKDMEAFKAVCVLNCVTFVWDDMDPALHDFGLFLPQLRKICEKYYGPEDAEVAYEAARAFVTSDHMFRDSPIKAALCTTSPEQYFRFRVTDIGVDFWMKMSYPIYRHPEFTEHAKTSLAARMTTRGLTIVNDFYSYDREVSLGQITNCFRLCDVSDETAFKEFFQARLDDMIEDIECIKAFDQLTQDVFLDLIYGNFVWTTSNKRYKTAVNDVNSRIQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2BYY_1)}(2) \setminus P_{f(6ZXC_1)}(2)|=83\), \(|P_{f(6ZXC_1)}(2) \setminus P_{f(2BYY_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010000000101001110111110011000001110100100110100010001100011101010001110001101100101011101001110111010100001011111101110101011110110110110111101100111011010110110101100010010100100110110010110001111111001010110010111110000000100100000100011111111111110010011101101010111011000110111101011100101110110011001000100011101001111001110001110100110100111111001100111100111110101001000111101100000001001100011111001011100100
Pair \(Z_2\) Length of longest common subsequence
2BYY_1,6ZXC_1 145 7
2BYY_1,5GUE_1 149 7
6ZXC_1,5GUE_1 174 8

Newick tree

 
[
	5GUE_1:83.63,
	[
		2BYY_1:72.5,6ZXC_1:72.5
	]:11.13
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{736 }{\log_{20} 736}-\frac{318}{\log_{20}318})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 2BYY_1 6ZXC_1 142 125.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]