CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5DKZ_1 8TOU_1 5AOA_1 Letter Amino acid
48 18 19 R Arginine
28 27 9 H Histidine
76 35 15 K Lycine
51 28 15 F Phenylalanine
56 32 7 T Threonine
57 61 20 L Leucine
53 27 20 P Proline
39 28 8 Y Tyrosine
69 39 35 A Alanine
20 32 9 Q Glutamine
73 34 25 G Glycine
51 31 22 V Valine
28 21 4 W Tryptophan
34 40 8 N Asparagine
73 33 15 D Aspartic acid
4 8 4 C Cysteine
62 50 21 E Glutamic acid
54 24 12 I Isoleucine
19 21 6 M Methionine
56 36 12 S Serine

5DKZ_1|Chain A|Alpha glucosidase-like protein|Chaetomium thermophilum (strain DSM 1495 / CBS 144.50 / IMI 039719) (759272)
>8TOU_1|Chain A|Angiotensin-converting enzyme 2|Homo sapiens (9606)
>5AOA_1|Chain A|ESTERASE|THERMOGUTTA TERRIFONTIS (1331910)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5DKZ , Knot 360 951 0.86 40 317 874
GSEFVKEHDWKKCDQSGFCRRNRAYADHALSAISWESPYKIAPETGSFKDGQYQAIILKTINDHGETVRLPLTVSFLESGTARVTIDEEKRQKGEIELRHDSKARKERYNEAEQWVIVGGMTLDKGAKVDYEDKTQMTVKYGPSSKFEATIKFAPFSIDFKRDGASHIKFNDQGLLNIEHWRPKIDPPPEPEKKEGEQQPDKKEEAPREDDSTWWEESFGGNTDSKPRGPESVGLDISFVGYEHVFGIPSHASPLSLKQTRGGEGNYNEPYRMYNADVFEYILDSPMTLYGSIPFMQAHRKDSSVGIFWLNAAETWVDITKGKDSKNPLALGVKSKITTRTHWFSESGLLDVFVFLGPTPKDIISKYAELTGTTAMPQEFSLGYHQCRWNYVSDEDVKDVDRKMDKFNMPYDVIWLDIEYTDEKKYFTWDKHSFKDPIGMGKQLEAHGRKLVTIIDPHIKNTNNYPVVDELKSKDLAVKTKDGSIFEGWCWPGSSHWIDAFNPAAREWWKGLFKYDKFKGTMENTFIWNAMNEPSVFNGPEVTMPKDNLHHGNWEHRDVHNLNGMTFQNATYHALLSRKPGEHRRPFVLTRAFFAGSQRLGAMWTGDNTADWGYLKASIPMVLSQGIAGFPFAGADVGGFFGNPDKDLLTRWYQTGIFYPFFRAHAHIDARRREPYLTGEPYNTIIAAALRLRYSLLPSWYTAFRHAHLDGTPIIKPMFYTHPSEEAGLPIDDQFFIGNTGLLAKPVTDKDRTSVDIWIPDSEVYYDYFTYDIISAAKSKTATLDAPLEKIPLLMRGGHVFARRDIPRRSSALMKWDPYTLVVVLGNDRKAEGDLYVDDGDSFDYEKGQYIHRRFIFDANTLTSADYEGRDDASIKEGEWLKKMRTVNVEKIIVVGAPAAWKGKKTVTVESEGKTWAAAIEYNPAEKSRAAFAVVKKVGVRVGADFKIVFG
8TOU , Knot 253 625 0.86 40 294 589
QSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYAGSHHHHHHHHHHSGLNDIFEAQKIEWHE
5AOA , Knot 127 286 0.83 40 180 273
AEVGRLRYPPEMPGAEVKVYKKVDNVDLKLYIYKPADWKPADRRSAIVFFFGGGWQSGSPAQFRPQCEYFAGRGMVAMAADYRVGSRHNVKVADCVADAKSAIRWVRQHAAELGVDPQKIVASGGSAGGHLAACTVMVPDLEAPEEDHTISSQANAAILFNPVLILSREGLKDHVPRQDWEERLRERLGTEPKAVSPYHHIRAGLPPMIIFHGTADNTVPFETIRLFAEAMKKAGNRCELVPFEGAAHGFFNFGRGDNLAYQKTLELADEFLVEIGFLAPKGESQP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5DKZ_1)}(2) \setminus P_{f(8TOU_1)}(2)|=70\), \(|P_{f(8TOU_1)}(2) \setminus P_{f(5DKZ_1)}(2)|=47\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110000100000011000001010011011010010011100101001000111100100010010111010110010101010000000101010000010000000100111111101001101000000010100110001010101111010100011001010001110100101010111010000100010000011000000110001110000010110011101011100011111001011010000110100001001001011001100110101011110100000011111101100110100100000111111000100000110001110111111101001100010101001110010110000010010000100100010010110011110100000000101000010011111001010100110110101000000111001000011100001011011011100011011011100110111000010101000111011001011011010110001001010000100101101001000111000110000111100111110001111101000101101010111110011111111110111111010001100100011101110101010100001010101000111111010001110100110010101011101110001000111110001111001111011000000010111100010000100011011000010101110011111011011100011000011101010011111100001010101001001000010010001110100100100010001010010110010010100111111111101000101000100111110001100001111110011101110101111
Pair \(Z_2\) Length of longest common subsequence
5DKZ_1,8TOU_1 117 4
5DKZ_1,5AOA_1 183 4
8TOU_1,5AOA_1 200 4

Newick tree

 
[
	5AOA_1:10.39,
	[
		5DKZ_1:58.5,8TOU_1:58.5
	]:46.89
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1576 }{\log_{20} 1576}-\frac{625}{\log_{20}625})=238.\)
Status Protein1 Protein2 d d1/2
Query variables 5DKZ_1 8TOU_1 306 252.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]