CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1KIT_1 8BLF_1 5TVF_1 Letter Amino acid
67 17 10 S Serine
48 35 7 D Aspartic acid
48 16 1 Q Glutamine
10 1 4 H Histidine
49 32 6 I Isoleucine
58 42 6 L Leucine
30 14 2 P Proline
0 3 2 C Cysteine
31 7 3 F Phenylalanine
18 0 2 W Tryptophan
22 40 4 K Lycine
12 23 4 M Methionine
22 7 2 Y Tyrosine
48 58 8 V Valine
50 74 5 A Alanine
30 22 6 R Arginine
57 19 1 N Asparagine
31 46 7 E Glutamic acid
70 59 4 G Glycine
56 33 1 T Threonine

1KIT_1|Chain A|SIALIDASE|Vibrio cholerae (666)
>8BLF_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N|Chaperonin GroEL|Escherichia coli (562)
>5TVF_1|Chains A, C|S-adenosylmethionine decarboxylase beta chain|Trypanosoma brucei brucei (strain 927/4 GUTat10.1) (185431)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1KIT , Knot 285 757 0.83 38 280 696
ALFDYNATGDTEFDSPAKQGWMQDNTNNGSGVLTNADGMPAWLVQGIGGRAQWTYSLSTNQHAQASSFGWRMTTEMKVLSGGMITNYYANGTQRVLPIISLDSSGNLVVEFEGQTGRTVLATGTAATEYHKFELVFLPGSNPSASFYFDGKLIRDNIQPTASKQNMIVWGNGSSNTDGVAAYRDIKFEIQGDVIFRGPDRIPSIVASSVTPGVVTAFAEKRVGGGDPGALSNTNDIITRTSRDGGITWDTELNLTEQINVSDEFDFSDPRPIYDPSSNTVLVSYARWPTDAAQNGDRIKPWMPNGIFYSVYDVASGNWQAPIDVTDQVKERSFQIAGWGGSELYRRNTSLNSQQDWQSNAKIRIVDGAANQIQVADGSRKYVVTLSIDESGGLVANLNGVSAPIILQSEHAKVHSFHDYELQYSALNHTTTLFVDGQQITTWAGEVSQENNIQFGNADAQIDGRLHVQKIVLTQQGHNLVEFDAFYLAQQTPEVEKDLEKLGWTKIKTGNTMSLYGNASVNPGPGHGITLTRQQNISGSQNGRLIYPAIVLDRFFLNVMSIYSDDGGSNWQTGSTLPIPFRWKSSSILETLEPSEADMVELQNGDLLLTARLDFNQIVNGVNYSPRQQFLSKDGGITWSLLEANNANVFSNISTGTVDASITRFEQSDGSHFLLFTNPQGNPAGTNGRQNLGLWFSFDEGVTWKGPIQLVNGASAYSDIYQLDSENAIVIVETDNSNMRILRMPITLLKQKLTLSQN
8BLF , Knot 207 548 0.79 38 204 475
MAAKDVKFGNDARVKMLRGVNVLADAVKVTLGPKGRNVVLDKSFGAPTITKDGVSVAREIELEDKFENMGAQMVKEVASKANDAAGDGTTTATVLAQAIITEGLKAVAAGMNPMDLKRGIDKAVTAAVEELKALSVPCSDSKAIAQVGTISANSDETVGKLIAEAMDKVGKEGVITVEDGTGLQDELDVVEGMQFDRGYLSPYFINKPETGAVELESPFILLADKKISNIREMLPVLEAVAKAGKPLLIIAEDVEGEALATLVVNTMRGIVKVAAVKAPGFGDRRKAMLQDIATLTGGTVISEEIGMELEKATLEDLGQAKRVVINKDTTTIIDGVGEEAAIQGRVAQIRQQIEEATSDYDREKLQERVAKLAGGVAVIKVGAATEVEMKEKKARVEDALHATRAAVEEGVVAGGGVALIRVASKLADLRGQNEDQNVGIKVALRAMEAPLRQIVLNCGEEPSVVANTVKGGDGNYGYNAATEEYGNMIDMGILDPTKVTRSALQYAASVAGLMITTECMVTDLPKNDAADLGAAGGMGGMGGMGGMM
5TVF , Knot 46 85 0.80 40 77 83
MSSCKDSLSLMAMWGSIARFDPKHERSFEGPEKRLEVIMRVVDGTHVSGLLAHDDDVWQKVIDAICAHIVSREFNEYIRSYVLSE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1KIT_1)}(2) \setminus P_{f(8BLF_1)}(2)|=114\), \(|P_{f(8BLF_1)}(2) \setminus P_{f(1KIT_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1110001010001001100111000000101110010111111101111010100010000010100111010001011011110000101000111110100010111010100100111010110000010111111001010101010110001010100001111101000001111000101010101110110011011100101111011100011110111100000110000001110100010100010100010100101100100001110010110011001001011110111001001101010111010001000010111111001000000100000100010101101110010110100001101010001111101011011111000010100100001000110000011101001001110100000101101010101010100111000100110101101100010100010011100100100101010101011110110100000101000101101111100111011010000110010010011111010000110010100101101001011101010100110110001000110001110101101001011001001010101001000010011110010101110010001111101001101011101101101000100100001111100000010110111011000101000
Pair \(Z_2\) Length of longest common subsequence
1KIT_1,8BLF_1 152 4
1KIT_1,5TVF_1 239 4
8BLF_1,5TVF_1 193 3

Newick tree

 
[
	5TVF_1:11.48,
	[
		1KIT_1:76,8BLF_1:76
	]:41.48
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1305 }{\log_{20} 1305}-\frac{548}{\log_{20}548})=193.\)
Status Protein1 Protein2 d d1/2
Query variables 1KIT_1 8BLF_1 253 211.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]