CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6LZG_1 7NQE_1 9IUK_1 Letter Amino acid
27 6 61 P Proline
20 1 24 W Tryptophan
47 19 91 E Glutamic acid
22 23 75 I Isoleucine
32 13 98 G Glycine
16 5 26 H Histidine
28 10 77 Y Tyrosine
31 10 96 V Valine
39 20 77 N Asparagine
32 12 67 D Aspartic acid
30 5 55 Q Glutamine
34 26 79 K Lycine
21 2 43 M Methionine
27 17 96 F Phenylalanine
34 8 112 S Serine
32 8 88 T Threonine
38 7 116 A Alanine
18 11 72 R Arginine
8 1 23 C Cysteine
60 18 125 L Leucine

6LZG_1|Chain A|Angiotensin-converting enzyme 2|Homo sapiens (9606)
>7NQE_1|Chain A|TPR_REGION domain-containing protein|Marinitoga sp. 1137 (1545835)
>9IUK_1|Chain A|Pleiotropic ABC efflux transporter of multiple drugs CDR1|Candida albicans SC5314 (237561)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6LZG , Knot 246 596 0.88 40 288 569
STIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYA
7NQE , Knot 101 222 0.82 40 148 214
KAPSQNAIKRFMTLFSGREDVFSIQYEGGYRPIRRPLNFQDIKNHFSGKKTLGIYLLKKNDTVKFAAYDIDIKKHYLNREDKFVYEENSKKVAKRLSRELNLENITHYFEFTGNRGYHIWIFFDIPVSAYKIKYIMEKILDRIELEEGIDVEIFPKQTSLNGGLGNLIKVPLGVHKKTGKKCLFVDNDFNVIENQIEFLNNIKENKATEINKLFREIFNEND
9IUK , Knot 529 1501 0.86 40 358 1302
MSDSKMSSQDESKLEKAISQDSSSENHSINEYHGFDAHTSENIQNLARTFTHDSFKDDSSAGLLKYLTHMSEVPGVNPYEHEEINNDQLNPDSENFNAKFWVKNLRKLFESDPEYYKPSKLGIGYRNLRAYGVANDSDYQPTVTNALWKLATEGFRHFQKDDDSRYFDILKSMDAIMRPGELTVVLGRPGAGCSTLLKTIAVNTYGFHIGKESQITYDGLSPHDIERHYRGDVIYSAETDVHFPHLSVGDTLEFAARLRTPQNRGEGIDRETYAKHMASVYMATYGLSHTRNTNVGNDFVRGVSGGERKRVSIAEASLSGANIQCWDNATRGLDSATALEFIRALKTSAVILDTTPLIAIYQCSQDAYDLFDKVVVLYEGYQIFFGKATKAKEYFEKMGWKCPQRQTTADFLTSLTNPAEREPLPGYEDKVPRTAQEFETYWKNSPEYAELTKEIDEYFVECERSNTRETYRESHVAKQSNNTRPASPYTVSFFMQVRYGVARNFLRMKGDPSIPIFSVFGQLVMGLILSSVFYNLSQTTGSFYYRGAAMFFAVLFNAFSSLLEIMSLFEARPIVEKHKKYALYRPSADALASIISELPVKLAMSMSFNFVFYFMVNFRRNPGRFFFYWLMCIWCTFVMSHLFRSIGAVSTSISGAMTPATVLLLAMVIYTGFVIPTPSMLGWSRWINYINPVGYVFESLMVNEFHGREFQCAQYVPSGPGYENISRSNQVCTAVGSVPGNEMVSGTNYLAGAYQYYNSHKWRNLGITIGFAVFFLAIYIALTEFNKGAMQKGEIVLFLKGSLKKHKRKTAASNKGDIEAGPVAGKLDYQDEAEAVNNEKFTEKGSTGSVDFPENREIFFWRDLTYQVKIKKEDRVILDHVDGWVKPGQITALMGASGAGKTTLLNCLSERVTTGIITDGERLVNGHALDSSFQRSIGYVQQQDVHLETTTVREALQFSAYLRQSNKISKKEKDDYVDYVIDLLEMTDYADALVGVAGEGLNVEQRKRLTIGVELVAKPKLLLFLDEPTSGLDSQTAWSICKLMRKLADHGQAILCTIHQPSALIMAEFDRLLFLQKGGRTAYFGELGENCQTMINYFEKYGADPCPKEANPAEWMLQVVGAAPGSHAKQDYFEVWRNSSEYQAVREEINRMEAELSKLPRDNDPEALLKYAAPLWKQYLLVSWRTIVQDWRSPGYIYSKIFLVVSAALFNGFSFFKAKNNMQGLQNQMFSVFMFFIPFNTLVQQMLPYFVKQRDVYEVREAPSRTFSWFAFIAGQITSEIPYQVAVGTIAFFCWYYPLGLYNNATPTDSVNPRGVLMWMLVTAFYVYTATMGQLCMSFSELADNAANLATLLFTMCLNFCGVLAGPDVLPGFWIFMYRCNPFTYLVQAMLSTGLANTFVKCAEREYVSVKPPNGESCSTYLDPYIKFAGGYFETRNDGSCAFCQMSSTNTFLKSVNSLYSERWRNFGIFIAFIAINIILTVIFYWLARVPKGNREKKNKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6LZG_1)}(2) \setminus P_{f(7NQE_1)}(2)|=177\), \(|P_{f(7NQE_1)}(2) \setminus P_{f(6LZG_1)}(2)|=37\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00100010011001000100110000110100000100001001001100101110000011010110010010101010110001001100000001001100100100010100100100011101110011100100000111100100011001011000011100011010000001001010001011010000010110010001001011000101010101101010010111011101110111011001001011110010101001110011010011001001110111101001110001100110100110010110110101011100010100110100011010001101101111001100110011101101011010010011110101000000010111001101110111001100101111010110001100110100011111011100000001101101000001100000010010100110011000111000010000011001101101100011011100111100101011100101110110000000111100010101
Pair \(Z_2\) Length of longest common subsequence
6LZG_1,7NQE_1 214 4
6LZG_1,9IUK_1 110 5
7NQE_1,9IUK_1 212 4

Newick tree

 
[
	7NQE_1:11.80,
	[
		6LZG_1:55,9IUK_1:55
	]:63.80
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{818 }{\log_{20} 818}-\frac{222}{\log_{20}222})=164.\)
Status Protein1 Protein2 d d1/2
Query variables 6LZG_1 7NQE_1 214 144.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]