CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9FPN_1 3FKQ_1 6GYO_1 Letter Amino acid
19 8 10 H Histidine
34 30 24 K Lycine
18 11 26 M Methionine
25 28 33 S Serine
4 1 9 W Tryptophan
55 24 37 V Valine
17 12 21 Q Glutamine
11 17 33 F Phenylalanine
30 4 17 P Proline
33 24 27 D Aspartic acid
31 28 29 E Glutamic acid
58 20 28 G Glycine
60 30 54 L Leucine
5 17 23 Y Tyrosine
27 23 19 N Asparagine
26 10 36 R Arginine
19 6 8 C Cysteine
43 38 39 I Isoleucine
41 17 24 T Threonine
80 25 24 A Alanine

9FPN_1|Chain A[auth X]|Carbon monoxide dehydrogenase 2|Carboxydothermus hydrogenoformans Z-2901 (246194)
>3FKQ_1|Chain A|NtrC-like two-domain protein|Eubacterium rectale (39491)
>6GYO_1|Chains A, B, C, D|Potassium/sodium hyperpolarization-activated cyclic nucleotide-gated channel 4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9FPN , Knot 245 636 0.83 40 256 566
MARQNLKSTDRAVQQMLDKAKREGIQTVWDRYEAMKPQCGFGETGLCCRHCLQGPCRINPFGDEPKVGICGATAEVIVARGLDRSIAAGAAGHSGHAKHLAHTLKKAVQGKAASYMIKDRTKLHSIAKRLGIPTEGQKDEDIALEVAKAALADFHEKDTPVLWVTTVLPPSRVKVLSAHGLIPAGIDHEIAEIMHRTSMGCDADAQNLLLGGLRCSLADLAGCYMGTDLADILFGTPAPVVTESNLGVLKADAVNVAVHGHNPVLSDIIVSVSKEMENEARAAGATGINVVGICCTGNEVLMRHGIPACTHSVSQEMAMITGALDAMILDYQCIQPSVATIAECTGTTVITTMEMSKITGATHVNFAEEAAVENAKQILRLAIDTFKRRKGKPVEIPNIKTKVVAGFSTEAIINALSKLNANDPLKPLIDNVVNGNIRGVCLFAGCNNVKVPQDQNFTTIARKLLKQNVLVVATGCGAGALMRHGFMDPANVDELCGDGLKAVLTAIGEANGLGGPLPPVLHMGSCVDNSRAVALVAALANRLGVDLDRLPVVASAAEAMHEKAVAIGTWAVTIGLPTHIGVLPPITGSLPVTQILTSSVKDITGGYFIVELDPETAADKLLAAINERRAGLGLPW
3FKQ , Knot 157 373 0.83 40 201 351
MGSDKIHHHHHHENLYFQGMKIKVALLDKDKEYLDRLTGVFNTKYADKLEVYSFTDEKNAIESVKEYRIDVLIAEEDFNIDKSEFKRNCGLAYFTGTPGIELIKDEIAICKYQRVDVIFKQILGVYSDMAANVATISGENDKSSVVIFTSPCGGVGTSTVAAACAIAHANMGKKVFYLNIEQCGTTDVFFQAEGNATMSDVIYSLKSRKANLLLKLESCIKQSQEGVSYFSSTKVALDILEISYADIDTLIGNIQGMDNYDEIIVDLPFSLEIEKLKLLSKAWRIIVVNDGSQLSNYKFMRAYESVVLLEQNDDINIIRNMNMIYNKFSNKNSEMLSNISIKTIGGAPRYEHATVRQIIEALTKMEFFEEILQ
6GYO , Knot 209 521 0.83 40 263 492
SMLPEAEVRLGQAGFMQRQFGAMLQPGVNKFSLRMFGSQKAVEREQERVKSAGFWIIHPYSDFRFYWDLTMLLLMVGNLIIIPVGITFFKDENTTPWIVFNVVSDTFFLIDLVLNFRTGIVVEDNTEIILDPQRIKMKYLKSWFMVDFISSIPVDYIFLIVETRIDSEVYKTARALRIVRFTKILSLLRLLRLSRLIRYIHQWEEIFHMTYDLASAVVRIVNLIGMMLLLCHWDGCLQFLVPMLQDFPDDCWVSINNMVNNSWGKQYSYALFKAMSHMLCIGYGRQAPVGMSDVWLTMLSMIVGATCYAMFIGHATALIQSLDSSRRQYQEKYKQVEQYMSFHKLPPDTRQRIHDYYEHRYQGKMFDEESILGELSEPLREEIINFNCRKLVASMPLFANADPNFVTSMLTKLRFEVFQPGDYIIREGTIGKKMYFIQHGVVSVLTKGNKETKLADGSYFGEICLLTRGRRTASVRADTYCRLYSLSVDNFNEVLEEYPMMRRAFETVALDRLDRIGKKNS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9FPN_1)}(2) \setminus P_{f(3FKQ_1)}(2)|=110\), \(|P_{f(3FKQ_1)}(2) \setminus P_{f(9FPN_1)}(2)|=55\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001000001100110010001100110000110100111001100000101100101110010111011010111101100011111110010100110010011010110011000001001100111100100000111011011110100000111110011110010110101111111000110110000110010100111111000110111001100110111101111100001111010110111010011100111010001000101111011011110001001110011110000100011110111011110000101011011000100110010100101100101100111001001101110010000101101101000111110001110110010100110111001101010110111100010110000100110011000111110101111110011101101001010110111011101011111111110110010000111111111001110100111110110110001111101110111100111111101011100110001001011011101010011001111100001111111
Pair \(Z_2\) Length of longest common subsequence
9FPN_1,3FKQ_1 165 4
9FPN_1,6GYO_1 167 4
3FKQ_1,6GYO_1 164 4

Newick tree

 
[
	9FPN_1:83.33,
	[
		3FKQ_1:82,6GYO_1:82
	]:1.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1009 }{\log_{20} 1009}-\frac{373}{\log_{20}373})=168.\)
Status Protein1 Protein2 d d1/2
Query variables 9FPN_1 3FKQ_1 208 166
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]