CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9GMA_1 2LWC_1 1ULF_1 Letter Amino acid
64 0 5 T Threonine
9 0 1 W Tryptophan
32 1 1 M Methionine
93 0 12 S Serine
36 1 6 Y Tyrosine
138 0 8 R Arginine
132 0 3 Q Glutamine
54 0 12 I Isoleucine
155 0 8 E Glutamic acid
61 2 9 G Glycine
33 0 4 H Histidine
176 0 11 L Leucine
55 0 7 K Lycine
140 0 12 A Alanine
47 0 14 N Asparagine
87 0 5 D Aspartic acid
87 0 10 V Valine
9 0 0 C Cysteine
41 1 10 F Phenylalanine
33 0 12 P Proline

9GMA_1|Chains A, B, K[auth O], L[auth P]|Chromosome partition protein MukB|Photorhabdus thracensis (230089)
>2LWC_1|Chain A|Met-enkephalin|Homo sapiens (9606)
>1ULF_1|Chains A, B|galectin-2|Coprinopsis cinerea (5346)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9GMA , Knot 503 1482 0.82 40 314 1151
MIERGKFRSLTLVNWNGFFARTFDLDELVTTLSGGNGAGKSTTMAAFVTALIPDLTLLHFRNTTEAGATSGSRDKGLHGKLRAGVCYSTLDVINSRHQRVVVGVRLQQVAGRDRKVDIKPFMIQGLPTAIQPTQLLTENVGERQARVLPLNELKDRLDEMEGVQFKQFNSITDYHAQMFDLGVIPKRLRSASDRSKFYRLIEASLYGGISSAITRSLRDYLLPENSGVRKAFQDMEAALRENRITLEAIRVTQSDRDLFKHLITEATSYVSADYMRHANERRTHLDEALALRGELFGSHKQLATEQYRHVEMARELAEQSGASSDLETDHQAASDHLNLVQTAMRQQEKIDRYQVDLEELSYRLEEQTDVVEEAGELQAEYEARTEATEQEVDELKSQLADYQQALDVQQTRAIQYQQALQALERARELCRLPDLSVDNAEEWLETFQAKEQQATEALLALEQKLSVADAAHNQFEQAYQLVKNIVGETSRSEAWQSARELLRDWPSQRHLADRVQPLRMRLSELEQRLNNQQNAERLLSEFCKRQGRQYQAEDLEALQNELEARQEALSLSVNEGGERRMEMRQELEQLKQKIQSLTARAPVWLAAQDTLNQLCEQSGETLASSNDVTEYMQQLLEREREATVERDEVAAQKRELEKQIERLSQPSGAEDSRMIALAERFGGVLLSEIYDDITIDDAPYFSALYGPARHGIVVPDLSLVRPHLETLEDCPEDLYLIEGDPQSFDDSVFNAEEQTNAVLVKSSDRQWRYSRYPELPLFGRAARENRLEALNLERDALAERYATLSFDVQKIQRAHQAFSQFVGKHLSVAFDTDPEAEIRELRQRHTELEREVSRFEDQTQQQRQQYAQAKESLTTLNRLIPQVTLLLDETLIDRVEEVREEMDEAQEAARFLQQHGSALTKLEPMVAVLQSDPQQHEQLQQDYETAKHSQHQAKQQAFALVEIVQRRVHFSYSDSAGMLSENADLNDKLRQRLEHAESDRSRAREQLRQQQAQYSQFNQVLASLKSSYETKQDMLKELLQEMKDIGVQADANAEMRARERRDRLHEALSVNRSRVNQLEKQIAFCEAEMENVQKKLRKLERDYYQIREQVVSAKAGWCAVMRMVKDNGVERRLHRRELAYMEGGALRSMSDKALGALRLAVADNEHLRDALRLSEDPKRPERKVQFFIAVYQHLRERIRQDIIRTDDPVDAIEQMEIELARLTEELTAREQKLAISSKSVANIIRKTIQREQNRIRMLNQGLQAVSFGQVRGVRLNVNVRESHAILLDVLSEQQEQHQDLFNSQRLTFSEAMAKLYQRLNPQVDMGQRLPQTIGEELLDYRNYLELDVEVNRGSDGWLKAESGALSTGEAIGTGMSILVMVVQSWEEESRRLRGKDISPCRLLFLDEAARLDAKSIATLFELCERLQMQLIIAAPENISPEKGTTYKLVRKVFKNHEHVHVVGLRGFGQDAPATQLISDVTA
2LWC , Knot 4 5 0.42 8 4 3
YGGFM
1ULF , Knot 71 150 0.79 38 114 143
MLYHLFVNNQVKLQNDFKPESVAAIRSSAFNSKGGTTVFNFLSAGENILLHISIRPGENVIVFNSRLKNGAWGPEERIPYAEKFRPPNPSITVIDHGDRFQIRFDYGTSIYYNKRIKENAAAIAYNAENSLFSSPVTVDVHGLLPPLPPA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9GMA_1)}(2) \setminus P_{f(2LWC_1)}(2)|=310\), \(|P_{f(2LWC_1)}(2) \setminus P_{f(9GMA_1)}(2)|=0\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010100101101011110010100110010110111000011111011110101101000001110010000110101011100001011000000111110100111000010101111011101101001100011000101111001000100101101001001000010110111110010010000010011010101110011000100011100011001100101110000101011010000001100110010001010010010000001001111010111000011000000101100110001100010000011000101100110000010000101001000100000110011010100010001000010010001100001101000011000011011001001001101010010011001010000100111110001011011000100100110011100000011001001100110000110010110101001000100000100110010000100001001011000101000110101001100010100010010001001010111111100010010000100110000100010011000001010000111000010001001001011000011111001111110010001010011010110111001111101011010100100010010110101001000110100000111100000010000010111110110000101101000111000101010100100100110011100101110001010100100000010001001000000000001010001001001110101110001100100100010010011011000101100101111110001000001000000100000010001111101100010100000111100010100010001001000000100010000100001001110100000000011001100100111010101010100000010011010000100100011100101001000100100000010001101011101110110001100010000110101111001000111110111100001001101000100100010111110001000100011000011011001010110100010100001110000110110001000000101100110110110101101010100001111011000000000110000101001110100010101011001100110011000001010101001001110100111001011101101111110010000001010010100111100110101001101101000101011111100101001000011001100000101111011100111001100101
Pair \(Z_2\) Length of longest common subsequence
9GMA_1,2LWC_1 310 3
9GMA_1,1ULF_1 218 3
2LWC_1,1ULF_1 114 2

Newick tree

 
[
	9GMA_1:15.17,
	[
		1ULF_1:57,2LWC_1:57
	]:94.17
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1487 }{\log_{20} 1487}-\frac{5}{\log_{20}5})=408.\)
Status Protein1 Protein2 d d1/2
Query variables 9GMA_1 2LWC_1 501 251.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]