CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6LCV_1 1ZVK_1 3UAC_1 Letter Amino acid
9 4 23 K Lycine
7 1 28 M Methionine
26 22 19 S Serine
58 28 34 V Valine
10 13 21 N Asparagine
60 24 30 D Aspartic acid
0 3 2 C Cysteine
21 11 8 Y Tyrosine
63 12 20 E Glutamic acid
24 5 23 H Histidine
25 17 21 I Isoleucine
15 15 23 Q Glutamine
84 28 48 L Leucine
36 19 26 T Threonine
19 10 16 F Phenylalanine
43 30 33 P Proline
17 5 8 W Tryptophan
107 56 38 A Alanine
79 8 17 R Arginine
67 47 51 G Glycine

6LCV_1|Chain A|MTSase|Arthrobacter ramosus (1672)
>1ZVK_1|Chains A, B|kumamolisin-As|Alicyclobacillus sendaiensis (192387)
>3UAC_1|Chain A|Blue copper oxidase CueO|Escherichia coli (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6LCV , Knot 281 770 0.80 38 244 637
MPASTYRLQISAEFTLFDAARIVPYLHRLGADWLYLSPLLESEPGSSHGYDVVDHSRVDAARGGPEGLAELSRAAHERGMGVVVDIVPNHVGVATPKANRWWWDVLARGQRSEYADYFDIDWEFGGGRLRLPVLGDGPDELDALRVDGDELVYYEHRFPIAEGTGGGTPREVHDRQHYELMSWRRADHDLNYRRFFAVNTLAAVRVEDPRVFDDTHREIGRWIAEGLVDGLRVDHPDGLRAPGDYLRRLAELAQGRPIWVEKIIEGDERMPPQWPIAGTTGYDALAGIDRVLVDPAGEHPLTQIVDEAAGSPRRWAELVPERKRAVARGILNSEIRRVARELGEVAGDVEDALVEIAAALSVYRSYLPFGREHLDEAVAAAQAAAPQLEADLAAVGAALADPGNPAALRFQQTSGMIMAKGVEDNAFYRYPRLTSLTEVGGDPSLFAIDAAAFHAAQRDRAARLPESMTTLTTHDTKRSEDTRARITALAEAPERWRRFLTEVGGLIGTGDRVLENLIWQAIVGAWPASRERLEAYALKAAREAGESTDWIDGDPAFEERLTRLVTVAVEEPLVHELLERLVDELTAAGYSNGLAAKLLQLLAPGTPDVYQGTERWDRSLVDPDNRRPVDFAAASELLDRLDGGWRPPVDETGAVKTLVVSRALRLRRDRPELFTAYHPVTARGAQAEHLIGFDRGGAIALATRLPLGLAAAGGWGDTVVDVGERSLRDELTGREARGAARVAELFADYPVALLVETKLAAALEHHHHHH
1ZVK , Knot 144 358 0.78 40 175 331
AAPTAYTPLDVAQAYQFPEGLDGQGQCIAIIELGGGYDEASLAQYFASLGVPAPQVVSVSVDGASNQPTGDPSGPDGHVELDIEVAGALAPGAKFAVYFAPNTDAGFLDAITTAIHDPTLKPSVVSISWGGPEDSWTSAAIAAMNRAFLDAAALGVTVLAAAGNSGSTDGEQDGLYHVDFPAASPYVLACGGTRLVASGGRIAQETVWNDGPDGGATGGGVSRIFPLPAWQEHANVPPSANPGASSGRGVPDLAGNADPATGYEVVIDGEATVIGGTSAVAPLFAALVARINQKLGKAVGYLNPTLYQLPADVFHDITEGNNDIANRAQIYQAGPGWDPCTGLGSPIGVRLLQALLPS
3UAC , Knot 200 489 0.84 40 246 459
AERPTLPIPDLLTTDARNRIQLTIGAGQSTFGGKTATTWGYNGNLLGPAVKLQRGKAVTVDIYNQLTEETTLHWHGLEVPGEVDGGPQGIIPPGGKRSVTLNVDQPAATCWFHPHQHGKTGRQVAMGLAGLVVIEDDEILKLMLPKQWGIDDVPVIVQDKKFSADGQIDYQLDVMTAAVGWFGDTLLTNGAIYPQHAAPRGWLRLRLLNGCNARSLNFATSDNRPLYVIASDGGLLPEPVKVSELPVLMGERFEVLVEVNDNKPFDLVTLPVSQMGMAIAPFDKPHPVMRIQPIAISASGALPDTLSSLPALPSLEGLTVRKLQLSMDPMLDMMGMQMLMEKYGDQAMAGMDHSQMMGHMGHGNMNHMNHGGKFDFHHANKINGQAFDMNKPMFAAAKGQYERWVISGVGDMMLHPFHIHGTQFRILSENGKPPAAHRAGWKDTVKVEGNVSEVLVKFNHDAPKEHAYMAHSHLLEHQDTGMMLGFTVG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6LCV_1)}(2) \setminus P_{f(1ZVK_1)}(2)|=106\), \(|P_{f(1ZVK_1)}(2) \setminus P_{f(6LCV_1)}(2)|=37\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100001010101011011011101001110110101110001100010011000010110111011101001100011111101110011110101001110111010000010010101011110101111101100101101010011000001111010111010010000000110100100010000111100111101001011000000110111011101101001011011100100110110101111001101000111011111001001111100111011100110011001110100110111000011101110001001100110111010011101111101000011110001001111101111010101111111110110111101000011111011000110001010010011101011110111101100001101100100100000000000010101110110010011001111110100110011101111111100001010110110011000011010111000100110111001110011001100101110001111011011111010100100010001101000011011110011001011101110001110011100110100001011010011010110100111100111111100111111111111001101100010001010010111011011100111111000111110000000
Pair \(Z_2\) Length of longest common subsequence
6LCV_1,1ZVK_1 143 5
6LCV_1,3UAC_1 140 4
1ZVK_1,3UAC_1 161 4

Newick tree

 
[
	1ZVK_1:78.07,
	[
		6LCV_1:70,3UAC_1:70
	]:8.07
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1128 }{\log_{20} 1128}-\frac{358}{\log_{20}358})=202.\)
Status Protein1 Protein2 d d1/2
Query variables 6LCV_1 1ZVK_1 241 177
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]