CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8OJQ_1 6EOB_1 7PJO_1 Letter Amino acid
71 36 21 A Alanine
63 36 14 D Aspartic acid
9 2 1 C Cysteine
33 5 13 H Histidine
54 42 24 K Lycine
23 8 10 M Methionine
22 26 6 N Asparagine
54 44 13 G Glycine
48 23 10 P Proline
28 9 8 Y Tyrosine
57 47 6 V Valine
36 21 9 Q Glutamine
85 39 13 E Glutamic acid
52 35 10 I Isoleucine
36 23 6 F Phenylalanine
50 44 15 T Threonine
71 25 12 R Arginine
112 37 25 L Leucine
56 19 14 S Serine
14 1 7 W Tryptophan

8OJQ_1|Chains A, B|Phosphoenolpyruvate carboxylase 1|Arabidopsis thaliana (3702)
>6EOB_1|Chain A|78 kDa glucose-regulated protein|Cricetulus griseus (10029)
>7PJO_1|Chains A[auth AAA], B[auth BBB]|CPR-C4|Candidate division CPR1 (1618338)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8OJQ , Knot 360 974 0.84 40 315 863
MHHHHHHMANRKLEKMASIDVHLRQLVPGKVSEDDKLVEYDALLLDRFLDILQDLHGEDLRETVQELYEHSAEYEGKHEPKKLEELGSVLTSLDPGDSIVIAKAFSHMLNLANLAEEVQIAYRRRIKKLKKGDFVDESSATTESDLEETFKKLVGDLNKSPEEIFDALKNQTVDLVLTAHPTQSVRRSLLQKHGRIRDCLAQLYAKDITPDDKQELDEALQREIQAAFRTDEIKRTPPTPQDEMRAGMSYFHETIWKGVPKFLRRVDTALKNIGIEERVPYNAPLIQFSSWMGGDRDGNPRVTPEVTRDVCLLARMMAATMYFNQIEDLMFEMSMWRCNDELRARADEVHANSRKDAAKHYIEFWKSIPTTEPYRVILGDVRDKLYHTRERAHQLLSNGHSDVPVEATFINLEQFLEPLELCYRSLCSCGDRPIADGSLLDFLRQVSTFGLSLVRLDIRQESDRHTDVLDAITTHLDIGSYREWSEERRQEWLLSELSGKRPLFGSDLPKTEEIADVLDTFHVIAELPADSFGAYIISMATAPSDVLAVELLQRECRVKQPLRVVPLFEKLADLEAAPAAVARLFSVDWYKNRINGKQEVMIGYSDSGKDAGRLSAAWQLYKAQEELVKVAKEYGVKLTMFHGRGGTVGRGGGPTHLAILSQPPDTINGSLRVTVQGEVIEQSFGEEHLCFRTLQRFTAATLEHGMRPPISPKPEWRALLDEMAVVATEEYRSVVFQEPRFVEYFRLATPELEYGRMNIGSRPSKRKPSGGIESLRAIPWIFAWIQTRFHLPVWLGFGSAIRHVIEKDVRNLHMLQDMYQHWPFFRVTIDLIEMVFAKGDPGIAALYDKLLVSEELWPFGEKLRANFEETKKLILQTAGHKDLLEGDPYLKQRLRLRDSYITTLNVCQAYTLKRIRDPSYHVTLRPHISKEIAESSKPAKELIELNPTSEYAPGLEDTLILTMKGIAAGLQNTG
6EOB , Knot 210 522 0.84 40 230 484
GTVVGIDLGTTYSCVGVFKNGRVEIIANDQGNRITPSYVAFTPEGERLIGDAAKNQLTSNPENTVFDAKRLIGRTWNDPSVQQDIKFLPFKVVEKKTKPYIQVDIGGGQTKTFAPEEISAMVLTKMKETAEAYLGKKVTHAVVTVPAYFNDAQRQATKDAGTIAGLNVMRIINEPTAAAIAYGLDKREGEKNILVFDLGGGAFDVSLLTIDNGVFEVVATNGDTHLGGEDFDQRVMEHFIKLYKKKTGKDVRKDNRAVQKLRREVEKAKRALSSQHQARIEIESFFEGEDFSETLTRAKFEELNMDLFRSTMKPVQKVLEDSDLKKSDIDEIVLVGGSTRIPKIQQLVKEFFNGKEPSRGINPDEAVAYGAAVQAGVLSGDQDTGDLVLLDVCPLTLGIETVGGVMTKLIPRNTVVPTKKSQIFSTASDNQPTVTIKVYEGERPLTKDNHLLGTFDLTGIPPAPRGVPQIEVTFEIDVNGILRVTAEDKGTGNKNKITITNDQNRLTPEEIERMVNDAEKFA
7PJO , Knot 109 237 0.83 40 173 231
GHMASMTGGQQMGRGSMHYKAQLQKLLTTEEKKILARLSTPQKIQDFLDTIKNKDLAEGEHTMWSPRAVLKHKHAHCMEGAMLAALALAYHGHSPLLMDLQTTDEDEDHVVALFKIDGHWGAISKTNHPVLRYRDPIYKSVRELAMSYFHEYFIWWTKKNGGKKTLRAYSNPFDLTRYKPERWVIATGDLDWLAEALDDSKHFPILNKKMQKQLRPASRIETKAASLSEWPKRKTNS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8OJQ_1)}(2) \setminus P_{f(6EOB_1)}(2)|=108\), \(|P_{f(6EOB_1)}(2) \setminus P_{f(8OJQ_1)}(2)|=23\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000001100010011010101001111010000011000111100110110010100100010010000100010001001001101100101100111101100110110110010110000100100101100001000001000100111010001001101100001011101010001000110001010001101010010100000100110001011100001000110100010111001000110111011001001100111000110011110100111100010101010100010111011110101001001110101100000101010010100000110001011001100010011110100010000001001100100011101011010011011010000100010011101011011001001110110101000000000110110001011000010000000111001010011110011000011011001011101110011101101101100111101100000100110111110011010111111101101010000101000111100001001101011101001000110110001101011010110110111100111100110010101010101011000110001010010010110100110111010101011100111110000001110010110010110101001010110010000101110010111111111000101111111101100110001001011001000111101010110111101011111100011100011111001010100000111001100011010101000101000010010100100100100100010101010001100001100110101000011110001110101111110001
Pair \(Z_2\) Length of longest common subsequence
8OJQ_1,6EOB_1 131 4
8OJQ_1,7PJO_1 188 6
6EOB_1,7PJO_1 171 4

Newick tree

 
[
	7PJO_1:96.61,
	[
		8OJQ_1:65.5,6EOB_1:65.5
	]:31.11
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1496 }{\log_{20} 1496}-\frac{522}{\log_{20}522})=246.\)
Status Protein1 Protein2 d d1/2
Query variables 8OJQ_1 6EOB_1 317 240
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]