CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3SAF_1 4OHO_1 4BED_1 Letter Amino acid
14 21 110 H Histidine
24 41 75 I Isoleucine
5 6 27 W Tryptophan
23 51 116 A Alanine
27 26 70 R Arginine
26 27 120 D Aspartic acid
17 28 98 F Phenylalanine
19 46 83 V Valine
32 39 59 Q Glutamine
25 34 89 K Lycine
11 17 30 M Methionine
15 38 89 T Threonine
16 8 69 Y Tyrosine
7 9 27 C Cysteine
30 28 80 P Proline
20 47 93 S Serine
56 69 149 L Leucine
18 16 78 N Asparagine
33 42 111 E Glutamic acid
10 45 91 G Glycine

3SAF_1|Chains A, B|Exosome component 10|Homo sapiens (9606)
>4OHO_1|Chains A, B|Glucokinase regulatory protein|Homo sapiens (9606)
>4BED_1|Chains A, C|HEMOCYANIN KLH1|MEGATHURA CRENULATA (55429)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3SAF , Knot 174 428 0.82 40 227 399
SIIRPQLKFREKIDNSNTPFLPKIFIKPNAQKPLPQALSKERRERPQDRPEDLDVPPALADFIHQQRTQQVEQDMFAHPYQYELNHFTPADAVLQKPQPQLYRPIEETPCHFISSLDELVELNEKLLNCQEFAVNLEHHSYRSFLGLTCLMQISTRTEDFIIDTLELRSDMYILNESLTDPAIVKVFHGADSDIEWLQKDFGLYVVNMFDTHQAARLLNLGRHSLDHLLKLYCNVDSNKQYQLADWRIRPLPEEMLSYARDDTHYLLYIYDKMRLEMWERGNGQPVQLQVVWQRSRDICLKKFIKPIFTDESYLELYRKQKKHLNTQQLTAFQLLFAWRDKTARREDESYGYVLPNHMMLKIAEELPKEPQGIIACCNPVPPLVRQQINEMHLLIQQAREMPLLKSEVAAGVKKSGPLPSAERLENVL
4OHO , Knot 252 638 0.85 40 270 590
MAHHHHHHDEVDMPGTKRFQHVIETPEPGKWELSGYEAAVPITEKSNPLTQDLDKADAENIVRLLGQCDAEIFQEEGQALSTYQRLYSESILTTMVQVAGKVQEVLKEPDGGLVVLSGGGTSGRMAFLMSVSFNQLMKGLGQKPLYTYLIAGGDRSVVASREGTEDSALHGIEELKKVAAGKKRVIVIGISVGLSAPFVAGQMDCCMNNTAVFLPVLVGFNPVSMARNDPIEDWSSTFRQVAERMQKMQEKQKAFVLNPAIGPEGLSGSSRMKGGSATKILLETLLLAAHKTVDQGIAASQRCLLEILRTFERAHQVTYSQSPKIATLMKSVSTSLEKKGHVYLVGWQTLGIIAIMDGVECIHTFGADFRDVRGFLIGDHSDMFNQKAELTNQGPQFTFSQEDFLTSILPSLTEIDTVVFIFTLDDNLTEVQTIVEQVKEKTNHIQALAHSTVGQTLPIPLKKLFPSIISITWPLLFFEYEGNFIQKFQRELSTKWVLNTVSTGAHVLLGKILQNHMLDLRISNSKLFWRALAMLQRFSGQSKARCIESLLRAIHFPQPLSDDIRAAPISCHVQVAHEKEQVIPIALLSLLFRCSITEAQAHLAAAPSVCEAVRSALAGPGQKRTADPLEILEPDVQG
4BED , Knot 554 1664 0.82 40 350 1298
ENLVRKSVEHLTQEETLDLQAALRELQMDSSSIGFQKIAAAHGAPASCVHKDTSIACCIHGMPTFPHWHRAYVVHMERALQTKRRTSGLPYWDWTEPITQLPSLAADPVYIDSQGGKAHTNYWYRGNIDFLDKKTNRAVDDRLFEKVKPGQHTHLMESVLDALEQDEFCKFEIQFELAHNAIHYLVGGKHDYSMANLEYTAYDPIFFLHHSNVDRIFAIWQRLQELRNKDPKAMDCAQELLHQKMEPFSWEDNDIPLTNEHSTPADLFDYCELHYDYDTLNLNGMTPEELKTYLDERSSRARAFASFRLKGFGGSANVFVYVCIPDDNDRNDDHCEKAGDFFVLGGPSEMKWQFYRPYLFDLSDTVHKMGMKLDGHYTVKAELFSVNGTALPDDLLPHPVVVHHPEKGFTDPPVKHHQSANLLVRKNINDLTREEVLNLREAFHKFQEDRSVDGYQATAEYHGLPARCPRPDAKDRYACCVHGMPIFPHWHRLFVTQVEDALVGRGATIGIPYWDWTEPMTHIPGLAGNKTYVDSHGASHTNPFHSSVIAFEENAPHTKRQIDQRLFKPATFGHHTDLFNQILYAFEQEDYCDFEVQFEITHNTIHAWTGGSEHFSMSSLHYTAFDPLFYFHHSNVDRLWAVWQALQMRRHKPYRAHCAISLEHMHLKPFAFSSPLNNNEKTHANAMPNKIYDYENVLHYTYEDLTFGGISLENIEKMIHENQQEDRIYAGFLLAGIRTSANVDIFIKTTDSVQHKAGTFAVLGGSKEMKWGFDRVFKFDITHVLKDLDLTADGDFEVTVDITEVDGTKLASSLIPHASVIREHARVKFDKVPRSRLIRKNVDRLSPEEMNELRKALALLKEDKSAGGFQQLGAFHGEPKWCPSPEASKKFACCVHGMSVFPHWHRLLTVQSENALRRHGYDGALPYWDWTSPLNHLPELADHEKYVDPEDGVEKHNPWFDGHIDTVDKTTTRSVQNKLFEQPEFGHYTSIAKQVLLALEQDNFCDFEIQYEIAHNYIHALVGGAQPYGMASLRYTAFDPLFYLHHSNTDRIWAIWQALQKYRGKPYNVANCAVTSMREPLQPFGLSANINTDHVTKEHSVPFNVFDYKTNFNYEYDTLEFNGLSISQLNKKLEAIKSQDRFFAGFLLSGFKKSSLVKFNICTDSSNCHPAGEFYLLGDENEMPWAYDRVFKYDITEKLHDLKLHAEDHFYIDYEVFDLKPASLGKDLFKQPSVIHEPRIGHHEGEVYQAEVTSANRIRKNIENLSLGELESLRAAFLEIENDGTYESIAKFHGSPGLCQLNGNPISCCVHGMPTFPHWHRLYVVVVENALLKKGSSVAVPYWDWTKRIEHLPHLISDATYYNSRQHHYETNPFHHGKITHENEITTRDPKDSLFHSDYFYEQVLYALEQDNFCDFEIQLEILHNALHSLLGGKGKYSMSNLDYAAFDPVFFLHHATTDRIWAIWQDLQRFRKRPYREANCAIQLMHTPLQPFDKSDNNDEATKTHATPHDGFEYQNSFGYAYDNLELNHYSIPQLDHMLQERKRHDRVFAGFLLHNIGTSADGHVFVCLPTGEHTKDCSHEAGMFSILGGQTEMSFVFDRLYKLDITKALKKNGVHLQGDFDLEIEITAVNGSHLDSHVIHSPTILFEAGTDSAHTDDGHTEP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3SAF_1)}(2) \setminus P_{f(4OHO_1)}(2)|=57\), \(|P_{f(4OHO_1)}(2) \setminus P_{f(3SAF_1)}(2)|=100\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01101010100010000011110111010100111011000000010001001011111101100000001000111010000100101101110010101001100010011001001101000110000111010000000111100110100000011100101000101100010011110110110001011000111011011000011011011000100110100010000000110101011100110010000001101000101011001010110101110000010100110111000001010000000100001011011111000010000000101110011101100110010111100011111100010010111001001111000111110001111010010011
Pair \(Z_2\) Length of longest common subsequence
3SAF_1,4OHO_1 157 4
3SAF_1,4BED_1 155 4
4OHO_1,4BED_1 120 4

Newick tree

 
[
	3SAF_1:83.14,
	[
		4BED_1:60,4OHO_1:60
	]:23.14
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1066 }{\log_{20} 1066}-\frac{428}{\log_{20}428})=167.\)
Status Protein1 Protein2 d d1/2
Query variables 3SAF_1 4OHO_1 214 178.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]