CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2JDS_1 4RYR_1 8FNE_1 Letter Amino acid
9 10 16 H Histidine
21 14 55 I Isoleucine
25 19 43 F Phenylalanine
14 13 34 Y Tyrosine
16 16 41 S Serine
6 7 14 W Tryptophan
20 9 63 V Valine
23 10 99 A Alanine
15 1 36 R Arginine
2 1 4 C Cysteine
22 5 80 G Glycine
32 23 87 L Leucine
18 8 64 D Aspartic acid
8 6 21 M Methionine
14 10 55 T Threonine
17 8 60 N Asparagine
14 4 60 Q Glutamine
27 2 54 E Glutamic acid
34 9 62 K Lycine
14 6 48 P Proline

2JDS_1|Chain A|CAMP-DEPENDENT PROTEIN KINASE|BOS TAURUS (9913)
>4RYR_1|Chain A|Integral membrane protein|Bacillus cereus (226900)
>8FNE_1|Chains A, B, C, D, E, F, G, H|Maltose/maltodextrin-binding periplasmic protein, PhuN|Escherichia coli K-12 (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2JDS , Knot 151 351 0.84 40 210 336
MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTLGTGSFGRVMLVKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF
4RYR , Knot 79 181 0.75 40 121 165
MDYKDDDDKHHHHHHHHHHENLYFQSYVMFMKKSSIIVFFLTYGLFYVSSVLFPIDRTWYDALEKPSWTPPGMTIGMIWAVLFGLIALSVAIIYNNYGFKPKTFWFLFLLNYIFNQAFSYFQFSQKNLFLATVDCLLVAITTLLLIMFSSNLSKVSAWLLIPYFLWSAFATYLSWTIYSIN
8FNE , Knot 361 996 0.83 40 302 886
HHHHHHMKIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTGKPIPNPLLGLDSTENLYFQGMQQTQQGPKVQTQTLQGGAGNLNSIFQRSGRTDGGDARASEALAVFNKLKEEAIAQQDLHDDFLVFRFDRDQNRVGYSALLVVKRAAINGQQVIVTRPLVMPNDQITLPTKKLTIQNGMHQETIEAEADVQDVFTTQYWNRICDSIRQQTGKHDAMVINAGPTVIPADFDLKDELVLKQLLIKSVNLCDDMLAKRSGEQPFSVAMLKGTDETLAARLNFTGKPMHDSLGYPIRSDILVSLNRVKKPGQQENEFYEAEDKLNQVSCFVNLEYTPQPQQAIYGAPQQTQQLPPLTPAIVITDVRQAEWLKANTMELYLFALSNAFRVTANQSWARSLLPQLGKVKDMRDIGAIGYLSRLAARVETKTETFTDQNFAELLYNMVRPSPVFMSDLNRFGDNAAIENVFIDALGGVNQQRAVAAIIAGVNNLIGGGFEKFFDHNTMPIIQPYGTDIQLGYYLDGEGEKQDRRDLDVLGALNASDGNIQEWMSWYGTQCNVAVHPELRARQSKNFDRQYLGNSVTYTTRAHRGIWNPKFIEALDKAIASVGLTVAMDNVAQVFGAQRFSGNLAIADYAVTGTAQVSSGLVSNGGYNPQFGVGQGSGFY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2JDS_1)}(2) \setminus P_{f(4RYR_1)}(2)|=135\), \(|P_{f(4RYR_1)}(2) \setminus P_{f(2JDS_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110111100100000100111010001100100110001010010010011010110111100100100011011000011010010001000011011011111010101000001011100111101100100110100101010110111010010010110001010011100010101001111001010010101010011101110010001101111111100111101111100110100011010101100100010011001101010001101001100100001110001111000010111110101110000100000001010100001001001
Pair \(Z_2\) Length of longest common subsequence
2JDS_1,4RYR_1 181 3
2JDS_1,8FNE_1 154 4
4RYR_1,8FNE_1 203 6

Newick tree

 
[
	4RYR_1:10.74,
	[
		2JDS_1:77,8FNE_1:77
	]:24.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{532 }{\log_{20} 532}-\frac{181}{\log_{20}181})=101.\)
Status Protein1 Protein2 d d1/2
Query variables 2JDS_1 4RYR_1 132 97.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]