CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6BII_1 7XPZ_1 6OOA_1 Letter Amino acid
25 34 22 R Arginine
15 29 34 P Proline
16 24 26 T Threonine
7 20 12 H Histidine
7 9 18 M Methionine
13 32 32 F Phenylalanine
5 48 30 S Serine
5 9 3 W Tryptophan
9 24 16 Y Tyrosine
10 11 18 N Asparagine
1 22 15 Q Glutamine
23 42 26 G Glycine
34 77 51 L Leucine
20 20 29 I Isoleucine
24 13 38 K Lycine
33 47 37 V Valine
35 45 18 A Alanine
16 13 25 D Aspartic acid
0 7 7 C Cysteine
35 18 30 E Glutamic acid

6BII_1|Chains A, B|Glyoxylate reductase|Pyrococcus yayanosii (strain CH1 / JCM 16557) (529709)
>7XPZ_1|Chain A|Reduced folate transporter|Homo sapiens (9606)
>6OOA_1|Chain A|Cytochrome P450 3A4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6BII , Knot 141 333 0.82 38 179 314
MKPKVLITRAIPENGIELLREHFEVEVWEHEHEIPREVLLEKVKDVDALVTMLSEKIDREVFDAAPRLRIVANYAVGYDNIDIEEATKRGIYVTNTPDVLTDATADLAWALLLAAARHVVKGDKFVRSGEWKRRGIAWHPKMFLGYDVYGKTIGIVGFGRIGQAIAKRAKGFGMRILYTARSRKPEAEKELGAEFKPLEELLRESDFVVLAVPLTKETYHMINEERLRLMKPTAVLVNVARGKVVDTKALIRALKEGWIAAAGLDVFEEEPYYDEELFALDNVVLTPHIGSATFGAREGMAELVAKNLIAFKNGEVPPTLVNREVLKVRRPGF
7XPZ , Knot 214 544 0.82 40 240 494
MGSMVPSSPAVEKQVPVEPGPDPELRSWRHLVCYLCFYGFMAQIRPGESFITPYLLGPDKNFTREQVTNEITPVLSYSYLAVLVPVFLLTDYLRYTPVLLLQGLSFVSVWLLLLLGHSVAHMQLMELFYSVTMAARIAYSSYIFSLVRPARYQRVAGYSRAAVLLGVFTSSVLGQLLVTVGRVSFSTLNYISLAFLTFSVVLALFLKRPKRSLFFNRDDRGRCETSASELERMNPGPGGKLGHALRVACGDSVLARMLRELGDSLRRPQLRLWSLWWVFNSAGYYLVVYYVHILWNEVDPTTNSARVYNGAADAASTLLGAITSFAAGFVKIRWARWSKLLIAGVTATQAGLVFLLAHTRHPSSIWLCYAAFVLFRGSYQFLVPIATFQIASSLSKELCALVFGVNTFFATIVKTIITFIVSDVRGLGLPVRKQFQLYSVYFLILSIIYFLGAMLDGLRHCQRGHHPRQPPAQGLRSAAEEKAAQALSVQDKGLGGLQPAQSPPLSPEDKLGSENLYLEVLFQGPFQGGSGGSGHHHHHHHHHH
6OOA , Knot 203 487 0.86 40 248 467
MAYLYGTHSHGLFKKLGIPGPTPLPFLGNILSYHKGFCMFDMECHKKYGKVWGFYDGQQPVLAITDPDMIKTVLVKECYSVFTNRRPFGPVGFMKSAISIAEDEEWKRLRSLLSPTFTSGKLKEMVPIIAQYGDVLVRNLRREAETGKPVTLKDVFGAYSMDVITSTSFGVNIDSLNNPQDPFVENTKKLLRFDFLDPFFLSITVFPFLIPILEVLNICVFPREVTNFLRKSVKRMKESRLEDTQKHRVDFLQLMIDSQNSKETESHKALSDLELVAQSIIFIFAGYETTSSVLSFIMYELATHPDVQQKLQEEIDAVLPNKAPPTYDTVLQMEYLDMVVNETLRLFPIAMRLERVCKKDVEINGMFIPKGVVVMIPSYALHRDPKYWTEPEKFLPERFSKKNKDNIDPYIYTPFGSGPRNCIGMRFALMNMKLALIRVLQNFSFKPCKETQIPLKLSLGGLLQPEKPVVLKVESRDGTVSGAHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6BII_1)}(2) \setminus P_{f(7XPZ_1)}(2)|=49\), \(|P_{f(7XPZ_1)}(2) \setminus P_{f(6BII_1)}(2)|=110\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011100111001101100010101100000110011100100101110110001000110111010111001110001010010001101000101100101011111111110011010011001010001111010111100101001111111011011100101111011001000010100011101011001100001111111100000011000010110101111011010110001110110011111111011000100000111100111010110101110011101110011110010111011000110100111
Pair \(Z_2\) Length of longest common subsequence
6BII_1,7XPZ_1 159 4
6BII_1,6OOA_1 165 4
7XPZ_1,6OOA_1 154 5

Newick tree

 
[
	6BII_1:82.30,
	[
		7XPZ_1:77,6OOA_1:77
	]:5.30
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{877 }{\log_{20} 877}-\frac{333}{\log_{20}333})=146.\)
Status Protein1 Protein2 d d1/2
Query variables 6BII_1 7XPZ_1 182 147
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]