CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8YSS_1 5TUQ_1 6KYY_1 Letter Amino acid
26 48 39 L Leucine
24 60 17 K Lycine
17 13 15 F Phenylalanine
21 20 22 S Serine
32 26 39 A Alanine
8 8 16 H Histidine
6 7 14 M Methionine
16 38 19 P Proline
22 15 24 R Arginine
24 34 36 G Glycine
19 47 23 E Glutamic acid
16 39 27 T Threonine
6 19 2 W Tryptophan
11 22 8 Y Tyrosine
33 24 28 D Aspartic acid
17 36 17 Q Glutamine
13 40 36 I Isoleucine
22 40 37 V Valine
11 20 26 N Asparagine
4 1 3 C Cysteine

8YSS_1|Chain A|Nucleotidyltransferase|Deinococcus wulumuqiensis (980427)
>5TUQ_1|Chain A|HIV-1 REVERSE TRANSCRIPTASE|Human immunodeficiency virus type 1 group M subtype B (isolate BH10) (11678)
>6KYY_1|Chains A, B, C, D|Pyridine nucleotide-disulphide oxidoreductase dimerisation region|Escherichia coli BL21(DE3) (469008)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8YSS , Knot 152 348 0.85 40 213 331
MAPVQKQFREFHDRIKLAQYDENQTLRDERDAVLTAVREGLKKVFADRGEAAPTFTPFNQGSYAMNTGVKPLEGGEYDIDVGIILNIAKDDHDPVEVKKWIRDALKDYGNGAEIRRSCVTVFKPGYHVDLAVYADPELSGGTLCIAKGKENSGDEHRLWQISDPQGFQDRIASKLSGDDAAQFRRCIRYLKRWRDFRFSSDGNAAPLGIGLTAAAYWWFQVSKRTDPVSQNVTYDDRDALEQFVQTMLDNFHDTWDSKDQRSYPRLTVELPVQPYNDVFEKMTGMQMESFKSKLQALLNALKTAKSRLELHDACKALADHFGSEFPVPEKDKSAVHTAPAIVGSGSSG
5TUQ , Knot 220 557 0.83 40 252 512
MVPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFKKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLSKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNKGRQKVVPLTNTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAG
6KYY , Knot 186 448 0.84 40 240 428
GAMDPEFMNKYQAVIIGFGKAGKTLAVTLAKAGWRVALIEQSNAMYGGTCINIGCIPTKTLVHDAQQHTDFVRAIQRKNEVVNFLRNKNFHNLADMPNIDVIDGQAEFINNHSLRVHRPEGNLEIHGEKIFINTGAQTVVPPIPGITTTPGVYDSTGLLNLKELPGHLGILGGGYIGVEFASMFANFGSKVTILEAASLFLPREDRDIADNIATILRDQGVDIILNAHVERISHHENQVQVHSEHAQLAVDALLIASGRQPATASLHPENAGIAVNERGATVVDKRLHTTADNIWAMGDVTGGLQFTYISLDDYRIVRDELLGEGKRSTDDRKNVPYSVFMTPPLSRVGMTEEQARESGADIQVVTLPVAAIPRARVMNDTRGVLKAIVDNKTQRMLGASLLCVDSHEMINIVKMVMDAGLPYSILRDQIFTHPSMSESLNDLFSLVK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8YSS_1)}(2) \setminus P_{f(5TUQ_1)}(2)|=60\), \(|P_{f(5TUQ_1)}(2) \setminus P_{f(8YSS_1)}(2)|=99\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111100010010001011000000010000011101100110011100101110101100100110011011011000101111101100000110100110011000101101000010110110010111010101011010110100001000011010010110001100101001101000100100100101000101111111101110111010000011000100000011001100110010001000000001010101110100011001011010010001011101100100010100100111001100111100000110011111101001
Pair \(Z_2\) Length of longest common subsequence
8YSS_1,5TUQ_1 159 4
8YSS_1,6KYY_1 157 4
5TUQ_1,6KYY_1 156 4

Newick tree

 
[
	8YSS_1:79.33,
	[
		6KYY_1:78,5TUQ_1:78
	]:1.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{905 }{\log_{20} 905}-\frac{348}{\log_{20}348})=149.\)
Status Protein1 Protein2 d d1/2
Query variables 8YSS_1 5TUQ_1 187 151
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]