CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5IWT_1 2BLI_1 1CYE_1 Letter Amino acid
34 5 2 Q Glutamine
41 11 10 G Glycine
47 17 15 A Alanine
32 6 8 D Aspartic acid
39 14 11 E Glutamic acid
42 9 6 I Isoleucine
33 6 6 F Phenylalanine
27 4 3 P Proline
33 5 5 T Threonine
13 3 1 W Tryptophan
41 4 5 R Arginine
24 2 8 N Asparagine
46 8 10 V Valine
25 19 11 K Lycine
13 0 0 C Cysteine
16 12 0 H Histidine
27 6 5 S Serine
27 3 2 Y Tyrosine
88 17 15 L Leucine
24 2 6 M Methionine

5IWT_1|Chain A|Transient receptor potential cation channel subfamily V member 6|Rattus norvegicus (10116)
>2BLI_1|Chain A|MYOGLOBIN|PHYSETER CATODON (9755)
>1CYE_1|Chain A|CHEY|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5IWT , Knot 261 672 0.84 40 292 624
MGWSLPKEKGLILCLWNKFCRWFHRRESWAQSRDEQNLLQQKRIWESPLLLAAKENNVQALYKLLKFEGCEVHQKGAMGETALHIAALYDNNEAAQVLMEAAPELVFEPMTSELYEGQTALHIAVINQNVNLVRALLARGASVSARATGSVFHYRPHNLIYYGEHPLSFAACVGSEEIVRLLIEHGADIRAQDSLGNTVLHILILQPNKTFACQMYNLLLSYDGGDHLKSLELVPNNQGLTPFKLAGVEGNIVMFQHLMQKRKHIQWTYGPLTSTLYDLTEIDSSGDDQSLLELIVTTKKREARQILDQTPVKELVSLKWKRYGRPYFCVLGAIYVLYIICFTMCCVYRPLKPRITNRTNPRDNTLLQQKLLQEAYVTPKDDLRLVGELVSIVGAVIILLVEIPDIFRLGVTRFFGQTILGGPFHVIIVTYAFMVLVTMVMRLTNSDGEVVPMSFALVLGWCNVMYFARGFQMLGPFTIMIQKMIFGDLMRFCWQMAVVILGFASAFYIIFQTEDPDELGHFYDYPMALFSTFELFLTIIDGPANYDVDLPFMYSITYAAFAIIATLLMLNLLIAMMGDTHWRVAHERDELWRAQVVATTVMLERKLPRCLWPRSGICGREYGLGDRWFLRVEDRQDLNRQRIRRYAQAFQQQDDLYSEDLEKDSGEKLVPR
2BLI , Knot 74 153 0.81 38 112 148
VLSEGEWQLVLHVWAKVEADVAGHGQDIWIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG
1CYE , Knot 64 129 0.80 36 96 122
RSDKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMPNMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5IWT_1)}(2) \setminus P_{f(2BLI_1)}(2)|=211\), \(|P_{f(2BLI_1)}(2) \setminus P_{f(5IWT_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111011000111101100100110000011000000011000011001111110000101100110101001000111100110111100000110111011101110110001001001101111000101101111011010101010110001001100100110111011000110111001101010001100110111101000110010011100011001001011100011011011110101111001100000101001110001001001000100001101110000001001100011001101010001010101111101101101010010011010100000100001100011001010100010111011011111111110110110111001110011111101111001111110111010000101111011111110011011011011111011100111101101010111111111011011100001001101000111110010111011011100010111100100111111101111011111110001011000001101011100111000110011100110100011100111010000010000100010110000010000100001001110
Pair \(Z_2\) Length of longest common subsequence
5IWT_1,2BLI_1 242 4
5IWT_1,1CYE_1 226 3
2BLI_1,1CYE_1 132 4

Newick tree

 
[
	5IWT_1:12.69,
	[
		1CYE_1:66,2BLI_1:66
	]:63.69
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{825 }{\log_{20} 825}-\frac{153}{\log_{20}153})=188.\)
Status Protein1 Protein2 d d1/2
Query variables 5IWT_1 2BLI_1 240 146
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]