CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5HJE_1 7RFX_1 8DXE_1 Letter Amino acid
48 36 40 V Valine
36 25 38 P Proline
8 8 7 M Methionine
9 7 19 W Tryptophan
38 33 47 E Glutamic acid
21 13 15 R Arginine
62 35 34 G Glycine
58 55 48 L Leucine
28 15 13 F Phenylalanine
79 49 28 A Alanine
36 28 24 D Aspartic acid
3 6 1 C Cysteine
25 25 36 Q Glutamine
23 15 8 H Histidine
38 38 40 I Isoleucine
42 24 58 K Lycine
53 33 20 S Serine
24 27 20 N Asparagine
23 6 22 Y Tyrosine
43 32 39 T Threonine

5HJE_1|Chain A|Transketolase|Scheffersomyces stipitis CBS 6054 (322104)
>7RFX_1|Chain A|Importin subunit alpha-1|Mus musculus (10090)
>8DXE_1|Chain A|Reverse transcriptase/ribonuclease H|Human immunodeficiency virus type 1 group M subtype B (isolate BH10) (11678)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5HJE , Knot 268 697 0.84 40 273 633
MGSSHHHHHHSSGLVPRGSHMSSVDQKAISTIRLLAVDAVAAANSGHPGAPLGLAPAAHAVFKKMRFNPKDTKWINRDRFVLSNGHACALLYSMLVLYGYDLTVEDLKKFRQLGSKTPGHPENTDVPGAEVTTGPLGQGICNGVGIALAQAQFAATYNKPDFPISDSYTYVFLGDGCLMEGVSSEASSLAGHLQLGNLIAFWDDNKISIDGSTEVAFTEDVIARYKSYGWHIVEVSDADTDITAIAAAIDEAKKVTNKPTLVRLTTTIGFGSLAQGTHGVHGAPLKADDIKQLKTKWGFNPEESFAVPAEVTASYNEHVAENQKIQQQWNELFAAYKQKYPELGAELQRRLDGKLPENWDKALPVYTPADAAVATRKLSEIVLSKIIPEVPEIIGGSADLTPSNLTKAKGTVDFQPAATGLGDYSGRYIRYGVREHAMGAIMNGIAAFGANYKNYGGTFLNFVSYAAGAVRLSALSEFPITWVATHDSIGLGEDGPTHQPIETLAHFRATPNISVWRPADGNETSAAYKSAIESTHTPHILALTRQNLPQLEGSSIEKASKGGYTLVQQDKADIIIVATGSEVSLAVDALKVLEGQGIKAGVVSLPDQLTFDKQSEEYKLSVLPDGVPILSVEVMSTFGWSKYSHQQFGLNRFGASGKAPEIFKLFEFTPEGVAERAAKTVAFYKGKDVVSPLRSAF
7RFX , Knot 206 510 0.84 40 248 467
MHHHHHHSSGLVPRGSGMLETAAALFERNHMDSPDLGTDDDDLAMADIGSNQGTVNWSVEDIVKGINSNNLESQLQATQAARKLLSREKQPPIDNIIRAGLIPKFVSFLGKTDCSPIQFESAWALTNIASGTSEQTKAVVDGGAIPAFISLLASPHAHISEQAVWALGNIAGDGSAFRDLVIKHGAIDPLLALLAVPDLSTLACGYLRNLTWTLSNLCRNKNPAPPLDAVEQILPTLVRLLHHNDPEVLADSCWAISYLTDGPNERIEMVVKKGVVPQLVKLLGATELPIVTPALRAIGNIVTGTDEQTQKVIDAGALAVFPSLLTNPKTNIQKEATWTMSNITAGRQDQIQQVVNHGLVPFLVGVLSKADFKTQKEAAWAITNYTSGGTVEQIVYLVHCGIIEPLMNLLSAKDTKIIQVILDAISNIFQAAEKLGETEKLSIMIEECGGLDKIEALQRHENESVYKASLNLIEKYFSVEEEEDQNVVPETTSEGFAFQVQDGAPGTFNF
8DXE , Knot 221 557 0.83 40 252 513
MVPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFAAQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLSKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNKGRQKVVPLTNTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5HJE_1)}(2) \setminus P_{f(7RFX_1)}(2)|=74\), \(|P_{f(7RFX_1)}(2) \setminus P_{f(5HJE_1)}(2)|=49\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000001111010010010001100101111011111001011111111111011100101010000110000111001010111001111010010100100100110001101000011110100111101100111111101011100001011100000011110101101100010011101011011111000010101000111000111000001101101001000101111110010010001011010001111011010011011110100100100011101000111110101000001100001000100111100000101110100010101100100111100110111100010011100111011011110101010010010101010111011100010010011000111111011111110000011011011001111101011001110111000011110011000110011010101010110110100001100011000001011110000110101001001001100110000101111101001011101101101011011110110010100000000101110111110101100111000000011100111010110110110101011100110011100100110110011
Pair \(Z_2\) Length of longest common subsequence
5HJE_1,7RFX_1 123 15
5HJE_1,8DXE_1 133 4
7RFX_1,8DXE_1 142 4

Newick tree

 
[
	8DXE_1:71.04,
	[
		5HJE_1:61.5,7RFX_1:61.5
	]:9.54
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1207 }{\log_{20} 1207}-\frac{510}{\log_{20}510})=179.\)
Status Protein1 Protein2 d d1/2
Query variables 5HJE_1 7RFX_1 223 191.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]