CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6PQO_1 2LGX_1 2YSK_1 Letter Amino acid
33 2 3 Y Tyrosine
41 5 10 R Arginine
71 5 16 E Glutamic acid
53 8 8 T Threonine
11 6 3 W Tryptophan
141 15 23 L Leucine
56 2 3 F Phenylalanine
67 2 4 S Serine
59 9 10 V Valine
66 2 0 N Asparagine
59 11 6 D Aspartic acid
61 8 9 G Glycine
78 3 2 I Isoleucine
39 4 8 P Proline
68 6 18 A Alanine
53 5 6 H Histidine
78 10 9 K Lycine
42 4 3 M Methionine
28 1 0 C Cysteine
48 4 4 Q Glutamine

6PQO_1|Chains A, B, C, D|Transient receptor potential cation channel subfamily A member 1|Homo sapiens (9606)
>2LGX_1|Chain A|Fermitin family homolog 2|Homo sapiens (9606)
>2YSK_1|Chain A|Hypothetical protein TTHA1432|Thermus thermophilus (300852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6PQO , Knot 413 1152 0.84 40 338 992
MAKRSLRKMWRPGEKKEPQGVVYEDVPDDTEDFKESLKVVFEGSAYGLQNFNKQKKLKRCDDMDTFFLHYAAAEGQIELMEKITRDSSLEVLHEMDDYGNTPLHCAVEKNQIESVKFLLSRGANPNLRNFNMMAPLHIAVQGMNNEVMKVLLEHRTIDVNLEGENGNTAVIIACTTNNSEALQILLKKGAKPCKSNKWGCFPIHQAAFSGSKECMEIILRFGEEHGYSRQLHINFMNNGKATPLHLAVQNGDLEMIKMCLDNGAQIDPVEKGRCTAIHFAATQGATEIVKLMISSYSGSVDIVNTTDGCHETMLHRASLFDHHELADYLISVGADINKIDSEGRSPLILATASASWNIVNLLLSKGAQVDIKDNFGRNFLHLTVQQPYGLKNLRPEFMQMQQIKELVMDEDNDGCTPLHYACRQGGPGSVNNLLGFNVSIHSKSKDKKSPLHFAASYGRINTCQRLLQDISDTRLLNEGDLHGMTPLHLAAKNGHDKVVQLLLKKGALFLSDHNGWTALHHASMGGYTQTMKVILDTNLKCTDRLDEDGNTALHFAAREGHAKAVALLLSHNADIVLNKQQASFLHLALHNKRKEVVLTIIRSKRWDECLKIFSHNSPGNKCPITEMIEYLPECMKVLLDFCMLHSTEDKSCRDYYIEYNFKYLQCPLEFTKKTPTQDVIYEPLTALNAMVQNNRIELLNHPVCKEYLLMKWLAYGFRAHMMNLGSYCLGLIPMTILVVNIKPGMAFNSTGIINETSDHSEILDTTNSYLIKTCMILVFLSSIFGYCKEAGQIFQQKRNYFMDISNVLEWIIYTTGIIFVLPLFVEIPAHLQWQCGAIAVYFYWMNFLLYLQRFENCGIFIVMLEVILKTLLRSTVVFIFLLLAFGLSFYILLNLQDPFSSPLLSIIQTFSMMLGDINYRESFLEPYLRNELAHPVLSFAQLVSFTIFVPIVLMNLLIGLAVGDIAEVQKHASLKRIAMQVELHTSLEKKLPLWFLRKVDQKSTIVYPNKPRSGGMLFHIFCFLFCTGEIRQEIPNADKSLEMEILKQKYRLKDLTFLLEKQHELIKLIIQKMEIISETEDDDSHCSFQDRFKKEQMEQRNSRWNTVLRAVKAKTHHLEPSNSLEVLFQGPAADYKDDDDKAHHHHHHHHHH
2LGX , Knot 58 112 0.81 40 85 109
GAMPDEFMALDGIRMPDGCYADGTWELSVHVTDVNRDVTLRVTGEVHIGGVMLKLVEKLDVKKDWSDHALWWEKKRTWLLKTHWTLDKYGIQADAKLQFTPQHKLLRLQLPN
2YSK , Knot 69 145 0.79 36 97 137
MSATGLEVFDRTLHKTHAWLKAIMEELGTEDRHKAYLALRAVLHALRDRLTVEEVAQLAAQLPMLVRGLYYEGWDPTGKPLKERHKEAFLAHVAEELKTPSGPAVDPEAATRAVFKVLSREISQGELEDVLGLLPKELRALWPQG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6PQO_1)}(2) \setminus P_{f(2LGX_1)}(2)|=270\), \(|P_{f(2LGX_1)}(2) \setminus P_{f(6PQO_1)}(2)|=17\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001001101100001011100011000001000101110101011001000001000001001110011101010110010000010110010001001100110000100101110011010100101111101110110001101110000101010100100111110000000110111001101000001101110011101000010111011000100001010110010101101110010101101010011010110010001101110011001101110000101011000010000110010110000110011011101001000100111110101010110111001101010001100110101001011001010110100100111000001001100100011110100111101010000000001101110010100000110010000110010101101101110010001101110011111000011011001011100001011100010000010001001101110010101111110001011100001011011100000011101100001000101100001100011001100110010111010110000000000001000100100110100001000110011011011100001011001100001110111011010110110001111110111101011111000111000000001100000011000111111001110000110110000001101001101110001111111111011101010011111010110111010010001111111011100110001111111111110101110100110011101100101111010000011010100011011101101101011111111011111111011010001010011101010001000111111001000001101001001111101101110010100011010001010110000010010111000001101110010110000000000010001000010000001001101101000010100010111011110000000010000000000
Pair \(Z_2\) Length of longest common subsequence
6PQO_1,2LGX_1 287 3
6PQO_1,2YSK_1 265 4
2LGX_1,2YSK_1 124 3

Newick tree

 
[
	6PQO_1:15.40,
	[
		2YSK_1:62,2LGX_1:62
	]:93.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1264 }{\log_{20} 1264}-\frac{112}{\log_{20}112})=312.\)
Status Protein1 Protein2 d d1/2
Query variables 6PQO_1 2LGX_1 392 214.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]