CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7DCR_1 7UAR_1 3WOB_1 Letter Amino acid
68 48 6 E Glutamic acid
45 84 26 G Glycine
63 72 13 I Isoleucine
87 106 9 L Leucine
24 11 1 M Methionine
45 95 7 V Valine
42 76 4 A Alanine
43 88 4 N Asparagine
14 22 4 H Histidine
70 57 6 K Lycine
66 95 10 T Threonine
27 54 7 Y Tyrosine
50 43 5 R Arginine
44 61 1 D Aspartic acid
38 67 4 P Proline
56 98 10 S Serine
5 11 3 W Tryptophan
13 30 0 C Cysteine
47 63 6 Q Glutamine
29 75 10 F Phenylalanine

7DCR_1|Chain A[auth x]|PRP2 isoform 1|Saccharomyces cerevisiae (4932)
>7UAR_1|Chains A, B, C|Spike glycoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
>3WOB_1|Chain A|hypothetical protein|Sus scrofa (9823)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7DCR , Knot 328 876 0.84 40 297 798
MSSITSETGKRRVKRTYEVTRQNDNAVRIEPSSLGEEEDKEAKDKNSALQLKRSRYDPNKVFSNTNQGPEKNNLKGEQLGSQKKSSKYDEKITSNNELTTKKGLLGDSENETKYASSNSKFNVEVTHKIKNAKEIDKINRQRMWEEQQLRNAMAGQSDHPDDITLEGSDKYDYVFDTDAMIDYTNEEDDLLPEEKLQYEARLAQALETEEKRILTIQEARKLLPVHQYKDELLQEIKKNQVLIIMGETGSGKTTQLPQYLVEDGFTDQGKLQIAITQPRRVAATSVAARVADEMNVVLGKEVGYQIRFEDKTTPNKTVLKYMTDGMLLREFLTDSKLSKYSCIMIDEAHERTLATDILIGLLKDILPQRPTLKLLISSATMNAKKFSEFFDNCPIFNVPGRRYPVDIHYTLQPEANYIHAAITTIFQIHTTQSLPGDILVFLTGQEEIERTKTKLEEIMSKLGSRTKQMIITPIYANLPQEQQLKIFQPTPENCRKVVLATNIAETSLTIDGIRYVIDPGFVKENSYVPSTGMTQLLTVPCSRASVDQRAGRAGRVGPGKCFRIFTKWSYLHELELMPKPEITRTNLSNTVLLLLSLGVTDLIKFPLMDKPSIPTLRKSLENLYILGALNSKGTITRLGKMMCEFPCEPEFAKVLYTAATHEQCQGVLEECLTIVSMLHETPSLFIGQKRDAAASVLSEVESDHILYLEIFNQWRNSKFSRSWCQDHKIQFKTMLRVRNIRNQLFRCSEKVGLVEKNDQARMKIGNIAGYINARITRCFISGFPMNIVQLGPTGYQTMGRSSGGLNVSVHPTSILFVNHKEKAQRPSKYVLYQQLMLTSKEFIRDCLVIPKEEWLIDMVPQIFKDLIDDKTNRGRR
7UAR , Knot 446 1256 0.84 40 328 1087
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPASVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPSGRLVPRGSPGSGYIPEAPRDGQAYVRKDGEWVLLSTFLGHHHHHH
3WOB , Knot 66 136 0.79 38 99 127
GSGQMFGNGKGSYFITSKDNETGITGIRVFVGPVGLIKSIQVRYGSSWSEKYGIPGGKAHELILHPGEHIISIYGRYRTFLQHVTLITNQGRSASFGLETGKGFFAAPNLTGQVLEGVYGQFWLYGITGIGFTWGF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7DCR_1)}(2) \setminus P_{f(7UAR_1)}(2)|=36\), \(|P_{f(7UAR_1)}(2) \setminus P_{f(7DCR_1)}(2)|=67\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100001000100000100000011010100110000001000001101000000100110000011000010100110000000000010000010000111100000000100000101010001001001001000011000010011110000100101010000001100011100000000111000100010110110000001101001001111000000110010000111111001010000110011001100010101110010011100111011001011110011001010000010001100100111100110000100000111001000011001111110011100101011100101010010011000111011100011010001010100101110011010000011101111101000100000010011001100000111011010110000101101010000011110011000101011001101111000001100110011011000101000110110111100101100100100101110101000010001111101110011011110010110100010010111110001010011011001100101101100110000001110001011011000101111000011101100100001101011001000010001000001010011010010001100000111100000101011011101010100011011110110111010001100011101010100111100000100100011000111000011000111100011101110110011000000100
Pair \(Z_2\) Length of longest common subsequence
7DCR_1,7UAR_1 103 4
7DCR_1,3WOB_1 230 4
7UAR_1,3WOB_1 247 4

Newick tree

 
[
	3WOB_1:13.53,
	[
		7DCR_1:51.5,7UAR_1:51.5
	]:83.03
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2132 }{\log_{20} 2132}-\frac{876}{\log_{20}876})=303.\)
Status Protein1 Protein2 d d1/2
Query variables 7DCR_1 7UAR_1 388 329.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]