CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7AFN_1 8XSC_1 5JTK_1 Letter Amino acid
0 7 0 N Asparagine
352 2 0 C Cysteine
0 4 2 Q Glutamine
0 7 1 H Histidine
0 12 4 K Lycine
0 5 1 M Methionine
389 12 8 A Alanine
0 5 10 R Arginine
0 8 1 F Phenylalanine
0 6 4 P Proline
0 5 4 T Threonine
0 5 8 D Aspartic acid
0 11 6 E Glutamic acid
0 0 0 W Tryptophan
0 6 6 Y Tyrosine
0 17 16 L Leucine
0 14 6 S Serine
0 11 7 V Valine
486 7 5 G Glycine
0 13 5 I Isoleucine

7AFN_1|Chain A[auth 1]|16SrRNA (head domain of the 30S ribosome)|Escherichia coli (562)
>8XSC_1|Chains A, B|Phosphopantetheine adenylyltransferase|Helicobacter pylori 26695 (85962)
>5JTK_1|Chain A|Uncharacterized protein|Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) (208964)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7AFN , Knot 283 1541 0.44 8 16 64
AAAUUGAAGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGAAGCUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCGGGCCUCUUGCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGGCUCACCUAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGCGGGGAGGAAGGGAGUAAAGUUAAUACCUUUGCUCAUUGACGUUACCCGCAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCACGCAGGCGGUUUGUUAAGUCAGAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUCUGAUACUGGCAAGCUUGAGUCUCGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGGAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACGAAGACUGACGCUCAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCGACUUGGAGGUUGUGCCCUUGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUCGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGUCUUGACAUCCACGGAAGUUUUCAGAGAUGAGAAUGUGCCUUCGGGAACCGUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUUGUUGCCAGCGGUCCGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUACAAUGGCGCAUACAAAGAGAAGCGACCUCGCGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUUCGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAACCGUAGGGGAACCUGCGGUUGGAUCACCUCCUUA
8XSC , Knot 77 157 0.82 38 123 154
MQKIGIYPGTFDPVTNGHIDIIHRSSELFEKLIVAVAHSSAKNPMFSLDERLKMIQLATKSFKNVECVAFEGLLANLAKEYHCKVLVRGLRVVSDFEYELQMGYANKSLNHELETLYFMPTLQNAFISSSIVRSIIAHKGDASHLVPKEIYPLISKA
5JTK , Knot 49 94 0.79 34 73 88
MTPIEYIDRALALVVDRLARYPGYEVLLSAEKQLQYIRSVLLDRSLDRSALHRLTLGSIAVKEFDETDPELSRALKDAYYVGIRTGRGLKVDLP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7AFN_1)}(2) \setminus P_{f(8XSC_1)}(2)|=16\), \(|P_{f(8XSC_1)}(2) \setminus P_{f(7AFN_1)}(2)|=123\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100111111000110010110001110011101001101101110001101010101110011101101101111111110001000000010011011101101110111011101101000111111001000110111111111011001001111101101100110100101011010010111100111111111110000011100000010010011101010001110111100110011011101111011011000100011101101100000110011000111111101100110010100111100111101011000111000001011111101101101111110100101011011101011100011010110010100101010101111111100000111001011110100000110111111111111110111100110100000100010011010010001011111111010011001100001010011011001011011010111111010111010011001111001001110101111010101011101100010011100111010111100000111000110001111100101000110100110111000111000010111111111011110000111010110110111101010111110001111111010011011011111011000000111011111001101000111010111110101111110111011110011101000011011000101001011101101001100011111001010000011110101100000111100110101001110011001000111111010110010111100111100011101110011011111000101011101101111010101100011000110101101011111100001000110000110100010111110000011111011111010100000111110010111101110100101011001001001100010100101111010011100111000010110111010110000010000001001001101100011001111100011111111001001101101110011111111101111101101001110010010110000010110011110010101010100101101101010101111111110110000101111101110111000010111101010010110001110011110001011000110000101111001111001001101100101110011110100101101110101000001100001010101001000100101001011111011100101111111101110110001100000111111101000100100001011000101100111101111001011011110110010111111100010110011100100000001
Pair \(Z_2\) Length of longest common subsequence
7AFN_1,8XSC_1 139 1
7AFN_1,5JTK_1 89 1
8XSC_1,5JTK_1 124 3

Newick tree

 
[
	8XSC_1:71.57,
	[
		7AFN_1:44.5,5JTK_1:44.5
	]:27.07
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1698 }{\log_{20} 1698}-\frac{157}{\log_{20}157})=401.\)
Status Protein1 Protein2 d d1/2
Query variables 7AFN_1 8XSC_1 282 179
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: