CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7UDE_1 7QIJ_1 2LWE_1 Letter Amino acid
28 16 3 A Alanine
8 8 3 N Asparagine
38 22 3 G Glycine
7 5 1 H Histidine
4 18 1 Y Tyrosine
12 25 5 R Arginine
2 3 2 W Tryptophan
26 23 6 S Serine
24 10 3 T Threonine
17 12 3 D Aspartic acid
8 24 3 Q Glutamine
24 24 9 I Isoleucine
25 48 15 L Leucine
9 8 3 M Methionine
20 12 3 P Proline
14 2 4 C Cysteine
21 41 16 E Glutamic acid
30 13 13 K Lycine
18 8 2 F Phenylalanine
39 28 2 V Valine

7UDE_1|Chains A, B|Alcohol dehydrogenase E chain|Equus caballus (9796)
>7QIJ_1|Chains A[auth AA], B[auth BA], C[auth CA], D[auth DA], E[auth EA], F[auth FA], G[auth GA], H[auth HA], I[auth IA], J[auth JA], K[auth KA], L[auth LA], M[auth MA], N[auth NA], O[auth OA], P[auth PA], Q[auth QA], R[auth RA]|Low calcium response locus protein D|Yersinia enterocolitica (630)
>2LWE_1|Chain A|Probable ATP-dependent RNA helicase DDX58|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7UDE , Knot 158 374 0.83 40 210 358
STAGKVIKCKAAVLWEEKKPFSIEEVEVAPPKAHEVRIKMVATGICRSDDHVVSGTLVTPLPVIAGHEAAGIVESIGEGVTTVRPGDKVIPLFTPQCGKCRVCKHPEGNFCLKNDLSMPRGTMQDGTSRFTCRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLIGCGFSTGYGSAVKVAKVTQGSTCAVFGLGGVGLSVIMGCKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYKKPIQEVLTEMSNGGVDFSFEVIGRLDTMVTALSCCQEAYGVSVIVGVPPDSQNLSMNPMLLLSGRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGESIRTILTF
7QIJ , Knot 146 350 0.81 40 190 330
MANKGRLGEQEAFAMTVPLLIDVDSSQQEALEAIALNDELVRVRRALYLDLGVPFPGIHLRFNEGMGEGEYLISLQEVPVARGELKAGYLLVRESVSQLELLGIPYEKGEHLLPDQETFWVSVEYEERLEKSQLEFFSHSQVLTWHLSHVLREYAEDFIGIQETRYLLEQMEGGYGELIKEVQRIVPLQRMTEILQRLVGEDISIRNMRSILEAMVEWGQKEKDVVQLTEYIRSSLKRYICYKYANGNNILPAYLFDQEVEEKIRSRVRQTSAGSYLALDPAVTESLLEQVRKTIGDLSQIQSKPVLIVSMDIRRYVRKLIESEYYGLPVLSYQELTQQINIQPLGRVCL
2LWE , Knot 51 100 0.78 40 77 96
GSHMKKIEKLEEYRLLLKRLQPEFKTRIIPTDIISDLSECLINQECEEILQICSTKGMMAGAEKLVECLLRSDKENWPKELKLALEKERNKFSELWIVEK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7UDE_1)}(2) \setminus P_{f(7QIJ_1)}(2)|=96\), \(|P_{f(7QIJ_1)}(2) \setminus P_{f(7UDE_1)}(2)|=76\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00110110001111100001101001011110100101011101100000011010110111111100111110011011001011001111101001000100010101010001011010100100010001011001110001000011001011010110110010111011001010110110100100011111111110111100111110111101000011010011100010100000110011001001110101011101001101100000101101111111000010101111101001011111110000011011101110011101110011110010011011001001001101
Pair \(Z_2\) Length of longest common subsequence
7UDE_1,7QIJ_1 172 4
7UDE_1,2LWE_1 193 4
7QIJ_1,2LWE_1 165 4

Newick tree

 
[
	7UDE_1:94.18,
	[
		7QIJ_1:82.5,2LWE_1:82.5
	]:11.68
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{724 }{\log_{20} 724}-\frac{350}{\log_{20}350})=102.\)
Status Protein1 Protein2 d d1/2
Query variables 7UDE_1 7QIJ_1 131 125.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]