CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2OQV_1 3CTJ_1 1XPC_1 Letter Amino acid
39 14 8 N Asparagine
30 8 9 Q Glutamine
14 9 15 M Methionine
20 4 3 W Tryptophan
56 9 5 Y Tyrosine
44 30 13 V Valine
35 16 15 A Alanine
12 6 4 C Cysteine
40 15 15 E Glutamic acid
19 14 13 H Histidine
40 21 10 G Glycine
47 16 12 I Isoleucine
45 16 11 T Threonine
31 17 7 F Phenylalanine
26 15 9 P Proline
64 19 17 S Serine
30 13 11 R Arginine
43 17 13 D Aspartic acid
54 34 47 L Leucine
37 21 11 K Lycine

2OQV_1|Chains A, B|Dipeptidyl peptidase 4 (Dipeptidyl peptidase IV) (DPP IV)|Homo sapiens (9606)
>3CTJ_1|Chain A|Hepatocyte growth factor receptor|Homo sapiens (9606)
>1XPC_1|Chain A|Estrogen receptor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2OQV , Knot 279 726 0.84 40 306 684
SRKTYTLTDYLKNTYRLKLYSLRWISDHEYLYKQENNILVFNAEYGNSSVFLENSTFDEFGHSINDYSISPDGQFILLEYNYVKQWRHSYTASYDIYDLNKRQLITEERIPNNTQWVTWSPVGHKLAYVWNNDIYVKIEPNLPSYRITWTGKEDIIYNGITDWVYEEEVFSAYSALWWSPNGTFLAYAQFNDTEVPLIEYSFYSDESLQYPKTVRVPYPKAGAVNPTVKFFVVNTDSLSSVTNATSIQITAPASMLIGDHYLCDVTWATQERISLQWLRRIQNYSVMDICDYDESSGRWNCLVARQHIEMSTTGWVGRFRPSEPHFTLDGNSFYKIISNEEGYRHICYFQIDKKDCTFITKGTWEVIGIEALTSDYLYYISNEYKGMPGGRNLYKIQLSDYTKVTCLSCELNPERCQYYSVSFSKEAKYYQLRCSGPGLPLYTLHSSVNDKGLRVLEDNSALDKMLQNVQMPSKKLDFIILNETKFWYQMILPPHFDKSKKYPLLLDVYAGPCSQKADTVFRLNWATYLASTENIIVASFDGRGSGYQGDKIMHAINRRLGTFEVEDQIEAARQFSKMGFVDNKRIAIWGWSYGGYVTSMVLGSGSGVFKCGIAVAPVSRWEYYDSVYTERYMGLPTPEDNLDHYRNSTVMSRAENFKQVEYLLIHGTADDNVHFQQSAQISKALVDVGVDFQAMWYTDEDHGIASSTAHQHIYTHMSHFIKQCFS
3CTJ , Knot 141 314 0.86 40 204 305
GANTVHIDLSALNPELVQAVQHVVIGPSSLIVHFNEVIGRGHFGCVYHGTLLDNDGKKIHCAVKSLNRITDIGEVSQFLTEGIIMKDFSHPNVLSLLGICLRSEGSPLVVLPYMKHGDLRNFIRNETHNPTVKDLIGFGLQVAKGMKFLASKKFVHRDLAARNCMLDEKFTVKVADFGLARDMYDKEFDSVHNKTGAKLPVKWMALESLQTQKFTTKSDVWSFGVLLWELMTRGAPPYPDVNTFDITVYLLQGRRLLQPEYCPDPLYEVMLKCWHPKAEMRPSFSELVSRISAIFSTFIGEHYVHVNATYVNVK
1XPC , Knot 112 248 0.83 40 161 237
ALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHAPTS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2OQV_1)}(2) \setminus P_{f(3CTJ_1)}(2)|=138\), \(|P_{f(3CTJ_1)}(2) \setminus P_{f(2OQV_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000000100010000010100101100000100000011110100100011100001001100100001010101111000010010000010001001000011000011000011010111001101100010101010110001010100011001100110000110100111101010111010100001111000100000100100101101011110101011110000100100100101011101111000100101100001010110010000110100000001010011100010100011110101001010101001001100001000100101000000110010101111011000010010000011111001001010000010010001010000000101000100001000111111001000100011011000011001100101100010111100001100111110100000011110101110000100110101100110000111101010101001001101100011010100010110010011110000111111001101001111010111001111111001000001000001111010001000000011001001001001110101000101000101001110111010111000000111000100010001001100010
Pair \(Z_2\) Length of longest common subsequence
2OQV_1,3CTJ_1 174 5
2OQV_1,1XPC_1 217 4
3CTJ_1,1XPC_1 177 3

Newick tree

 
[
	1XPC_1:10.69,
	[
		2OQV_1:87,3CTJ_1:87
	]:15.69
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1040 }{\log_{20} 1040}-\frac{314}{\log_{20}314})=193.\)
Status Protein1 Protein2 d d1/2
Query variables 2OQV_1 3CTJ_1 248 175.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]