CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9AYJ_1 1DQY_1 3CQU_1 Letter Amino acid
161 10 19 R Arginine
227 22 33 L Leucine
65 6 26 K Lycine
48 16 15 Y Tyrosine
39 3 10 H Histidine
87 9 13 I Isoleucine
33 11 3 W Tryptophan
153 14 20 V Valine
153 31 23 G Glycine
106 11 21 F Phenylalanine
152 22 15 P Proline
170 22 19 A Alanine
64 21 8 N Asparagine
98 14 23 D Aspartic acid
50 1 5 C Cysteine
135 8 31 E Glutamic acid
58 16 10 Q Glutamine
56 9 13 M Methionine
174 23 16 S Serine
87 14 19 T Threonine

9AYJ_1|Chain A|Voltage-dependent T-type calcium channel subunit alpha-1H|Homo sapiens (9606)
>1DQY_1|Chain A|PROTEIN (ANTIGEN 85-C)|Mycobacterium tuberculosis (1773)
>3CQU_1|Chain A|RAC-alpha serine/threonine-protein kinase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9AYJ , Knot 697 2116 0.84 40 367 1576
MASWSHPQFEKGGGARGGSGGGSWSHPQFEKGFDYKDDDDKGTMTEGARAADEVRVPLGAPPPGPAALVGASPESPGAPGREAERGSELGVSPSESPAAERGAELGADEEQRVPYPALAATVFFCLGQTTRPRSWCLRLVCNPWFEHVSMLVIMLNCVTLGMFRPCEDVECGSERCNILEAFDAFIFAFFAVEMVIKMVALGLFGQKCYLGDTWNRLDFFIVVAGMMEYSLDGHNVSLSAIRTVRVLRPLRAINRVPSMRILVTLLLDTLPMLGNVLLLCFFVFFIFGIVGVQLWAGLLRNRCFLDSAFVRNNNLTFLRPYYQTEEGEENPFICSSRRDNGMQKCSHIPGRRELRMPCTLGWEAYTQPQAEGVGAARNACINWNQYYNVCRSGDSNPHNGAINFDNIGYAWIAIFQVITLEGWVDIMYYVMDAHSFYNFIYFILLIIVGSFFMINLCLVVIATQFSETKQRESQLMREQRARHLSNDSTLASFSEPGSCYEELLKYVGHIFRKVKRRSLRLYARWQSRWRKKVDPGWMGRLWVTFSGKLRRIVDSKYFSRGIMMAILVNTLSMGVEYHEQPEELTNALEISNIVFTSMFALEMLLKLLACGPLGYIRNPYNIFDGIIVVISVWEIVGQADGGLSVLRTFRLLRVLKLVRFLPALRRQLVVLVKTMDNVATFCTLLMLFIFIFSILGMHLFGCKFSLKTDTGDTVPDRKNFDSLLWAIVTVFQILTQEDWNVVLYNGMASTSSWAALYFVALMTFGNYVLFNLLVAILVEGFQAEGDANRSDTDEDKTSVHFEEDFHKLRELQTTELKMCSLAVTPNGHLEGRGSLSPPLIMCTAATPMPTPKSSPFLDAAPSLPDSRRGSSSSGDPPLGDQKPPASLRSSPCAPWGPSGAWSSRRSSWSSLGRAPSLKRRGQCGERESLLSGEGKGSTDDEAEDGRAAPGPRATPLRRAESLDPRPLRPAALPPTKCRDRDGQVVALPSDFFLRIDSHREDAAELDDDSEDSCCLRLHKVLEPYKPQWCRSREAWALYLFSPQNRFRVSCQKVITHKMFDHVVLVFIFLNCVTIALERPDIDPGSTERVFLSVSNYIFTAIFVAEMMVKVVALGLLSGEHAYLQSSWNLLDGLLVLVSLVDIVVAMASAGGAKILGVLRVLRLLRTLRPLRVISRAPGLKLVVETLISSLRPIGNIVLICCAFFIIFGILGVQLFKGKFYYCEGPDTRNISTKAQCRAAHYRWVRRKYNFDNLGQALMSLFVLSSKDGWVNIMYDGLDAVGVDQQPVQNHNPWMLLYFISFLLIVSFFVLNMFVGVVVENFHKCRQHQEAEEARRREEKRLRRLERRRRSTFPSPEAQRRPYYADYSPTRRSIHSLCTSHYLDLFITFIICVNVITMSMEHYNQPKSLDEALKYCNYVFTIVFVFEAALKLVAFGFRRFFKDRWNQLDLAIVLLSLMGITLEEIEMSAALPINPTIIRIMRVLRIARVLKLLKMATGMRALLDTVVQALPQVGNLGLLFMLLFFIYAALGVELFGRLECSEDNPCEGLSRHATFSNFGMAFLTLFRVSTGDNWNGIMKDTLRECSREDKHCLSYLPALSPVYFVTFVLVAQFVLVNVVVAVLMKHLEESNKEAREDAELDAEIELEMAQGPGSARRVDADRPPLPQESPGARDAPNLVARKVSVSRMLSLPNDSYMFRPVVPASAPHPRPLQEVEMETYGAGTPLGSVASVHSPPAESCASLQIPLAVSSPARSGEPLHALSPRGTARSPSLSRLLCRQEAVHTDSLEGKIDSPRDTLDPAEPGEKTPVRPVTQGGSLQSPPRSPRPASVRTRKHTFGQRCVSSRPAAPGGEEAEASDPADEEVSHITSSACPWQPTAEPHGPEASPVAGGERDLRRLYSVDAQGFLDKPGRADEQWRPSAELGSGEPGEAKAWGPEAEPALGARRKKKMSPPCISVEPPAEDEGSARPSAAEGGSTTLRRRTPSCEATPHRDSLEPTEGSGAGGDPAAKGERWGQASCRAEHLTVPSFAFEPLDLGVPSGDPFLDGSHSVTPESRASSSGAIVPLEPPESEPPMPVGDPPEKRRGLYLTVPQCPLEKPGSPSATPAPGGGADDPV
1DQY , Knot 125 283 0.83 40 188 275
MFSRPGLPVEYLQVPSASMGRDIKVQFQGGGPHAVYLLDGLRAQDDYNGWDINTPAFEEYYQSGLSVIMPVGGQSSFYTDWYQPSQSNGQNYTYKWETFLTREMPAWLQANKGVSPTGNAAVGLSMSGGSALILAAYYPQQFPYAASLSGFLNPSESWWPTLIGLAMNDSGGYNANSMWGPSSDPAWKRNDPMVQIPRLVANNTRIWVYCGNGTPSDLGGDNIPAKFLEGLTLRTNQTFRDTYAADGGRNGVFNFPPNGTHSWPYWNEQLVAMKADIQHVLNG
3CQU , Knot 153 342 0.87 40 212 335
GAMDPRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKILKKEVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLSRERVFSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGIKDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHEKLFELILMEEIRFPRTLGPEAKSLLSGLLKKDPKQRLGGGSEDAKEIMQHRFFAGIVWQHVYEKKLSPPFKPQVTSETDTRYFDEEFTAQMITITPPDQDDSMECVDSERRPHFPQFDYSASSTA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9AYJ_1)}(2) \setminus P_{f(1DQY_1)}(2)|=194\), \(|P_{f(1DQY_1)}(2) \setminus P_{f(9AYJ_1)}(2)|=15\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101001010011110110111010010100110000000010100110110010111111111111111101001111100100100111010001110011011100000110111110111011000010010101100111001011111100101111010001001000001101101111111110111011111111000011001001011111111100010100101011001011011011001101011101110011111011110111111111111011111100001100111000010110100000010001110000000110000011100010110011101000101011111001010100000100010001001110100110111111011010111011001101001001101111111101111010111110010000000001100001001000001101001100000110011011001000010101010001000101111101110101010011000010011111111001011100000100100110100111001111011101110111101001001101111110110111010111011001011011011011111000111110010011010011111111101111011100101000010011000010011111101101100001011100111000011110111110110011101111111011010101000000000001010001001001000010100111010101010101011111001101110100011101110110000100001011110001110100010111110111000000100110110100010010000110101010000010010111110101100100101011011111100000001011111001110100000011010000000001010011010010100000111101101000101000011000110011111111001011100101011000011101000110111110111011111110100101000101101111110110111111011110111110110110010110110011110111001100101110111100111111111110110101000011000010001000110001100000100110111011110000111011001101111000110000111110110111110111101111111001000000001001000000010010000000110101000100100010000100100000101110111010110101000001001001100000110111110111011111100110001001011111101111010010101111101011011011011011011011011011100110111011011111111111011111011101000000100110001010011111101101001001011100010000000000100111101101101111101111011111110010000001000101010101011011101001010011110001110011011100101001101100001101111101101011001010001110111011010011100010101111100110010110110101010010100110000110000101010010001011011000110110011010011001011010000001100010001111110010100110001001000101101010101101011111000100100101011100110100010101011010110101111010111110000010110101011100010101011011000100001000101000010100101111011101001101000100101101110110111101011101000101000100011111101100011111101100001101011001100110101011111110011
Pair \(Z_2\) Length of longest common subsequence
9AYJ_1,1DQY_1 209 5
9AYJ_1,3CQU_1 173 5
1DQY_1,3CQU_1 190 3

Newick tree

 
[
	1DQY_1:10.93,
	[
		9AYJ_1:86.5,3CQU_1:86.5
	]:17.43
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2399 }{\log_{20} 2399}-\frac{283}{\log_{20}283})=525.\)
Status Protein1 Protein2 d d1/2
Query variables 9AYJ_1 1DQY_1 662 375.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]