CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7KWO_1 5PSI_1 7SWW_1 Letter Amino acid
73 4 22 N Asparagine
28 1 6 C Cysteine
140 7 17 G Glycine
146 18 28 L Leucine
87 6 23 F Phenylalanine
214 7 22 S Serine
102 11 13 A Alanine
76 14 11 R Arginine
93 13 14 D Aspartic acid
97 6 15 K Lycine
74 8 10 Q Glutamine
148 13 10 E Glutamic acid
56 10 6 H Histidine
79 5 13 I Isoleucine
32 0 4 W Tryptophan
77 5 15 Y Tyrosine
110 10 20 V Valine
46 7 2 M Methionine
141 6 15 P Proline
146 5 25 T Threonine

7KWO_1|Chain A|Coagulation factor FVIII-Fc-XTEN|Homo sapiens (9606)
>5PSI_1|Chains A, B|Bromodomain-containing protein 1|Homo sapiens (9606)
>7SWW_1|Chain A|Spike protein S1|Severe acute respiratory syndrome coronavirus 2 (2697049)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7KWO , Knot 603 1965 0.77 40 365 1455
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSVVYKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAVGVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSHVDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRDAASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNHRQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNEEAEDYDDDLTDSEMDVVRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLAPDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTLLIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDENRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQLSVCLHEVAYWYILSIGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRGMTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNGTSESATPESGPGSEPATSGSETPGTSESATPESGPGSEPATSGSETPGTSESATPESGPGTSTEPSEGSAPGSPAGSPTSTEEGTSESATPESGPGSEPATSGSETPGTSESATPESGPGSPAGSPTSTEEGSPAGSPTSTEEGTSTEPSEGSAPGTSESATPESGPGTSESATPESGPGTSESATPESGPGSEPATSGSETPGSEPATSGSETPGSPAGSPTSTEEGTSTEPSEGSAPGTSTEPSEGSAPGSEPATSGSETPGTSESATPESGPGTSTEPSEGSAPASSEITRTTLQSDQEEIDYDDTISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQMEDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRKKEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGMASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQGARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIARYIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSKARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCEAQDLYDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPG
5PSI , Knot 74 156 0.79 38 120 151
MHHHHHHSSGVDLGTENLYFQSMEQVAMELRLTELTRLLRSVLDQLQDKDPARIFAQPVSLKEVPDYLDHIKHPMDFATMRKRLEAQGYKNLHEFEEDFDLIIDNCMKYNARDTVFYRAAVRLRDQGGVVLRQARREVDSIGLEEASGMHLPERPA
7SWW , Knot 129 291 0.83 40 189 281
QCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7KWO_1)}(2) \setminus P_{f(5PSI_1)}(2)|=248\), \(|P_{f(5PSI_1)}(2) \setminus P_{f(7KWO_1)}(2)|=3\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101010001110110101010000011110101001000110111010111011001110001100001110100011011010111111111010101000111010011001101011110010100110000000000000001111100001101100011110011010000100101100100111111110001011000000100111111110010010000000110000110101110100101010001111110000010101111100101001110100111000001010101101101001110110111100100000011010101000100101010000010000001000010110100000101101001100010011001110000100111111100000000010011001100000101110000010000110000111111101011001111100010010010101100101100001101100100111111011000101010011000010010000001101000110111111110000010001001100000111101100000101000100111011110100101010011001010110010101010011010110111000110111010010001100001011110100111010011111110000010001101110100000001000000000101011000011010010001000010100111001100100011000010100111001100100011000010100111000010010111011101000001000010100111001100100011000010100111011101000001011101000001000010010111000010100111000010100111000010100111001100100011001100100011011101000001000010010111000010010111001100100011000010100111000010010111000100001000000100000101010000101000000001001000000011111001100110001011000100101101001110010010100110010100011111101010100011101000100100100011000000001101000110100000011010001110000100011101001010001001111111100000101101001010011111011000001010001000001100101001010000010110101100111111100001010110110000100101010110100000001110010111100101110011110100111001011100111100000001111101010010101010010111011010001010110000110110101111111011000110001001010011110010100100001000101111110100011000110111110010101000010001010111001000011111000110010101000100111010100101010100011010100100110101000101011000110011001010011100000100101110010101101000010111001011110001010100110011101011100100100000001101110111110111111010001110001010011101000010101010101101001000100000000001101101100011010000001000111111000100101010010100111000010000101001101101001110100010100000001111000101110001010000100101100011001100000000101011
Pair \(Z_2\) Length of longest common subsequence
7KWO_1,5PSI_1 251 4
7KWO_1,7SWW_1 198 4
5PSI_1,7SWW_1 169 3

Newick tree

 
[
	7KWO_1:12.05,
	[
		7SWW_1:84.5,5PSI_1:84.5
	]:36.55
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2121 }{\log_{20} 2121}-\frac{156}{\log_{20}156})=501.\)
Status Protein1 Protein2 d d1/2
Query variables 7KWO_1 5PSI_1 571 308.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]