CoV2D BrowserTM
CoV2D project home |
Random page
Browse covid-19 proteins
6M3M
6VXX
7BWJ
6X6P
6VYB
6WZO
7C01
6VW1
6WPT
6WPS
6LXT
6WZQ
6XDC
7BYR
6W41
6WJI
6WKP
6YOR
6M0J
6Z97
6XC2
6XC4
6XC3
6XKL
6LZG
6XC7
7BV2
6WTT
7C2L
6WS6
6YM0
6M71
7C22
6YUN
6Y2G
6Y2E
6WIQ
6WQD
6W37
6ZCZ
6ZDH
6ZER
6YLA
6Y2F
7BW4
6WTC
6WX4
6ZGG
6M17
6M2N
6M2Q
6XDG
6X2A
6X2C
6X2B
6X29
6WUU
6W4H
6W75
6VWW
6WKQ
6WQF
6YZ5
6ZGE
6ZGI
6Z4U
6ZCO
6YI3
7C8V
7C8W
7CAN
6Z43
6ZFO
6XCN
6XCM
6XE1
7BQ7
5RGG
5RGI
5RGH
5RG3
5RG2
5RG1
5RGS
5RGR
5RGK
5RGJ
5RGM
5RGL
5RGO
5RGN
5RGQ
5RGP
6XCA
6W01
7BV1
6YYT
6M0K
6WKS
6LZE
6W6Y
7BUY
5R8T
5REA
5REC
5REB
5REE
5RED
5REG
5REF
5RE9
5RE8
5RE5
5RE4
5RE7
5RE6
5RFB
5RFA
5RFD
5RFC
5RFF
5RFE
5RFH
5RFG
5REY
5REX
5RF9
5REZ
5RF2
5REP
5RF1
5RES
5RF4
5RER
5RF3
5REU
5RF6
5RET
5RF5
5REW
5RF8
5REV
5RF7
5REI
5REH
5REK
5REJ
5REM
5REL
5REO
5RF0
5REN
5RFZ
5RFY
5RFR
5RFQ
5RFT
5RFS
5RFV
5RFU
5RFX
5RFW
5RFJ
5RFI
5RFL
5RFK
5RFN
5RFM
5RFP
5RFO
5RG0
5RHD
5RHC
5RGZ
5RHB
5RHA
5RH4
5RH3
5RGU
5RH6
5RGT
5RH5
5RGW
5RH8
5RGV
5RH7
5RGY
5RGX
5RH9
5RH0
5RH2
5RH1
6WNP
6Z2M
6YZ7
6YNQ
6YT8
6YVF
6X2G
6ZGH
6YHU
6WEY
6W7Y
6XOA
6Z2E
6YWM
6YWL
6YWK
6YZ1
5RHF
5RHE
6W9Q
6W9C
6WOJ
6WRH
6WTK
6WTM
6WTJ
6WXD
6WZU
6XB1
6XB0
6XB2
6XAA
6XDH
6XG3
6XIP
6XKF
6XKH
6M5I
6ZGF
7C2K
7BZF
6XKM
6M1D
6VSB
6M18
6YZ6
6XCH
6W02
6XMK
6W63
7C2I
7C2J
7C8T
7C8R
5R83
5R80
5R7Z
6VXS
6WCF
6WEN
6X1B
6X4I
6WXC
6XBI
6XBH
6XBG
6XFN
6XHU
6XA9
6W61
7BTF
6WJT
6WLC
6WQ3
6WRZ
6WVN
6LU7
6VYO
6YVA
6XA4
6ZCT
5R84
5R7Y
5R82
5R81
6YB7
6WAQ
6WAR
7BQY
6LVN
6Y84
6W4B
7BRR
7BRO
7BRP
6XHM
6M1V
7BZ5
6Y7M
7C8U
6M03
3R24
Parikh vectors
5YEW_1
6WYV_1
9FRA_1
Letter
Amino acid
17
0
0
T
Threonine
32
0
0
V
Valine
17
0
0
R
Arginine
22
0
4
N
Asparagine
20
912
524
G
Glycine
33
762
320
A
Alanine
39
0
0
E
Glutamic acid
42
0
0
L
Leucine
34
0
0
K
Lycine
11
0
0
M
Methionine
33
0
0
S
Serine
4
0
0
W
Tryptophan
24
0
0
D
Aspartic acid
6
639
416
C
Cysteine
20
0
0
I
Isoleucine
23
0
0
F
Phenylalanine
11
0
0
P
Proline
4
0
0
Y
Tyrosine
17
0
0
Q
Glutamine
12
0
0
H
Histidine
Select a full-length spike protein
6VXX_1, 6X6P_1, 7BYR_1
6X6P_1, 7BYR_1, 7C2L_1
7BYR_1, 7C2L_1, 6Z97_1
5YEW_1|Chain A|Mitofusin-1,Mitofusin-1 fusion protein|Homo sapiens (9606)
>6WYV_1|Chain A[auth I]|23S ribosomal RNA|Escherichia coli (562)
>9FRA_1|Chain A[auth 2]|rRNA 16S|Saccharolobus solfataricus P2 (273057)
Protein code \(c\)
LZ-complexity \(\mathrm{LZ}(w)\)
Length \(n=|w|\)
\(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\)
\(p_w(1)\)
\(p_w(2)\)
\(p_w(3)\)
Sequence \(w=f(c)\)
5YEW
, Knot
175
421
0.83
40
217
395
MA EPVSPLKHFVLA KKAITAIFDQLLEFVTEGSHFVEA TYKNPELDRIA TEDDLVEMQGYKDKLSIIG EVLSRRHMKVAFFGRTSSGKSSVINA MLWDKVLPSGIGHITNCFLSVEG TDGDKAYLMTEGSDEKKSVKVNNQLA HALHMDKDLKA GC LVRVFWPKA KCA LLRDDLVLVDSPGTDVTTELDSWIDKFCLDA DVFVLVANSESTLMNTEKHFFHKVNERLSKPNIFILNNRWDA SASEPEYMEDVRRQHMERCLHFLVEELKVVNALEAQNRIFFVSA KEVLSARKQKAQG MPESG VALAEG FHA RLQEFQNFEQIFEECISQSA VKTKFEQHTIRA KQILATVKNIMDSVNLAA EDGSG SGSGG SEIARLPKEIDQLEKIQNNSKLLRNKAVQLENELENFTKQFLPSSNEES
6WYV
, Knot
489
2904
0.44
10
18
67
GGUU AAG CGACUAA GCG UACACGGUG GAUG C CCU GG CA G U CAG AGGCGA UGA AG G ACG U GCU A AUC U G C GAUAAGCG UCGG UAAG G UG AUAUG A ACCG U UA UA ACCG G C GA UUUCCG AAU G GG G A AACCCA GUGUGUUUCG A CA C ACUAUCAUUAACUGAA U CCA UAGG UUAAUGAG G CGAACCGGGGGAACUGA A ACA U C UAAG UACCCCGAGGAAAAGAAA UC AAC CGA G AUU CCCCC A G UAG C GGCGAGCG AA CGGG G AG CAG CCCAG AGC C UGAA UCAGU GU GUGUG UU A G UG G AAG CG UCUG GAAAGG C GC GC GAU ACAG GGUGACA GC CCCG UAC ACA AAAAU GC ACAUGCUGUGA G CUCG AUGA G UAGGGC GGGAC AC G UGGUAUCCUG UCUG AAUAUGGG GG GAC CAUC CU CCAAG GCUAAAU AC UC CUGACUGA CCG AUAG UG AA CCA GUACCG UG AGGGAAAGG CGAAAA G AACC C CGG CGA GGGGAGUGAAAAAG A ACCUGAAACCGU GU ACGU ACAAGCA GUG GG AGC ACGCUUAG GC GUGUGA CU GCGU AC CU UUUG UAUAAUG GGUCA GC G ACUUA UAUUCUG U AGCA AGG U UAACC GA AUAG G GG AG CCGAAGGGAAACCGAGUC UUAA CU GG G CG UUA A GU UGCAGG GU AU AG ACCCG A AACC C GG U G AUCUA GCC AUGGG CAGGUUGAAGG UUGG GUAACACUAACUG GA GG AC CG AAC CG ACUA A UG UUGAA AA AUUAGC GG AUG AC UUG UGGCU GGGGGUGAAA G G C CA AUCAAACC GGG AG AU AGCUG GUUCU CCCCGAAAG C UAUUUA G GUAGCGCCUC GUGAAUUCAUCUCCGGGG G U AG AGCACUGUUUCGGCA A GG G GGU CAUC CCG ACUUACC AA C CCG AUG CAAA CUGCG AAUACC GGAGAAUG UUA UC AC GGG AGA CACACGGCG G G UGCUAACGUCCGU CGUG AAGA G G GAA A C AA C C CAG AC CG CC AGCUAAGG UCCCAAAG UCAUG GUUAAG UGGG AAACGAU GUGG GA A GG CC C AG ACA GCCAGG AUGUUG GCUUA G A AGC AGCC AUC AUUU AAAGA AAGCG UAAU AGCUCAC UG G UC GA GUC GGCCUG CGC GG AAG AUG U A A CGGG G CUAA ACCA UG CACC GA AG CUGCG GC A G C GACGC UUAUGCGUUGU UG G GUAGGGG AG CG U U CUGUAAG C CUG CGAA G GUG UGCU GUG A GGC AUGCUG G AG GUAUCA GAAGUGCGA A UGC U G AC AUA AGU AACGAUAA AG CGGG UGA AAAG C C C GC UCG CCGG A AG A CCAAGGGUU CCUGUC CAA CGUUAA UCGGG GCAGG GUGA GUCGAC CCC U AAGGCGA GGC CGAAAGGCGUAGU C GA UGGG A AACAG GUUAAUAU UCCUGUAC U U G GUGU U ACUGCG AAG G GGG GAC GGAGAAGGCUAUGUUG G CCGG GCG AC G GUUGUCCCGGUUUAAGCG UG UA GGCUGG UUU UCCAGGC AAAUCCGGAAAAUCAAGGCUGAGGCGUGAUGACGAGGCACUACGGUGCUGAAGCAACAAAUGCCCUGCUUCCAGGAAAAGCCUCUAAGCAUCAGGUAACAUCAAAUCGUACCCCAAACCGACACAGGUGGUCAGGUAGAGAAUACCAAGGCGCUUGAGAGAACUCGGGUGAAGGAACUAGGCAAAAUGGUGCCGUAACUUCGGGAGAAGGCACGCUGAUAUGUAGGUGAGGUCCCUCGCGGAUGGAGCUGAAAUCAGUCGAAGAUACCAGCUGGCUGCAACUGUUUAUUAAAAACACAGCACUGUGCAAACACGAAAGUGGACGUAUACGGUGUGACGCCUGCCCGGUGCCGGAAGGUUAAUUGAUGGGGUUAGCGCAAGCGAAGCUCUUGAUCGAAGCCCCGGUAAACGGCGGCCGUAACXAUAACGGUCCUAAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCUGCACGAAUGGCGUAAUGAUGGCCAGGCUGUCUCCACCCGAGACUCAGUGAAAUUGAACUCGCUGUGAAGAUGCAGUGUACCCGCGGCAAGACGGAAAGACCCCGUGAACCUUUACUAUAGCUUGACACUGAACAUUGAGCCUUGAUGUGUAGGAUAGGUGGGAGGCUUUGAAGUGUGGACGCCAGUCUGCAUGGAGCCGACCUUGAAAUACCACCCUUUAAUGUUUGAUGUUCUAACGUUGACCCGUAAUCCGGGUUGCGGACAGUGUCUGGUGGGUAGUUUGACUGGGGCGGUCUCCUCCUAAAGAGUAACGGAGGAGCACGAAGGUUGGCUAAUCCUGGUCGGACAUCAGGAGGUUAGUGCAAUGGCAUAAGCCAGCUUGACUGCGAGCGUGACGGCGCGAGCAGGUGCGAAAGCAGGUCAUAGUGAUCCGGUGGUUCUGAAUGGAAGGGCCAUCGCUCAACGGAUAAAAGGUACUCCGGGGAUAACAGGCUGAUACCGCCCAAGAGUUCAUAUCGACGGCGGUGUUUGGCACCUCGAUGUCGGCUCAUCACAUCCUGGGGCUGAAGUAGGUCCCAAGGGUAUGGCUGUUCGCCAUUUAAAGUGGUACGCGAGCUGGGUUUAGAACGUCGUGAGACAGUUCGGUCCCUAUCUGCCGUGGGCGCUGGAGAACUGAGGGGGGCUGCUCCUAGUACGAGAGGACCGGAGUGGACGCAUCACUGGUGUUCGGGUUGUCAUGCCAAUGGCACUGCCCGGUAGCUAAAUGCGGAAGAGAUAAGUGCUGAAAGCAUCUAAGCACGAAACUUGCCCCGAGAUGAGUUCUCCCUGACCCUUUAAGGGUCCUGAAGGAACGUUGAAGACGACGACGUUGAUAGGCCGGGUGUGUAAGCGCAGCGAUGCGUUGAGCUAACCGGUACUAAUGAACCGUGAGGCUUAACCUU
9FRA
, Knot
268
1497
0.43
12
23
75
AA AU CCG GUUGAUCCUG CCGGACCCG ACCG C UAU CG GGG U GGG GCUAAGCCA UG G GAG U CGU A CGC U CC CGGGCAAG GGAG CGUG G CG GACGG CUGAG U AA CA CGUG G C UA ACCUACCCU G AG G A GGGAGA UAACCCCGGG A A A C UGGGGAUAA UC UCCCA U AGGCGA G GAGUCCUG G AACGGUUCCUCGCUGAA A GGCU C AUGG GCUAUUCCCCGCUCAUGA GC GCC UCA G GAU GGGGC UG CGG C CCAUCAGG UA GUUG G GG GGG UAAGG GCC C CCCA AGCCU A U AACGG GU A G GG G CCG UG AGAG CG GGA G C CC CC AGU UGGG CACUGAGAC AAGG GCC CAGGCCCU AC GGGGCGCACCA G GCGCGAAA CGUCCCC AAUGC GC G GAAGCGUGAG GGCG CCACCCCG AG UGC UCCC GU AAGGG AGCUUUU CC CC GCUCUACA AAG GCGG GG GA AUA AGCGGG GG GCAAGUCUG GUGUCA G CCGC C GCG GUA AUACCAGCCCCGCG A GUGGUCGGGACU CU UACU GGGCCUA AAG CG CCC GUAGCCGG CC CGACAA GU CACU CC UU AAAG ACCCCGG CUCAA CC G GGGGA AUGGGGG U GAUA CUG U CGGGC UA GGGG G CG GG AGAGGCCAGCGGUACUCC CGGA GU AG G GG CGA A AU CCUCAG AU CU CG GGAGG A CCAC C AG U G GCGAA AGC GGCUG GCUAGAACGCG CCCG ACGGUGAGGGGCG AA AG CC GG GGC AG CAAA A GG GAUUA GA UACCCC UG UAG UC CCG GCUGU AAACAAUGCA G G C UA GGUGUCAC AUG GG CU UAGAG CCCAU GUGGUGCCG C AGGGAA G CCGUUAAGCC UGCCGCCUGGGGAGUACG G U CG CAAGACUGAAACUUA A AG G AAU UGGC GGG GGAGCAC CA C AAG GGG UGGA ACCUG CGGCUC AAUUGGAG XCA AC GC CUG GAA UCUUACUAG G G GAGACCGCAGGAU GACG GCCA G G CUA A C GA C C UUG CC UG AC UCGCGGAG AGGAGGUG CAUGG CCGUCG CCAG CUCGUGU UGUG AA A UG UC C UG UUA AGUCAG GCAACG AGCGA G A CCC CCAC CAC UAGU UGGUA UCCUG GUCU CCGGGCC GG G AC CA CAC UAGUGG GAC UG CCG GCG U A A GCCG G AGGA GGGA GG GGGC CA CG GCAGG UC A G C AUGCC CCGAAACCCCU GG G CCGCACG CG GG U U ACAAUGG C AGG GACA G CGG GAUU CCG A CCC CGAAAG G GG AAGGUA AUCCCUUAA A CCC U G CC GCA GUU GGGAUCGA GG GCUG AAA CUCG C C C UC GUG AACG A GG A AUCCCUAGU AACCGC AGA UCAACA AUCUG CGGUG AAUA CGUCCC UGC U CCUUGCA CAC ACCGCCCGUCGCU C CA CCCG A GUAGG AGAGGGGU GAGGCCCC U U G CCUU U GGGUGG GGG G UCG AGC UUCUCUCCUGCAAGGG G GGAG AAG UC G UAACAAGGUAGNNGUAGG GG AA NNUGCG GCU GGAUCAC CUCC
Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\).
Let \(p_w(n)\) be the cardinality of \(P_w(n)\).
Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).
\(|P_{f(5YEW_1)}(2) \setminus P_{f(6WYV_1)}(2)|=212\),
\(|P_{f(6WYV_1)}(2) \setminus P_{f(5YEW_1)}(2)|=13\).
Let
\(
Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)|
\)
be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101101100111100110111001101100100110100001010011000011010100001011101100001011111000010001101111001110111010001101010010010110010000001010001101101000101101101111010011100011110011001000100110010101011111100000110000011001000100101111000101010010010010000100010111001011011010001111010011010000101110011111011010100100100110001000110001000010100111010011001011100101010110011011001001001000001100011010001001000111000000
Pair
\(Z_2\)
Length of longest common subsequence
5YEW_1,6WYV_1
225
3
5YEW_1,9FRA_1
226
3
6WYV_1,9FRA_1
9
11
Newick tree
[
5YEW_1:13.16,
[
6WYV_1:4.5,9FRA_1:4.5
]:12.66
]
Let d be the
Otu--Sayood
distance d.
Let d1 be the Otu--Sayood distance d1 . (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{3325
}{\log_{20}
3325}-\frac{421}{\log_{20}421})=693.\)
Status
Protein1
Protein2
d
d1 /2
Query variables
5YEW_1
6WYV_1
485
328
Was not able to put for d Was not able to put for d1
In notation analogous to
[Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022) ],
\[
\delta=
\alpha \mathrm{min} + (1-\alpha) \mathrm{max}=
\begin{cases}
d &\alpha=0,\\
d_1/2 &\alpha=1/2
\end{cases}
\]