Hepatitis B virus reverse transcriptase sequence variant database for sequence analysis and mutation discovery
Introduction
Hepatitis B virus (HBV) infects more than 400 million people worldwide and is a leading cause of mortality as a result of cirrhosis and hepatocellular carcinoma. Within the past 12 years, five nucleos(t)ide RT inhibitors (N(t)RTIs) have been licensed for HBV treatment including lamivudine (3TC), adefovir (ADV), entecavir (ETV), telbivudine (LdT), and tenofovir (TDF). Emtricitabine (FTC), which is structurally similar to 3TC, is also active against HBV and is frequently used to treat HBV because it is co-formulated with TDF (as truvada) for HIV treatment. 3TC, FTC, and LdT are l-nucleoside analogs. ETV is a deoxyguanosine analog. ADV and TDF are acyclic nucleoside phosphonates (ANPs). HBV resistance is one of the obstacles to successful anti-HBV therapy. HBV RT is functionally and structurally similar to HIV-1 RT (Bartholomeusz et al., 2004, Das et al., 2001) and has an error rate similar to that of other retroviral polymerases (Günther et al., 1999). The current HBV treatment guidelines recommend HBV genotypic resistance testing for patients who experience primary or secondary virological failure while receiving N(t)RTIs (Keeffe et al., 2008a, Keeffe et al., 2008b, Lok and McMahon, 2009, Lok et al., 2007). Although about 15 mutations at 10 positions are strongly associated with decreased HBV N(t)RTI susceptibility (Keeffe et al., 2008b, Lok et al., 2007), the relative frequencies of those drug resistance mutations in N(t)RTI-treated and N(t)RTI-untreated individuals are not known. Moreover, the association of some less-well characterized possible drug-resistance mutations with N(t)RTI treatment is uncertain.
There are at least eight HBV genotypes that differ from one another by about 10% of their nucleotides (Kurbanov et al., 2010). There are two HBV sequence databases that allow users to download genotype-specific alignments of different HBV genes (Gnaneshan et al., 2007, Shin et al., 2008). However, these databases do not contain information on the N(t)RTIs received by the individuals from whom the sequenced viruses were obtained. In addition, the sequence data in these databases are not linked to the additional data in the references from which the sequences were reported. A third database contains proprietary sequence data and associated clinical information that are available only for registered users (Yuen et al., 2007).
We created an HBV RT variant database, HBVrtDB to feature associations between HBV RT mutations and the N(t)RTI treatments of the individuals from whom the sequences were obtained. The database presents these associations within the context of viral genotype and geographic origin. We also created an interactive program, HBVseq, to enable users to identify mutations in submitted sequences and retrieve the prevalence of these mutations in HBVrtDB according to genotype and N(t)RTI treatment.
Section snippets
Sequence retrieval and annotation
A local BLAST search using an HBV RT amino acid sequence was performed using the GenBank viral sequence files. The BLAST search results were aggregated by the GenBank reference field and were imported to seed a relational database we call the HBV BLAST-Hits DB. Each reference in the HBV BLAST-Hits DB was annotated according to whether the set of sequences in the reference was obtained from one or more than one individual, which we refer to as the provenance of the sequence. Sequences from an
Database
Our filtered BLAST search program produced the seed HBV BLAST-Hits DB. This database contained 23,871 HBV RT sequences from 761 references. Fig. 1A shows the schema of the HBV BLAST-Hits DB that contains four tables with data from the GenBank BLAST search. Of the 761 references in the HBV BLAST-Hits DB, sufficient annotation for HBVrtDB inclusion was available for 281 (37%) references containing 6811 sequences from 3896 individuals. Fig. 1B shows the schema of HBVrtDB which contains the
Discussion
HBVrtDB was constructed by curating and annotating more than 250 studies in GenBank and was supplemented by the contribution of well-characterized HBV sequences from two large clinical populations. HBVrtDB provides novel data on the extent of polymorphism at each RT position according to HBV genotype and on the relative prevalence of each of the well-characterized drug-resistance mutations in individuals who received l-nucleoside and/or ETV but no ANPs and in individuals who received ANPs but
Conclusion
HBVrtDB demonstrates that the large numbers of sequences in GenBank – when properly annotated – provides a powerful tool for mutation discovery and sequence analysis. Analysis of the aggregate data in HBVrtDB makes it possible to test statistical associations between HBV RT mutations and treatment exposure and to generate hypotheses about mutations that can be further tested in vitro. HBVseq makes it possible for those performing HBV RT sequencing to identify previous reports of the RT
Acknowledgements
SYR, TFL and RWS were supported by a NIH grant, AI06858. SMT was supported by Hoffmann La Roche.
Contribution. SYR and RWS designed the study and wrote the manuscript. SYR and TFL performed the computer programming required for the database and sequence analysis program. MHN, RMK, BB, JV, and RK collected, annotated, and contributed to the study. SMT and RWS reviewed and annotated the studies in the HBVrtDB. All authors reviewed the manuscript and approved its content.
References (19)
- et al.
Naturally occurring variants of hepatitis B virus
Adv. Virus Res.
(1999) - et al.
A treatment algorithm for the management of chronic hepatitis B virus infection in the United States: 2008 update
Clin. Gastroenterol. Hepatol.
(2008) - et al.
Chronic hepatitis B: preventing, detecting, and managing viral resistance
Clin. Gastroenterol. Hepatol.
(2008) - et al.
Characterization of potential antiviral resistance mutations in hepatitis B virus reverse transcriptase sequences in treatment-naive Chinese patients
Antiviral Res.
(2010) - et al.
HBV drug resistance: mechanisms, detection and interpretation
J. Hepatol.
(2006) - et al.
SeqHepB: a sequence analysis program and relational database system for chronic hepatitis B
Antiviral Res.
(2007) - et al.
Comparisons of the HBV and HIV polymerase, and antiviral resistance mutations
Antivir. Ther.
(2004) - et al.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Stat. Soc. Ser. B
(1995) - et al.
Molecular modeling and biochemical characterization reveal the mechanism of hepatitis B virus polymerase resistance to lamivudine (3TC) and emtricitabine (FTC)
J. Virol.
(2001)
Cited by (43)
Nanopore sequencing from extraction-free direct PCR of dried serum spots for portable hepatitis B virus drug-resistance typing
2020, Journal of Clinical VirologyCitation Excerpt :A full description of the bioinformatics workflow used is included as supplementary data. Genotypes were determined using the web-based tools HBV geno2pheno [22] and HBVseq [23] following removal of primer sequences. For samples sequenced by MinION potential intra-host variants were screened for by aligning processed, filtered reads to the corresponding Sanger sequence.
Bioinformatics and database resources in hepatology
2015, Journal of HepatologyCitation Excerpt :The jpHMM algorithm is also available through a web interface [32]. In addition to subtype analyses of a sequence query, the HBVseq uses information from a local HBV drug resistance database (HBVrt DB) to retrieve the prevalence of each mutation according to genotype and treatment [33]. Similar to HBVseq, SeqHepB is a commercial database that determines the HBV genotype and aims at identifying key viral mutations associated with antiviral resistance [34].
Understanding amino acid mutations in hepatitis B virus proteins for rational design of vaccines and drugs
2015, Advances in Protein Chemistry and Structural BiologyCitation Excerpt :At present, the HBV genome has been sequenced and the reference sequences of all the proteins are known. A few databases have been built for the storing and the annotation of the HBV related data, such as, the HBV reverse transcriptase sequence variant database (HBVrtDB, http://hivdb.stanford.edu/HBV/releaseNotes/), which was developed by Rhee et al., stored the mutations happened in reverse transcriptase (Rhee et al., 2010). The other database, HBV knowledge database (HBVdb, https://hbvdb.ibcp.fr/HBVdb/), provides both data and tools for annotation, genotyping, and drug resistance profiling (Hayer et al., 2013).
Nucleoside/nucleotide analog inhibitors of hepatitis B virus polymerase: Mechanism of action and resistance
2014, Current Opinion in VirologyMutations in HBV DNA Polymerase Associated With Nucleos(t)ide Resistance Are Rare in Treatment-naive Patients
2014, Clinical Gastroenterology and HepatologyCitation Excerpt :The discordance between direct sequencing and INNO-LiPA results may be due to virologic and technical processes. Differences in viral fitness between wild-type and mutant quasispecies may explain how direct sequencing may have detected the rtA194S mutation, but INNO-LiPA did not detect it a year later because HBV DNA polymerase mutations often impair HBV replication.35,36 In addition, Degertekin et al28 studied concordance rates between direct sequencing and INNO-LiPA DR v.3 and noted a high overall concordance rate when detecting wild-type variants but a low concordance rate when detecting mutant sequences.