Molecular characterization of the rotavirus enterotoxin NSP 4 gene of strains causing diarrhoea in children aged 0-5 years in northern India

Article history: Received on: 19/08/2015 Revised on: 03/09/2015 Accepted on: 30/09/2015 Available online: 27/11/2015 Non structural protein 4 (NSP4) gene of Rotavirus encodes a multifunctional protein which has significant role in viral multiplication and pathogenesis of acute watery diarrhoea associated with rotaviral gastroenteritis. It is known as the first viral enterotoxin and mutations of the gene have been linked to altered pathogenesis. This study was planned to ascertain the genotypes and genetic variations of NSP4 gene in the rotavirus strains prevalent in this area. We collected consecutive diarrhoeal stools from equal no of children aged under five years hospitalized with diarrhoea in a period from January 2010 to June 2012 and tested them for rotavirus antigen by ELISA. NSP4 gene was amplified by RT-PCR and subsequently sequenced (Big-Dye terminator kit using 3130 ABI, Genetic analyzer) and genotyped by Rotavirus C software. Of the 260 samples, 58(22.3%) samples were positive by ELISA. We were able to amplify NSP4 gene by RTPCR from 45 strains of which 35 amplicons were selected for sequencing. Total 25(71.4%) strains belonged to genotype E1, 6 (17.1%) strains to genotype E2 and 4 (11.4%) matched with the genotype E6. Sequence analysis revealed changes in the nucleotides causing punctate mutations in the conserved regions, the Inter species variable domain (ISVD) and the enterotoxin region (amino acid 114-135). On evolutionary analysis of 33 strains amino acid at position 131 was found under positive selection.


INTRODUCTION
Diarrhoea and gastroenteritis due to rotavirus afflicts infants and young children across the planet resulting in over half a million deaths annually (Tate et al., 2012).Rotaviruses are enveloped viruses with a unique segmented double stranded RNA genome belonging to the family Reoviridae.The 11 segmented genome of rotavirus is translated into six structural proteins which make up the capsid and five/six non structural proteins which are involved in replication and pathogenesis of rotavirus infection (Estes MK., 2013).The pathophysiology of rotaviral diarrhoea and contributing virulence factors are being researched and debated.One of the non structural proteins-Non structural protein 4 (NSP4) has been found to have multiple roles in rotavirus lifecycle.Initially this 175 aa protein was found to work as a receptor for immature double layered virion entering the endoplasmic reticulum for final maturation (Bergman et al., 1989).Later Ball et al., (1996) demonstrated that purified NSP4 given intraperitonialy can induce diarrhoea in infant mice thereby designating it as the first viral enterotoxin.Subsequently in Caco -2 cell line and other cell lines effects of NSP4 or its cleavage peptide (aa114-135) on calcium homeostasis, signal transduction pathways, serotonin secretion and Sodium Glucose symporter were studied.The findings have further reinforced its role as a toxin and virulence factor (Tian et al., 1994;Dong et al., 1997;Hagbom et al., 2011;Halaihel et al., 2000).Some studies have associated mutations in the gene with altered virulence (Zhang et al., 1998, Ball et al., 2013;Tsugawa et al., 2014).NSP4 is also being considred as a vaccine candidate (Yu J and Langridge 2001).We have planned this study to characterize the genetic variations of NSP4 gene in strains causing diarrhoea in children under the age of five years.

Study population
This observational study was approved by the Institutional Ethics Committee.Infants and children in the age group 0-5 years suffering from acute watery diarrhoea referred to emergency unit of department of Pediatrics, King George's Medical University from Lucknow and adjoining areas were consecutively enrolled.Acute diarrhoea was defined as passage of three or more liquid or loose stools per day (or more frequent passage than is normal for the individual) (WHO fact sheet, 2009).Patients with dysentery (diarhoea with blood in stools with or without mucus), persistent diarrhoea (duration more than 2 weeks) and cases where parents did not give consent were excluded from the study.Single stool sample per patient was collected.

RNA Extraction and c DNA construction
Stool suspension (10%) in Phosphate buffered saline [2mM] was centrifuged and the supernatant (250µl) was used for extracting RNA by QIAamp viral RNA mini kit (QIAGEN, Hilden, Germany).Complementary DNA (cDNA) was constructed with random hexamers [2µM] (Invitrogen, Carlsbad, USA), Moloney murine leukaemia virus (M-MLV) reverse transcriptase [100-200U], dntp [.5µM] and RNAse inhibitor [20U] (SIGMA-Aldrich Corporation, St Louis, USA) in a final reaction volume of 30 µl.Briefly RNA (2-5 µl) was mixed with random hexamers and dntps and the 15 µl mixture was denatured at 95 °C followed by quenching on ice for 5 min.Reverse transcriptase and RNAse inhibitor were added to the cooled mixture and incubated for 10 min at 25 °C followed by 60 min at 37 °C.The cDNA was stored at 20 °C (Banyai et al., 2009).

Sequencing
Amplicons were extracted from gel using PureLink gel extraction kit (Invitrogen Carlsbad, USA).Sequencing was done by the dideoxy nucleotide chain termination method with the Big Dye terminator cycle sequencing reaction kit (Applied Biosystems, Foster City, USA) on an automated sequencer (ABI Prism 3100xl, Applied Biosystems, Foster City, USA).The same set of primers as used in amplification were also used for sequencing.

DNA and Protein Sequence Submission and Analysis
Nucleotide and protein sequence similarity searches were done by using the BLAST (Basic Local Alignment Search Tool) on the NCBI (National Center for Biotechnology Information) website http://www.ncbi.nlm.nih.gov/blast/blast.cgi).Genotypes were assigned using Rota C rotavirus genotyping tool on the website http://rotac.regatools.be/(Maes P et al., 2009) as per the recommendations of Rotavirus Classification Working Group (RCWG) (Matthijnssens et al., 2008(Matthijnssens et al., , 2011)).Sequences were submitted to GenBank database using the BanKit v3.0 tool and accession numbers obtained.Both the protein and nucleic acid sequences were aligned among themselves as well as with prototype strains -Wa, KUN, DS-1, AU-1, N26, RV176 and EW (sourced from Genbank) using CLUSTAL W. Statistical method of UPGMA using Kimura 2-parameter and 1000 bootstrap replicates was used to draw phylogenetic trees by Mega5.05 software (Tamura et al., 2011).The protein sequence was analysed using the BioEdit sequence analysis editor (Hall et al., 1999).

Evolutionary analysis SLAC
Aligned sequences were screened for evidence of positive selection on codons by Single likelihood ancestor counting (SLAC) algorithm implemented in the HyPhy (Hypothesis testing using Phylogeny) software (Kosakovsky Pond et al., 2005;Kosakovsky Pond and Frost, 2005a).The nucleotide substitution model GTR (012345) and a tree constructed using neighborhood joining method were used to calculates dn/ds [ω] (non synonymous substitutions per synonymous substitution per site).The final interpretation was obtained at significance level of p < 0.1.

Samples
From January 2010 to June 2012 a total of 260 subjects were enrolled.NSP4 gene could be amplified in 45 samples.Products showing faint bands or more than one band on gel electrophoresis were excluded and 35 amplicons were selected for sequencing and subsequent genotyping using the Rota C software.

Genotypes and dendogram
In this study E1 genotype was present in 25 (71.4%)samples, E2 in 6 (17.1%) samples while 4 (11.4%)samples matched with the E6 genotype.The most common NSP4 genotypes in humans are E1 (Wa-like), E2 (Kun-like) and E3 (AU-1), which up till now were known as genotypes B, A and C respectively.E1 is the predominant genotype in most parts of the world followed by E2.Some other uncommon genotypes like E6 and E9 have also been reported from a few geographical areas (Khamrin et al., 2007;Rehman et al., 2007).
The phylogenetic tree for the nucleotide sequences is shown in figure 1

Genetic variations/ Mutations in the protein
The mutations in the amino acid (aa) sequence in the sample strains with respect to reference strains and according to genotype E1, E2 and E6 are shown in figures 2a, 2b and 2c respectively.Only areas with mutations are shown.The final analysis has been done for twenty two E1 strains.Structurally NSP4 has been found to have two distinct portions.The carboxy terminus from amino acid (aa) 1 to 22 is embedded in the endoplasmic reticulum (ER) while aa 22 -44 are sandwiched in the ER membrane.The remaining part from aa 45-175 ends in the amino terminal and floats in the cytoplasm.The secondary structure reveals an amphipathic alpha helix formed in the cytoplasmic portion from aa 95 to 137.The polar aa face the aqueous side while hydrophobic aa face the protein.The conserved aa lie at position 8, 18, 63 and 71; the first two are the N linked glycosylation sites and the latter two contain the cysteine residues.NSP4 protein has three hydrophobic domains (H1, H2, H3) corresponding to aa 7 to 21, 28 to 47 and 67 to 85.respectively (Bergmann et al., 1989, Chan et al., 1989).These are also conserved.

Mutations in the conserved region
In our study the above regions are all conserved except for three E2 strains.In these strains at position 74 hydrophobic aa alanine replaces polar aa threonine.This mutation could be significant in modifying the secondary structure of NSP4 as hydrophobic aa are buried in the core away from aqueous cytoplasm.We have also found genotype specific variation at position 68 and 76.The significance of these changes as part of natural evolution of genotypes needs further research.

Mutations in the non conserved region
Functionally speaking, specific regions of the non conserved cytoplasmic portion are associated with specific functions.The 114-135 aa peptide has been shown to be enterotoxigenic, the interspecies variable domain (ISVD) maps to aa 131-141 and membrane destabilizing activity is associated with aa 55-72.The stretch from aa 112 to 148 binds VP4 and VP6 binding site is the cytoplasmic tail (aa 157-175) [2].Amino acid variations according to genotype are mostly seen in the ISVD and our data also reflects this fact.Two other regions the VP4 binding site and VP6 binding site which are crucial for rotavirus replication also show variations according to genotype.

E1 Strains
At position 114 one strain had asparagine (polar aa) instead of aspartic acid ( -charged aa) and at position 150 another had lysine (+ charged aa) instead of glutamine (polar aa).Ten strains have asparagine instead of serine both of which are polar aa.These changes may not be very significant as both polar and charged aa can donate or accept an electron to form hydrogen bonds.

E2 Strains
At pos 139 two strains had alanine instead of isoleucine (E2 and E6) or valine (E1).The same strains also have glycine instead of cysteine at pos 140.Glycine is the only amino acid without a side chain and charge and thereby provides flexibility to the protein structure wherever it occurs.Its presence could lead to significant changes in the tertiary structure.These two positions seem to be a hotspot for both intra and inter genotype variation which may lead to evolution of new genotypes in the future.Another change which could be important was seen at position 144 (E2 strain) in which hydrophobic alaninne replaces polar threonine.

E6 Strains
All the four E6 strains were similar to the reference Bangladeshi strains except for changes at position 167 and 170 in one strain.

Positive selection at position 131
Estimates of evolutionary pressure leading to genetic variation can be calculated statistically by measuring synonymous (ds) and nonsynonymous (dn) substitutions per site and subsequently testing them for equality.One of the approaches is a parsimony based SLAC algorithm.It is a counting based method which has been consistent in detecting diversifying selection on virus genomes.We have applied SLAC on 33 strains and found that amino acid at pos 131 was under positive selection.Fifteen strains had tyrosine and eighteen had histidine at this position.
How significant are the mutations in the cytoplasmic portion of NSP4?Ball et al., (1996) have reported that mutation at position 131 has effect on the enterotoxic properties of the protein.Specifically mutated strains having lysine in place of tyrosine at this pos did not produce diarrhoea in the experimental infant mice.They have also done a study on lab mutant NSP4 peptides (enterotoxic portion 114-135) (Ball et al., 2013).In one mutant three hydrophobic amino acids on the side facing the protein were replaced by charged amino acids.One of these aa was tyrosine at position 131 replaced by aspartic acid.This mutant was relatively inert in its biological activities.It was neither able to bind caveolin protein nor induced diarrhoea in newborn mice.The authors concluded that changes in aa altering the secondary structure of alpha helix may affect the virulence.Some workers have found that at this position histidine is present in genotype E1 and tyrosine is found in genotype E2 in human strains causing diarrhoea (Mascarendes et al., 2006;Tavares et al., 2008).There was no correlation of tyrosine with genotype in our study and the correlation found in other studies could have been co incidental.Both tyrosine and histidine are polar aa and it seems that their presence is necessary for virulence.Though all the strains in our study were from symptomatic patients, strains with tyrosine caused relatively severe diarrhoea in most patients.We have not evaluated this property statistically.
In a study comparing two standard strains in their virulent and avirulent form (produced by serial passage), changes were observed at six positions in NSP4 in the attenuated strains (Zhang et al., 1998).
Punctate mutations in the last 40 amino acids of this 175 aa long protein have been reported previously in the literature (Gonsalez et al., 2013).These changes may not look significant at first glance but scientists working on the tertiary structure of the protein have found that even solitary mutations in the entertoxigenic, ISVD or the N terminal region result in changes in the properties of the protein.Conformation and multimerization is affected and there are experimentally demonstrable changes in characteristics such as resistance to trypsin, DLP and Thioflavin T binding and most importantly a change in the diarrhoea inducing dose (Chacko et al., 2011;Deepa et al., 2008).
Contrary to the above findings research on human asymptomatic strains has not been encouraging.Lee et al., (2000) have compared strains isolated from children with and without diarrhoea and concluded that lack of diarrhoea causing ability of RV did not reflect in the genetic variation of NSP4 gene.In a study on a vaccine strain which was attenuated after repeated passage, the NSP4 gene did not have any mutation which could be attributed to loss of pathogenicity (Ward et al., 1997).Pathogenesis of diarrhoea and rotavirus infection per se is multifactorial and though the NSP4 plays a pivotal role in precipitation of diarrhoea and viral replication other VPs and NSPs could be significant in asymptomatic human strain.It also could be hypothesized that conditions in the experimental animal gut are somehow different from the human infants and therefore the mutations which are significant in animals are not as effective in human strains.

CONCLUSION
In this study we have found an uncommon genotype E6 of NSP4 gene.Mutations were seen in both the conserved and non conserved regions.Accumulation of these mutations could also lead to further genotypic variations.Amino acid at position 131 was found to be under positive selection.

Fig. 1 :
Fig. 1: Phylogenetic tree for NSP4 gene -Tree constructed using UPGMA method.Reference strains are shown in bold.Bootstrap values greater than 75% are shown at the branching points.. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches.The evolutionary distances were computed using the Kimura 2-parameter method and are in the units of the number of base substitutions per site.The analysis involved 43 nucleotide sequences.Evolutionary analyses were conducted in MEGA 5.

Fig. 2a :
Fig. 2a: Multiple alignment of the partial deduced amino acid sequence of the NSP4 protein of 22 human rotaviruses strains of E1 genotype.Similarity to the reference sequence Wa is shown by dots.

Fig. 2b :
Fig. 2b: Multiple alignment of the partial deduced amino acid sequence of the NSP4 protein of 6 human rotaviruses strains of E2 genotype.Similarity to the reference sequence DS-1 is shown by dots.Brackets denote that aa 80 -122 are not shown.

Fig. 2c :
Fig. 2c: Multiple alignment of the partial deduced amino acid sequence of the NSP4 protein of 4 human rotaviruses strains of E6 genotype.Similarity to the reference sequence N26 is shown by dots.