Wikisource:WikiProject Open Access/Programmatic import from PubMed Central/Transcriptome sequencing and annotation of the polychaete Hermodice carunculata (Annelida Amphinomidae)

Transcriptome sequencing and annotation of the polychaete Hermodice carunculata (Annelida, Amphinomidae)
Shaadi Mehr; Aida Verdes; Rob DeSalle; John Sparks; Vincent Pieribone; David F Gruber
BMC Genomics , vol. 16, iss. p.



The amphinomid polychaete Hermodice carunculata is a cosmopolitan and ecologically important omnivore in coral reef ecosystems, preying on a diverse suite of reef organisms and potentially acting as a vector for coral disease. While amphinomids are a key group for determining the root of the Annelida, their phylogenetic position has been difficult to resolve, and their publically available genomic data was scarce.


We performed deep transcriptome sequencing (Illumina HiSeq) and profiling on Hermodice carunculata collected in the Western Atlantic Ocean. We focused this study on 58,454 predicted Open Reading Frames (ORFs) of genes longer than 200 amino acids for our homology search, and Gene Ontology (GO) terms and InterPro IDs were assigned to 32,500 of these ORFs. We used this de novo assembled transcriptome to recover major signaling pathways and housekeeping genes. We also identify a suite of H. carunculata genes related to reproduction and immune response.


We provide a comprehensive catalogue of annotated genes for Hermodice carunculata and expand the knowledge of reproduction and immune response genes in annelids, in general. Overall, this study vastly expands the available genomic data for H. carunculata, of which previously consisted of only 279 nucleotide sequences in NCBI. This underscores the utility of Illumina sequencing for de novo transcriptome assembly in non-model organisms as a cost-effective and efficient tool for gene discovery and downstream applications, such as phylogenetic analysis and gene expression profiling.

Electronic supplementary materialEdit

The online version of this article (doi:10.1186/s12864-015-1565-6) contains supplementary material, which is available to authorized users.


The amphinomid polychaete Hermodice carunculata (Annelida, Amphinomidae) is a cosmopolitan and ecologically important omnivore inhabiting coral reefs and other habitats throughout the Atlantic Ocean, including the Gulf of Mexico and the Caribbean Sea, as well as the Mediterranean and Red seas[1]. It is known to prey on a diverse suite of reef organisms such as zoanthids [2],[3], scleractinian corals [4]-[5], milleporid hydrocorals [6],[7], anemones [8] and gorgonians [6]. Hermodice carunculata is also a winter reservoir and spring-summer vector for the coral-bleaching pathogen Vibrio shiloi[9] and plays a complex and potentially ecologically important role in coral reef ecosystem health.

Amphinomidae is a well-delineated clade within aciculate polychaetes and it comprises approximately 200 described species from 25 genera[10]-[11]. Amphinomids are distributed worldwide and are known to inhabit intertidal, continental shelf and shallow reef communities, with a few species also recorded from the deep-sea [11]. The clade is primarily identified by a series of morphological apomorphies including nuchal organs situated on a caruncle, a ventral muscular eversible proboscis with thickened cuticle on circular lamellae, and calcareous chaetae [12],[13]. Due to the lack of knowledge regarding their morphological variability (particularly within closely related genera), previous studies based mainly on morphology have failed to clarify the evolutionary history of the group, leading to taxonomic problems. In fact, several nominal species have been regarded as conspecifics, often without evaluation of molecular data, which might explain the common occurrence of cosmopolitan species within the clade [14]. Consequently, detailed revisions of species and even genera are needed [11], which incorporate molecular phylogenetic studies to clarify the affinities within the family [10],[15]. Additionally, amphinomids are group with unclear phylogenetic position within Annelida as different studies find different evolutionary affinities for the group [15],[16], but regarded as morphologically primitive and considered of prime interest for determining the root of the annelid Tree of Life [17]. However, the availability of genomic data in public databases for Hermodice carunculata and other amphinomid species is particularly scarce. Previous to this study, only 279 sequences were accessible in NCBI for H. carunculata.

Furthermore, the annelid Hermodice carunculata is a representative of the Lophotrochozoa, a clade of protostome bilaterian animals that comprises about half of the extant animal phyla, including Mollusca, the second most diverse phylum[18]. Annelids, in general, are of interest within lophotrochozoans because they are among the first coelomates [19] and polychaetes in particular, exhibit ancestral traits in body plan and embryonic development [19],[20]. Nevertheless, polychaete annelids and lophotrochozoans have been heavily underrepresented in sequencing efforts, therefore, genomic resources for this key bilaterian clade are still relatively poor compared to the other two major bilaterian clades (Ecdysozoa and Deuterostomia) [20]. A more complete representation of taxa in the genomic databases is needed to better understand animal evolution and unravel the origins of organismal diversity, especially of crucial clades such as the Lophotrochozoa [20],[21].

Here, we provide a de novo transcriptome assembly of Hermodice carunculata, a cosmopolitan Lophotrochozoan polychaete that inhabits coral reefs throughout the Atlantic Ocean. In this study we use the Illumina HiSeq platform to generate a cDNA library for H. carunculata. These Next-Generation Sequencing (NGS) libraries have an enormous sequencing depth and better effectiveness, producing at least 100 to 10,000 times higher throughput than classical Sanger sequencing[22]. This allows for the examination of thousands of transcripts from uncharacterized species and renders it useful for a wide range of biological applications including phylogenomics [23], regulatory gene discovery [24]-[25], molecular marker development [26], single nucleotide polymorphism (SNP) identification for trait adaptation [27],[28], haplotype detection [29],[30], and differential gene expression profiling [24],[29]. In this study we provide a reference set of mRNA sequences for H. carunculata, which will facilitate annotation of the genome and future studies of polychaete evolution, systematics and functional genomics. We specifically focused on major signaling pathways and housekeeping genes, as well as genes related to reproduction and immune response, and we provide a comprehensive list of genes related to these key processes in the annelid H. carunculata.

Results and discussionEdit

Sequencing andde novoassemblyEdit

Total RNA was extracted from the body-segment H. carunculata. The (A)+ RNA was isolated, sheered to smaller fragments, and reverse transcribed to make cDNA for sequencing with Hi-Seq Illumina 1000. Four hundred million paired-end strand-unspecific reads were obtained from one lane of one plate, generating 32.4 gigabase pairs (Gbp) of raw data that were uploaded to NCBI. Reads were checked for Phred-like quality scores above the Q30 level with FastQC[31]. We used the pipeline proposed in [32] to remove low quality reads for de novo assembly. HiSeq Illumina read sequences were assembled into 525,989 contigs longer than 200 bp, with an N50 of 1,095 and mean length of 722.30 bp, using ABySS 1.3.1[33], followed by Blat (with default parameters) [34] for redundancy removal. A range of 8 k-mers (21–55) were used for ABySS runs, with the parameter q = 3 to trim low-quality bases from the ends of reads for each run. The final data set was filtered for contigs longer than 200 bp. Summary statistics for each k-mer assembly, as well as for the merged and redundant-removed set of contigs is outlined in Table 1. Paired-end reads and assembled contigs that do not contain ambiguous bases have been deposited into NCBI and can be downloaded at the NCBI Sequence Read Archive:

Table 1Edit

"Summary Statistics for individual and merged assemblies"
AssemblyNumber of transcripts > 200 bpN50 bpMean length bpMax length bpTotal number of bp
K-mer 21143,191584505.547,34272,390,913
K-mer 25160,583771605.8713,38297,292,569
K-mer 29188,890631523.058,87898,798,757
K-mer 35225,756689551.6111,724124,529,844
K-mer 41179,143891633.8618,825113,522,250
K-mer 45171,154983667.6624,711114,273,429
K-mer 51156,3871,096713.0317,800111,509,378
K-mer 55144,5651,160740.3214,922107,023,822
Generated ORFs from AssemblyNumber of ORFs >200 AAN50 AAMean length AAMax length AATotal number of AA
ORFs > 200AA58,454490443.928,16725,948,636
For each k-mer, data from AbySS is shown. The final assmbly is the result of merging the AbySS k-mer assemblies using BLAT to remove the redundancies. Predicted ORF’s longer than 200AA’s from this final contig set were used for annotation. K-mer = required length of overlap match between two reads in AbySS; N50 = length weigthed median contig length; bp = base pair; ORF = Open Reading Frame.

Assemblies at higher k-mers (e.g. 41–55) had lower mean length and N50 than assemblies at lower k-mers (21–35) (Table 1). This is in agreement with other summary statistics of NGS reported de novo assembly data[35]. The lower N50 and mean in the final merged dataset, compared with k-mer 51 and k-mer 55, is due to addition of shorter sequences from lower k-mer assemblies. As outlined in Table 1, the N50 has changed from 584 in k-mer 21 to 1095 bp in the merged set of contigs, indicating an improvement in the assembly contig length. Although the majority of the contig length is between 200–600 bp, we obtained 20,828 contigs, with length greater than 3,563 bp (Figure 1). This result indicates that the data has a very high quality for further annotation. Lastly, the assembled sequences were deposited in Transcriptome Shotgun Assembly (TSA) at the NCBI.

Assembled contig length distribution. Each number on top of each bar represents number of assembled contigs per length category.

A six frame translation (ORFs) from stop to stop for each assembled contig was generated using the EMBOSS package, version:[36]. This file contained 58,454 predicted ORFs longer than 200 AA, with the N50 of 490 AA, and mean length of 443.92 AA.

Comparative sequence similarity with other annelidsEdit

For comparative annotation, all ORFs longer than 200 AA (58,454) were initially searched against two existing annelid genomic datasets, Capitella teleta ( and Helobdella robusta (; and subsequently against Paramphinome jeffreysi and Eurythoe complanata, using BlastP[37] with a significant E-value of 2e−15. Similarity search showed that 23,617 (40.5%) ORFs have similarity higher than 70% against C. teleta, while 20,468 (35%) ORFs have similarity higher than 70% against H. robusta (Figure 2). This indicates that the proportion of sequences with matches in the proteome of C. teleta is greater than the proportion of matches for H. robusta. This is expected, as C. teleta and H. carunculata are both polychaete annelids, as opposed to H. robusta, a leech (Clitellata). In total, 15,841 transcripts had a significant hit (70% length homology) in both datasets. Furthermore, 29,819 of these ORFs showed homology to P. jeffreysi and 36,033 to E. complanata. Of these ORFs, 23,441 were homologous to both Paramphinome jeffreysi and Eurythoe complanata. These shared sequences can be used for future genome annotation of both annelids and amphinomids, respectively (data available upon request).

Venn diagram distribution of similarity search results for Hermodice carunculata. Based on 58,454 predicted Open Reading Frames (ORFs) of genes longer than 200 amino acids. The number of unique sequence-based annotation is the best sum of unique BlastP hits (E-value of 2e−15) from Capitella teleta and Helobdella robusta proteome, respectively.

Functional annotation and characterizationEdit

One of the important aspects of mining the transcriptomic data is assigning function to individual transcripts. Functional annotation is an effective way to categorize genes into physiological classes to assist in understanding the large quantity of transcripts and for evaluating functional differences between subgroups of sequences. These data provide a tool for designing custom microarray experiments related to annotated functions[38]. Gene ontology (GO,[39],[40] is an extensive scheme for this purpose. This framework covers a wide biological scope, and with its directed acyclic graph (DAG) structure, it accounts for biological dependencies. In addition, programs such as InterProScan [41],[42] provide an integrated platform for domain-based searches against databases such as PROSITE [43], PRINTS [44], Pfam [45], and SMART [46], in addition to others. Over the past few years, resources have been developed for automatic GO term and InterPro ID assignment to unknown sequences. Blast2GO [47] was utilized for functional annotation, visualization and its associated statistics.

As part of the Blast2GO pipeline, ORFs longer than 200 AA (58,454) were subjected to sequence homology search against the non-redundant protein database (NR) at NCBI, using BlastP (E 10–10, cutoff =55, GO weight = 5, HSP coverage = 0). Followed by mapping to collect GO terms, and assigning reliable information to each query sequence. Default values of Blast2GO annotation parameters were chosen to optimize the ratio between annotation accuracy and coverage[48]. This provided a framework for categorizing genes into functional annotation groups, namely biological process (sets of molecular events or operations with a defined beginning and end), molecular function (the primary activities of gene product at the molecular level, such as catalysis or binding), and cellular compartment. Furthermore, InterPro IDs (protein domain IDs) were assigned to sequences by running InterProScan (part of the Blast2Go pipeline).

Out of 58,454 predicted ORFs, 55.6% (32,500) of the data contained definitive functional annotation. These sequences were classified into three categories (GOslim): biological process, cellular component and molecular function. The summary of classification of annotation is reported at Level 2 of GO Category. In the molecular function, the clusters relating to “binding” and “catalytic activity” were enriched (21,089 and 12,443, respectively) (Figure 3A). In the biological process classification, “metabolic process” with 14,272 sequences, “cellular processes” with 14,254 sequences, and “biological regulation” with 8,818 sequences were large compared to “regulation of anatomical structure size” and “cell growth” with about 200 sequences each (Figure 3B). This is expected, as these data are not collected from a developmental stage with high rate of divisions. In the cellular component category, the cluster size of “cell” with 20,053 sequences and “organelle” with 11,413 sequences were highly represented compared to “microbody” or “extracellular matrix” with less than 100 sequences each (Figure 3C). This pattern is very similar to a recent analysis of Lymnae stagnalis (pond snail) transcriptome functional annotation[49].

Functional annotation of Hermodice carunculata transcripts. The 30 most abundant GOslim terms based on A molecular function, B biological processes, C cellular component.

In terms of length distribution of annotated sequences, 70% to 90% of the sequences with length ranging from 200 AA to 1,500 AA were functionally annotated, while 100% of the sequences with length between 1,500 AA to 3,500 AA had a GO term assigned to them (Figure 4). This result indicates that longer sequences have a higher rate of annotation than shorter sequences. The annotated sequences and a table representing sequence IDs with their assigned GO terms and InterPro IDs and enzyme codes are reported (Additional file 1).

Percentage of functionally annotated transcripts relative to their length.

Identification of candidate genes and potential phylogenetic markersEdit

Signaling pathway and housekeeping genesEdit

We identified 21 homologs of housekeeping genes belonging to CAT, MAT, PFK, ATP Synthase and 4,450 homologs of signaling pathways belonging to Activin, Deltex, DPP, Fringe, Jagged, Notch, Notch2, SMAD, TGF- β; (Additional file 2: Table S1). Riesgo and colleagues[50], in their analysis of ten transcriptomes of newly sequenced invertebrates, found similar homologs in mollusk and annelid transcriptomes.

Immune response genesEdit

We identified 172 orthologous sequences of 37 genes involved in immune response (Additional file 2: Table S1), including caspase, interleukin, toll-like receptors, IRF genes, ficolin, antistasin and angiopoietin among others.

Reproduction genesEdit

We identified 46 homologous sequences to 17 genes in>volved in reproduction, including attractin, vasa, germ cell-less, piwi, smaug, nanos, zona pellucida, spermatogenesis-associated proteins and zonadhesin (Additional file 2: Table S1).

Potential phylogenetic markersEdit

Using reciprocal BLAST searches between the Hermodice carunculata transcriptome and publicly available sequences, we have identified putative H. carunculata homologues of genes that have been previously used as phylogenetic markers in Annelida but were unavailable for H. carunculata and amphinomids in general, with a few exceptions. We identified 900 homologous sequences of EF-1α, 101 homologous to H3, 7 homologous to CytB, and 400 homologous to U2 snRNA. We chose the longest sequence in each category for downstream phylogenetic analysis. The alignment of each of these sequences, along with the five best hits retrieved by BLAST from the NCBI database, are available in the supplementary materials (Additional files 3, 4, 5 and 6). Sequences were deposited in GeneBank.

Light production genesEdit

A search for sequence homology in the transcriptome of Hermodice carunculata against 182 known bioluminescent-related proteins, such as the photoproteins Obelin, Aequorin, and other luciferases, found eight sequence transcripts with an average of 44.9% homology to the luciferase protein of the phylogenetically distant sea pansy Renilla reniformis (Cnidaria, Renillidae). An alignment of the H. carunculata putative luciferase with Renilla luciferase is generated (Figure 5) and the corresponding cDNA sequences are included (Additional file 7).

Overlapping region of amino acid sequence alignment of homologous proteins sequences to luciferase from the sea pansy, Renilla sp.

In silicoquantification of thehermodice carunculatatranscriptomeEdit

In order to identify poor quality and potentially misassembled transcripts, reads were mapped back onto the non-redundant set of transcripts[51]. The number of reads corresponding to each transcript ranged from 2 to 9000 with an average of 1,644 reads, indicating a wide range of expression (Additional file 8). This indicates that very low expressed transcripts were represented in our dataset. Furthermore, we analyzed the coverage of the functionally annotated transcripts. The minimum coverage was 2 FPKM and maximum was 20,000 FPKM. Among these, 400 transcripts had a mean coverage less than 3, or gaps were removed from dataset (Table 2).

Table 2Edit

"Summary statistics of read counts and coverage"
Total number of reads426,555,924
Number of read used reads for assembly141,684,860 (33.22%)
Number of unused reads28,4871,064 (66.78%)
Number of non-redundant transcripts (>200 bp)525,989
Number of non-redundant trasncripts with back-aligned reads (>200 bp)525,939
Number of transcripts with coverage fpkm >1176,412
Number of transcripts with coverage fpkm >549,690
Average coverage for contigs from filtered dataset 2 (fpmk)15.279
Average number of reads mapped per contig (with coverage fpkm >5)1644
bp = base pair; fpkm = paired-reads per kilobase per million; contig = contiguous overlapping sequence read from assembly.


Relying on Next Generation Sequencing techniques and a thorough bioinformatics pipeline we have generated a comprehensive list of major signaling pathways, housekeeping genes, and genes related to reproduction and immune response in a representative of the Lophotrochozoa, the polychaete annelid Hermodice carunculata, whose phylogenetic placement within Annelida has been difficult to resolve. Major signaling pathways are highly evolutionarily conserved across Metazoa and play an important role during embryonic and adult development, regulating many fundamental cellular processes such as proliferation, stem cell maintenance, differentiation, migration or apoptosis[52]. In addition, some genes such as those involved in Notch signaling might have a role in segment formation and adult regeneration in polychaetes [53]. Housekeeping genes are required for the maintenance of essential basal cellular functions and consequently, under normal conditions, they are expressed in all cells regardless of tissue type or developmental stage [54]. They are especially interesting because they represent the minimal set of genes required to sustain life and they can be used as comparative controls for experimental and computational studies [54], for example, to assess the suitability of transcriptome datasets for gene discovery [50]. Immune response genes are also of great concern especially among invertebrates because they represent an early model of the more highly evolved innate immune system of vertebrates [55]. Knowledge of the invertebrate immune system is based mainly in two ecdysozoan model organisms, Drosophila melanogaster and Caenorabditis elegans, and although Lophotrochozoan systems show some distinct differences[56], studies focusing on this group are very limited. Lastly, characterization of the reproductive genes of polychaetes is of interest as they exhibit an astonishing diversity of reproductive strategies, including both sexual and asexual reproduction, and range from spawning and external fertilization to brooding or viviparism, often involving marked morphological, physiological and behavioral modifications [12]. For example, some amphinomids such as Eurythoe complanata or Cryptonome conclava exhibit both sexual and asexual reproduction, the latter accomplished by architomic scissiparity: the body fragments in two or more parts which regenerate head, tail or both[11],[57].

Sex pheromones have been postulated to drive cryptic speciation in oligochaetes[58]. Within polychaetes, there are several species known to use pheromones to attract the opposite sex and to control the release of gametes, such as the scale worm Harmothoe imbricata[59], the rag worms Nereis succinea and Platynereis dumerilii and the lugworm Arenicola marina[60]. The sex pheromone attractin has been suggested by previous authors as a potential phylogenetic marker [58]. As part of our annotation pipeline, we have identified seven sequences homologous to attractin in the transcriptome of Hermodice carunculata. A phylogenetic analysis was performed to evaluate the potential of the H. carunculata attractin protein as a reliable phylogenetic marker for polychaete systematics and evolutionary studies. Our analysis corroborates results by previous authors[58] suggesting that attractin represents an effective phylogenetic marker, recovering deep metazoan relationships (Figure 6; Additional file 9) and important clades such as Bilateria, its split into Deuterostomia and Protostomia, and the subdivision of the latter in Ecdysozoa and Spiralia (Lophotrochozoa). Attractin also recovers Annelida as a monophyletic group (Figure 6).

Maximum likelihood tree of 21 Attractin proteins and one newly identified attractin sequence from Hermodice carunculata. The newly identified attractin is colored red.

Several so-called cosmopolitan species within amphinomids have proven to comprise various cryptic species[1]. Hermodice carunculata has a widespread distribution and has been reported throughout the Atlantic Ocean, Caribbean, Mediterranean and Red Sea[61],[62]. Despite its widespread distribution, its representation in NCBI consisted of only 359 nucleotide sequences and only a handful of studies have examined genetic aspects of H. carunculata. For example, in a species delineation study, two mitochondrial genes (COI and 16S rDNA) and the internal transcribed spacer 1 (ITS1) were used to test for cryptic speciation in H. carunculata[1]. This analysis showed that genetic divergence is low among samples across the Atlantic Ocean, and these particular three genes do not reflect any genetic basis for the observed morphological differences (e.g., variable filament abundance) among populations. Therefore, identification of informative loci for phylogeographic application is necessary. However, a different study using COI sequences has found that Eurythoe complanata represents a complex of three genetically distinct and morphologically indistinguishable lineages inhabiting the Atlantic and Pacific Oceans. Also, the deep-sea genus Archinome has been shown to comprise four genetically distinct lineages with no apparent morphological differences[63]. Therefore, the de novo assembled transcriptome presented herein for Hermodice carunculata, can also be used to develop additional molecular phylogenetic markers to aid forthcoming studies of species boundaries and evolutionary relationships within Amphinomidae. Furthermore, amphinomids are a morphologically plesiomorphic group of annelids, considered as a highly important taxon for reconstructing relationships at the base of the annelid tree[17]. Thus, the vast amount of molecular data provided herein can also help to elucidate the basal relationships of Annelida.

Within annelid polychaetes there are a number of bioluminescent species distributed in various families such as Acrocirridae (Swima), Chaetopteridae (Chaetopterus), Flabelligeridae (Poeobius, Flota), Polynoidae (Harmothoe, Polynoe), Syllidae (Odontosyllis, Eusyllis, Pionosyllis), Terebellidae (Polycirrus, Thelepus) and Tomopteridae (Tomopteris)[64]. To date, no bioluminescent protein sequence has been reported from this phylum, but we do report homologous sequences of a luciferase protein (Figure 5). The fact that the putative Hermodice carunculata luciferase shows highest homology to the luciferase of a phylogenetically distant cnidarian (Renilla reniformis) can probably be attributable to the lack of publicly available luciferase sequences from more closely related organisms. The transcriptomic dataset presented herein can greatly help identify and characterize this putative photoprotein and facilitate future studies investigating the genetic and biochemical basis of light production in annelids. In addition, we report both green and red biofluorescence in Hermodice carunculata, yet the search of the genome showed no homology to any known fluorescent protein species (Figure 7).

Fluorescent macro image of Hermodice carunculata using 450–500 nm excitation and 514 nm LP emission (A); white light image (B); and fluorescent macro comparison (using 450-500 nm excitation and 514nmLP emission) (C); confocal images (D-G) obtained with a Olympus Fluoview FV1000 (Olympus, Japan) confocal laser scanning microscope using an Olympus LUMFL 60×/1.10 W objective (excitation 488 nm wavelength Ar-laser was used), illustrating distrubution of green and red fluorescence; (H) Emission spectra using an Ocean Optics USB2000+ miniature spectrometer (Dunedin, FL) equipped with a hand-held fiber optic probe (Ocean Optics ZFQ-12135).

An additional recent approach in estimating more accurate intergeneric and intrageneric level relationships utilizes conserved blocks of homologous sequences shared between genomic regions of multiple species[65]. Our data provides a complementary resource for this kind of application in the future. Also, the annotation of the genomes is reliant on transcriptome data for the exon intron boundary delimitation. Our data provide a base for future genomic and ecological research on Hermodice carunculata, as well as a resource to understand the natural history of polychaetes and the evolution of annelids in general.


Sample collectionEdit

Research, collecting and export permits were obtained from the government of the Bahamas while working out of the Perry Institute for Marine Science on Lee Stocking Island during a December 2011 expedition. The sample was collected by scientific divers D. Gruber, J. Sparks and M. Lombardi from Norman’s Pond Cay Cave, Norman’s Pond Cay, Exumas, Bahamas (GPS N 23 47.181, W 076 08.428). The cave’s entrance is a 2 m by 8 m sinkhole located just above high tide level and the cave is approximately 50 m linear and to a depth of 40 m. Divers explored the walls of the cavern zone using compact LED lights for cryptic invertebrate specimens. The Hermodice carunculata specimen was collected 30 m within the cave, transported back to the field station where it was frozen in liquid nitrogen less than two hours following collection.

RNA extraction and transcriptome sequencingEdit

Total RNA was extracted from dissected tail muscles. The muscle tissue was homogenized in TriZol reagent (Life Technologies, NY) and the total RNA was precipitated with isopropanol and dissolved in ddH2O. The quality of RNA was assessed on a 2100 Bioanalyzer and with agarose gel electrophoresis. The total RNA was pooled for Library preparation. Libraries were prepared using a HiSeq RNA sample preparation kit (Illumina Inc, San Diego, CA) according to the manufacturer’s instructions. One lane was multiplexed for four samples and was sequenced as 80-bp PE reads. FASTQ file generation was performed by CASAVA version 1.8.2 (Illumina).

De novoassemblyEdit

All the assemblies were performed on a server with 50 cores and 250 GB random access memory. Obtained reads were de novo assembled, using ABySS[33] followed by Blat version: 34x12 [34], according to the proposed pipeline for merge and redundancy removal [32] in contigs generated by ABySS. In order to recover high and low expressed transcripts, a range of k-mers (21–55) was used prior to merge with Blat.

Phylogenetic analysisEdit

Sequences for the sex pheromone attractin were downloaded from GenBank (accession number generation in progress) and aligned with the Hermodice carunculata translated sequence using MUSCLE in SEAVIEW 4.3.0[66]. A phylogenetic analysis using amino acid sequences was conducted with RAxML ver. 7.7.1 [67] using the maximum likelihood optimality criterion with a JTT amino acid substitution model. Support values were estimated using a rapid bootstrap algorithm with 1,000 replicates. The protozoan symbiont Capsaspora owczarzaki was specified as the outgroup.

Functional annotationEdit

Gene ontology (GO) terms and InterPro IDs were assigned to ORF sequences longer than 200 AA, using Blast2GO[47].

Availability of supporting dataEdit

Hermodice carunculata paired-end reads and assembled contigs can be downloaded at the NCBI Sequence Read Archive: We have also made available at LabArchives ( 1) a Fasta file of homologous of contigs shared between Capitella teleta, Helobdella robusta and Hermodice carunculata; 2) a Fasta file of homologous contigs shared between Eurythoe complanata, Paramphinome jeffreysii and Hermodice carunculata; and 3) the functionally annotated Open Reading Frames generated from the Hermodice carunculata transcriptome.

Additional filesEdit

File:12864 2015 1565 MOESM1 ESM.docx
Table of assigned Go terms and InterPro IDs for Open Reading Frame generated fromHermodice carunculatatranscriptome.
File:12864 2015 1565 MOESM2 ESM.docx
List of Hermodice carunculata reproduction and immune response genes. The specific gene, number of found contigs and the contig identification tag are included.
File:12864 2015 1565 MOESM3 ESM.docx
Fasta file of annotated EF-α expressed isoforms.
File:12864 2015 1565 MOESM4 ESM.docx
Fasta file of annotated Histon expressed isoforms.
File:12864 2015 1565 MOESM5 ESM.docx
Fasta file of annotated Cytochrome B expressed isoforms.
File:12864 2015 1565 MOESM6 ESM.docx
Fasta file of annotated U2 snRNA expressed isoforms.
File:12864 2015 1565 MOESM7 ESM.docx
Fasta file of annotated luciferases expressed isoforms.
File:12864 2015 1565 MOESM8 ESM.docx
RPKM of the assembled contigs.
File:12864 2015 1565 MOESM9 ESM.docx
Multiple Sequence Alignment of annotated attractin protein fromHermodice carunculataalong with 21 other attractin sequences from other species.

Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsDFG, RD and SFPM designed the study. DFG, JS, SFPM and VAP participated in sample collection and Illumina sequencing. SFPM and DFG carried out the molecular genetic studies. SFPM and AV performed sequence alignments and phylogenetic analysis. DFG, SFPM, RD, AV and SFPM drafted the manuscript. All authors read and approved the final manuscript.


We thank Ana Reisgo and Jean Gaffney for helpful comments, Zhou Han for laboratory assistance, Mike Lombardi for assistance with sample collection assistance and the staff of the John H. Perry Caribbean Research Center for hosting our field visit to the Bahamas. This work was supported by City University of New York Collaborative Incentive Research Grant #2064, PSC-CUNY Research Award # 66474–00 44, National Geographic Society/Waitt Grant W101-10 and National Science Foundation Grant to DFG, to the Sackler Institute for Comparative Genomics and the Korein Foundation to RD and the American Museum of Natural History to JSS.


  1. 1.0 1.1 1.2 Ahrens JB, Borda E, Barroso R, Paiva PC, Campbell AM, Wolf A, Nugues MM, Rouse GW, Schulze A. The curious case of Hermodice carunculata (Annelida: Amphinomidae): evidence for genetic homogeneity throughout the Atlantic Ocean and adjacent basins. Mol Ecol. 2013;22:2280–91.10.1111/mec.1226323517352
  2. Sebens KP. Intertidal distribution of zoanthids on the Caribbean coast of Panama: effects of predation and desiccation. Bull Mar Sci. 1982;32:316–35.
  3. Karlson RH. Disturbance and monopolization of a spatial resource by Zoanthus sociatus (Coelenterata, Anthozoa) Bull Mar Sci. 1983;33:118–31.
  4. Ott B, Lewis JB. The importance of the gastropod Coralliophila abbreviata (Lamarck) and the polychaete Hermodice carunculata (Pallas) as coral reef predators. Can J Zool. 1972;50:1651–6.10.1139/z72-217
  5. Marsden JR. The digestive tract of Hermodice carunculata (Pallas) Polychaeta: Amphinomidae Can J Zool. 1963;41:165–84.
  6. 6.0 6.1 Rylaarsdam KW. Life histories and abundance patterns of colonial corals on Jamaican reefs. Mar Ecol Prog Ser Oldend. 1983;13:249–60.10.3354/meps013249
  7. Lewis J, Crooks R. Foraging cycles of the amphinomid polychaete Hermodice caruncluata preying on the calcereous hydrozoan Millepora complenata. Bull Mar Sci. 1996;58:853–6.
  8. Fauchald K, Jumars PA. The diet of worms: a study of polychaete feeding guilds. 1979.
  9. Sussman M, Loya Y, Fine M, Rosenberg E. The marine fireworm Hermodice carunculata is a winter reservoir and spring-summer vector for the coral-bleaching pathogen Vibrio shiloi. Environ Microbiol. 2003;5:250–5.10.1046/j.1462-2920.2003.00424.x12662172
  10. 10.0 10.1 Wiklund H, Nygren A, Pleijel F, Sundberg P. The phylogenetic relationships between Amphinomidae, Archinomidae and Euphrosinidae (Amphinomida: Aciculata: Polychaeta), inferred from molecular data. J Mar Biol Assoc UK. 2008;88:509–13.10.1017/S0025315408000982
  11. 11.0 11.1 11.2 11.3 Borda E, Kudenov JD, Bienhold C, Rouse GW. Towards a revised Amphinomidae (Annelida, Amphinomida): description and affinities of a new genus and species from the Nile Deep-sea Fan, Mediterranean Sea. Zool Scr. 2012;41:307–25.10.1111/j.1463-6409.2012.00529.x
  12. 12.0 12.1 Rouse G, Pleijel F: Polychaetes. Oxford University Press; Oxford: 2001
  13. Rouse GW, Fauchald K. Cladistics and polychaetes. Zool Scr. 1997;26:139–204.10.1111/j.1463-6409.1997.tb00412.x
  14. Yáñez-Rivera B, Salazar-Vallejo SI. Revision of Hermodice Kinberg, 1857 (Polychaeta: Amphinomidae) Sci Mar. 2011;75:251–62.10.3989/scimar.2011.75n2251
  15. 15.0 15.1 Weigert A, Helm C, Meyer M, Nickel B, Arendt D, Hausdorf B, Santos SR, Halanych KM, Purschke G, Bleidorn C, Struck TH. Illuminating the Base of the Annelid Tree Using Transcriptomics. Mol Biol Evol. 2014;31:1391–401.10.1093/molbev/msu08024567512
  16. Struck TH, Paul C, Hill N, Hartmann S, Hosel C, Kube M, Lieb B, Meyer A, Tiedemann R, Purschke G, Bleidorn C. Phylogenomic analyses unravel annelid evolution. Nature. 2011;471:95–U113.10.1038/nature0986421368831
  17. 17.0 17.1 Colgan DJ, Hutchings PA, Beacham E. Multi-gene analyses of the phylogenetic relationships among the Mollusca, Annelida, and Arthropoda. Zool Sci. 2008;47:338–51.
  18. Giribet G. Assembling the lophotrochozoan (=spiralian) tree of life. Philos Trans R Soc L B Biol Sci. 2008;363:1513–22.10.1098/rstb.2007.2241
  19. 19.0 19.1 Salzet M, Tasiemski A, Cooper E. Innate immunity in lophotrochozoans: the annelids. Curr Pharm Des. 2006;12:3043–50.10.2174/13816120677794755116918433
  20. 20.0 20.1 20.2 Gagniere N, Jollivet D, Boutet I, Brelivet Y, Busso D, Da Silva C, Gaill F, Higuet D, Hourdez S, Knoops B, Lallier F, Leize-Wagner E, Mary J, Moras D, Perrodou E, Rees J-F, Segurens B, Shillito B, Tanguy A, Thierry J-C, Weissenbach J, Wincker P, Zal F, Poch O, Lecompte O. Insights into metazoan evolution from alvinella pompejana cDNAs. BMC Genomics. 2010;11:634.10.1186/1471-2164-11-63421080938
  21. Takahashi T, McDougall C, Troscianko J, Chen W-C, Jayaraman-Nagarajan A, Shimeld S, Ferrier D. An EST screen from the annelid Pomatoceros lamarckii reveals patterns of gene loss and gain in animals. BMC Evol Biol. 2009;9:240.10.1186/1471-2148-9-24019781084
  22. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2009;11:31–46.10.1038/nrg262619997069
  23. Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SH, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452:745–9.10.1038/nature0661418322464
  24. 24.0 24.1 Feng C, Chen M, Xu CJ, Bai L, Yin XR, Li X, Allan AC, Ferguson IB, Chen KS. Transcriptomic analysis of Chinese bayberry (Myrica rubra) fruit development and ripening using RNA-Seq. BMC Genomics. 2012;13:19.10.1186/1471-2164-13-1922244270
  25. Crawford JE, Guelbeogo WM, Sanou A, Traore A, Vernick KD, Sagnon N, Lazzaro BP. De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology. PLoS One. 2010;5:e14202.10.1371/journal.pone.001420221151993
  26. Franchini P, Van der Merwe M, Roodt-Wilding R. Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis. BMC Res Notes. 2011;4:59.10.1186/1756-0500-4-5921396099
  27. Salem M, Vallejo RL, Leeds TD, Palti Y, Liu S, Sabbagh A, CE R l, Yao J. RNA-Seq identifies SNP markers for growth traits in rainbow trout. PLoS One. 2012;7:e36264.10.1371/journal.pone.003626422574143
  28. Renaut S, Nolte AW, Rogers SM, Derome N, Bernatchez L. SNP signatures of selection on standing genetic variation and their association with adaptive phenotypes along gradients of ecological speciation in lake whitefish species pairs (Coregonus spp.) Mol Ecol. 2011;20:545–59.10.1111/j.1365-294X.2010.04952.x21143332
  29. 29.0 29.1 Yang SS, Tu ZJ, Cheung F, Xu WW, Lamb JFS, Jung H-JG, Vance CP, Gronwald JW. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC Genomics. 2011;12:199.10.1186/1471-2164-12-19921504589
  30. Canovas A, Rincon G, Islas-Trejo A, Wickramasinghe S, Medrano JF. SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mamm Genome. 2010;21:592–8.10.1007/s00335-010-9297-z21057797
  31. Andrews S. A quality control tool for high throughput sequence data. 2010.
  32. 32.0 32.1 Swaminathan K, Chae WB, Mitros T, Varala K, Xie L, Barling A, Glowacka K, Hall M, Jezowski S, Ming R. A framework genetic map for Miscanthus sinensis from RNAseq-based markers shows recent tetraploidy. BMC Genomics. 2012;13:142.10.1186/1471-2164-13-14222524439
  33. 33.0 33.1 Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–23.10.1101/gr.089532.10819251739
  34. 34.0 34.1 Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12:656–64.10.1101/gr.229202.ArticlepublishedonlinebeforeMarch200211932250
  35. Surget-Groba Y, Montoya-Burgos JI. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010;20:1432–40.10.1101/gr.103846.10920693479
  36. Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–7.10.1016/S0168-9525(00)02024-210827456
  37. Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.10.1093/nar/25.17.33899254694
  38. Nagaraj SH, Gasser RB, Ranganathan S. A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief Bioinform. 2007;8:6–21.10.1093/bib/bbl01516772268
  39. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.10.1038/7555610802651
  40. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Database issue):D258–61.14681407
  41. Zdobnov EM, Apweiler R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–8.10.1093/bioinformatics/17.9.84711590104
  42. Mulder NJ, Apweiler R: The InterPro database and tools for protein domain analysis. Curr Protoc Bioinforma 2008, Chapter 2:Unit 2 7.
  43. Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE database, its status in 1999. Nucleic Acids Res. 1999;27:215–9.10.1093/nar/27.1.2159847184
  44. Attwood TK, Croning MDR, Flower DR, Lewis AP, Mabey JE, Scordis P, Selley JN, Wright W. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 2000;28:225–7.10.1093/nar/28.1.22510592232
  45. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL. The Pfam protein families database. Nucleic Acids Res. 2004;32(suppl 1):D138–D141.14681378
  46. Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004;32(suppl 1):D142–D144.14681379
  47. 47.0 47.1 Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.10.1093/bioinformatics/bti61016081474
  48. Conesa A, Götz S: Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008, 619832. doi:10.1155/2008/619832
  49. Sadamoto H, Takahashi H, Okada T, Kenmoku H, Toyota M, Asakawa Y. De novo sequencing and transcriptome analysis of the central nervous system of mollusc Lymnaea stagnalis by deep RNA sequencing. PLoS One. 2012;7:e42546.10.1371/journal.pone.004254622870333
  50. 50.0 50.1 Riesgo A, Andrade SC, Sharma PP, Novo M, Perez-Porro AR, Vahtera V, Gonzalez VL, Kawauchi GY, Giribet G. Comparative description of ten transcriptomes of newly sequenced invertebrates and efficiency estimation of genomic sampling in non-model taxa. Front Zool. 2012;9:33.10.1186/1742-9994-9-3323190771
  51. Mehr SF, DeSalle R, Kao H-T, Narechania A, Han Z, Tchernov D, Pieribone V, Gruber DF. Transcriptome deep-sequencing and clustering of expressed isoforms from Favia corals. BMC Genomics. 2013;14:546.10.1186/1471-2164-14-54623937070
  52. Borggrefe T, Oswald F. The Notch signaling pathway: transcriptional regulation at Notch target genes. Cell Mol Life Sci. 2009;66:1631–46.10.1007/s00018-009-8668-719165418
  53. Thamm K, Seaver EC. Notch signaling during larval and juvenile development in the polychaete annelid Capitella sp. I Dev Biol. 2008;320:304–18.10.1016/j.ydbio.2008.04.015
  54. 54.0 54.1 Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29:569–74.10.1016/j.tig.2013.05.01023810203
  55. Cooper EL. Comparative immunology. Integr Comp Biol. 2003;43:278–80.10.1093/icb/43.2.27821680434
  56. Nyholm SV, Graf J. Knowing your friends: invertebrate innate immunity fosters beneficial bacterial symbioses. Nat Rev Microbiol. 2012;10:815–27.10.1038/nrmicro289423147708
  57. Kudenov JD: The reproductive biology of Eurythoe complanata (Pallas, 1766), (Polychaeta: Amphinomidae). University of Arizona; Tuscon: 1974
  58. 58.0 58.1 58.2 Novo M, Riesgo A, Fernández-Guerra A, Giribet G. Pheromone evolution, reproductive genes, and comparative transcriptomics in Mediterranean earthworms (Annelida, Oligochaeta, Hormogastridae) Mol Biol Evol. 2013;30:1614–29.10.1093/molbev/mst07423596327
  59. Watson GJ, Langford FM, Gaudron SM, Bentley MG. Factors influencing spawning and pairing in the scale worm Harmothoe imbricata (Annelida: Polychaeta) Biol Bull. 2000;199:50–8.10.2307/154270610975642
  60. Zeeck E, Hardege J, Bartels-Hardege H. Pla tyn ereis d urn erilii. Mar Ecol Prog Ser. 1990;67:183–8.10.3354/meps067183
  61. Costello MJ, Bouchet P, Boxshall G, Fauchald K, Gordon D, Hoeksema BW, Poore GC, van Soest RW, Stohr S, Walter TC, Vanhoorne B, Decock W, Appeltans W. Global Coordination and Standardisation in Marine Biodiversity through the World Register of Marine Species (WoRMS) and Related Databases. PLoS One. 2013;8:e51629.10.1371/journal.pone.005162923505408
  62. Barroso R, Paiva PC. Amphinomidae (Annelida: Polychaeta) from Rocas Atoll, Northeastern Brazil. Arq Mus Nac. 2007;65:357–62.
  63. Borda E, Kudenov JD, Chevaldonne P, Blake JA, Desbruyeres D, Fabri MC, Hourdez S, Pleijel F, Shank TM, Wilson NG, Schulze A, Rouse GW. Cryptic species of Archinome (Annelida: Amphinomida) from vents and seeps. Proceedings of the Royal Society of London B. 2013;280:20131876.10.1098/rspb.2013.1876
  64. Shimomura O: Bioluminescence: Chemical Principles and Methods. World Scientific Publishing Company; 2012.
  65. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17:540–52.10.1093/oxfordjournals.molbev.a02633410742046
  66. Gouy M, Guindon S, Gascuel O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27:221–4.10.1093/molbev/msp25919854763
  67. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.10.1093/bioinformatics/btl44616928733