Many areas of cephalopod biology- from neuronal function at the cellular and systems levels to cephalopod population dynamics to the evolution of gene regulatory elements mediating body plan variation- would benefit greatly from the molecular insight that high-quality cephalopod genomics would provide. Indeed, it is astonishing that, in 2012, with the explosion of genome resources for so many life forms, there is not yet available a single assembled cephalopod genome. The goal of the CephSeq Consortium is to provide organizational mechanisms for cephalopod biology to move from the pre-genomic to the post-genomic age.
In the last decade, sequencing technology has advanced to be able to rapidly generate large amounts of data. Indeed, many different massively parallel high-throughput sequencing platforms have emerged in recent years. The data generated from these platforms then has to be assembled into longer sequences and annotated using bioinformatics methods.
Annotation of novel genomes is a complex problem. Efforts at automated annotation of molluscan genomic sequences have demonstrated the challenge facing the future annotation of cephalopod genomes. Long branch lengths within the phylum, the taxonomic distances to well annotated animal genomes, and the relatively low quantity of previous molecular and genetic work in molluscs will demand the generation of additional resources to assist and train automated gene detection programs. Of primary importance will be the generation of transcript inventories to identify genes, refine gene models, detect start points and intron-exon boundaries, and train automated gene identification algorithms. Transcriptome data such as those from RNAseq are quick and relatively inexpensive to generate, and will be immensely useful. Systematic sequencing of nervous system tissues and embryonic stages can be combined with even relatively early-stage assemblies to generate gene models and exon structures. In addition, pairs of species (e.g. O. vulgaris and O. bimaculoides, I. notoides and I. paradoxus), through comparative sequence analysis, may be critical for annotation.