introduction
Advances in next-generation sequencing (NGS) technologies are creating new opportunities for omics data analysis, unlocking valuable insights that benefit precision medicine, clinical diagnostics, and drug discovery. To keep up with this high-throughput technology and accommodate fluctuating data volumes, Healthcare and Life Sciences (HCLS) customers are looking for a secure, reliable, and scalable environment for data storage and analysis. However, they often have limited in-house expertise and resources to set up this infrastructure and run analyses, creating obstacles to large-scale omics data analysis and collaborative research.
To address this challenge, AWS launched AWS HealthOmics, a purpose-built service that helps customers store, query, and analyze genomic and other omics data. By eliminating the undifferentiated heavy lifting of infrastructure provisioning and management, HCLS organizations can focus on scientific discovery and improving patient outcomes. Basepair, an AWS certified software partner, provides a next-generation bioinformatics platform that accelerates the migration, deployment, orchestration, and scaling of bioinformatics workflows on AWS. An intuitive point-and-click interface allows scientists with little computing experience to connect to and process data, and explore it with interactive reports customized for each data type.
In this blog post, we discuss how you can leverage the Basepair bioinformatics platform, powered by AWS HealthOmics, to access an easy-to-use, scalable, and flexible infrastructure for omics data analysis at a predictable cost. We provide an overview of the platform and discuss the key benefits of this integration. We show how this Software-as-a-Service (SaaS) solution allows you to use storage and compute resources in your own AWS account. Finally, we explain how integrated, ready-to-use interactive visualization tools can accelerate your time to scientific insights.
Base pair overview
Basepair is designed to make bioinformatics on AWS easier, faster, and more cost-effective. The platform provides a SaaS platform that democratizes not only access to omics data, but also its analysis and interpretation. The platform can be provisioned into a customer’s AWS account to use their own storage and compute resources, eliminating the overhead and risk of data movement. A point-and-click graphical user interface (GUI) enables end users with little to no programming experience to leverage existing industry-standard tools or build custom workflows while supporting reproducibility. Built-in visualization tools generate interactive reports that reveal valuable insights from data, improving collaboration across R&D teams and allowing bioinformaticians to focus on more advanced downstream analysis. Finally, it provides an application programming interface (API) and a powerful command line interface (CLI) for automation and integration with third-party applications, with a look and feel like an organization’s own web portal with branded labels.
ease of use
Powered by AWS, Basepair’s platform improves the user experience by enabling scientists of all backgrounds to perform complex analyses and easily interpret the resulting data. The point-and-click interface allows end users, even those new to bioinformatics, to access and use the tools they need with minimal training. In addition, the results of the analyses are not a set of downloadable flat files or static web pages. Instead, users can quickly assess the quality of their data and interactively explore it through a set of dynamic, interactive reports optimized for each data type (Figure 1). Then, if they have questions about how to use the data or how to best analyze and interpret it, they can share their samples, analyses, and results with their organization’s bioinformatics team or Basepair’s technical support team to foster collaboration and accelerate support for their R&D projects.
Figure 1: Powered by AWS HealthOmics, the Basepair platform provides an easy-to-use graphical user interface (GUI) that enables customers to leverage the storage and workflow capabilities of HealthOmics. Built-in visualization tools generate interactive reports that accelerate time to scientific insight.
Connected Cloud
Traditional bioinformatics platforms typically use one of two deployment methods: they either move genomic data into a centralized hosted bioinformatics platform environment or they require installation within the customer’s AWS account, which can increase ongoing operations and maintenance. Basepair, on the other hand, can be configured to assume an Identity Access Management (IAM) role to interface with the customer’s existing environment. Through a series of API calls, it can perform read/write operations to the customer’s Amazon Simple Storage Service (S3) bucket or AWS HealthOmics data store. This architecture not only eliminates data movement, thus addressing most compliance, security, and data privacy concerns, but also allows customers to manage cloud costs while still connecting to other tools and resources within their AWS account, as shown in Figure 2.
Figure 2: High-level diagram of Basepair’s connected cloud architecture. The orchestration plane on the left is in Basepair’s AWS account, while customer AWS accounts hosting compute and storage resources are on the right and accessed via restricted IAM roles.
Integrating the Basepair Platform with AWS HealthOmics
The Basepair platform consists of two fundamental components: Storage and Workflow Engine. The role of the Storage module is to efficiently store, retrieve, and organize customer omics data. Historically, Basepair has leveraged Amazon S3 as a storage layer, but is now extending this functionality to AWS HealthOmics. This extension includes establishing connectivity with the AWS HealthOmics sequence and reference stores using the HealthOmics API. The Workflow Engine component is dedicated to designing, monitoring, and executing customer workflows. This integration with HealthOmics allows customers to leverage Ready2Run and private workflow capabilities within Basepair.
Features supported by the integration:
The Basepair Console and Basepair CLI facilitate streamlined sample upload (Figure 3). Direct, interactive visualization of HealthOmics read sets. Archived HealthOmics read sets are automatically activated when utilized in a workflow run. A user-friendly drag-and-drop interface enables custom workflow creation and supports workflows written in Nextflow, Workflow Description Language (WDL), or Common Workflow Language (CWL). Interactive timeline charts are available during workflow execution, providing insight into resource utilization and run times. Comprehensive management and sharing capabilities for samples, workflows, and analyses across multiple users. Connected Cloud capabilities allow customers to integrate their own cloud infrastructure for sample storage and workflow execution.
Figure 3: Steps for uploading input data samples to the Basepair platform.
Figure 4: Steps to start your analysis by selecting one of the sample and Ready2Run workflows provided by AWS HealthOmics.
Figure 5: Steps to access Basepair’s interactive visualization dashboard for data analysis.
Figure 6: Steps to view workflow execution summary and monitor performance, including resource utilization and execution time.
Benefits of an Integrated Platform
One of the key benefits is that it provides a user-friendly, point-and-click GUI that allows scientists of all backgrounds to access AWS HealthOmics and its capabilities (e.g., storage, workflows, etc.), enabling organizations to build on their existing investments in AWS HealthOmics and take advantage of the inherent benefits of AWS HealthOmics, including up to 50% cost savings (compared to traditional object storage), pricing predictability, and improved scalability across the organization.
Basepair’s out-of-the-box visualization tools extend AWS HealthOmics to generate interactive reports that help researchers of all backgrounds explore data before collaborating with bioinformatics experts on informed questions, ultimately improving collaboration across R&D teams and reducing the time to scientific and diagnostic insights by up to 50%, Nkarta Therapeutics reports.
The integrated platform offers customers an extensive list of Ready2Run workflows customized to meet diverse requirements for omics data analysis. In addition, customers also have the flexibility to incorporate their own pipelines into private workflows defined in workflow languages such as Nextflow, WDL, CWL, etc. Through a seamless GUI, users can easily upload their code and define parameters for workflow execution.
An execution overview is provided, allowing bioinformatics professionals to monitor the performance of each workflow and identify failed steps through task-level logs (see Figure 6). Interactive timeline charts are also available during workflow execution, providing valuable insights into resource utilization and execution time. With these powerful features, the platform provides a comprehensive suite of tools required to design, develop, execute, and monitor bioinformatics workflows.
Basepair’s connected cloud capabilities allow customers to provision compute and storage into their own AWS accounts, giving customers full control over not only their data, but also the resources required to store and analyze it. This federated approach improves data security and privacy by completely eliminating data movement that may be required with traditional commercial platforms. It also enhances connectivity to other tools and resources, compliance with local data residency laws, and economies of scale in terms of cloud consumption.
Finally, another benefit is that there is no need to commit the DevOps resources required to build, scale, support, and maintain the infrastructure. By eliminating undifferentiated heavy lifting, the Basepair platform, powered by AWS HealthOmics, helps customers significantly accelerate their time to market or start production.
Conclusion
By making AWS HealthOmics storage and workflow capabilities available to Basepair’s Software-as-a-Service (SaaS) solution, HCLS organizations can now leverage low-cost omics-optimized storage and have a push-button approach to deploying and running their NGS analysis pipelines. This reduces development delays, security complexities, and internal resource requirements, allowing them to focus on new scientific discoveries and delivering important therapies to patients. Additionally, it is now more efficient and cost-effective for these organizations to build their own infrastructure to process omics data at scale, allowing Basepair to focus on the differentiating aspects of their platform – enabling HCLS organizations to quickly, easily, and securely analyze large, complex omics data to accelerate scientific discovery and time to market.
“As healthcare and life science information moves to the cloud, there is an increasing need to create an environment where researchers can run workflows and interactively visualize data,” said Tehsin Syed, general manager of Health AI Services at AWS. “Basepair makes it easier for researchers to run their research by providing a simplified, GUI-based experience. Plus, because this happens within the customer’s own AWS account, customers retain control over their commitments around data governance, security, and usage.”
To learn more about the launch of the Basepair platform, powered by AWS HealthOmics, at the Bio-IT World Conference, watch this video. If you’re interested in evaluating the Basepair platform, powered by AWS HealthOmics, you can sign up for a free trial on AWS Marketplace. You can find more information about Basepair’s unique pay-per-sample licensing model here. More information and resources about the platform. Information about Basepair.