BDE-NA: AI-Powered Optimization of mRNA Sequence Design for Vaccine

BDE-NA Nucleic Acid Drug Design

As one of the main product clusters of the BDE platform, BDE-NA utilizes AI technology to build an "end-to-end" intelligent model through large-scale data training, providing intelligent design and optimization of nucleic acid drugs. It can be quickly ported to different types of nucleic acid sequence design, especially for mRNA drug design, which can significantly improve key indicators of nucleic acid sequences, including ribosome loading, translation efficiency, and stability.

Product Introduction

The BDE-NA nucleic acid drug design system is to solve the problems of the design and optimization in the development process of nucleic acid drugs, especially playing a significant role in mRNA drug design. It can be combined with BDE-Bio target analysis to generate new nucleic acid sequences that meet specific conditions from scratch, optimize and enhance the specific properties of a given nucleic acid sequence, including:

5' UTR sequence design and optimization: Predicting ribosomal load, optimizing sequences, and generating sequences from scratch can increase ribosomal load by an average of 100%.

ORF coding sequence design and optimization: Optimize the coding sequence CDS (codon), improve translation efficiency (CAI) by an average of 15 percentage points, and maintain high stability (lower MFE).

3' UTR sequence design and optimization: predicting half-life, optimizing sequences, and generating sequences from scratch can significantly improve sequence stability, with an average improvement of 200%.

Generation and Optimization of Nucleic Acid Sequences

The design and optimization objects of the BDE-NA include 5' UTR, codon (CDS), and 3' UTR sequences, effectively solving the problems of poor expression efficiency and low stability of mRNA vaccines/drugs.

Generation of UTR Sequences

The multitasking LSTM model predicts MRL (average ribosome load) while generating high MRL UTR sequences, suitable for generating fixed length and variable length sequences.

Prediction performance of fixed length sequences (50 nt)

Prediction performance of variable length sequences (25-100 nt)

5’ UTR Sequence Optimization：To increase ribosome loading.

Codon Sequence (CDS) Optimization：To improve translation efficiency + remain low energy

3’ UTR Sequence Optimization：To improve the stability.

The optimized sequence shows higher stability.

Effects of Sequence Generation

The MRL distribution predicted by integrating multiple different prediction models is more concentrated and biased towards larger values. The MRL distribution of fixed length and variable length sequences predicted by the model is better than that of random sequences. The GC content distribution of the generated sequence is closer to a reasonable distribution range for humans.

The generated sequence has high diversity or complexity with the GC content of the sequence highly consistent with that of the random sequence. The average sequence MRL predicted by multiple models far exceeds 50% of the random sequence.

A. The MRL of generated sequences predicted by multiple models exceeds 50% of the random sequence on average;

B. The complexity and GC content of the sequence are highly consistent with random sequences, and the sequence generated by the model based on experimental testing of random sequences is very realistic in terms of composition properties;

C. The predicted structural energy of the generated sequence is lower than that of the random sequence on average, indicating a higher stability of the sequence structure;

D. For sequences generated by different CDS, the predicted MRL still exceeds 50% of the random sequence

Improve translation efficiency (high CAI) while ensuring a more stable structure (low MFE)