
«`html
µFormer: A Deep Learning Framework for Efficient Protein Fitness Prediction and Optimization
Protein engineering is essential for designing proteins with specific functions, but navigating the complex fitness landscape of protein mutations poses a significant challenge, making it hard to find optimal sequences. Zero-shot approaches, which predict mutational effects without relying on homologs or multiple sequence alignments (MSAs), reduce some dependencies but fall short in predicting diverse protein properties. Learning-based models trained on deep mutational scanning (DMS) or MAVE data have been used to predict fitness landscapes alone or with MSAs or language models. Still, these data-driven models often struggle when experimental data is sparse.
µFormer Approach
Microsoft Research AI for Science researchers introduced µFormer, a deep learning framework that integrates a pre-trained protein language model with specialized scoring modules to predict protein mutational effects. µFormer predicts high-order mutants, models epistatic interactions, and handles insertions. With reinforcement learning, µFormer efficiently explores vast mutant spaces to design enhanced protein variants. The model predicted mutants with a 2000-fold increase in bacterial growth rate, driven by improved enzymatic activity. µFormer’s success extends to challenging scenarios, including multi-point mutations and its predictions were validated through wet-lab experiments, highlighting its potential for optimizing protein design.
The µFormer model operates in two stages: first, by pre-training a masked protein language model (PLM) on a large dataset of unlabeled protein sequences, and second, by predicting fitness scores using three scoring modules integrated into the pre-trained model. These modules—residual-level, motif-level, and sequence-level—capture different aspects of the protein sequence and combine their outputs to generate the final fitness score. The model is trained using known fitness data, minimizing errors between predicted and actual scores.
Additionally, the µFormer is combined with a reinforcement learning (RL) strategy to explore the vast space of possible mutations efficiently. The protein engineering problem in this framework is modeled as a Markov Decision Process (MDP), with Proximal Policy Optimization (PPO) used to optimize mutation policies. Dirichlet noise is added during the mutation search process to ensure effective exploration and avoid local optima. Baseline comparisons were made using models like ESM-1v and ECNet, and they were evaluated on datasets such as FLIP and ProteinGym.
µFormer, a hybrid model combining a self-supervised protein language model with supervised scoring modules, predicts protein fitness scores efficiently. Pre-trained on 30 million protein sequences from UniRef50 and fine-tuned with three scoring modules, µFormer outperformed ten methods in the ProteinGym benchmark, achieving a mean Spearman correlation of 0.703. It predicts high-order mutations and epistasis, with strong correlations for multi-site mutations. In protein optimization, µFormer, paired with reinforcement learning, designed TEM-1 variants that significantly improved growth, with one double mutant outperforming a known quadruple mutant.
In conclusion, Previous studies have shown the potential of sequence-based protein language models in tasks like enzyme function prediction and antibody design. µFormer, a sequence-based model with three scoring modules, was developed to generalize across diverse protein properties. It achieved state-of-the-art performance in fitness prediction tasks, including complex mutations and epistasis. µFormer also demonstrated its ability to optimize enzyme activity, particularly in predicting TEM-1 variants against cefotaxime. Despite its success, improvements can be made by incorporating structural data, developing phenotype-aware models, and creating models capable of handling longer protein sequences for better accuracy.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and LinkedIn. Join our Telegram Channel.
Применение ИИ в бизнесе
Если вы хотите, чтобы ваша компания развивалась с помощью искусственного интеллекта (ИИ) и оставалась в числе лидеров, грамотно используйте µFormer: A Deep Learning Framework for Efficient Protein Fitness Prediction and Optimization .
Проанализируйте, как ИИ может изменить вашу работу. Определите, где возможно применение автоматизации: найдите моменты, когда ваши клиенты могут извлечь выгоду из AI.
Определитесь какие ключевые показатели эффективности (KPI): вы хотите улучшить с помощью ИИ.
Подберите подходящее решение, сейчас очень много вариантов ИИ. Внедряйте ИИ решения постепенно: начните с малого проекта, анализируйте результаты и KPI.
На полученных данных и опыте расширяйте автоматизацию.
Если вам нужны советы по внедрению ИИ, пишите нам на https://t.me/itinai . Следите за новостями об ИИ в нашем Телеграм-канале https://t.me/aisalesbotnews
Попробуйте AI Sales Bot https://saile.ru/ Это AI ассистент для продаж, он помогает отвечать на вопросы клиентов, генерировать контент для отдела продаж, снижать нагрузку на первую линию.
Узнайте, как ИИ может изменить процесс продаж в вашей компании с решением от saile.ru будущее уже здесь!
«`