Eden’s Reading List Header

Efficient protein structure generation with sparse denoising models | Nature Machine Intelligence

Read More
Michael Jendrusch, Jan O. Korbel
August 27, 2025
A new protein structure generative model family called "salad" (sparse all-atom denoising) offers significant advances for computational protein design, addressing major limitations in current diffusion-based models by enabling efficient, scalable generation of protein backbones up to 1,000 amino acids with improved runtime (19 seconds vs. >10 minutes for RFdiffusion on large proteins), reduced parameter count (~8M vs. 200M in Proteina), and comparable or better designability and diversity. Salad's sparse transformer architecture with invariant point attention reduces computational complexity from cubic to near-linear, enabling high-throughput design of large and complex proteins relevant for biotech, enzyme optimization, antibody and vaccine scaffold design. Notably, salad introduces a flexible structure editing sampling strategy that allows constraint enforcement (motif scaffolding, multi-state proteins, symmetric repeat proteins including screw symmetry) without retraining, outperforming or matching state-of-the-art models like Genie 2 and RFdiffusion on motif-scaffolding benchmarks and enabling multi-motif and multi-state protein design, a previously challenging task. While validated computationally via designability metrics using ProteinMPNN and AlphaFold/ESMFold predictions—with design success rates exceeding prior ML methods for multi-state design—experimental validation remains to be done. This modular, efficient approach advances protein generative modeling with potential impact on enzyme, antibody, biosensor, and vaccine design workflows, and offers a versatile, plug-and-play backbone generator that can integrate with sequence design and downstream experimental pipelines. The model's limitation includes training on PDB-only data without small molecules, suggesting future extension using AlphaFold DB and ligand complexes, which would enhance its applicability for enzyme and small-molecule binder design. Eden Shochat’s portfolio companies operating in biotech and AI-powered design, such as Anodot (AI analytics) or Windward (AI risk analytics), may find interest in similar efficiency and scalability gains in their computational pipelines; analogously, startups in synthetic biology or drug discovery could see salad’s approach as a competitor or collaborator in protein engineering tools.

Chunking Strategies to Improve Your RAG Performance | Weaviate

Read More
Femke Plantinga, Vctoria Slocum
August 27, 2025
Weaviate's in-depth guide on chunking strategies for Retrieval-Augmented Generation (RAG) underscores chunking—the process of splitting documents into smaller, semantically meaningful parts—as the critical factor for improving vector search retrieval and LLM-generated answer accuracy in AI applications using Large Language Models (LLMs). It presents multiple chunking techniques relevant to text-heavy, structured documents—including fixed-size, recursive, document-structure-based (e.g., Markdown, PDF, source code), semantic similarity, LLM-based, agentic (AI-driven dynamic), and late chunking—explaining trade-offs between retrieval precision, context preservation, efficiency, and computational cost. This advice is directly applicable for Aleph portfolio companies like LawGeex and Superlegal dealing with legal, structured, or complex document understanding; Panorays and Windward in cybersecurity risk analysis where accuracy of extracted insights matters; Sequence and Ply working with knowledge retrieval; and Fabric, which may leverage RAG in commerce. The post also notes practical preprocessing needs such as OCR for scanned PDFs and encourages balancing chunk size to avoid LLM hallucinations and optimize cost-efficiency, relevant across Aleph’s AI and data-driven ventures. Furthermore, it references integrations and tooling around document splitting strategies that competitors or partners like LangChain and LlamaIndex provide, illustrating ecosystem movement in RAG workflows.

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models

Read More
Subhey Sadi Rahman, Md. Adnanul Islam, Md. Mahbub Alam, Musarrat Zeba, Md. Abdur Rahman, Sadia Sultana Chowa, Mohaimenul Azam Khan Raiaan, Sami Azam
August 27, 2025
A comprehensive 2023–2025 review analyzes fact-checking and factuality evaluation methods for Large Language Models (LLMs) like GPT-4 and LLaMA, highlighting their frequent hallucinations—factually incorrect yet fluent outputs—due to training on noisy or outdated data; it underscores the limitations of traditional evaluation metrics (accuracy, F1, BLEU, ROUGE) which often fail to capture factual consistency, advocating for advanced specialized metrics (FactScore, TruthfulQA, NLI-based) and using LLMs themselves as evaluators. The review further emphasizes the effectiveness of Retrieval-Augmented Generation (RAG)—combining LLMs with external knowledge retrieval—to reduce hallucinations and improve factual grounding, especially through domain-specific fine-tuning and instruction tuning, which are critical for accurate, explainable outputs in high-stakes industries such as legal, finance, and healthcare. For Aleph’s portfolio companies like LawGeex (legal AI), Grain Finance (financial services), and Windward (maritime risk analytics), whose products rely on domain-specific accuracy and trustworthiness of automated reasoning or AI-generated insights, these findings reinforce the importance of integrating retrieval-augmented, fine-tuned LLMs with robust domain-specific fact-checking frameworks to mitigate risks of misinformation and ensure compliance and reliability. Moreover, emerging metrics and multi-agent reasoning approaches can help improve AI explainability and reliability—key competitive factors against other AI providers targeting regulated sectors.

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Read More
Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping, University of Cambridge, Institute for AI, University of Stuttgart, Max Planck Institute for Intelligent Systems, ELLIS Institute Tübingen, University of Southampton, Tübingen AI Center
August 27, 2025
A new study thoroughly analyzes the long-horizon execution capabilities of large language models (LLMs), revealing that while marginal improvements in single-step accuracy appear to yield diminishing returns, these small gains compound exponentially, enabling longer task execution—a critical insight for economically valuable multi-step tasks like those in software engineering or autonomous agents relevant to Aleph portfolio companies such as Ply (automation), Sequence (workflow), and Grain Finance (complex financial operations). The research isolates execution (performing known plans with provided knowledge) from planning and knowledge acquisition, finding that execution errors (not reasoning failures) cause LLMs to degrade over long tasks, partly due to a "self-conditioning" effect where models increasingly condition on their own errors, worsening accuracy—a phenomenon not rectified merely by scaling model size. However, "thinking" models using sequential test-time compute and reinforcement learning (e.g., GPT-5 "Horizon," DeepSeek R1) overcome self-conditioning, vastly improving task length executed in single turns, outperforming advanced competitors like Anthropic’s Claude-4-Sonnet and xAI’s Grok. This distinction highlights the importance of reasoning-before-acting frameworks (similar to ReAct prompting), pertinent for agentic and automation platforms like Unit and Workiz, which require robust long-horizon reasoning and execution. The findings suggest that further investment in scaling LLM compute remains justified despite apparent single-step metric slowdowns, as these enable economically valuable, extended multi-step task automation, an encouraging sign for Aleph’s portfolio addressing complex workflows, financial instruments, and legal-tech automation (e.g., LawGeex, Superlegal.ai). The paper underscores challenges still faced by open-weight models compared to API-based advanced models, pointing to opportunities for research and product differentiation in execution reliability and long-horizon planning within competitive industries such as legal AI, finance automation, and developer tools.

An AI system to help scientists write expert-level empirical software

Read More
Eser Aygün Google DeepMind, Anastasiya Belyaeva Google Research, Gheorghe Comanici Google DeepMind, Marc Coram Google Research, Hao Cui Google Research, Jake Garrison Google Platforms and Devices, Renee Johnston Google Research, Anton Kast Google Research, Cory Y. McLean Google Research, Peter Norgaard Google Research, Zahra Shamsi Google Research, David Smalling Google DeepMind, James Thompson Google Research, Subhashini Venugopalan Google Research, Brian P. Williams Google Research, Chujun He Google Research Massachusetts Institute of Technology, Sarah Martinson Google Research School of Engineering and Applied Sciences, Harvard University Martyna Plomecka Google Research Google Cloud, Lai Wei Google Research, Yuchen Zhou Google Research, Qian-Ze Zhu Google Research School of Engineering and Applied Sciences Harvard University, Matthew Abraham Google Research, Erica Brand Google Research, Anna Bulanova Google DeepMind, Jeffrey A. Cardille Google Research Faculty of Agricultural and Environmental Sciences McGill University, Chris Co Google Research Scott Ellsworth Google Research, Grace Joseph Google Research, Malcolm Kane Google Research, Ryan Krueger Google Research School of Engineering and Applied Sciences Harvard University, Johan Kartiwa Google Research, Dan Liebling Google Research, Jan-Matthis Lueckmann Google Research, Paul Raccuglia Google Research, Xuefei (Julie) Wang Google Research California Institute of Technology, Katherine Chou Google Research, James Manyika Google Research, Yossi Matias Google Research John C. Platt Google Research, Lizzie Dorfman Google Research, Shibl Mourad Google DeepMind, Michael P. Brenner Google Research School of Engineering and Applied Sciences Harvard University
August 27, 2025
A new AI system combining a Large Language Model and Tree Search autonomously generates expert-level empirical software by systematically improving quality metrics across diverse scientific scorable tasks, outperforming state-of-the-art methods in areas including single-cell RNA-seq batch integration (bioinformatics), COVID-19 hospitalization forecasting (epidemiology), geospatial semantic segmentation, zebrafish neural activity prediction, general time series forecasting, and numerical integration; notable achievements include creating 40 novel batch integration methods surpassing 9 major existing tools, outperforming the CDC COVID Forecast Hub ensemble and top epidemiological models, innovating in complex architecture combinations (like U-Nets and Transformers) for geospatial tasks, and generating faster, accurate neural activity predictors—demonstrating significant acceleration and optimization in scientific computational software development relevant to Aleph portfolio sectors such as Fabric and Panorays that operate in software, data integration, security, and AI-driven analytics, while setting a new paradigm in AI-assisted automated research and model synthesis.

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors

Read More
Aniket Didolkar, Nicolas Ballas, Sanjeev Arora, Anirudh Goyal
August 27, 2025
A recent Meta-led research introduces a metacognitive approach enabling large language models (LLMs) to extract and reuse recurring multi-step reasoning patterns as concise, named "behaviors," stored in a behavior handbook that improves token efficiency by up to 46% and reasoning accuracy by up to 10% on challenging math benchmarks (MATH, AIME), demonstrated via behavior-conditioned inference, self-improvement, and supervised fine-tuning methods; notably, behavior-conditioned supervised fine-tuning significantly enhances performance and efficiency in turning non-reasoning models into reasoning-capable ones. This procedural memory approach differs from typical retrieval-augmented generation by focusing on how to think rather than factual knowledge, aligning with recent trends in LLM reasoning and metacognition. The framework is model- and domain-agnostic with potential applicability in programming, scientific reasoning, and dialogue systems. Given Aleph’s portfolio in AI and data-driven startups like Anodot (anomaly detection), Q.ai (AI-powered finance), Sequence (customer communication automation), and Superlegal (AI for legal workflows), this advance could inspire optimizations in their products by integrating efficient, scalable LLM reasoning processes that reduce computational costs and improve accuracy, especially in complex multi-step tasks such as legal document analysis (Superlegal), financial modeling (Q.ai, Grain Finance), or workflow automation (Sequence, Workiz). Furthermore, competitors employing large LLMs for multi-step reasoning might benefit from similar metacognitive behavior distillation, positioning this technique as a valuable innovation to enhance Aleph portfolio companies’ AI capabilities and cost-effectiveness.

Towards an AI-Augmented Textbook

Read More
LearnLM Team, Google: Alicia Martín, Amir Globerson, Amy Wang, Anirudh Shekhawat, Anisha Choudhury, Anna Iurchenko, Avinatan Hassidim, Ayça Çakmakli, Ayelet Shasha Evron, Charlie Yang, Courtney Heldreth, Diana Akrong, Gal Elidan, Hairong Mu, Ian Li, Ido Cohen, Katherine Chou, Komal Singh, Lev Borovoi, Lidan Hackmon, Lior Belinsky, Michael Fink, Niv Efron, Preeti Singh, Rena Levitt, Shashank Agarwal, Shay Sharon, Tracey Lee-Joe, Xiaohong Hao, Yael Gold-Zamir, Yael Haramaty, Yishay Mor, Yoav Bar Sinai, Yossi Matias
August 27, 2025
A new system called Learn Your Way uses generative AI (Gemini 2.5 Pro) to transform traditional textbooks into personalized, AI-augmented learning experiences by adapting content to learners’ grade level and interests, providing multiple modalities (text, narrated slides, audio-graphic lessons, mind maps, timelines, mnemonics, visual illustrations), and embedding formative assessments like quizzes and questions; pedagogical expert evaluations and a randomized controlled study with teenage students showed Learn Your Way improves learning efficacy and engagement compared to standard digital readers, demonstrating generative AI’s potential to enhance personalized education—relevant to Aleph’s interests in education tech and AI-powered platforms, while offering insight into personalized learning, content transformation, and assessment techniques that might influence portfolio companies like Grain Finance (adaptive financial education) or Ply (content personalization in developer tools).

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

Read More
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Honghui Ding, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaichao You, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang & Zhen Zhang
August 27, 2025
The DeepSeek-R1 paper presents a novel reinforcement learning (RL) framework that incentivizes advanced reasoning capabilities in large language models (LLMs) without relying on human-annotated reasoning traces, achieving superior performance on complex, verifiable tasks such as math competitions (AIME 2024), coding contests, and STEM problems by emergently developing behaviors like self-reflection and verification; notably, smaller distilled versions of DeepSeek-R1 also outperform instruction-tuned counterparts, offering potential research and practical benefits for AI reasoning—relevant to Aleph portfolio companies like Anodot (AI analytics), Ply (developer tools), Q.ai (financial AI), Sequence (workflow automation), and Superlegal (legal AI) that operate in AI-enhanced decision-making and automation domains—while also highlighting challenges in structured output, tool use, language mixing, and model safety that overlap with concerns in AI-driven legal, financial, and software engineering applications.

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Read More
Jiaming Li, Longze Chen, Ze Gong, Yukun Chen, Lu Wang, Wanwei He, Run Luo, Min Yang
August 27, 2025
Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have empowered large language models (LLMs) to tackle challenging reasoning tasks such as mathematics and programming. RLVR leverages verifiable outcome rewards to guide policy optimization, enabling LLMs to progressively improve output quality in a grounded and reliable manner. Despite its promise, the RLVR paradigm poses significant challenges, as existing methods often suffer from sparse reward signals and unstable policy gradient updates, particularly in RL-based approaches. To address the challenges, we propose PACS, a novel RLVR framework that achieves imPlicit Actor Critic coupling via a Supervised learning framework. By treating the outcome reward as a predictable label, we reformulate the RLVR problem into a supervised learning task over a score function parameterized by the policy model and optimized using cross-entropy loss. A detailed gradient analysis shows that this supervised formulation inherently recovers the classical policy gradient update while implicitly coupling actor and critic roles, yielding more stable and efficient training. Benchmarking on challenging mathematical reasoning tasks, PACS outperforms strong RLVR baselines, such as PPO and GRPO, achieving superior reasoning performance. For instance, PACS achieves 59.78\% at pass@256 on AIME 2025, representing improvements of 13.32 and 14.36 points over PPO and GRPO. This simple yet powerful framework offers a promising avenue for LLMs post-training with verifiable rewards. Our code and data are available as open source at https://github.com/ritzz-ai/PACS.