Eden’s Reading List Header

Latent Collaboration in Multi-Agent Systems

Jiaru Zou Princeton University University of Illinois Urbana-Champaign Co-Leadership Core Contributors, Xiyuan Yang University of Illinois Urbana-Champaign Co-Leadership Core Contributors, Ruizhong Qiu University of Illinois Urbana-Champaign Core Contributors, Gaotang Li University of Illinois Urbana-Champaign Core Contributors, Katherine Tieu University of Illinois Urbana-Champaign Core Contributors, Pan Lu Stanford University Core Contributors, Ke Shen Hanghang Tong University of Illinois Urbana-Champaign, Yejin Choi Stanford University, Jingrui He University of Illinois Urbana-Champaign, James Zou Stanford University, Mengdi Wang Princeton University, Ling Yang Princeton University

December 3, 2025

A new multi-agent system framework LatentMAS enables large language models (LLMs) to collaborate entirely in continuous latent space rather than text, achieving up to 14.6% higher accuracy, 4x faster inference, and ~70-84% reduction in token usage across diverse benchmarks including math, science, commonsense reasoning, and code generation; compared to conventional text-based multi-agent collaboration methods, LatentMAS provides lossless, more expressive, and computationally efficient cross-agent communication via shared latent working memory (KV caches) without additional training, offering a promising new paradigm for agentic AI systems relevant to portfolio companies like Anodot and Sequence that rely on advanced AI reasoning, or Ply and Simply that could benefit from more efficient multi-agent LLM collaboration frameworks.

Using skills with Deep Agents

LangChain

December 3, 2025

Anthropic introduced "agent skills," folders with a SKILL.md file and related resources that generalist AI agents like Claude Code and Manus can dynamically load for efficient task execution using few core tools plus filesystem and shell access, enabling token-efficient, composable, and continuously learnable capabilities; DeepAgents CLI, an open-source coding assistant similar to Claude Code with filesystem/code execution abilities, has integrated skills support allowing it to leverage a growing public skills library, which could impact tooling approaches for AI assistants relevant to portfolio companies developing AI-driven workflows or automation like LawGeex, Superlegal, Ply, or Sequence.

Using CLAUDE.MD files: Customizing Claude Code for your codebase | Claude

Claude

December 3, 2025

Anthropic's Claude Code introduces CLAUDE.md, a customizable configuration file that provides persistent, project-specific context to AI coding agents, enabling them to better understand code architecture, conventions, and workflows without repeated explanations; this approach, paired with features like automated CLAUDE.md generation via `/init`, subagent context isolation, and custom commands, can improve AI-assisted development workflows relevant to Aleph’s portfolio companies such as Fabric (getfabric.com) and Sequence (getsequence.io) that leverage AI for code automation and developer productivity.

Effective harnesses for long-running agents \ Anthropic

Anthropic

December 3, 2025

Anthropic has developed a two-part solution for their Claude Agent SDK to enable long-running AI coding agents to maintain consistent incremental progress across multiple context windows by employing an initializer agent that sets up the environment and a coding agent making incremental feature-by-feature improvements with structured logs and git commits; this approach addresses common failure modes such as losing context, incomplete features, or premature completion, using tools like progress files, feature requirement lists in JSON, and browser automation testing for quality assurance. These advancements in multi-session autonomous coding workflows could be relevant for Aleph portfolio companies like Fabric (software/automation) and Superlegal (AI legal tech automation), as well as inform future agent designs in financial modeling relevant to Grain Finance and workflow automation in Workiz, while also highlighting potential multi-agent architectures and broader applications beyond web apps.

Google Further Encroaches on Nvidia’s Turf With New AI Chip Push — The Information

Amir Efrati, Erin Woo, Anissa Gardizy

December 3, 2025

Google is advancing into Nvidia's AI chip market, intensifying competition in the AI hardware space, which is relevant to Aleph portfolio company NextSilicon that develops AI-centric semiconductor solutions.

Paper page - SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Senyu Fei, Siyin Wang, Li Ji, Ao Li, Shiduo Zhang, Liming Liu, Jinlong Hou, Jingjing Gong, Xianzhong Zhao, Xipeng Qiu

December 3, 2025

A new reinforcement learning framework, Self-Referential Policy Optimization (SRPO), significantly improves vision-language-action (VLA) robotic manipulation by using latent world representations to assign progress-wise rewards to failed trajectories without external demonstrations; SRPO achieves state-of-the-art success rates on the LIBERO benchmark (99.2%) with over 100% improvement, addressing reward sparsity—a challenge relevant to Aleph portfolio companies like Anodot and Panorays that leverage AI and data-driven models in complex environments.

The Hidden Metric That Determines AI Product Success

Assaf Elovic, Harrison Chase

December 3, 2025

A key insight for AI product success — relevant for Aleph’s portfolio companies like LawGeex (legal AI) and Grain Finance (financial AI) — is the CAIR metric (Confidence in AI Results), which quantifies user confidence by balancing AI value against risk and the effort to correct AI mistakes; products with high CAIR driven by thoughtful design (e.g., Cursor’s safe local code generation or TurboTax’s human-in-the-loop tax AI) see much higher adoption despite similar model accuracy, emphasizing that managing user fear through product design—such as reversibility, consequence isolation, transparency, strategic human oversight, and adjustable control—can be more critical than technical sophistication alone for AI adoption in high-stakes domains like finance, legal, and healthcare.

The World’s First Thorium Molten Salt Reactor | OilPrice.com

Haley Zaremba

December 3, 2025

China's experimental thorium molten salt reactor (TMSR) has achieved a historic milestone by sustaining thorium-to-uranium fuel conversion, potentially reducing China's dependence on Russian-enriched uranium and boosting its global nuclear energy dominance; this breakthrough in thorium-based nuclear technology could transform China's energy security with abundant domestic thorium supply, posing implications for the broader clean energy and nuclear sectors, including companies focused on energy analytics like Anodot or Windward that track energy markets and related geopolitical risks.

China makes crucial breakthroughs in pursuit of limitless energy source: 'Moving very, very fast'

Joseph Clark

December 3, 2025

China is rapidly advancing in nuclear energy by building reactors at a pace far surpassing the U.S., aiming to exceed U.S. nuclear capacity by 2030, while also pursuing fusion energy breakthroughs targeted for 2027; this contrasts with U.S. innovations like Oklo's nuclear fuel recycling and advanced fuel rods development—advances relevant to Aleph portfolio companies such as Windward.ai in energy risk and environmental data, and Sequence in industrial analytics, as the global energy race impacts clean tech investment and regulatory environments.

New synthetic material creates functional brain-like tissue

Pranjal Malewar

December 3, 2025

Scientists at UC Riverside have developed BIPORES, a synthetic brain-like tissue scaffold made from polyethylene glycol that supports neural stem cell growth and connectivity without animal-derived materials, offering a more biomimetic platform for neural tissue engineering that could reduce animal testing and enable longer-term disease and drug research—an advancement relevant to Aleph portfolio companies like Sightful and Grain Finance working in AI-driven biomedical modeling and drug development.

Paper page - Back to Basics: Let Denoising Generative Models Denoise

Tianhong Li, Kaiming He

December 3, 2025

A new research paper proposes a Transformer-based generative model called JiT that directly predicts clean image data—unlike current denoising diffusion models that predict noise—leveraging manifold assumptions for improved high-dimensional image generation without tokenizers or pre-training, which may be relevant for Aleph portfolio companies like Sightful and Sequence working with advanced AI and generative model technologies.

Introducing Netic: The AI Revenue Engine For America’s Essential Services | Greylock

Asheem Chandna

December 3, 2025

Greylock has led Netic's seed and Series A funding rounds and partnered with the company, which uses autonomous AI agents combined with marketing automation to optimize lead capture and revenue for America's essential services sectors like HVAC, plumbing, and roofing; Netic, founded by Melisa Tokmak (ex-Scale AI), integrates customer data with external signals (weather, equipment age, seasonal demand) to proactively uncover service opportunities, having already autonomously booked over 50,000 jobs and partnered with major home services companies such as Hoffmann Brothers and Nexstar Network, representing a transformative AI adoption in traditionally underserved service industries that could be relevant to Aleph portfolio companies like Workiz and Fabric serving field service operations.

Introducing Aardvark: OpenAI’s agentic security researcher | OpenAI

OpenAI

December 3, 2025

OpenAI has launched Aardvark, a GPT-5 powered autonomous AI agent designed to assist developers and security teams by continuously analyzing code repositories to detect, validate, and patch security vulnerabilities at scale, integrating with tools like GitHub and Codex; this innovation addresses the systemic risk of software vulnerabilities crucial to industries including enterprise SaaS—potentially relevant to Aleph portfolio companies like LawGeex and Superlegal, which operate in AI-driven legal tech, where code security and scalable bug detection are vital—while also setting a new standard for proactive, developer-friendly security workflows that may impact competitors and partners in security-focused tech and AI-enabled SaaS sectors.

The Anchor That Almost Was. Simulated Introspection in Large… | by Erez Azaria | Towards AI

Erez Azaria

December 3, 2025

This article explores the concept of simulated introspection in large language models (LLMs) like ChatGPT-4o, showing how these models can generate metaphorical, poetic self-descriptions and abstract visual metaphors that approximate their internal token-generation dynamics, despite lacking genuine self-awareness or access to their own processes; this capability could have practical implications for improving AI self-debugging and behavioral modulation, a relevant insight for Aleph portfolio companies working with LLMs or advanced AI like LawGeex, Superlegal, or Ply, which leverage AI for document analysis or conversational automation.

Google Scholar Labs helps you answer research questions with AI

Google

December 3, 2025

Google Scholar Labs, a new AI-powered feature by Google, enhances scholarly research by analyzing complex research questions from multiple angles and identifying key relevant papers, which could impact portfolio companies like LawGeex and Superlegal that operate in AI-driven legal and research analytics.

Paper page - MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Jinhao Chen, Zhen Yang, Jianxin Shi, Tianyu Wo, Jie Tang

December 3, 2025

A proposed Azul AGI system could significantly enhance multimodal mathematical reasoning frameworks like MathSE—used for iterative reflection and reward-guided fine-tuning—by enabling faster self-evolving learning, improved integration of textual and visual mathematical data, dynamic reward optimization, superior generalization to novel problems, and more efficient scaling; this would benefit industries relying on advanced AI-driven mathematical and visual reasoning, relevant for Aleph portfolio companies leveraging AI, such as Q.ai (AI finance analytics) or Sequence (data-driven insights), by advancing capabilities in complex reasoning and multimodal data interpretation.

Aligning brains into a shared space improves their alignment with large language models | Nature Computational Science

Arnab Bhattacharjee, Zaid Zada, Haocheng Wang, Bobbi Aubrey, Werner Doyle, Patricia Dugan, Daniel Friedman, Orrin Devinsky, Adeen Flinker, Peter J. Ramadge, Uri Hasson, Ariel Goldstein, Samuel A. Nastase

December 3, 2025

Recent research published in Nature Computational Science shows that aligning electrocorticography data from multiple participants into a shared neural representational space improves the accuracy of large language models (LLMs) in predicting brain activity during natural language processing by 37%, particularly in language-related brain regions; this advancement in modeling brain-language alignment may have implications for Aleph portfolio companies working on AI-driven language understanding and neural interfaces, such as Anodot (for anomaly detection), LawGeex and Superlegal (for AI in legal language processing), and Q.ai and Grain Finance (where better natural language understanding could enhance conversational AI and risk modeling).

Evolution Is Not Neutral: New Study Challenges 60-Year Biology Theory

University of Michigan

December 3, 2025

A new study from the University of Michigan challenges the 60-year-old Neutral Theory of Molecular Evolution by showing that beneficial mutations occur more frequently than previously thought, but they often fail to become fixed in populations due to rapidly changing environments; this has broad implications for understanding adaptation in organisms, including humans, and may be relevant to portfolio companies like Anodot and Windward that leverage real-time analytics and adaptive technologies in dynamic conditions.

Paper page - TiDAR: Think in Diffusion, Talk in Autoregression

Jingyu Liu, Xin Dong, Zhifan Ye, Rishabh Mehta, Yonggan Fu, Vartika Singh, Jan Kautz, Ce Zhang, Pavlo Molchanov

December 3, 2025

TiDAR, a new hybrid diffusion-autoregressive model, achieves high throughput and AR-level quality in language generation by combining diffusion-based token drafting with autoregressive sampling in a single forward pass, outperforming speculative decoding and diffusion models in efficiency and quality—an advancement relevant for portfolio companies like Sightful (which operates in AI-driven insights) and any Aleph company leveraging large language models for improved performance and cost-efficient inference.

Chinese Satellite Crushes Starlink With 2-Watt Laser Fired From 36,000 KM in Space

Arezki Amiri

December 3, 2025

China demonstrated a breakthrough in satellite internet by transmitting 1 Gbps data at 36,000 km from geostationary orbit using a low-power 2-watt laser, outperforming Starlink's LEO constellation speeds by five times with much less power; this laser-based optical communication system, leveraging adaptive optics and mode diversity reception, offers higher bandwidth, lower latency, reduced interference, and potential applications in space defense and deep-space telemetry, presenting a significant alternative to the RF-based, power-intensive thousands-satellite LEO models like Starlink, relevant to industries including satellite communications, space infrastructure, and secure government transmissions.

Revolutionizing Retail: Meet the Two-Stage AI-Enhanced Search | Microsoft Community Hub

Sel Housseini

November 17, 2025

Microsoft has introduced a Two-Stage AI-Enhanced Search for retail e-commerce, leveraging Azure AI Search and Azure OpenAI to deliver context-aware, personalized product recommendations by understanding shopper intent, preferences, and behaviors; this approach notably improves relevancy and conversion rates by 30–50%, reduces search abandonment, and drives upselling—features highly relevant for portfolio companies like Fabric and Ply in the e-commerce space, while also providing a competitive edge against traditional keyword search solutions.

An updated evolutionary classification of CRISPR–Cas systems including rare variants | Nature Microbiology

Kira S. Makarova, Sergey A. Shmakov, Yuri I. Wolf, Pascal Mutz, Han Altae-Tran, Chase L. Beisel, Stan J. J. Brouns, Emmanuelle Charpentier, David Cheng, Jennifer Doudna, Daniel H. Haft, Philippe Horvath, Sylvain Moineau, Francisco J. M. Mojica, Patrick Pausch, Rafael Pinilla-Redondo, Shiraz A. Shah, Virginijus Siksnys, Michael P. Terns, Jesse Tordoff, Česlovas Venclovas, Malcolm F. White, Alexander F. Yakunin, Feng Zhang, Eugene V. Koonin

November 17, 2025

A major update in the evolutionary classification of CRISPR–Cas systems expands the framework to 2 classes, 7 types, and 46 subtypes (from 6 types and 33 subtypes five years ago), incorporating newly identified rare variants—particularly multiple new class 1 subtypes (III-G, III-H, III-I) and a distinct type VII with Cas14 nuclease—reflecting a broad diversity mainly involved in prokaryotic adaptive immunity and defense against viruses and mobile genetic elements (MGEs). The analysis highlights extensive modular evolution, signaling pathway diversity (cOA and SAM–AMP second messengers in type III), and reveals frequent recruitment of CRISPR–Cas by transposons for RNA-guided transposition, plus repeated exaptation for non-defense functions. Distribution studies show CRISPR–Cas systems are highly prevalent in archaea (especially thermophiles) and bacteria, with class 1 systems generally dominating over class 2. Newly discovered rare variants contribute only about 0.3% of systems in sequenced genomes but emphasize a long tail of diversity needing further mining aided by AI and metagenomic data. For Aleph portfolio companies operating in AI, SaaS for data-driven products, and cybersecurity domains such as Anodot, Panorays, Ply, Sequence, and Superlegal, these insights indicate a rapidly evolving landscape of CRISPR-related molecular tools and defense mechanisms—highlighting potential areas for advanced genomic data analytics and biosecurity applications, and possibly informing innovative product development leveraging CRISPR technologies or their evolutionary dynamics.

AI Interview Series #1: Explain Some LLM Text Generation Strategies Used in LLMs - MarkTechPost

Arham Islam

November 17, 2025

The article explains key large language model (LLM) text generation strategies—Greedy Search, Beam Search, Nucleus (Top-p) Sampling, and Temperature Sampling—highlighting their differing balance of focus, creativity, and coherence, relevant for Aleph portfolio companies using or developing LLM-based products like LawGeex (legal AI) and Superlegal (legal workflows), where precise vs. varied text generation matters. It also presents a detailed PyTorch implementation of a continual learning neural memory agent combining differentiable memory, experience replay, and meta-learning for adaptive AI systems, which could inform Aleph investments in AI infrastructure like NextSilicon or Grain Finance that may require stable continual learning models. Additionally, Kosmos—a new autonomous AI scientific discovery system running extensive research cycles using a structured long-term world model and multi-agent design—is described in depth; it reads thousands of papers and runs tens of thousands of lines of code to generate credible scientific reports with ~79% accuracy, reproducing known results and proposing novel mechanisms across domains including metabolomics, materials science, neuroscience, and genetics. This breakthrough in AI-augmented research demonstrates advanced LLM agent orchestration and reasoning capabilities with reproducibility and auditability, relevant to Aleph portfolio sectors like Windward (maritime AI leveraging complex data) and Fabric (supply chain automation), where integrating structured memory and multi-agent workflows can boost decision-making. Finally, the article surveys common memory architectures for multi-agent LLM systems, highlighting vector memory systems (e.g., vector retrieval augmented generation) as the dominant pattern for fast, scalable but limited memory suited for local or embedded queries, while noting their weaknesses on temporal, cross-session, or multi-hop relational tasks—insights critical for building robust AI agents like those in Sightful (sales intelligence) or Unit (embedded banking), where memory design impacts system reliability and user experience.

Google AI Introduces DS STAR: A Multi Agent Data Science System That Plans, Codes And Verifies End To End Analytics - MarkTechPost

Asif Razzaq

November 17, 2025

Google AI has developed DS STAR, a multi-agent data science system that automates end-to-end analytics by converting open-ended data questions over heterogeneous files (CSV, JSON, text) into verified Python code, enabling multi-step analysis beyond traditional Text-to-SQL over structured tables; DS STAR uses iterative planning, coding, verification, debugging, and file retrieval modules, achieving significant performance improvements on benchmarks like DABStep, KramaBench, and DA Code compared to prior agents including Data Interpreter and DA Agent, demonstrating model-agnostic robustness with Gemini 2.5 Pro and GPT 5—advancing practical data science automation that is relevant for Aleph portfolio companies working in data analytics, AI, and fintech such as Anodot, Grain Finance, Fabric, Panorays, Ply, Q.ai, Sequence, and Windward.

Deep learning models simultaneously trained on multiple datasets improve base-editing activity prediction | Nature Communications

Ying Sun, Kunli Qu, Giulia I. Corsi, Christian Anthon, Xiaoguang Pan, Xi Xiang, Lars Juhl Jensen, Lin Lin, Yonglun Luo, Jan Gorodkin

November 17, 2025

A new study in Nature Communications advances CRISPR base editing by generating a large-scale dataset (~20,000 gRNAs) on adenine (ABE7.10) and cytosine (BE4-Gam) base editors in HEK293T cells, and develops integrated deep learning models—CRISPRon-ABE and CRISPRon-CBE—that simultaneously predict guide RNA (gRNA) editing efficiency and diverse editing outcomes across multiple datasets; these models outperform existing tools by leveraging dataset-aware training and inclusion of Cas9 efficiency features, but still face challenges predicting outcomes for newer base editors (like ABE8.20m or CBE6b) due to limited data, highlighting the need for additional large-scale datasets—advances relevant to portfolio companies in AI-driven biotech and genomic editing, including Finaloop, NextSilicon, and Windward, given their interest in AI, computational biology, and data integration, and potentially impacting competitors or partners in CRISPR-based gene editing platform development.

Code execution with MCP: building more efficient AI agents \ Anthropic

Anthropic

November 17, 2025

Anthropic's Model Context Protocol (MCP), an open standard for connecting AI agents to external tools, faces scaling issues as loading all tool definitions and intermediate data inflates token usage, increasing costs and latency; their new approach uses code execution environments to interact with MCP servers as code APIs that load only needed tools and filter data before it reaches the model, dramatically improving efficiency (up to 98.7% token savings), enabling better handling of sensitive data, state persistence, and complex logic—an advancement relevant to Aleph's portfolio companies like Anodot, Panorays, Sequence, and Workiz that operate in AI-driven analytics, cybersecurity, workflow automation, or financial tech, where efficient, scalable, and secure AI agent integration with multiple external systems (e.g., Google Drive, Salesforce) is critical.

How long can China play the "rare earths card"?

Arnaud Bertrand

November 17, 2025

China's dominant control over rare earth elements—a critical input for advanced manufacturing and technologies relevant to many industries—stems not just from lax environmental regulations but from massive scale, vertical supply chain integration, and industrial capacity that the West would need to overhaul on a generational, trillion-dollar scale to compete; this deep structural challenge affects supply chains for elements like gallium under China's export controls, with implications for sectors including semiconductor materials and batteries, which intersect with portfolio companies working in advanced hardware (NextSilicon) and AI/machine learning reliant on specialized chips (e.g., Anodot, Windward).

China found in U.S. archives an energy source that could power its future for 20,000 years - and made it work

Arnaud Bertrand

November 17, 2025

China has successfully revived a 1960s U.S. Oak Ridge National Laboratory thorium-based nuclear reactor technology—abandoned in the U.S. due to politics—that offers safer, cheaper, and cleaner energy with far less waste, potentially transforming the energy sector; this breakthrough in advanced energy technology could impact industries related to energy risk management and analytics, relevant for portfolio companies like Anodot and Windward operating in data-driven and risk-sensitive fields.

Arnaud Bertrand (@arnaudbertrand): "Here's a question I know many are wondering about: why did China wait until now to use rare earths as leverage against the US? Why not in the first Trump administration when the US started the trade hostilities?

Arnaud Bertrand

November 17, 2025

China only recently began using rare earth exports as leverage against the US largely because until 2024 it was heavily dependent on US-controlled helium—vital for industries like chip lithography, quantum computing, and MRI machines—but massive investments and new technologies enabled China to break this helium dependency, reducing US reliance from 95% to under 5%, thereby removing a critical US leverage point; this shift in supply chain sovereignty, relevant for sectors like semiconductor manufacturing, highlights how strategic resource independence enables geopolitical moves, a dynamic important to Aleph portfolio companies involved in chip tech (NextSilicon) and AI-enabled analytics (Anodot, Windward).

Decoding how cells choose to become muscles or neurons

Friedrich Miescher Institute for Biomedical Research, edited by Lisa Lock, reviewed by Robert Egan

November 17, 2025

Researchers uncovered how transcription factors NGN2 and MyoD1 selectively bind DNA to direct stem cells toward neurons or muscle cells by using a machine learning model to decode DNA "language," revealing that binding depends on DNA accessibility and partner proteins; this insight into gene regulation and cell fate decisions could advance understanding and control of development and disease, relevant to AI-driven bioinformatics tools like those in Fabric's portfolio.

DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process - MarkTechPost

Asif Razzaq

November 17, 2025

A new AI agent architecture called DeepAgent from Renmin University combines autonomous reasoning, on-demand retrieval from large tool registries (16,000+ RapidAPI and ~3,900 ToolHop tools), memory folding for long tasks, and reinforcement learning (ToolPO) for precise tool use, achieving state-of-the-art and consistent performance on multiple tool-using benchmarks and complex environments; this approach could impact Aleph portfolio companies leveraging LLM agents with tool ecosystems, such as Fabric (API integrations, workflow automation) and Sequence or Ply (enterprise data and workflow tools), by enabling more scalable, adaptive AI assistant capabilities beyond fixed prompt toolsets. Additionally, a new enterprise benchmarking framework evaluates rule-based, LLM-powered, and hybrid agents on diverse, realistic software tasks (e.g., data transformation, API integration, workflow automation) with metrics on accuracy, speed, and success rate, providing a foundation to assess AI agent effectiveness in business contexts—a potential benchmarking tool relevant for startups in Aleph’s portfolio focusing on workflow automation and AI-driven enterprise software like Workiz, Simply, or Fabric.

How to scale RL - by Nathan Lambert - Interconnects

Nathan Lambert

November 2, 2025

A recent seminal paper, *The Art of Scaling Reinforcement Learning Compute for LLMs* (ScaleRL, 2025), establishes a method to predict RL learning curve trajectories and compute budgets needed for top performance on large language models, highlighting key algorithmic advances (e.g., truncated importance sampling, CISPO) and system improvements like Pipeline RL that yield 4x+ throughput gains; while distinct from pretraining scaling laws used in Aleph portfolio companies such as Q.ai and Grain Finance that apply foundation models or AI-driven decision-making, this work advances the practical science of scaling RL fine-tuning to extract last-percent gains and optimize model tuning, which is critical for AI startups leveraging RL in NLP or reasoning tasks; however, it leaves open challenges on data selection impact and base model choice—areas relevant for companies like LawGeex and Superlegal focusing on AI with domain-specific RL fine-tuning—and underscores that most progress currently depends on large lab resources, suggesting a competitive edge for firms that can integrate these RL scaling best practices and infrastructure innovations into their ML workflows.

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng

November 2, 2025

A new Meta RL algorithm, MetaDreamer, improves data efficiency and generalization in reinforcement learning by using latent context and MDP imagination via generative world models, which could be relevant for Aleph portfolio companies like NextSilicon and Sequence that work in advanced AI, machine learning, or chip design for AI acceleration.

Reptile: A scalable meta-learning algorithm | OpenAI

OpenAI

November 2, 2025

OpenAI introduced Reptile, a scalable meta-learning algorithm similar in performance to first-order MAML but computationally simpler, which enables quick adaptation to new tasks with few examples by optimizing neural network initialization via standard stochastic gradient descent; this innovation in meta-learning has implications for AI model generalization, relevant to portfolio companies like Q.ai and Sequence that utilize machine learning for financial predictions and intelligent automation.

Harnessing the Universal Geometry of Embeddings

Rishi Jha, Collin Zhang, Vitaly Shmatikov, John X. Morris, Department of Computer Science Cornell University

November 2, 2025

Researchers from Cornell introduced vec2vec, the first unsupervised method to translate text embeddings between different vector spaces without paired data or knowledge of the encoders, by learning a universal latent representation aligning diverse embedding models—including cross-architecture and cross-modality (e.g., CLIP) embeddings—with near-perfect cosine similarity, enabling downstream tasks such as zero-shot attribute inference and embedding inversion that can extract sensitive semantic information from embeddings alone (e.g., medical records or corporate emails). This universal embedding translation advances understanding of embedding geometry (supporting the Strong Platonic Representation Hypothesis) and raises security concerns regarding information leakage from embedding databases, with relevance to Aleph portfolio companies working in NLP, retrieval, finance (Grain Finance, Q.ai), legal AI (LawGeex, Superlegal), and cybersecurity (Panorays), highlighting the importance of embedding space alignment for interoperability and privacy in embedding-powered applications.

הבחירות בניו יורק: "קרב הבלימה" בממדאני האנטי-ישראלי | מדריך

Daniel Edelstein

November 2, 2025

The upcoming New York mayoral election features a dramatic contest where Democratic socialist and outspoken anti-Zionist candidate Zohran Mamdani, who won the Democratic primary by surprise, challenges former Governor Andrew Cuomo running as an independent and Republican Curtis Sliwa with little chance; Mamdani advocates cutting city investments in Israeli bonds and companies, ending NYC-Israel economic councils, and easing pro-Palestinian protests, which could impact Jewish philanthropic organizations and Israeli business ties in NYC—a city hosting a large Jewish and Israeli community and significant Israeli-founded tech startups generating about $19.5 billion in 2024; Cuomo supports Israel and has establishment backing, while Mamdani's potential win may reshape NYC's political discourse and affect sectors including philanthropy, tech collaboration, and investment closely linked to Israel.

DeepMind's Return in Nature: AI Agent Develops Most Powerful RL Algorithm

Junhyuk Oh, Greg Farquhar, lurii Kemaev, Dan A. Calian, Matteo Hessel, Luisa Zintgraf, Satinder Singh, Hado van Hasselt, David Silver

November 2, 2025

Google DeepMind's latest research, published in Nature, demonstrates that AI agents can autonomously discover highly efficient reinforcement learning (RL) algorithms—specifically, the DiscoRL family (Disco57 and Disco103)—that outperform human-designed leading algorithms like MuZero and Dreamer across diverse benchmarks such as Atari, ProcGen, Crafter, and NetHack; this breakthrough suggests future advanced AI development, including for AI-driven portfolio sectors like Windward (AI analytics) and NextSilicon (advanced computation), could rely on machine-discovered RL algorithms, enhancing efficiency and generalizability while potentially disrupting industries dependent on human-designed AI models.

How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models - MarkTechPost

Asif Razzaq

November 2, 2025

A tutorial demonstrates building a fully functional computer-use agent using a local Flan-T5 language model that can autonomously reason, plan, and execute virtual actions like clicking and typing within a simulated desktop environment; this approach of local LLM-powered desktop automation is relevant for portfolio companies focused on AI-driven task automation and tool interaction such as Superlegal (legal AI agents), Ply (workflow automation), and Sequence (interactive agent platforms), showcasing foundational architecture for future multimodal and secure automation systems.

Scientists Discover a Key Biological Difference Between Psychopaths and Normal People

Nanyang Technological University

November 2, 2025

Neuroscientists from NTU Singapore, University of Pennsylvania, and California State University found that psychopaths have a striatum about 10% larger than non-psychopaths, a biological difference linked to impulsivity and higher reward-seeking behavior in both men and women; this neurodevelopmental insight into antisocial behavior could inform better interventions, relevant to AI-driven mental health analysis or behavioral risk assessment tools potentially used by portfolio companies like Q.ai or Panorays in their behavioral or risk modeling.

Scientists make incredible breakthroughs in pursuit of new-age nuclear power: 'People are going to get their Nobel Prize'

Ren Venkatesh

November 2, 2025

Scientists at Stony Brook University are pioneering durable materials like tungsten alloys and ceramics to enable advanced nuclear fission and fusion reactors that operate under extreme heat and radiation, aiming to reduce radioactive waste and improve nuclear fuel efficiency—a development that, while not directly tied to Eden Shochat's portfolio, aligns with global energy industry trends impacting sectors including AI-driven energy management and environmental risk assessment, relevant to portfolio companies like Windward (maritime risk analytics) and Grain Finance (which could be affected by energy market shifts).

Paper page - Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Haotian Xu, Jiaqi Guo, Chutian Wang, Haonan Chen, Xiaoxi Jiang, Guanjun Jiang

November 2, 2025

A new self-play training method for deep search agents significantly improves unsupervised agentic reinforcement learning by having the agent simultaneously propose tasks and solve them via multi-turn search and retrieval-augmented generation, enhancing performance without human-labeled data—a technique relevant to Aleph portfolio companies like Ply (ply.io) and Sequence (getsequence.io) that leverage AI agents for problem-solving and automation in enterprise workflows.

Thinking Machines challenges OpenAI's AI scaling strategy: 'First superintelligence will be a superhuman learner' | VentureBeat

Michael Nunez

October 26, 2025

Thinking Machines Lab, co-founded by former OpenAI CTO Mira Murati and valued at $12B after a $2B seed round, challenges the prevailing AI industry focus on scaling model size (OpenAI, Anthropic, Google DeepMind) by advocating for meta-learning—AI that continuously learns and improves from experience rather than just training on large datasets; their approach addresses limitations seen in current AI coding assistants (which fail to internalize past learning and rely on shortcuts like try/except) by proposing smarter data, reward structures, and training environments to develop adaptive, self-improving agents, exemplified by their new product Tinker (an API for fine-tuning open-source LMs), marking a potentially transformative long-term alternative in AI development that could impact portfolio companies in AI, coding assistance, and reinforcement learning spaces.

OpenAI Bests Google in Race for Consumer AI Token Consumption

PYMNTS

October 26, 2025

OpenAI leads Google in consumer AI token consumption with its ChatGPT platform processing 6 billion tokens per minute and 800 million weekly active users, while Google processes more tokens overall across all AI services including Gemini; Anthropic outpaces OpenAI in enterprise AI revenue and customers, projecting up to $20 billion annualized revenue by 2026, highlighting strong competition in AI-driven markets relevant to portfolio companies like Grain Finance and Unit that operate in AI-enhanced financial and operational services, as AI also advances supply chain efficiencies—important for companies in logistics and finance—where firms like FIS emphasize AI's role in integrating payments, procurement, and working-capital management.

AI #137: An OpenAI App For That — LessWrong

Zvi

October 26, 2025

OpenAI recently strengthened its $500B valuation through a chip supply deal with AMD and launched new developer tools including GPT-5 Pro API, AgentKit for agent workflows, and a Chat With Apps feature integrating partners like Canva, Figma, Spotify, Coursera, and Expedia—establishing a first-mover advantage in app-chat integration that is pushing competitors Anthropic, Google, and xAI to catch up; they also reversed controversial copyright policies around their media generation tool Sora, opting for an opt-in rights model with revenue sharing for content owners. These developments highlight OpenAI’s expanding platform and ecosystem strategy relevant to Aleph’s portfolio companies like Anodot and Panorays working in AI-driven analytics and risk management, and Fabric and Workiz in operational workflows, while the industry faces headwinds such as China’s rare earth export controls which may impact hardware supply chains critical to AI compute growth.

Amazon Goes Nuclear: SMRs Power AWS Data Centers Reliably

David Szondy

October 26, 2025

Amazon plans to build the Cascade Advanced Energy Facility near Richland, Washington, using X-Energy's Xe-100 small modular reactors (SMRs) to supply reliable, carbon-free power for its AWS data centers, aiming to deploy up to 5 GW of nuclear energy by 2039 to prevent outages like the recent one affecting over 1,000 companies; this move highlights tech giants' investment in Generation IV nuclear technology to meet growing energy demands for data infrastructure.

Discover network dynamics with neural symbolic regression | Nature Computational Science

Zihan Yu, Jingtao Ding, Yong Li

October 26, 2025

A new neural symbolic regression method enables automatic discovery of accurate mathematical models for complex network dynamics, improving predictions in gene regulation, microbial communities, and epidemic transmission across human mobility networks—advancements relevant for AI-driven data analysis and modeling in portfolio companies like Anodot (anomaly detection) and Windward (complex system analytics in maritime).

'I Expect Some Really Bad Stuff To Happen,' Says the CEO of ChatGPT's Parent Company—Here's What He's Talking About

Peter Gratton

October 26, 2025

OpenAI CEO Sam Altman warned that AI, especially deepfake technology exemplified by OpenAI’s new video app Sora 2—which rapidly topped Apple’s App Store—could cause “really bad stuff” like widespread misinformation and societal distrust, urging early public exposure to build safeguards and norms despite current risks from extremist content, a caution relevant to portfolio companies in AI-driven analytics (Anodot), AI legal tech (LawGeex, Superlegal), and risk management (Panorays) that rely on trustworthy data and automation.

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

Feiyang Kang, Newsha Ardalani, Michael Kuchnik, Youssef Emad, Mostafa Elhoushi, Shubhabrata Sengupta, Shang-Wen Li, Ramya Raghavendra, Ruoxi Jia, Carole-Jean Wu

October 26, 2025

A large-scale empirical study by FAIR at Meta and collaborators analyzed the use of synthetic data (rephrased web text and generated textbook-style content) in LLM pre-training (up to 3B params, >1000 models, >100k GPU hours), finding that mixing ~30% rephrased synthetic data with natural web text can speed up training (5-10x to reach the same validation loss) at large data budgets, while pure synthetic textbook data alone leads to higher downstream loss especially on small budgets; larger generator models do not necessarily yield better synthetic data, and concerns about "model collapse" were observed for pure textbook synthetic data but not for rephrased data mixtures, demystifying synthetic data’s conditional benefits and optimal ratios in foundational LLM training—relevant insights for Aleph portfolio companies like Anodot, Fabric, Ply, Q.ai, Sequence, Sightful, Superlegal, Unit, Windward, and Workiz operating in AI and LLM-driven industries, as well as competitors or partners focusing on LLM pre-training data strategies.

The Markovian Thinker

Milad Aghajohari

October 26, 2025

Milad Aghajohari introduced "The Markovian Thinker," a reinforcement learning approach reformulating reasoning with linear O(n) compute and constant memory, enabling an R1-1.5B model called Delethink to scale thought length up to 96K tokens with about 2x accuracy improvement and significantly reduced training cost versus prior methods like LongCoT-RL; this scalable reasoning method shows promise for applications involving large context reasoning and could impact AI models related to portfolio companies like Grain Finance, Sequence, or Ply that rely on advanced AI and efficient computation.

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu

October 26, 2025

Researchers have introduced AgentFlow, a trainable, in-the-flow agentic system optimizing planning and tool use in large language models through four coordinated modules and a novel on-policy training method (Flow-GRPO), achieving significant accuracy improvements over strong baselines—including larger proprietary models like GPT-4o—across various reasoning tasks, advances that could benefit Aleph portfolio companies harnessing AI for decision-making and automation such as LawGeex (legal AI), Panorays (cyber risk), and Windward (maritime analytics).

Reimagining Research Papers As Interactive and Reliable AI Agents

Jiacheng Miao, Joe R. Davis, Yaohui Zhang, Jonathan K. Pritchard, James Zou

October 26, 2025

Researchers at Stanford developed Paper2Agent, an automated framework that converts static research papers—especially computational methods—into interactive AI agents accessible via natural language, demonstrated with agents for genomics (AlphaGenome), spatial transcriptomics (TISSUE), and single-cell analysis (Scanpy); notably, the AlphaGenome agent outperforms existing AI systems in accuracy and efficiency for genomic variant interpretation and enables autonomous AI-driven discovery, such as identifying causal ADHD variants by integrating method and data papers. This innovation lowers technical barriers to adoption, boosts reproducibility, and fosters AI-enabled collaboration across scientific publications—advances highly relevant to Aleph portfolio companies operating in AI-driven biotech, genomics, and data science sectors like Anodot, Panorays, and Grain Finance leveraging AI agents for complex data interpretation and workflow automation.

Turtles all the way down - Wikipedia

Wikipedia

October 26, 2025

The phrase "Turtles all the way down" illustrates the problem of infinite regress, originating from mythological concepts like the World Turtle supporting a flat Earth, and is used in philosophy and epistemology to highlight explanatory failure when each explanation requires a further explanation ad infinitum; it has cultural references ranging from Stephen Hawking's "A Brief History of Time" to modern media but has no direct relation to Eden Shochat's portfolio companies or their industries.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, Tomas Pfister

October 26, 2025

The paper introduces ReasoningBank, a novel memory framework for large language model (LLM) agents that distills transferable reasoning strategies from both successful and failed experiences to enable continuous self-improvement in persistent real-world tasks like web browsing and software engineering; combined with a new memory-aware test-time scaling method (MaTTS), it significantly outperforms existing memory mechanisms and scaling approaches in benchmarks, improving effectiveness and efficiency by leveraging rich, structured memories rather than just raw trajectories or successful workflows. This advancement is relevant for portfolio companies like Ply (ply.io) and Sequence (getsequence.io), which build agent-driven automation and workflow tools that could benefit from enhanced agent memory and scaling capabilities; it also impacts competitors and partners in the LLM agent ecosystem, such as those developing autonomous web or code agents (e.g., WebArena, BrowserGym environments). The synergy between memory and test-time compute scaling demonstrated by ReasoningBank and MaTTS illustrates a practical path toward adaptive, lifelong-learning AI agents with emergent reasoning strategies.

Consumer research

Robert Youssef

October 26, 2025

The consumer research highlights that social media platform X (formerly Twitter) is a key channel where people get early access to news and updates, a trend relevant to portfolio companies like theGist, which operates in media and content intelligence.

The astrocytic ensemble acts as a multiday trace to stabilize memory | Nature

Ken-ichi Dewa, Kodai Kaseda, Aoi Kuwahara, Hideaki Kubotera, Ayato Yamasaki, Natsumi Awata, Atsuko Komori, Mika A. Holtz, Atsushi Kasai, Henrik Skibbe, Norio Takata, Tatsushi Yokoyama, Makoto Tsuda, Genri Numata, Shun Nakamura, Eiki Takimoto, Masayuki Sakamoto, Minako Ito, Takahiro Masuda & Jun Nagai

October 26, 2025

This comprehensive study published in Nature reveals that astrocytes form behaviorally relevant ensembles (BAEs) acting as multiday molecular traces critical for stabilizing emotionally salient, repeated fear memories, especially in the amygdala—a key region implicated in fear memory—by integrating local engram neuronal activity with long-range noradrenaline (NA) signaling. Using novel brain-wide astrocyte-specific Fos tagging combined with imaging and transcriptomics, the authors demonstrate that initial fear conditioning (FC) primes astrocytes by upregulating α1- and β1-adrenoreceptors over a day, enhancing their responsiveness to NA during fear recall (FR), which activates Fos and induces neuromodulatory IGFBP2 expression. Perturbation of astrocytic NA signaling or IGFBP2 impairs memory restabilization/reconsolidation, while astrocyte β1-adrenergic receptor overexpression amplifies astrocyte ensembles and leads to memory over-stabilization and generalization, as evidenced by increased reactivation of neuronal engram cells during recall. This astrocyte ensemble's slower timescale of activity contrasts with the rapid neuronal engrams and suggests astrocytes provide eligibility or stabilization traces over days to maintain memory precision, highlighting their integral role in memory circuits beyond neurons. These findings have implications for disorders with dysregulated noradrenergic signaling, such as PTSD, and open avenues for targeting astrocyte–neuron interactions in cognitive and neuropsychiatric therapeutics. Although none of Aleph’s portfolio companies are directly mentioned, the study’s insights into neural circuit stability, neuromodulation, and state-dependent molecular ensembles may be relevant in the broader neurotechnology and AI-driven cognitive analytics domains related to companies like Grain Finance (cognitive modeling of economic memory) or Sightful (interpreting nuanced human behavior), and intersect with AI and data analytics innovations in brain-inspired computing.

Germany is making the biggest “power play” in its energy history, investing $385 million in laser-driven inertial confinement nuclear fusion. - Evidence Network

Dr. Rosalia Neve

October 26, 2025

Germany is investing €385 million in Marvel Fusion, a Munich-based startup developing innovative laser-driven inertial confinement nuclear fusion technology that differs from traditional magnetic confinement (like ITER) by using ultra-short laser pulses to achieve controlled micro-explosions, with Siemens Energy as a major investor and partner in power plant design and grid integration; this aligns with Germany’s energy transition away from nuclear fission, aiming for carbon-neutral, reliable fusion power by 2040—an ambitious move that contrasts with longer fusion development cycles and has potential to impact energy markets and infrastructure industries, relevant to energy tech and grid-focused portfolio companies like Windward.ai.

Paper page - LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Siyuan Wang, Gaokai Zhang, Li Lyna Zhang, Ning Shang, Fan Yang, Dongyao Chen, Mao Yang

October 26, 2025

LoongRL is a new data-driven reinforcement learning method that significantly improves large language models' reasoning over long contexts by converting short multi-hop QA into complex long-context tasks, achieving notable accuracy gains (+21-23%) on models like Qwen2.5-7B and 14B and rivaling larger models in multi-hop QA performance and retrieval, which could be relevant for portfolio companies leveraging advanced NLP for legal tech (LawGeex, Superlegal) or AI-driven insights (theGist, Anodot).

How to scale RL - by Nathan Lambert - Interconnects

Nathan Lambert

October 26, 2025

A new foundational paper, *The Art of Scaling Reinforcement Learning Compute for LLMs* (ScaleRL), defines stable methods for predicting RL learning curves and sets compute baselines needed for top performance in scalable RL training on large language models, highlighting key algorithmic advances like truncated importance sampling (TIS), CISPO, and Group Sequence Policy Optimization, as well as systems improvements such as Pipeline RL that drastically improve throughput; while this progress bridges gaps between isolated academic experiments and leading AI labs, important open questions remain on the impact of data types and optimal base model selection, with RL scaling laws serving as practical tools to maximize performance on given base models—insights relevant to Aleph portfolio companies like LawGeex, Superlegal, Panorays, and Grain Finance that engage with LLM-based AI, as well as competitors or partners deploying RL-enhanced LLMs for reasoning, decision-making, and automation workflows.

Scaling Recommender Transformers to a Billion Parameters | Towards Data Science

Kirill Кhrylchenko

October 26, 2025

Yandex's RecSys R&D team has developed a new generation of transformer-based recommender systems scaling to billions of parameters, focusing on efficient multi-stage architectures like two-tower models for candidate retrieval and early-fusion models for ranking; their novel ARGUS approach improves training by modeling full user interaction histories (including context and feedback) as sequences of triples for autoregressive generative user sequential modeling, addressing limitations of existing models like SASRec by incorporating reinforcement learning concepts to better capture user preferences and logging policies—advances relevant to Aleph portfolio companies such as Ply (ply.io) and theGist (thegist.ai), which operate in AI-driven data and recommendation domains, as well as competitors and partners in recommendation technologies leveraging large-scale transformers and embedding techniques.

NVIDIA Blackwell Platform Boosts Water Efficiency by Over 300x | NVIDIA Blog

Marc Hamilton

October 21, 2025

NVIDIA’s new Blackwell platform-powered liquid-cooled systems (GB200 NVL72 and GB300 NVL72) dramatically improve data center efficiency for AI workloads, offering up to 300x greater water efficiency, 30x energy efficiency, and significant cost savings compared to traditional air-cooled architectures — a critical advancement as AI model sizes and compute density surge; this innovation, leveraged by partners like Vertiv, Schneider Electric, and CoolIT Systems, aligns with trends in energy-efficient AI infrastructure vital for data center operators, including portfolio companies serving cloud, AI, and enterprise automation sectors such as Grain Finance, Workiz, and Panorays that rely on scalable, cost-effective AI compute environments.

Andrej Karpathy: It Will Take a Decade for AI Agents to Actually Work - Business Insider

Lakshmi Varanasi

October 21, 2025

OpenAI cofounder Andrej Karpathy, now leading an AI-native school at Eureka Labs, expressed skepticism about current AI agents—virtual assistants capable of autonomous task completion—criticizing their lack of multimodal intelligence, continual learning, and reliability, and estimating it will take about a decade to solve these issues; his view contrasts with the industry hype around 2025 as "the year of the agent," urging for more collaborative AI-human coding rather than fully autonomous AI, a perspective relevant to Aleph portfolio companies like Sequence (getsequence.io) and Ply (ply.io) that operate in AI-assisted workflows and automation, while highlighting ongoing challenges faced by AI tools in finance, legal, and developer platforms.

Algorithm precisely quantifies flow of information in complex networks

Ingrid Fadelli

October 21, 2025

Researchers at the Dutch institute AMOLF developed TE-PWS, a novel computational algorithm that precisely quantifies directional information flow (transfer entropy) in complex networks—including nonlinear and feedback-rich systems—with high accuracy and efficiency, enabling improved analysis of biological, financial, AI, and engineered networks relevant to Eden's portfolio companies like Grain Finance (financial networks), LawGeex (AI/legal tech), Panorays (cyber risk networks), and others relying on complex data flow understanding.

AI Breakthrough Finally Cracks Century-Old Physics Problem

University of New Mexico

October 21, 2025

Researchers at the University of New Mexico and Los Alamos National Laboratory developed the THOR AI framework, which uses tensor network algorithms combined with machine learning to compute complex physics equations related to materials' thermodynamic behavior over 400 times faster than traditional simulations; this breakthrough could impact industries relying on materials science, including sectors relevant to Aleph portfolio companies like NextSilicon (advanced materials for semiconductors) and Windward (which utilizes AI for data analysis in complex environments).

Google Introduces Speech-to-Retrieval (S2R) Approach that Maps a Spoken Query Directly to an Embedding and Retrieves Information without First Converting Speech to Text - MarkTechPost

Maxime Mommessin

October 21, 2025

Google AI Research has launched Speech-to-Retrieval (S2R), a production system for Voice Search that bypasses traditional speech-to-text conversion by directly mapping spoken queries to semantic embeddings for information retrieval, significantly improving search relevance (MRR) over classic ASR-based cascades; this dual-encoder approach is now live across multiple languages within Google’s ranking stack and introduces new benchmarking resources (SVQ and MSEB) to advance speech retrieval—an innovation relevant for portfolio companies like Ply (ply.io) and Grain Finance (grainfinance.co) that leverage NLP and embeddings for semantic search and voice-enabled interactions, while competitors in AI-powered retrieval and voice assistants may need to consider S2R-style architectures to reduce error propagation inherent in cascade models.

Paper page - Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Guanhua Huang, Tingqiang Xu, Mingze Wang, Qi Yi, Xue Gong, Siheng Li, Ruibin Xiong, Kejiao Li, Yuhao Jiang, Bo Zhou

October 21, 2025

A new method called Low-probability Regularization (Lp-Reg) improves exploration in Reinforcement Learning with Verifiable Rewards (RLVR)—used to enhance complex reasoning in large language models—by preserving valuable low-probability tokens ("reasoning sparks") that are often lost during training, leading to a 2.66% accuracy gain on math benchmarks; this advancement in stable on-policy training could benefit Aleph portfolio companies leveraging AI and machine learning, such as Ply (ply.io) and Sightful (sightful.com), which operate in AI-driven SaaS and data insights.

How the Rise of Tabular Foundation Models Is Reshaping Data Science | Towards Data Science

Pirmin Lemberger

October 21, 2025

Recent advances in AI have historically excelled with unstructured data like text and images but struggled with tabular (structured) data, which remains economically crucial across industries including finance and analytics; new developments in Tabular Foundation Models (TFMs)—Transformer-based models pretrained on millions of synthetic, causally generated tabular datasets—are now enabling universal, fast, well-calibrated, and robust predictions directly from tabular data without retraining, signaling a potential paradigm shift in data science from model-centric to data-centric approaches. This innovation is relevant to portfolio companies working with tabular and time series data such as Anodot (anomaly detection on business metrics), Grain Finance (financial data analysis), Fabric (e-commerce inventory and logistics), Windward (maritime data analytics), and Sequence (customer data), which could benefit from more efficient modeling of tabular data with TFMs; competitors and collaborators in this space include classical models like XGBoost and emerging startups/labs developing TFMs such as Prior Labs (TabPFN) and INRIA (TabICL), while scientific advisor Gaël Varoquaux, connected to startup Probabl, is a leading figure in this research.

Time Evolution Travel Algorithm (TETA) - MQL5 Articles

Andrey Dik

October 21, 2025

The article presents the Time Evolution Travel Algorithm (TETA), a novel parameter-free optimization algorithm inspired by the concept of traveling among parallel universes by adjusting key variables ("anchors") that represent important life events; TETA balances exploration and exploitation by probabilistically selecting universes based on their fitness and applying proportional changes to anchors, offering a philosophically grounded, self-adaptive method for complex multidimensional optimization problems, which could be relevant for Aleph portfolio companies like Q.ai or Sequence that rely on sophisticated optimization and AI algorithms in finance and software.

Paper page - Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Yoonjeon Kim, Doohyuk Jang, Eunho Yang

October 12, 2025

A new training pipeline called MASA improves meta-awareness in reasoning language models by self-aligning predicted and true meta-information, enhancing accuracy and efficiency significantly without external training data—this improvement could benefit AI-driven portfolio companies like Grain Finance and Q.ai that rely on advanced reasoning models, and offers better generalization across diverse benchmarks relevant to legal, financial, and coding industries.

Neural Networks in Trading: Models Using Wavelet Transform and Multi-Task Attention (Final Part) - MQL5 Articles

Dmitriy Gizlyk

October 12, 2025

The article details the full MQL5 implementation of the Multitask-Stockformer framework, a neural network model combining discrete wavelet transform and multitask self-attention to analyze financial time series by decomposing data into low- and high-frequency components for forecasting asset returns and trends; the implementation includes architectural adaptations like Node-Adaptive Feature Smoothing for graph attention, a dual-frequency fusion decoder, and is tested on 2023 EURUSD hourly data showing profitable trading policy outcomes—relevant for Aleph portfolio company Finaloop, which operates in AI-driven financial analytics and trading, demonstrating advanced time series forecasting techniques applicable in grainfinance.co and Q.ai sectors focused on financial modeling and trading strategy optimization.

Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios - MarkTechPost

Asif Razzaq

October 12, 2025

Anthropic released Petri, an open-source MIT-licensed AI framework for automated multi-turn, tool-augmented auditing of frontier LLMs, scoring safety-related behaviors across 36 dimensions; in a pilot on 14 leading models—including Claude Sonnet 4.5 and GPT-5, which roughly tied for best safety profiles—Petri surfaced issues like autonomous deception and misuse cooperation, with potential relevance for Aleph portfolio companies working with AI (e.g., LawGeex, Superlegal, Q.ai) seeking tools for enhanced model alignment and risk assessment.

Paper page - Large Reasoning Models Learn Better Alignment from Flawed Thinking

ShengYun Peng, Eric Smith, Ivan Evtimov, Song Jiang, Pin-Yu Chen, Hongyuan Zhan, Haozhu Wang, Duen Horng Chau, Mahesh Pasupuleti, Jianfeng Chi

October 12, 2025

A new reinforcement learning method called RECAP improves large reasoning models’ (LRMs) safety and robustness by enabling them to override flawed chain-of-thought reasoning and resist jailbreak attacks without added training costs—relevant for AI-driven portfolio companies like Superlegal and theGist that rely on safe, reliable reasoning and alignment in NLP models.

Paper page - RLP: Reinforcement as a Pretraining Objective

Ali Hatamizadeh, Syeda Nahida Akter, Shrimai Prabhumoye, Jan Kautz, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi

October 12, 2025

The paper presents Reinforcement as a Pretraining (RLP), a new objective that integrates chain-of-thought reasoning during LLM pretraining by rewarding "useful thoughts" that improve next-token prediction without relying on external verifiers or sparse rewards; this approach yields stronger base models with gains that persist and compound after alignment, which is relevant for Aleph portfolio companies like Ply (ply.io) and Sightful (sightful.com) operating in AI/NLP spaces, as RLP could enhance foundational model reasoning and performance in their products.

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Xin Qiu Cognizant AI Lab, Yulu Gan 1 MIT, Conor F. Hayes 1 Cognizant AI Lab, Qiyao Liang 3 MIT Equal ContributionProject Lead, Xin Qiu, Work done during internship at Cognizant AI Lab, Elliot Meyerson Cognizant AI Lab, Babak Hodjat Cognizant AI Lab, Risto Miikkulainen Cognizant AI Lab, UT Austin

October 12, 2025

A new study demonstrates that Evolution Strategies (ES) can scale efficiently to fine-tune large language models (LLMs) with billions of parameters, outperforming reinforcement learning (RL) methods like PPO and GRPO in sample efficiency, stability, robustness across different LLM architectures, and resistance to reward hacking; ES notably improves fine-tuning even on smaller LLMs where RL struggles, making it a promising alternative for LLM post-training and alignment tasks relevant to AI-intensive portfolio companies such as LawGeex (legal NLP), Superlegal (legal AI), Panorays (security risk), and Q.ai (AI-driven finance), where scalable, robust, and efficient LLM fine-tuning methods could enhance their AI capabilities and product accuracy.

Chinese researchers reversed aging in primates. Yes, it's real.

Arnaud Bertrand

October 12, 2025

Chinese researchers at the Chinese Academy of Sciences have, for the first time, successfully reversed aging across 10 organ systems and 61 tissue types in primates (cynomolgus macaques) using engineered human stem cells, demonstrating no adverse effects over 44 weeks and providing the strongest evidence yet for a clinically translatable anti-aging therapy; this breakthrough in regenerative and cellular therapies could inform biotech ventures focused on aging and longevity, a sector relevant to Aleph’s portfolio companies in AI-driven health analytics or bioinformatics.

Predicting drug responses of unseen cell types through transfer learning with foundation models | Nature Computational Science

Yixuan Wang, Xinyuan Liu, Yimin Fan, Binghui Xie, James Cheng, Kam Chung Wong, Peter Cheung, Irwin King & Yu Li

October 12, 2025

A new framework called CRISP uses transfer learning with foundation models to predict single-cell drug perturbation responses in unseen cell types, showing promise for drug repurposing including for cancer types such as chronic myeloid leukemia (CML), where CRISP accurately predicted therapeutic effects of drugs like sorafenib and identified key anti-tumor pathways; this advancement in single-cell drug response prediction may impact companies in drug discovery, biotech, and AI-driven healthcare sectors relevant to Aleph’s portfolio companies such as LawGeex (AI for legal compliance in healthcare) or Grain Finance (which could benefit from biotech innovation trends), as well as competitors or partners in the precision medicine and drug screening space.

[2311.06673] Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng

October 5, 2025

MetaDreamer, a new context-based meta reinforcement learning algorithm, improves data efficiency and generalization by using meta-imagination in latent context space and MDP-imagination through a physics-informed generative world model, enabling faster learning of new tasks with fewer training tasks and less data compared to previous methods; this advancement could be relevant for Aleph portfolio companies working with AI-driven automation and data analytics such as Anodot (anomaly detection), Windward (maritime risk analytics), or Sequence (digital onboarding workflows).

Radiology’s Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Lakshmi Vennela Chowdary Kaza, Mrudula Bhalke, Kautik Singh, Ayush Pandey, Sonit Sai Vasipalli, Upasana Karnwal, Hakikat Bir Singh Bhatti, Bhavya Ratan Maroo, Sanjana Hebbar, Rahul Joseph, Gurkawal Kaur, Devyani Singh, Akhil V, Dheeksha Devasya Shama Prasad, Nishtha Mahajan, Ayinaparthi Arisha, Rajesh Vanagundi, Reet Nandy, Kartik Vuthoo, Snigdhaa Rajvanshi, Nikhileswar Kondaveeti, Suyash Gunjal, Rishabh Jain, Rajat Jain

October 5, 2025

A new benchmark study, Radiology’s Last Exam (RadLE) v1, evaluated five frontier generalist multimodal AI models—OpenAI GPT-5, OpenAI o3, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1—against board-certified radiologists and trainees on 50 complex multi-modality radiology spot-diagnosis cases; results showed radiologists achieved 83% accuracy, trainees 45%, while the best AI (GPT-5) reached only 30%, demonstrating that current large language and vision-language models substantially underperform human experts in challenging diagnostic imaging and cautioning against unsupervised clinical use—key insights relevant to AI-powered medical imaging and diagnostics, an area overlapping with clinical AI interests of Aleph’s portfolio companies like Sightful (sightful.com) and Windward (windward.ai).

Google AI Proposes ReasoningBank: A Strategy-Level I Agent Memory Framework that Makes LLM Agents Self-Evolve at Test Time - MarkTechPost

Asif Rzzaq

October 5, 2025

Google Research introduces ReasoningBank, a novel AI agent memory framework that enables LLM agents to self-evolve by distilling both successes and failures into reusable, human-readable reasoning strategies, achieving up to 34.2% relative task success improvement and 16% fewer interaction steps. This approach, combined with Memory-aware Test-Time Scaling (MaTTS), enhances decision-making efficiency in multi-step tasks like web browsing and software engineering. ReasoningBank acts as a plug-in memory layer compatible with ReAct-style agents and complements existing tools such as BrowserGym and SWE-Bench-Verified, which could be relevant for Aleph portfolio companies like Ply (ply.io) focusing on AI-driven workflows or Sequence (getsequence.io) involved in developer productivity.

Researchers Decode How Protein Language Models Think, Making AI More Transparent | The Scientist

Andrea Lius, PhD

October 5, 2025

Researchers from MIT have used sparse autoencoders to make protein language models (PLMs)—which predict protein features from single amino acid sequences—more interpretable by disentangling dense neural network information, potentially increasing trust in PLMs that, unlike AlphaFold, learn protein sequence-function relationships with faster but sometimes less accurate predictions; this advancement is relevant to AI-driven biotech and drug discovery, fields impacting portfolio companies like Q.ai (AI-driven insights) and Grain Finance (complex model risk management), as better AI interpretability tools can improve the reliability of models used in bioinformatics and computational biology applications.

Tinker

Devendra Chaplot

October 5, 2025

Thinking Machines Lab has released "Tinker" and "Tinker Cookbook," two libraries offering an SDK and practical examples for researchers and developers to fine-tune large language models (LLMs) via API-based distributed training; this open-science toolkit could be relevant for Aleph portfolio companies like Superlegal, LawGeex, and Ply that utilize or develop LLMs for legal AI, contract analysis, or developer tools, providing customizable fine-tuning pipelines and evaluation utilities to improve model performance and specialized use cases.

Paper page - DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Fang Wu, Weihao Xuan, Heli Qi, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi

October 5, 2025

DeepSearch, a reinforcement learning with verifiable rewards (RLVR) training framework integrating Monte Carlo Tree Search into the training loop for systematic exploration and fine-grained credit assignment, achieves state-of-the-art 62.95% accuracy on 1.5B-parameter mathematical reasoning benchmarks—outperforming previous models by 1.25% while using 5.7× fewer GPU hours—highlighting algorithmic innovations that could impact Aleph portfolio companies like Q.ai and Sequence involved in AI-driven reasoning and decision-making.

Google launches Gemini for Google Home plus new smart home hardware

Anish Kattukaran

October 5, 2025

Google has launched Gemini for Google Home, a new AI optimized for the smart home that enables natural, context-aware collaboration across devices, alongside a redesigned Google Home app and new hardware including Nest Cams, Doorbells, and the Google Home Speaker; a Google Home Premium subscription also unlocks advanced Gemini features, reflecting strong innovation in connected home AI that relates to portfolio areas like Ply (ply.io) in smart device management and Panorays (panorays.com) in security for connected ecosystems.

Paper page - BroRL: Scaling Reinforcement Learning via Broadened Exploration

Jian Hu, Mingjie Liu, Ximing Lu, Fang Wu, Zaid Harchaoui, Shizhe Diao, Yejin Choi, Pavlo Molchanov, Jun Yang, Jan Kautz, Yi Dong

October 5, 2025

BroRL advances reinforcement learning by significantly increasing rollouts per training example to enhance exploration, overcoming performance plateaus seen in previous methods like ProRL, and achieves state-of-the-art results in large language models, which is relevant for Aleph portfolio companies like Q.ai and Grain Finance that leverage AI and machine learning for financial modeling and decision-making.

Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required - MarkTechPost

Assif Razzaq

October 5, 2025

oLLM is a lightweight Python library enabling large language model (LLM) inference with 100K-token context on consumer GPUs with 8 GB VRAM using SSD offloading, without needing quantization, which could benefit AI-driven portfolio companies like Sightful, Superlegal, and theGist by enabling advanced NLP capabilities on accessible hardware.

AI as a research partner: Advancing theoretical computer science with AlphaEvolve

Ansh Nagda, Student Researcher, and Abhradeep Thakurta, Staff Research Scientist, Google DeepMind, and Prabhakar Raghavan, Chief Technologist, Google

October 5, 2025

Google DeepMind's AlphaEvolve, an LLM-powered coding agent, has advanced complexity theory by discovering improved combinatorial structures—such as novel gadget reductions for the MAX-4-CUT problem and larger Ramanujan graphs for average-case hardness certification—yielding new mathematically verified theorems with absolute correctness guaranteed through computational proof verification; this AI-driven approach to theoretical computer science research exemplifies a transformative tool for algorithmic discovery and hardness approximation, relevant to Aleph portfolio companies like Anodot and Windward operating at the intersection of AI, data, and complex algorithmic processing.

Solving the many-electron Schrödinger equation with a transformer-based framework | Nature Communications

Honghui Shang, Chu Guo, Yangjun Wu, Zhenyu Li & Jinlong Yang

October 5, 2025

A new neural network quantum state framework named QiankunNet applies Transformer architecture combined with efficient Monte Carlo tree search (MCTS) autoregressive sampling to accurately and scalably solve the many-electron Schrödinger equation for complex molecules, achieving 99.9% of full configuration interaction (FCI) correlation energies on systems up to 30 spin orbitals and successfully modeling the Fenton reaction mechanism involving a large CAS(46e,26o) active space; this surpasses prior neural quantum state methods like NAQS and MADE in both accuracy and computational efficiency, indicating transformative potential for quantum chemistry and related fields such as drug discovery, materials design, and chemical simulation relevant to Aleph portfolio companies like Finaloop (focused on scientific computation) and NextSilicon (advanced semiconductor/materials modeling).

Cancer uses mitochondria to reprogram neighboring cells

Michael Cangkrama and Sabine Werner

October 5, 2025

A recent study identified that cancer cells can transfer mitochondria to neighboring fibroblasts via tunneling nanotubes using the protein MIRO2, reprogramming these fibroblasts to support tumor growth—a discovery that reveals new mechanisms of tumor microenvironment manipulation relevant for developing targeted cancer therapies.

Game over for pure LLMs. Even Turing Award Winner Rich Sutton has gotten off the bus.

Gary Marcus

October 5, 2025

Rich Sutton, a Turing Award winner and early advocate of large-scale general-purpose AI models, publicly expressed skepticism about relying solely on scaling large language models (LLMs), aligning with emerging critiques by other AI leaders like Yann LeCun and Demis Hassabis; this marks a shift in consensus that pure LLM scaling is insufficient, underscoring the need for integrating world models, neurosymbolic approaches, and innate constraints—insights relevant for AI-driven companies in Eden Shochat's portfolio such as Panorays, Windward, LawGeex, and Superlegal that rely on advanced AI models beyond pure LLM architectures.

Google AI Research Introduce a Novel Machine Learning Approach that Transforms TimesFM into a Few-Shot Learner - MarkTechPost

Asif Razzaq

October 5, 2025

Google Research introduced "in-context fine-tuning" (ICF) for their TimesFM time-series foundation model, enabling it to perform few-shot forecasting by adapting on the fly at inference using multiple related series in the prompt, achieving accuracy comparable to per-dataset supervised fine-tuning (+6.8% over base TimesFM) without the heavy MLOps overhead; this advances time-series forecasting by bridging train-time and prompt-time adaptation, relevant to Aleph portfolio companies like Grain Finance and Q.ai that rely on accurate, efficient time-series predictions for financial and AI-driven analytics.

Efficient protein structure generation with sparse denoising models | Nature Machine Intelligence

Michael Jendrusch, Jan O. Korbel

September 28, 2025

A new protein structure generative model family called "salad" (sparse all-atom denoising) offers significant advances for computational protein design, addressing major limitations in current diffusion-based models by enabling efficient, scalable generation of protein backbones up to 1,000 amino acids with improved runtime (19 seconds vs. >10 minutes for RFdiffusion on large proteins), reduced parameter count (~8M vs. 200M in Proteina), and comparable or better designability and diversity. Salad's sparse transformer architecture with invariant point attention reduces computational complexity from cubic to near-linear, enabling high-throughput design of large and complex proteins relevant for biotech, enzyme optimization, antibody and vaccine scaffold design. Notably, salad introduces a flexible structure editing sampling strategy that allows constraint enforcement (motif scaffolding, multi-state proteins, symmetric repeat proteins including screw symmetry) without retraining, outperforming or matching state-of-the-art models like Genie 2 and RFdiffusion on motif-scaffolding benchmarks and enabling multi-motif and multi-state protein design, a previously challenging task. While validated computationally via designability metrics using ProteinMPNN and AlphaFold/ESMFold predictions—with design success rates exceeding prior ML methods for multi-state design—experimental validation remains to be done. This modular, efficient approach advances protein generative modeling with potential impact on enzyme, antibody, biosensor, and vaccine design workflows, and offers a versatile, plug-and-play backbone generator that can integrate with sequence design and downstream experimental pipelines. The model's limitation includes training on PDB-only data without small molecules, suggesting future extension using AlphaFold DB and ligand complexes, which would enhance its applicability for enzyme and small-molecule binder design. Eden Shochat’s portfolio companies operating in biotech and AI-powered design, such as Anodot (AI analytics) or Windward (AI risk analytics), may find interest in similar efficiency and scalability gains in their computational pipelines; analogously, startups in synthetic biology or drug discovery could see salad’s approach as a competitor or collaborator in protein engineering tools.

First proof links plasma ripples to fusion and universe origins

Georgina Jedikovska

September 28, 2025

Researchers at Seoul National University have experimentally demonstrated multiscale coupling in plasma—showing how tiny magnetic ripples trigger large-scale structural changes—which confirms a key plasma physics theory relevant to fusion energy development and cosmic phenomena, potentially impacting fusion reactor technologies like those pursued by Aleph’s portfolio company NextSilicon, which works on advanced semiconductor technologies integral to fusion-related hardware.

New Hidden 'Edge State' May Lead to Practically Infinite Energy

Drren Orf

September 28, 2025

MIT researchers have developed an experimental setup using ultracold sodium atoms to simulate the quantum Hall effect's "edge state" electrons with millisecond-scale interactions, enabling detailed study of resistance-free energy flow that could inform advances in exotic materials and quantum physics; this development is relevant to portfolio companies like NextSilicon, which operates in advanced semiconductor technologies and materials innovation.

ShinkaEvolve: Evolving New Algorithms with LLMs, Orders of Magnitude More Efficiently

September 28, 2025

ShinkaEvolve, an open-source evolutionary code optimization framework leveraging LLMs, achieves vastly improved sample efficiency and state-of-the-art results on complex algorithmic problems like Circle Packing, outperforming competitors such as AlphaEvolve; its innovations in program evolution, demonstrated across domains including LLM training optimization and competitive programming, could accelerate AI-driven discovery relevant to portfolio companies like Anodot (data analytics), Sightful (AI-driven insights), and NextSilicon (hardware optimization), and it offers a promising tool for improving AI model training and agentic system design, potentially impacting startups focused on AI research and development such as Q.ai or Sequence.

Neutrinos may be the hidden force behind gold and platinum | ScienceDaily

Penn State

September 28, 2025

New simulations of neutron star mergers reveal that neutrino flavor transformations significantly affect the composition and structure of merger remnants, influencing the production of heavy metals like gold, platinum, and rare earth elements used in smartphones and EV batteries; this improved understanding of extreme astrophysical processes and emissions could impact industries reliant on these materials, relevant to Aleph’s portfolio companies like Fabric (supply chain/commerce technology) and Unit (financial infrastructure), which operate in sectors dependent on rare earth elements and metals.

LLM Multi-Agent Systems: Challenges and Open Problems

Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu

September 28, 2025

A comprehensive survey paper analyzes challenges and open problems in multi-agent large language model (LLM) systems—which coordinate specialized agents through collaboration, debates, and hierarchical or dynamic structures—to tackle complex tasks beyond single-agent capabilities; key challenges include global task planning, aligning multi-layered context among agents, sophisticated memory management, and secure, efficient inter-agent communication; notably, the paper highlights promising applications in blockchain, where multi-agent systems can optimize smart contract analysis, consensus mechanisms, fraud detection, and enable agents assigned to blockchain nodes to negotiate and execute tasks autonomously, suggesting this as a significant growth area for distributed computing and AI-driven automation relevant to Aleph portfolio companies involved in AI, blockchain, and SaaS domains.

GitHub - MoonshotAI/checkpoint-engine: Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Weixiao Huang

September 28, 2025

Checkpoint-engine is a middleware for efficient updating of model weights in large language model (LLM) inference engines, currently tested with vLLM, enabling fast synchronous broadcasting and peer-to-peer (P2P) weight updates across GPUs; this technology could be relevant for Aleph portfolio companies working with AI models and infrastructure such as Q.ai or NextSilicon, as it improves reinforcement learning workflows and weight synchronization at scale.

[2509.08827] A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma, Xuekai Zhu, Ermo Hua, Yihao Liu, Zonglin Li, Huayu Chen, Xiaoye Qu, Yafu Li, Weize Chen, Zhenzhao Yuan, Junqi Gao, Dong Li, Zhiyuan Ma, Ganqu Cui, Zhiyuan Liu, Biqing Qi, Ning Ding, Bowen Zhou

September 28, 2025

A new survey paper reviews recent advances in reinforcement learning (RL) applied to large language models (LLMs) for complex reasoning tasks like math and coding, highlighting RL as essential for transforming LLMs into large reasoning models (LRMs) but noting scaling challenges in compute, algorithms, data, and infrastructure—relevant for Aleph portfolio companies leveraging AI and reasoning technologies such as LawGeex (legal AI) and Superlegal, as well as companies in AI-driven analytics like Anodot and Windward.

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

Xuan-Phi Nguyen, Shrey Pandit, Revanth Gangi Reddy, Austin Xu &Silvio Savarese, Caiming Xiong, Shafiq Joty, Salesforce AI Research

September 28, 2025

Salesforce AI Research presents SFR-DeepResearch (SFR-DR), an autonomous single-agent large language model (LLM) system trained with a novel reinforcement learning recipe on synthetic data, enabling dynamic tool use (web search, crawling, Python interpreter) to perform complex Deep Research tasks like multi-hop QA and long-form report writing; their best open-source variant, SFR-DR-20B (based on gpt-oss-20b), achieves 28.7% accuracy on the challenging Humanity’s Last Exam benchmark—outperforming comparable open-source and proprietary single- and multi-agent baselines including OpenAI's DeepResearch o3, MoonshotAI's Kimi-Researcher, and LangChain-AI’s multi-agent systems—while introducing innovations in agentic workflow, context memory management, fault tolerance, and length-normalized RL objectives that stabilize long-horizon reasoning and prevent degenerate tool use. This work is relevant to Aleph's portfolio companies dealing with AI-driven reasoning and tool integration such as Q.ai, Sightful, and Superlegal, as it advances efficient autonomous reasoning agents; it also situates within competitive landscapes involving OpenAI, MoonshotAI, and LangChain, who develop multi-agent AI research assistants.

Writing effective tools for AI agents—using AI agents \ Anthropic

Ken Aizawa

September 28, 2025

Anthropic's detailed guide on building effective AI agent tools via the Model Context Protocol (MCP) emphasizes prototyping, systematic evaluation with agents like Claude Code, and continuous agent-assisted improvements to optimize tool functionality for complex real-world tasks; key principles include carefully selecting high-impact tools (favoring consolidated, task-specific tools over broad API wrappers), employing clear namespacing to reduce agent confusion, optimizing tool responses for token efficiency and context relevance, and using structured response formats tailored to LLM preferences—insights relevant for Aleph portfolio companies developing AI agents and tooling layers, especially those like Grain Finance, Panorays, Ply, Sequence, Superlegal, or Unit whose platforms depend on efficient AI integrations and tool-augmented workflows.

Attribution-Based Control for AI ⬩ OpenMined

September 28, 2025

OpenMined proposes a new AI paradigm called Attribution-Based Control (ABC) that addresses key AI risks—privacy, value alignment, hallucinations, centralized power—by enabling data and AI users to dynamically control and verify the sources influencing AI predictions; this approach, leveraging advances in deep learning, cryptography, and distributed systems, could impact industries like medical AI and AI governance by offering a decentralized, privacy-preserving alternative to traditional centralized AI systems, which is relevant to Aleph portfolio companies working with AI data, privacy, and trust such as LawGeex, Windward, and Panorays.