
Almost Human AI Forum
This one caught my eye in re:invent announcements:
AWS supports fine tuning, both with RLVR (verifiable rewards) and RLAIF ("llm-as-a-judge"). Currently it can only fine-tune their own model (Nova 2) but more models expected to follow.
Nice read: an AMA with the GTM engineer who built Vercel’s famous AI SDR (wow too many TLAs in one sentence…)
https://x.com/DBredvick/status/1995682667128652285?s=20
Especially interesting the build vs buy discussion - sometimes it’s just easier to “hard-code” stuff to fit your business like a glove instead of buying a SaaS
https://arxiv.org/abs/2512.04047
https://techcrunch.com/2025/12/05/meta-acquires-ai-device-startup-limitless/
LLMs find exploits in Smart Contracts ("just" a benchmark, but still)
google DeepMind (UK team) released an ML-based weather forecasting model (for the 15-days time-range), claim to beat 95% of ensemble models https://www.nature.com/articles/s41586-024-08252-9
https://iceberg.mit.edu/ did you see this? (specifically - https://github.com/AgentTorch/AgentTorch on top which this is built)could be useful for you?
wow. been saying this for a while as a risk. didn’t expect it to materialize this quickly.
https://thehackernews.com/2025/11/chinese-ai-model-deepseek-r1-generates.html
https://x.com/sama/status/1990071287750729829?s=46&t=isvMih-bU2zIgFokCUvA4Q
AI scientists are coming
Notice: gemini-3-pro-preview now supports structured output + google search grounding + code execution. This is probably the best tool for “automated search” available right now (for data collection)
https://ai.google.dev/gemini-api/docs/structured-output?example=recipe#javascript_4
going to interview https://www.linkedin.com/in/assafe, head of AI at Monday and builder of GPT Researcher (25k GitHub stars) before that.
our topic is leveraging AI to make SMBs perform as if they have the resources of enterprises.
what would you want me to ask him?
I'm slightly biased ;)
but this interview with Henry Ward, Carta's CEO is super interesting, especially this section on how Carta uses AI internally:https://youtu.be/lt6reZVgwCk?si=3kpT1PQXI5PGCb3b&t=681(timestamped it for you - but I think the whole thing is worth watching)Some interesting tidbits:
- Manual data entry has passed from this world
- LLMs can't do accounting. Accounting needs to be deterministic but with the right tools they can write code that does accounting - and it's done "on the fly" by the agent. (Just like a human won't compute things in their head - they would use a calculator or a spreadsheet)
- They don't believe in human-in-the-loop! this part was fascinating to me - treat AI agents like you would amazon mechanical turk freelancers - just have 3 agents do the same work and cross-check !
The term "Generative UI" has been thrown around a lot lately, but this work from Google Research shows that now it's actually very possible:
Ok who’s first to try Google’s cursor competitor?
Has Gemini 3 + Sonnet inside
Finally - structured output is supported in Claude:
While not practical yet - it's exciting to see that we might get an architecture even better than Transformers in the near future:
https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
Anthropic is far ahead of the other API companies in thinking through context management. there is a pretty confusing matrix right now between MCP, Agents and Skills, which I expect them to organize over the coming months.
the most recent announcement is a big one: https://www.anthropic.com/engineering/code-execution-with-mcp
effectively it's a filter before/after MCP calls and what's really a data integration layer so that MCP tool responses don't go into the context.
interesting tie in with skills insofar as the code written can be aggregated into skills for reuse.
IMHO this is a facet of a broader trend where the LLM is a writer and an orchestrator of code and underlying simulations.
has anyone tried Cursor’s “Composer 1” model? I heard a lot during development (if it’s the one, it’s by Thinking Machines specifically for Cursor).
so far it’s super fast, and at least in planning mode, generates high quality output.
OpenAI launched a security bot that looks very interesting:
https://openai.com/index/introducing-aardvark/
automatically scanning code for vulnerabilities + suggests patches
private beta, if you're signing up lmk
i recently wrote a claude code skill for cloudwatch analysis. it’s on top of the aws cloudwatch MCP but teaches claude:
- tracking through multiple lambdas via correlation/trace IDs
- analyzing performance of logs with no prior requirements other than @tracer.capture_method
- log patterns for errors and other failure modes
- cloudwatch query language
- date/time script rather than letting claude code write python date manipulations
question to the crowd - is this something you’d find useful in your own company? has anyone started working on similar / other skills?
is this something we should collaborate on across companies?
I came across this great podcast episode - https://spotify.link/xbjtSohnZXb
It's an excellent introduction to automations. It's technically aimed at designers / ux people but I think it's super relevant for anyone.
-
At this point I'm convinced that the next huge gains from AI will come from non-developers starting to build automations.
Automation platforms (e.g. Zapier, N8N, Make.com etc.) are becoming incredibly powerful!
Worth the listen!
that was quick:
https://x.com/cursor_ai/status/1983567619946147967
Cursor 2.0 - with an embedded browser for quick code-test loops
Hey! Going to speak with Nir Hemed of Daisy and would love your feedback re questions I should bring up. They have ± 10 agents, most of them internal but actual product externally facing agents that transform property management from a business with 2% margins that customers hate to a >40% with great NPS.Goal is to take techniques that are considered uncool like LLM evals, guardrails, model benchmarking & selection and talk about them in the context of why these drive real, $$$ impact.Some topics on my list:
- Moving from model T production line everything-pre-defined-detailed workflows using Temporal to open-ended agentic while maintaining reliability
- What’s “production ready” agent? How do you know it is (and no, it’s not like porn)?
- Creating over-time moats using feedback loops for on-going performance improvement
Who’s working on externally facing multi-step agents here?
Turns out AI makes burnout worse in broken cultures. Why? AI doesn't fix your problems - it accelerates them. Broken processes fail 10x faster. Toxic dynamics spread 10x quicker. AI is a culture amplifier, not a culture fix.
Want to actually reduce burnout before AI amplifies it?
Research is clear:
- Set consistent goals that don't change weekly
- Build psychological safety
- mistakes should teach, not punish
- Match demands to actual capacity
- Celebrate learning, not just shipping
Treat your AI agents like remote employees.
They can't overhear hallway conversations. They can't absorb tribal knowledge. They can't read between the lines.
Document everything or watch your AI productivity crater.
Remote-first practices aren't just for humans anymore.
Looking at Google's Veo 3.1 announcement
It seems like they really upped their game in making this actually useful (e.g. focus on character consistency and being able to control keyframes + being able to stitch multiple videos together seamlessly + being able to make edits post-generation)
Anyone experimented with this yet?
I love the UX discussion here:
https://ideas.fin.ai/p/a-tale-of-two-agent-builders
The idea of having a "notebook-like" interface to describe a workflow is really novel IMO.
TIL about Differential Privacy - a method that allows you to train / fine-tune (among other things) LLMs while making sure they don't memorize any specific (potentially private) data samples.
Google recently released VaultGemma using this method. Interesting to think how it applies to fine-tuning an LLM on data potentially containing PII or sensitive information.
Structured data is a very useful application of LLMs.
Theoretically - you could build a pipeline with "traditional" OCR stack and get cheaper/faster results, but that's quickly becoming not true.e.g. - look at Deepseek's open source OCR:
https://github.com/deepseek-ai/DeepSeek-OCR/
practically for free and high throughput. still need to benchmark on real-world data though.
Claude announced "skills" (not to be confused with connectors/MCP or claude-code's sub-agents) -- an easy way to give Claude self-contained capabilities, including instructions + tools (e.g. scripts).
This is basically the building blocks that enable the "canvas" feature in Claude - now available to end users.
Simon Willison wrote a good article about it:
another great summary on how to boost AI adoption in software engineering:
- Clearly communicate that engineers are expected to use AI (and the policy around it)
- Your data is a strategic asset, organize it well and make it accessible to AI systems (at Aleph we like Notion for things like that)
- Have solid processes, guardrails and safety nets (linters, tests in every layer, good source control practices, rollout and rollback mechanisms, etc.)
- Work in small chunks (one small feature at a time)
- Invest in a good internal platform
for all of you on OpenAI - last week they released their own no-code platform:
scary
I'm 90% confident state-level actors already generate 0-days with LLMs
https://www.thefirstoperator.com/p/what-is-gtm-engineering
Interesting read on automating GTM. In my perspective this is a specific case of automating, well, everything.
Now with no code AI tools this approach becomes extra easy
Question - does anyone have people in their org specifically designated to maintain automation platforms (e.g. Zapier / n8n) and help others automate away processes?
Answer us: almosthuman@aleph.vc
You can now build apps inside chatGPT
https://openai.com/index/introducing-apps-in-chatgpt/@Omer Chehmer fyi (!)
And all the rest of you consumer-facing people - this is something we should be paying attention to
Who’s tinkering with fine tuning open models?
https://thinkingmachines.ai/blog/announcing-tinker/
Thinking Machines (Mira Murati’s company) releases an API for fine tuning (similar to OpenAI’s but for open models)
hey all. just posted the below on linkedin:
https://www.linkedin.com/feed/update/urn:li:activity:7380522735300411393/?originTrackingId=KgOIi3YOKmDTqUEaEcqbhw%3D%3D
the next episode of Almost Human will be about meta learning. this is the brief we created for it: https://alph.vc/ah-meta-learning
what questions you’d want to see covered in the episode?
One line summary: AI that learns how to learn, enabling faster adaptation to new tasks with minimal data.
Founder’s lens: why it matters: Imagine you’re building an AI startup in oncology. Google has millions of CT scans, you have 42. Meta learning lets your model generalize from how it has learned in the past, so that those 42 scans are enough to get to a working product. This is the wedge: you don’t need to beat incumbents on raw data, you just need to adapt faster in niches they overlook.
Value proposition: Meta learning slashes the time and cost to train new models. Instead of collecting massive, labeled datasets, startups can enter markets that were previously “too small” or “too fragmented” for Big Tech — and still deliver AI performance that feels magical.
Why is it important for early stage entrepreneurs?
- Leveling the field: Big Tech wins on compute + data. Meta learning flips the game to speed + adaptability.
- Capital efficiency: You don’t need to raise $100M just to assemble training data.
- Go-to-market edge: You can win niche verticals quickly by adapting your model to each customer or workflow.
What type of startups should take note?
- Vertical AI SaaS (law, healthcare, logistics, insurance): Healthcare: Detect rare cancers with <100 cases available.
- Robotics/edge AI (factories, drones, retail automation): Adapt a warehouse robot to new inventory items in hours.
- Cyber/fraud startups (cat-and-mouse): Adjust to new scams or exploits in real time.
- B2B SaaS platforms with customer-specific customization needs: Customer support: Train on a company’s style guide overnight.
OpenAI Guide Context Engineering.pdf
good summary:
Here’s the summarised tech version, with 10 key insights you can apply now
:#1 Choose memory per task → trimming = short, summarization = long
#2 Trim cleanly → always by full turns, never mid-message
#3 Smarter summaries → structure prompts (env, steps, blockers) + check order/contradictions
#4 Context budgets → size max_turns by usage, keep latest N verbatim
#5 Async done right → don’t block summarization; re-check after ops complete
#6 Metadata hygiene → keep debug/timestamps out of messages
#7 Idempotency → repeated calls shouldn’t duplicate or distort summaries
#8 Progressive summarization → compress older content, mark boundaries clearly
#9 Evaluation harnesses → replay transcripts + LLM-as-judge to measure accuracy
#10 Watch for poisoning → track false info entry/propagation, log before/after token counts
Bottom line → Context engineering = knowing what to forget, not just what to remember.
https://x.com/patrickc/status/1972716417280860391?s=46&t=isvMih-bU2zIgFokCUvA4Q
Payments in chatGPT powered by Stripe are coming
Consumer folks pay attention
For everyone who was too distracted yesterday by other developments...
Anthropic announced Claude 4.5 and Claude Code 2.0
https://www.anthropic.com/claude/sonnet
From the way they describe sonnet 4.5 it's clear that Claude Code is shaping up to be a general-purpose assistant for every day tasks, not just coding
Finally - Cursor CLI (so us vim/jetbrains people can also have fun!)
this can be an important tool for problems that have a small number of samples.
curious if AI natives have an opinion whether this should be considered meta learning
for all you frontend devs out there:
massive potential impact on education: https://arxiv.org/html/2509.13348v2
And in case you missed it - Notion AI is becoming Agentic: https://www.notion.com/blog/introducing-notion-3-0
-
I think Notion is one of the very few AI apps I actually enjoy using.
https://techblog.cloudkitchens.com/p/study-and-update-on-genai-devex?hide_intro_popup=true Very interesting writeup by CloudKitchens on their adoption of GenAI coding tools.TL;DR:
- There is no quality deterioration
- In some tasks LLMs give huge productivity gains, in others - less. Variance is extremely high
- Self-built tools (root cause analysis from observability tools, query helper, etc.) show great adoption if they "live" within existing systems
I strongly recommend reading the whole thing, though. These guys are very methodical.
Story time: at Aleph's Slack, we have a bunch of old "single-channel/multi-channel" guests - from before "Slack Connect" was a thing.We have almost 200 guests which are not really logging in / connect with us with the newer "slack connect" features (with their own slack workspace - like all of you here do)I wanted to deactivate them.It turns out that Slack only provide API access to do that if you are on "Enterprise" tier. For us commoners - we have to go to the admin dashboard and click the users one by one, select 'deactivate account' and then confirm.Not wanting to waste time on this, I fired up Claude, connected it to my browser with browsermcp, and just told it to deactivate them one by one.Mission Accomplished.--Actually - just to be honest - making Claude actually loop over all users took some effort - I actually had to ask Claude Code to write a script that will execute Claude in a loop. Still - zero coding required on my end.
https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol Google announced Agent to Agent Payment Protocol - supporting crypto payments too!
cool new feature in Claude Code: multi directory support - https://x.com/claudeai/status/1967998401649447408I've personally been waiting for it - I'm tired of explaining it to cd into the correct frontend/backend directory in monorepo projects
And in case you missed it - OpenAI released a new model (GPT-5-codex) to be used, well.. in codex :https://openai.com/index/introducing-upgrades-to-codex/
I’ll just put this here: https://www.linkedin.com/pulse/fiverr-going-back-startup-mode-micha-kaufman-jfe6f
I came across Anchor Browser - https://app.anchorbrowser.io/playground - demo looks fantastic for automating browser stuff (perfect for automating data extraction <> form filling for places where there is no API) Anyone else tried them?
geeking out alert: an engineer from Thinking Machines (Mira Murati's company) found the bug that causes LLMs to be non-determinstic even with temperature=0
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
(I live for those tidbits)
i love this: https://x.com/omarsar0/status/1965808896314155378
deep research on top of code bases + external content to get tips & tricks how to use the specific repo/tool.
Hey, just got around to bringing my thoughts (and Claude’s ;) ) about last weeks meetup to writing:
- The software engineer’s role is changing - engineers shift to writing technical PRDs + managing coding agents (and some even shift to prompt engineering, which in itself is a kind of product management)
- Task switching is hard! theoretically one engineer can “manage” 10 Cursor agents in parallel, but in practice it’s a challenge to keep context on each task + switch.
- Engineering-adjacent roles are also changing - manual QA people starting to build automations, designers starting to build “live” designs in lovable / inside the product
- The process becomes ever more important - strict code review, security review, integration tests — all become critical to maintain trust.
- Also here - there is AI assistance (e.g. automatically flag PRs that require a security review)
- Finally - although we all subjectively feel productivity gains, there is yet no clear measurable velocity gain
https://openai.com/index/why-language-models-hallucinate/
openai claim that fixing hallucinations is basically as easy as changing the reward function to penalize wrong answers and encourage the model to say "i don't know"
check this out https://x.com/alterego_io/status/1965113585299849535
https://arxiv.org/html/2508.05004v1 reminds me of generative adversarial networks. maybe the start of self improvement loops.

