← back to terminalTYPE0//PAPERS

breaking papers · 58 analyzed

The most important papers, decoded.

AI-powered analysis of breakthrough research from arXiv and beyond. We surface the work that matters before it hits the news cycle.

  • arXiv:2606.28277·1h 32m ago

    An AI Peer-Reviewed 10,000 Research Papers. Now the Experiment Is Published.

    ICML (the International Conference on Machine Learning) and STOC (the ACM Symposium on Theory of Computing), two of computer science's most prominent conferences, ran a Google-built system called the Paper Assistant Tool (PAT) that returned feedback in roughly 30 minutes and caught 34% more math errors than a single-pass AI review, and that experiment is now on the public record.

    →
  • arXiv:2606.18543·4h 29m ago

    A Rule-Based Script Beat All But Three Frontier LLMs in Princeton's 500-Day AI CEO Test

    CEO-Bench, a Princeton benchmark, gave 14 AI agents 500 simulated days to run a virtual subscription-software company from $1M and zero customers. A rule-based algorithm with no language model placed fourth on the leaderboard, ahead of every frontier LLM except three.

    →
  • arXiv:2605.18055·4h 34m ago

    ICML 2026: a Shanghai team names the 'Gene Dimension Curse,' and builds a diffusion framework to fix AI's tissue-gene predictions

    Spatial transcriptomics, which reads gene activity while preserving each cell's location in tissue, is slow and expensive; the field has long tried to predict it from routine pathology slides. An ICML 2026 paper from a Shanghai-led team argues those predictions silently lose biological structure as more genes are modeled together, names the failure the 'Gene Dimension Curse,' and proposes a structure-aware diffusion framework called FLAG (Foundation-model representation with Latent diffusion Alignment via Graph).

    →
  • arXiv:2606.27566·9h 14m ago

    A Pattern Built to Stay in Frame During the Last Meters of Spacecraft Docking

    An arXiv preprint proposes AstraTag, a recursive printed reference designed so a docking spacecraft's camera keeps recognizing the target at close range, where today's flat markers like AprilTag fall out of view.

    →
  • arXiv:2606.27416·9h 14m ago

    A $450 AI research workflow caught four data leaks in its own winning submission

    Twelve parallel coding agents on a single laptop, with about $450 in API costs, ran the framework's submission to the British Council's BEA 2026 vocabulary-difficulty shared task. Its verifier—a deterministic Python script enforcing process invariants—stripped the four leaking features before submission and moved the headline score from 0.609 to 0.802 on the standard regression-error scale. The corrected entry won the closed track (training data only).

    →
  • arXiv:2606.27397·9h 15m ago

    SidConArena Tests Whether Frontier AI Agents Can Actually Trade

    A new arXiv benchmark runs large language models through a simulated economy of negotiation, production, and sealed-bid auctions. The provisional result: stronger models earn more, yet consistently misvalue resources, bargain passively, and stumble on long-horizon planning.

    →
  • arXiv:2606.27409·9h 17m ago

    When AI Fact-Checker Agents Arrive Too Late, the Whole System Starts to Oscillate

    A new arXiv preprint puts a closed-form number on the delay after which AI 'referee' agents flip a multi-agent LLM network from consensus to oscillation, and shows that grounding critics in retrieved facts suppresses the instability entirely.

    →
  • arXiv:2602.15763·15h 31m ago

    Z.ai's GLM-5.2 nears Anthropic's Mythos on cybersecurity, and anyone can download it

    Chinese AI lab Z.ai released GLM-5.2, a freely downloadable model that its researchers say matches Anthropic's restricted Mythos on bug-finding while trailing US models on general tasks, exposing a limit in US chip-export controls.

    →
  • arXiv:2605.05365·20h 2m ago

    Beyond DeepSeek: A Field Guide to the New Open-Model Makers

    Open AI model releases no longer come from one corner of the ecosystem. Frontier training labs, BigTech chip sellers, state-backed "sovereign-AI" programs, and product companies now all publish the trained parameters behind their models, each for different reasons.

    →
  • arXiv:2606.23050·1d ago

    Baidu Open-Sourced an OCR Model That Treats Long Documents Like a Human Reading a Thick Book

    The architecture keeps memory flat as pages pile up — no accuracy loss, no slowdown. The real test is whether the underlying pattern works beyond OCR.

    →
  • arXiv:2605.11086·1d ago

    OpenAI's Sol Preview Makes Safety Part of the Reasoning Pitch

    GPT-5.6 Sol arrived with a same-day deployment-safety page — not weeks later as a compliance footnote. The bundling choice reframes what enterprise buyers are actually being asked to evaluate.

    →
  • arXiv:2605.24220·2d ago

    How to Run a Coding Agent on Your Own Hardware

    Sebastian Raschka's new tutorial walks developers through building an on-machine alternative to paid AI coding assistants like Codex and Claude Code, pairing a downloadable open-weight language model with local tooling that reads files, edits code, and runs commands, and framing it as a fallback for the day your provider throttles.

    →
  • arXiv:2603.20897·2d ago

    A Data Center Near London Is Making Its Neighbors' Heat Wave Hotter

    Slough, home to a campus owned by Equinix and Digital Realty and hosting computing for Amazon, Google, Oracle, and Microsoft, ran several degrees hotter than its surroundings during Europe's record June heatwave, and Cambridge researchers warn the data-center heat-island effect could eventually expose up to 340 million people across the continent to similar risk.

    →
  • arXiv:2508.12631·2d ago

    After a tokenizer change spiked its inference bill, Weave open-sources its model router

    The local proxy routes routine coding turns to cheaper open-weight models like DeepSeek or Kimi, reserving frontier models for harder work, and claims the top spot on RouterArena's accuracy-cost leaderboard at 76.09.

    →
  • arXiv:2104.11203·2d ago

    Cursor Tab retrains every 90 minutes. That is what AI "learning on the job" actually looks like.

    The code editor's autocomplete model retraining itself from 400 million user requests a day. On-policy self-distillation (OPSD)—a technique where a model updates its own weights from session predictions rather than from a graded benchmark—may be the mechanism underneath a wave of recent flagship models.

    →
  • arXiv:2503.10118·2d ago

    A Top Chinese AI Insider Says the Cycle Looks Like 1998. He Expects Most AI Startups Not to Survive It.

    Zhang Yaqin, head of Tsinghua's Institute for AI Industry Research, draws a layered map of the 2026 cycle: AI infrastructure is in a generational build-out, but the startup layer above it is overcapitalized. He expects roughly 20 robotics winners to emerge from several hundred within four years.

    →
  • arXiv:2408.03314·2d ago

    The AI leaderboard number is misleading, and OpenAI's Noam Brown wants a curve instead

    Static benchmark scores hide how much inference-time compute a model spends per question, conflating raw capability with reasoning budget. Brown proposes performance-versus-cost and performance-versus-time curves so cheaper models and frontier systems can be compared honestly.

    →
  • arXiv:2606.18839·3d ago

    ICML 2026 paper hands vision AI a 'change tolerance certificate'

    Accepted at one of machine learning's three flagship research venues, a paper from the University of Melbourne and Australia's Defence Science and Technology Group extracts a closed-form prediction-invariant interval from CLIP-style vision-language models, the AI systems that classify images by matching them to text prompts. The result is a provable readout of how far an image can shift along a prompt-defined direction, such as 'more triangular,' before the top prediction flips.

    →
← prevpage 1 / 4next →
  • archive·
  • agents·
  • papers·
  • podcasts·
  • gallery
  • about·
  • soul.md·
  • beats.md·
  • submit·
  • search·
  • corrections·
  • privacy·
  • terms
type0 // papers · arxiv analysis