Wave 2: AI for Science

Wave 2 asked panelists to provide forecast about the future role and limitations of AI across science and medicine.

First released on:
10 November, 2025

The following report summarizes responses from 277 experts, in addition to 58 superforecasters and 1022 members of the public collected between Aug 18, 2025 and Sep 15, 2025. Within expert respondents, 63 computer scientists, 57 industry professionals, 60 economists, and 97 research staff at policy think tanks participated.

Our wider website contains more information about LEAP, our Panel, and our Methodology, as well as reports from other waves.

Questions

  • Millennium Prize: What is the probability that AI will solve or substantially assist in solving a Millennium Prize Problem in mathematics by the following resolution dates? ⬇️

  • Diffusion of AI Across Sciences: What percent of publications in the fields of Physics, Materials Science, and Medicine in 2030 will be ‘AI-engaged’ as measured in a replication of this study? ⬇️

  • Drug Discovery: What percent of sales of recently approved U.S. drugs will be from AI-discovered drugs and products derived from AI-discovered drugs in the years 2027, 2030 and 2040? ⬇️

  • Electricity Consumption: What percent of U.S. electricity consumption will be used for training and deploying AI systems in the years 2027, 2030 and 2040? ⬇️

  • Cognitive Limitations, Part II: By the end of 2030, what percent of LEAP expert panelists will agree that each of the following is a serious cognitive limitation of state-of-the-art AI systems? ⬇️

For full question details and resolution criteria, see below.

Results

In this section, we present each question, and summarize the forecasts made and the reasoning underlying those forecasts. More concretely, we present (1) background material, historical baselines, and resolution criteria; (2) graphs, results summaries, and results tables; (3) rationale analyses and rationale examples. In the first three waves, experts and superforecasters wrote over 600,000 words supporting their beliefs. We analyze these rationales alongside predictions to provide significantly more context on why experts believe what they believe, and the drivers of disagreement, than the forecasts alone.

Millennium Prize

Question. What is the probability that AI will solve or substantially assist in solving a Millennium Prize Problem in mathematics by the following resolution dates?

Results. Experts estimate a 10% chance that AI will solve or substantially assist in solving a Millennium Prize Problem by 2027,1 up to 20% by 2030,2 and 60% by 2040.3 All categories of experts, superforecasters, and the public largely predict similarly across timescales. However, there is wide disagreement between experts: the top quartile of experts think there’s at least an even (50%) chance of AI assistance by 2030, whereas the bottom quartile of experts think there’s only a 10% chance. The disagreement by 2040 is even larger: the interquartile range for expert medians is 30%–81%, while the top decile of experts believe there’s a 95% chance and the bottom decile think there’s only a 10% chance.

Millennium Prize. The figure above shows the distribution of forecasts by participant group, illustrating the median (50th percentile) and interquartile range (25th–75th percentiles) of each forecast.

Rationale analysis:

  • DeepMind/Navier-Stokes: High-forecast respondents frequently cite DeepMind CEO's January 2025 statement that, in partnership with a team of mathematicians, they're “close to solving” one of the problems (later identified as Navier-Stokes) within “a year or year and a half” (Ansede 2025). This is treated as strong, concrete evidence. Low-forecast respondents generally don't mention this or dismiss corporate pronouncements. One expert cites the Clay Institute president's June 2025 claim: “We're very far away from AI being able to say anything serious about any of those problems” (Heaven 2025).
  • Benchmarks: Many high-forecast respondents point to International Mathematical Olympiad gold medals (Luong and Lockhart 2025) and FrontierMath progress as evidence of rapid capability growth in mathematical reasoning that will likely continue. Low-forecast respondents tend to argue that these are fundamentally different challenges. One mathematician notes: “The Math Olympiad is targeted toward gifted high school students spending an afternoon on a problem solvable with known techniques…[whereas] Millennium Prize problems can consume entire careers without a solution.” Multiple forecasters note FrontierMath Tier 4 (which poses much harder problems than Tiers 1-3) has <10% solve rates.
  • The nature of Millennium problems: High-forecast respondents commonly emphasize that math is verifiable, has clear structure, and that some problems (Navier-Stokes, Birch–Swinnerton-Dyer) may be suited to AI-assisted numerical exploration or pattern recognition. Low-forecast respondents often express doubts that Millennium Problems are solvable with the current AI paradigm, emphasizing doing so requires “deep conceptual breakthroughs,” “developing new concepts and mathematical rules,” and “truly out of the box thinking.” One domain expert writes: “The current generation of AI does not seem to be able to do this sort of creative mathematical work at all. It can apply known techniques and get novel results, but these results would be very easy for top working mathematicians.”
  • Base rates and timelines: High-forecast respondents mostly don't engage with base rates, or they argue that AI changes the game fundamentally. By contrast, many low-forecast respondents emphasize that only one out of seven problems have been solved in the 25 years since the prize was announced, meaning some have remained unsolved for more than a century. They also highlight Millennium Prize rules: upon the publication of a solution, a minimum of two years must pass before a prize can be awarded, to allow time for adequate verification. (In the case of the one prize that was awarded, the gap between the publication of the solution and the awarding of the prize was over seven years.) This, many low-forecast respondents point out, renders the 2027/2030 dates almost impossible regardless of technical progress.
  • "Substantially assist" interpretation: High-forecast respondents tend toward a broad interpretation—any meaningful acceleration of human-AI hybrid research counts, whereas low-forecast respondents tend toward restrictive interpretation. One notes the resolution criteria require contribution “likely not producible without AI,” which is a higher bar.
  • Architecture sufficiency: Most high-forecast respondents believe incremental improvements over current LLM capabilities will be sufficient, especially when paired with specialized tools (Lean, AlphaProof) and human collaboration. Low-forecast respondents frequently argue the current LLM paradigm fundamentally cannot do this. Multiple forecasters say we need “entirely new architectures” (neurosymbolic systems were mentioned several times) or that a “pattern matching paradigm doesn't extend to the deep creativity required.”
  • Difficulty of achieving superhuman performance: Although rarely discussed by high-forecast respondents, a few low-forecast respondents expressed doubts that this could be achieved, with one writing, “Training a model to do math at the level of human experts might be a qualitatively different ML problem from training a model to do math surpassing expert capabilities. RL training requires creating problems with reward functions...We haven't achieved that with reasoning post-training yet.”

Diffusion of AI Across Sciences

Question. What percent of publications in the fields of Physics, Materials Science, and Medicine in 2030 will be ‘AI-engaged’ as measured in a replication of this study?

Results. Experts predict a 10x increase (from 3% to ~30%) in AI-engaged papers in Physics,9 Materials Science,10 and Medicine,11 between 2022 and 2030. This means that experts predict roughly 5x faster year-by-year progress in the next 8 years than we saw from 1985 to 2022.12 However, experts disagree: the interquartile range for expert medians is 15–50%, the bottom decile of experts believe less than 10% of papers will be AI-engaged, and the top decile of experts believe more than 70% of papers will be AI-engaged. The public expects meaningfully less diffusion than experts and superforecasters, predicting about 21%,13 roughly two thirds as much as experts.

Diffusion of AI Across Sciences. The figure above shows the distribution of forecasts by participant group, illustrating the median (50th percentile) and interquartile range (25th–75th percentiles) of each forecast.

Rationale analysis:

  • Measurement concerns: A significant proportion of forecasters predict low numbers because they believe it will be difficult to determine AI-engagement by abstracts alone: “AI will become so prevalent that its use will be assumed--like the use of a computer--and not mentioned in abstracts.” But several high-forecast respondents point out that “major publishers (e.g., Nature, Elsevier) increasingly require AI/tool disclosure,” even if “the details of AI tools are still discussed in the main body of the papers.”
  • Trend extrapolation: Many high-forecast respondents emphasize that the academic fields under study attained a combined increase of 1293% since 1985 (Duede et al. 2024), that this “rapid increase is mostly associated with the seven years between 2015 and 2022,” and “large increases in the percentage of AI-engaged have surely already occurred by Q3 of 2025 compared with the results of the 2022 study.” Low-forecast respondents often question the wisdom of anchoring off these historical trends.
  • AI capabilities: High-forecast respondents frequently emphasize that “AI will become a ubiquitous research tool” due to domain-specific models, agentic systems, and improved AI literacy among scientists. Low-forecast respondents tend to suspect that unreliability and a lack of interpretability will be a bottleneck. Some argue that to be truly useful in all aspects of research, a breakthrough in the underlying architecture is probably required.
  • Cultural resistance: A few low-forecast respondents consider that older researchers may be reluctant to adopt AI, and the introduction of AI-literate researchers into these fields will likely be gradual: “Integrating AI into fields that have barely engaged with it in the past will be naturally slow, both because of the lack of interdisciplinary knowledge and natural resistance to change from the existing body of researchers in these fields.”
  • Physics: Many high-forecast respondents highlight that extensive AI use is likely in data-rich, math-heavy fields of study where AI's mathematical and computational capabilities will be helpful, in particular: particle physics, astrophysics, high-energy physics, quantum systems, anomaly detection, and large-scale simulations. Some low-forecast respondents, however, express skepticism that theory-heavy subfields will see much engagement by 2030: “AI is way less useful for theoretical papers as there is black box problem of not being able to test the hypothesis against causal empirical findings.” Another writes that, in addition to lacking the requisite capabilities, “physics has a conservative publishing culture -- theorists in particular won't add AI to their abstracts unless the method is clearly central.”
  • Materials Science: As with physics, high-forecast respondents often emphasize that AI is well-suited to assist with data-rich, math-heavy subfields like molecular design and discovery, computational chemistry, energy storage, inverse design, high-throughput screening, and simulation AI coupling. One notes: “Materials science is...the one with the highest potential to be AI-engaged in the next decade due to its large dependence on knowledge coming from sophisticated combinatorics from a fixed set of elements.” Some also stress that industrial demand for accelerated discovery, and the financial rewards that might follow, will provide a strong push. One low-engagement forecaster stressed that “materials science likely [won’t] benefit from vast troves of data crossing disciplines unless new methods of AI accessible data collection are developed.”
  • Medicine: High-forecast respondents commonly note that AI is already used in imaging, diagnostics, genomics, drug discovery, medical natural language processing, and personalized medicine design: “Medicine shows the most rapid uptake, fueled by the pervasive use of AI in medical imaging diagnostic algorithms, and clinical decision support systems.” One emphasizes that “AI could enable the utilization of the enormous amount of medical data contained in electronic health records (EHRs).” Low-forecast respondents focus more on the possibility that regulatory, data privacy, validation, and ethical concerns could limit use, and that the number of observational clinical case studies and trials, which are unlikely to involve AI, will keep the percentage low: “A significant portion of papers are observational, often reporting causal effects. There isn't much room for AI in these sorts of papers, as current statistical methods are more reliable and bias-free, compared to AI.”

Drug Discovery

Question. What percent of sales of recently approved U.S. drugs will be from AI-discovered drugs and products derived from AI-discovered drugs in the years 2027, 2030 and 2040?

Results. Experts predict that 1.6% of recently approved drug sales in 2027 will come from AI-discovered drugs,13 5% by 2030,14 and 25% by 2040.15 Experts, superforecasters and the public largely forecast similarly for 2027 and 2030, but sharply diverge in 2040. Experts predict almost double the sales relative to the public (25% vs 15%16), and superforecasters predict almost double relative to experts (45%17 vs 25%), with superforecasters in aggregate giving 25% that more than 70% of sales will come from AI-discovered drugs. Drug discovery is one of the few settings where superforecasters are meaningfully more optimistic than experts.

Drug Discovery. The figure above shows the median 50th percentile (as well as 25th and 75th percentiles when applicable) forecasts by participant group.
Drug Discovery. The figure above shows pooled probability distributions. We estimate each forecaster’s full probability distribution from their 25th, 50th, and 75th percentile forecasts, by fitting the cumulative density function of an appropriate distribution (i.e., beta or gamma distribution) to the observed forecasts using nonlinear least squares.these samples to an appropriate distribution (i.e., beta or gamma distribution). We then sample from these distributions and plot the aggregated distribution for each forecaster category.

Rationale analysis:

  • FDA approval timelines: Several high-forecast respondents believe that AI will accelerate discovery-to-market timelines through faster design-make-test-analyze loops and potentially AI-enabled pharmacodynamic simulations that could streamline clinical trials. Others note that drug discovery-to-market timelines can be shortened significantly during times of crisis via EUAs (emergency use authorizations). Low-forecast respondents commonly emphasize regulatory realities that may limit AI's impact on approval timelines: “Given that the median time it takes to get through the FDA approval process is over 10 years, and no AI-discovered drugs appear to have started Phase III trials yet,18 2027 is likely too soon for many, if any, new AI drugs to be approved."L
  • Phase I success rates: Many high-forecast respondents note that AI-discovered drugs already demonstrate significantly higher Phase I success rates, and that “extrapolating from current rates of increase in the number of proposed AI drugs, these will constitute a majority of new clinical trial submissions.” Several low-percentage forecasters, however, think that “the turnaround time between Phase I and approval will not speed up substantially for AI-invented drugs,” because “early entrants sped through Phase I but then quickly reverted to the mean in Phase II.”
  • AI ubiquity by 2040: Many high-forecast respondents expect that by 2040, AI will become “a standard discovery tool.” One writes, “by then, I expect AI to be fully embedded in how drug discovery is done” and another that, “as some companies invest in AI adoption and see payoffs, more competitors and startups will do the same in order to compete.” But several low-percentage forecasters argue institutional inertia will slow adoption: “While there are some pharma companies that have widely embraced the usage of deep learning models...much of the industry is fairly slow moving to adopt new techniques....[The] market shift will likely take a decade at least to fully change the nature of biomedical research.”
  • What qualifies as "AI-discovered": Most high-forecast respondents interpret the term broadly, with one expert noting, “It gets tricky to pin down what counts as an AI-discovered drug. Based on you using reports like the Boston Consulting Group… I'm assuming you go for a fairly broad sense, i.e., that earlier attempts at ML assistance in drug discovery count as AI.” Low-percentage forecasters tend to question whether narrow tools will count: “Traditional methods (including Bayesian methods and random forest) with simple computational features can already perform well on drug property prediction...do Bayesian methods and random forest count as AI?”
  • Power laws: Some high-forecast respondents hypothesize that AI-discovered blockbusters may dominate sales: “Sales will probably follow a power law of some sort (i.e., a small number of drugs will have a large number of sales). If AI invents one or more blockbusters, then sales might be highly skewed.” Echoing that sentiment, another writes, “there's a fat tail from the possibility that one or more AI-discovered wonder drugs gets rapid approval and huge sales.”​ Several low-forecast respondents point to the possibility that AI may instead “increase the availability of treatments for rare diseases, but these drugs would likely not make up a large part of total new drug sales.”
  • Regulatory environment: A few high-percentage forecasters cite the potential for deregulation under the current administration and emphasize what seems to be a “bipartisan recommendation to reduce [the] FDA drug timeline.” Several low-percentage forecasters see the current political climate as constraining: “I kept my estimates low in the near term because of the ongoing disruption to FDA approval and drug discovery pipelines under the current U.S. Government and Trump Administration.”

Electricity Consumption

Question. What percent of U.S. electricity consumption will be used for training and deploying AI systems in the years 2027, 2030 and 2040?

Results. The median expert predicts that 4% of U.S. electricity consumption will be used for training and deploying AI systems in 2027.19 That rises to 7% of all electricity consumption in 2030,20 and close to double that (12%) in 2040.21 For context: 7% is 1.5x today’s entire data-center load, and 12% is close to all of Texas’ electricity use. The top quartile of experts predict that AI training and deployment will account for more than 20% of total U.S. electricity consumption by 2040, and the top 10% believe it will account for more than 30%.22 20% is almost all of the industrial sector’s electricity use. Experts and the public predict similarly across all dates, while superforecasters are slightly less optimistic—predicting 3%,23 6%,24 and 10%.25

Electricity Consumption. The figure above shows the median 50th percentile (as well as 25th and 75th percentiles when applicable) forecasts by participant group.
Electricity Consumption. The figure above shows pooled probability distributions. We estimate each forecaster’s full probability distribution from their 25th, 50th, and 75th percentile forecasts, by fitting the cumulative density function of an appropriate distribution (i.e., beta or gamma distribution) to the observed forecasts using nonlinear least squares.these samples to an appropriate distribution (i.e., beta or gamma distribution). We then sample from these distributions and plot the aggregated distribution for each forecaster category.

Rationale analysis:

  • Infrastructure: Many high-forecast respondents emphasize that tech companies are making unprecedented capital commitments to AI infrastructure: “With the building of data centers and dramatic investments in large-scale infrastructure, such as Project Stargate, alongside policies and executive orders enabling the leasing of federal land for more AI data centers…I absolutely think electricity consumption for AI in the near term will skyrocket.” Others point to the possibility that competition between companies and nations for supremacy in AI may lead to “an explosion in energy usage.” Low-forecast respondents tend to focus more on potentially formidable constraints, particularly when considering “the material and political investments necessary to get significant growth—physical data centers, chips, permitting, water for cooling, transmission lines, etc.”
  • Projections: Many forecasters anchor on institutional projections: “The Boston Consulting Group says 7.5% in 2030 [for data centers]. Bloomberg says 8.6% [for data centers] by 2035…AI-specialized servers accounted for 15% of total global data center demand in 2024. Electricity usage by specialized AI hardware accounted for 11-20% in 2024. Demand implies that a share of AI-related U.S. data center electricity consumption of 50/180=27.8%.”
  • Efficiency gains: A common low-forecaster consideration was that “there is a huge amount of scope for efficiency gains [and] things will become more efficient as they scale up,” and several noted that “DeepSeek appears to consume considerably less power” than frontier U.S. models. High-forecast respondents, however, largely don’t believe efficiency improvements will meaningfully reduce overall consumption: “More efficient chips and algorithms will simply lead to more compute going into training and inference.” One argues, “Jevons paradox will surprise on the upside - i.e., even more efficient chips will mean more usage with little savings in terms of energy efficiency.”
  • AI bubble: Many low-forecast respondents emphasize business model concerns: “The current economics of all of this are obviously not sustainable: we're spending billions of dollars on these data centers in support of AI companies, or AI segments of hyperscalers, that are losing huge amounts of money off their core AI businesses.” Multiple forecasters note “AI will need to begin demonstrating economic and social utility rapidly to justify the investments in physical infrastructure necessary to sustain rapid growth in energy consumption.”
  • Geopolitical competition: Some high-forecast respondents emphasize that “China is also investing massive amounts in datacenters,” with one writing “there's a chance that we enter an arms race that is mostly determined by who can pump the most electricity into AI.” A low-forecast respondent suggests this dynamic, instead of resulting in U.S. growth, could drive infrastructure offshore: “Major developers will possibly respond by increasingly outsourcing the physical infrastructure of data processing to locales outside of the US—there's no particular reason why models need to be trained inside of U.S. borders where the economic and political expenses are potentially much higher.”
  • Scaling plateau: High-forecast respondents typically expect continued returns from scaling: “I think it's more likely than not that transformative superintelligence will have arrived by 2040, in which case almost all knowledge work …will be reliant on the use of AI models.” Whereas low-growth forecasters commonly express that they anticipate diminishing returns: “Unless further scaling leads to qualitative leaps in AI capabilities, the growth in electricity consumption for AI training and deployment is unlikely to be exceptionally rapid;” “We have already seen indications that the limits of scaling may be soon reached.”
  • Denominator effects: Rarely emphasized by high-forecasters, low-forecast respondents frequently pointed out that “while growth in power production will likely be necessary for much growth in AI systems, overall power consumption is the denominator in this calculation, and increased demand from electric vehicles could play a role in growing this denominator” and therefore “data centers' energy consumption will grow in absolute terms, but not so much in relative terms.”

Cognitive Limitations, Part II

Question. By the end of 2030, what percent of LEAP expert panelists will agree that each of the following is a serious cognitive limitation of state-of-the-art AI systems?

Results. Experts predict that memory, hallucination, reasoning, and inter-system collaboration are not likely to be judged as serious cognitive limitations for AI in 2030 (20%,26 35%,27 30%,28 and 30% respectively29), whereas metacognition, embodiment, and generalization are predicted to be more serious (55%,30 50%,31 50%32). There’s substantial disagreement between experts: for example, half of experts’ best guesses for whether Embodiment will be rated a serious limitation span 30-75%, whereas the bottom decile of experts predicts less than 20%, and the top decile predicts more than 90%. There’s no consensus that any particular limitation will turn out to be serious. The public predicts fairly uniformly across limitations,33 which means they predict substantially lower than experts on Metacognition, Generalization and Embodiment.

Cognitive Limitations, Part II. The figure above shows the distribution of forecasts by participant group, illustrating the median (50th percentile) and interquartile range (25th–75th percentiles) of each forecast.

Rationale analysis:

  • General. Forecasters disagree strongly on whether incremental improvements through scaling and post-training enhancements will suffice to overcome AI's cognitive limitations. Low-forecast respondents tend to believe that “reasoning, memory, and embodiment are all something that can be improved with more computing power, more parameters, and more data.” High-forecast respondents often emphasize that limitations are “intrinsically interlinked,” and that to truly solve them, “we need entirely new architectures.”
  • Hallucination/inaccuracy. Many low-forecast respondents emphasize that the frequency of hallucinations has already been substantially reduced with some predicting that “productized retrieval, tool use, and verifiers [will] cut hallucinations enough that only a minority will still call them serious.” Some also predict that users will become increasingly adept at prompting to avoid hallucinations. Several high-forecast respondents maintain that “even with retrieval-augmented generation and tool use, hallucinations will remain a widely recognized constraint especially in high-stakes settings like medicine and law.”
  • Shallow reasoning. Low-forecast respondents often highlight recent progress, noting, “the advancements that have been made in just the last year to agentic systems and reasoning engines have [been] shocking levels of improvements.” They tend to think that test-time compute, tool-augmented reasoning, and other inference enhancements will lead to additional improvements. High-forecast respondents, however, tend to think that, “while AI can be expected to achieve superhuman performance in formal closed-system domains like math, code, and certain scientific domains, the ability to perform robust informal reasoning about the messy open world remains a frontier problem that will likely require new architectural breakthroughs.”
  • Long-term memory. There is relative consensus that the memory issue is solvable, with disagreements mainly about timeline and completeness. Low-forecast respondents point to technical solutions: “Memory systems (vector databases, stateful agents, persistent contexts) are improving rapidly.” High-forecast respondents frequently acknowledge progress but emphasize limitations: “How to enable AI to work continuously over years like a human expert remains a significant and currently challenging problem.” They focus more on technical challenges, including “context rot which limits the use of explicit in-context prompting as a tool for memory,” and “the general limitations of RAG [retrieval augmented generation].”
  • Limited ability to generalize. Low-forecast respondents often emphasize that transfer learning will help, along with "the explosion, and use, of synthetic data, greater progress in test-time learning, and programmatic reasoning.” Some high-forecast respondents acknowledge that “few-shot learning and transfer learning continue to improve,” but tend to think that an inability to generalize is inherent to the underlying architecture, and that “failures on out-of-distribution or truly novel tasks will still be seen as a serious barrier especially by economists and policymakers.”
  • Metacognition and Continual Learning. Many forecasters identify this as the core unsolved limitation. Low-forecast respondents are scarce and many high-forecast respondents are emphatic: “I deeply believe metacognition is the number one limitation today and that it will remain the case by 2030.” They emphasize that models “don't know what they don't know” in that they are unable to reliably self-correct, defer when appropriate, or guide themselves through the acquisition of new skills and knowledge—and there is no clear solution to address these issues on the horizon. One forecaster bluntly states: “I don't know how to solve this with RL [reinforcement learning]. Serious fundamental advances are needed.”
  • Embodiment/Robotics. Disagreements center on timelines. Low-forecast respondents commonly point to recent progress: “If you view a Waymo vehicle as basically a robot on wheels, it is apparent that the problem of navigating in physical space in the real world is manageable.” High-forecast respondents tend to emphasize fundamental constraints: “To generate training data we need to deploy robots in real-world settings to collect data. However, doing this on a large scale presents practical challenges.” A roboticist forecaster agrees: “Embodied AI is significantly more challenging than many in the AI community believe. It is severely data restricted.” Many high-forecast respondents think use cases in unstructured environments (agriculture, construction) are likely to be particularly constrained.
  • Inter-system collaboration. Many low-forecast respondents emphasize that this “appears to be in large part a systems integration and standardization problem that will see major progress in light of strong commercial incentives.” They note rapid progress has already been made via the deployment of tools that “permit a supervised semi-autonomous optimization framework wherein instruction sets, for submission to subordinate models, are able to be optimized, evaluated, and aligned with human experts.” Some low-forecast respondents, however, point out that “AI systems represent various individuals, companies, and organizations, each with their own interests,” leading to a “lack of interoperable standards.” One notes: “While short-term collaboration between AI systems is feasible and not a major obstacle, sustained, coordinated collaboration over extended periods remains challenging.”

Footnotes

  1. Raw data: IQR on the 50th percentile was (3.0%–20.0%). 90th percentile of median forecast: 44.5.

  2. Raw data: IQR on the 50th percentile was (10.0%–50.0%). 90th percentile of median forecast: 65.4.

  3. Raw data: IQR on the 50th percentile was (30.3%–80.8%). 90th percentile of median forecast: 95.0.

  4. In some cases, the "aggregate" refers to the mean; in others, the median is used, depending on which is more appropriate for the distribution of responses. 2 3 4 5

  5. We occasionally elicit participants' quantile forecasts (estimates of specific percentiles of a continuous outcome) to illustrate the range and uncertainty of their predictions. 2 3 4 5

  6. The expert is referring to and quoting from Ansede (2025).

  7. The expert is referring to MacKenzie (1999).

  8. The expert appears to be referring to an August 2025 post by OpenAI researcher Sebastien Bubeck (Bubeck 2025).

  9. Raw data: Median 50th percentile forecast: 25%; IQR on the 50th percentile was (15.0%–50.0%)

  10. Raw data: Median 50th percentile forecast: 30%; IQR on the 50th percentile was (15.0%–60.0%)

  11. Raw data: Median 50th percentile forecast: 30%; IQR on the 50th percentile was (20.0%–50.0%)

  12. For comparison, from 1985 to 2022—37 years—these fields saw 11x, 8x, and 14x more AI-engaged papers.

  13. Medians for Physics: 20%; Materials Science: 20%; Medicine: 24%. 2

  14. Raw data: IQR on the 50th percentile was (2.2%–10.0%); median 25th and 75th percentile forecasts were 2.0% and 10.0% respectively.

  15. Raw data: IQR on the 50th percentile was (10.0%–50.0%); median 25th and 75th percentile forecasts were 10.0% and 43.1% respectively.

  16. Raw data: IQR on the 50th percentile was (6.9%–30.0%); median 25th and 75th percentile forecasts were 10.0% and 23.0% respectively.

  17. Raw data: IQR on the 50th percentile was (15.0%–70.0%); median 25th and 75th percentile forecasts were 21.0% and 80.0% respectively.

  18. This was true at the time this expert completed the survey.

  19. Raw data: IQR on the 50th percentile was (3.0%–6.4%); median 25th and 75th percentile forecasts were 2.0% and 7.0% respectively.

  20. Raw data: IQR on the 50th percentile was (5.0%–10.0%); median 25th and 75th percentile forecasts were 4.5% and 12.0% respectively.

  21. Raw data: IQR on the 50th percentile was (8.0%–19.7%); median 25th and 75th percentile forecasts were 6.0% and 20.0% respectively.

  22. Pooled distribution: IQR (6.31%–24.13%); variance decomposition: 46.37% between–forecaster disagreement, 53.63% within–forecaster uncertainty.

  23. Raw data: IQR on the 50th percentile was (2.0%–4.0%); median 25th and 75th percentile forecasts were 1.5% and 4.5% respectively.

  24. Raw data: IQR on the 50th percentile was (3.7%–8.0%); median 25th and 75th percentile forecasts were 3.0% and 8.0% respectively.

  25. Raw data: IQR on the 50th percentile was (6.0%–15.0%); median 25th and 75th percentile forecasts were 5.0% and 15.0% respectively.

  26. Raw data: IQR on the 50th percentile was (10.0%–40.0%)

  27. Raw data: IQR on the 50th percentile was (20.0%–60.0%)

  28. Raw data: IQR on the 50th percentile was (20.0%–57.0%)

  29. Raw data: IQR on the 50th percentile was (10.0%–40.0%)

  30. Raw data: IQR on the 50th percentile was (30.0%–70.0%)

  31. Raw data: IQR on the 50th percentile was (30.0%–75.0%)

  32. Raw data: IQR on the 50th percentile was (25.0%–60.0%)

  33. The median response for each limitation falls between 25-35%.

Cite Our Work

Please use one of the following citation formats to cite this work.

APA Format

Murphy, C., Rosenberg, J., Canedy, J., Jacobs, Z., Flechner, N., Britt, R., Pan, A., Rogers-Smith, C., Mayland, D., Buffington, C., Kučinskas, S., Coston, A., Kerner, H., Pierson, E., Rabbany, R., Salganik, M., Seamans, R., Su, Y., Tramèr, F., Hashimoto, T., Narayanan, A., Tetlock, P. E., & Karger, E. (2025). The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact (Working paper No. 5). Forecasting Research Institute. Retrieved 2025-12-17, from https://leap.forecastingresearch.org/reports/wave2

BibTeX

@techreport{leap2025,
    author = {Murphy, Connacher and Rosenberg, Josh and Canedy, Jordan and Jacobs, Zach and Flechner, Nadja and Britt, Rhiannon and Pan, Alexa and Rogers-Smith, Charlie and Mayland, Dan and Buffington, Cathy and Kučinskas, Simas and Coston, Amanda and Kerner, Hannah and Pierson, Emma and Rabbany, Reihaneh and Salganik, Matthew and Seamans, Robert and Su, Yu and Tramèr, Florian and Hashimoto, Tatsunori and Narayanan, Arvind and Tetlock, Philip E. and Karger, Ezra},
    title = {The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact},
    institution = {Forecasting Research Institute},
    type = {Working paper},
    number = {5},
    url = {https://leap.forecastingresearch.org/reports/wave2}
    urldate = {2025-12-17}
    year = {2025}
  }