Insights from Waves 1, 2, and 3

An analysis of forecasts across Waves 1, 2, and 3: understanding expert views on headline AI indicators, applications of AI to science, and broad adoption.

First released on:
10 November 2025

Executive Summary

Background

There are major disagreements about the rate of AI progress and its future impact. Leaders of AI companies forecast a future where AI cures all diseases, replaces whole classes of jobs, and supercharges GDP growth by the 2030s. Others argue that AI’s impact will amount to little more than a modest boost in productivity—if anything at all. Existing surveys suggest that experts and the general public are similarly split on their beliefs about AI progress and its impact.

Despite these disparate narratives around AI, there has been little detailed work documenting forecasts from comprehensive groups of computer scientists, frontier AI company employees, economists, policy experts, and the general public. What exactly do experts believe about AI progress and impact, where do they disagree, and what rationales underpin their predictions?

The Longitudinal Expert AI Panel (LEAP) provides the most comprehensive understanding yet of expert forecasts on the future of AI and—crucially—the reasons for them. Launched in June 2025, LEAP is an in-depth, monthly survey of experts and the general public that tracks forecasts on key AI progress indicators including benchmarks, labor market impacts, and scientific discovery. LEAP elicits thousands of forecasts from superforecasters with proven track records of accurate forecasts, experts in industry, economics, policy, and computer science, as well as the general public. Forecasters also provide detailed rationales explaining how they arrived at their predictions—for the first three waves, experts and superforecasters submitted over 600,000 words supporting their views.

Based on Forecasting Research Institute’s (FRI) expertise in assembling expert panels and eliciting high-quality forecasts, we aim to make LEAP the most useful and widely-cited single source of expert and public opinions on the future of AI. LEAP is led by researchers at FRI, Stanford University, the Federal Reserve Bank of Chicago, and the University of Pennsylvania, with the support of Princeton AI Lab.

Sample

Invitees include top-cited AI and ML scientists, key technical staff at frontier AI companies, prominent economists, and influential policy experts from a broad range of NGOs.

Our respondent sample of 339 experts includes:

  • Top computer scientists. 41 of our 76 computer science experts (54%) are professors, and 30 of these 41 (73%) are from top-20 institutions (according to CSRankings.org). Ten (13%) are among the 200 top-cited authors in AI (according to OpenAlex). This category also includes researchers at academic and non-academic research institutions.
  • AI industry experts. 20 of our 76 industry respondents (26%) work for one of five leading AI companies: OpenAI, Anthropic, Google DeepMind, Meta, and Nvidia. 21 of the remaining industry respondents (28% of the total) work for either a top AI company (top-20 model providers, by training compute, as measured by Epoch AI), were identified as contributors to top-15 LLMs according to training compute on Epoch or performance on Chatbot Arena, or work for one of the top-30 (by fundraising) AI-related companies (according to Crunchbase).
  • Top economists. 54 of our 68 economist respondents (79%) are professors, and 30 (44%) are from top-50 economics institutions (according to RePEc).
  • Policy and think tank group. We have 119 respondents working in the field of AI policy. We invited participants from leading institutions such as Brookings, RAND, AEI, Epoch AI, AI Now, and the Stanford Institute for Human-Centered Artificial Intelligence.
  • TIME 100. Our panel includes 12 honorees from TIME’s 100 Most Influential People in AI in 2023 and 2024.

Reweighting our expert sample to match a carefully constructed expert sampling frame leaves our headline conclusions unchanged.

For more details about the sample, see below.

Questions

Since the launch of LEAP in June 2025, respondents have completed the first three waves of monthly surveys, answering 18 total questions. These first waves focused on the broad societal impact of AI, AI for science, and the adoption of AI.

For the full set of questions asked in each wave, see here.

Results

Across the first three waves, five patterns stand out:

  1. Experts expect sizable near‑term societal effects from AI by 2040. (More)

    On frontier capability milestones, 23% of experts expect saturation of the FrontierMath benchmark by 2030, and the median expert judges it more likely than not (60%) that AI substantially assists in solving a Millennium Prize Problem by 2040. On broad impacts of AI, the average expert thinks that 23% of LEAP panelists will say the world most closely mirrors a “rapid” AI progress scenario by 2030, where AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer. By 2040, experts give a 32% chance that AI will be at least as impactful as a "technology of the millennium," such as the printing press or the Industrial Revolution.

  2. Experts disagree and express substantial uncertainty about the trajectory of AI. (More)

  3. The median expert expects significantly less AI progress than leaders of frontier AI companies. (More)

    For example, the median expert predicts 2% growth in white-collar employment by 2030 (compared to a 6.8% trend extrapolation). While this is a substantial potential impact, it contrasts with Dario Amodei's prediction of 10-20% overall unemployment within the next five years and Elon Musk's suggestion that all jobs might be replaced by 2030.

  4. Experts predict much faster AI progress than the general public. (More)

    For example, experts forecast that the 2030 autonomous vehicle share of all U.S. rideshare will be 20%, while the public forecasts 12%. For the proportion of work hours assisted by generative AI by 2030, experts forecast 18% vs. the public median forecast of 10%.

  5. There are few differences in prediction between superforecasters and experts, but, where there is disagreement, experts tend to expect more AI progress. We don’t see systematic differences between the beliefs of computer scientists, economists, industry professionals, and policy professionals. (More)

For more details on each of these results, see below.

Sample rationale analysis. LEAP forecasters provide detailed written rationales explaining their forecasts. We expect that much of the value of this project comes from the argumentation and reasoning associated with the forecasts. We provide a summary of forecasters’ arguments about the likelihood of AI substantially assisting with solving a Millennium Prize problem below. We have similar analyses for all of our survey questions in individual wave reports.

Next Steps

This document presents results from the first 3 waves of LEAP surveys, but much of the value of the project will come from future planned work. For example:

  • As questions resolve, we’ll identify the most accurate forecasters and any differences in accuracy among expert groups. We can then present forward-looking forecasts from the most accurate AI forecasters.
  • Every several months we will re-survey respondents on a variety of questions to see how their views on AI evolve over time.
  • We’ll identify short-term cruxes that are most indicative of long-run AI progress and impacts.

See more details on our future plans here.

Resources

For full methodological details behind LEAP, see our white paper.

The text below offers high-level and cross-cutting takeaways. For full results and rationale summaries for all questions, see the following reports:

Sample

LEAP targets prominent experts who policymakers, business and nonprofit leaders, and other stakeholders would be most inclined to consult regarding the progression of AI capabilities and its technological impact. Specifically, we include four expert communities.

  1. We invite computer scientists researching topics in AI by including top-cited authors, stratified by age, and the authors of the top-rated papers at leading AI and ML conferences.
  2. We include industry professionals, identified via their contributions to frontier models or employment at AI-related companies with extensive fundraising.
  3. We identify leading economists, both across fields and within the subfield of economics focused on measuring the economic effects of AI and new technology. We include top-cited authors of papers on AI and technology, members of the US Economic Experts Panel (Clark Centre Forum 2025), and attendees of economics conferences on AI.
  4. We identify research staff at think tanks and other institutions leading the discussion on AI development, policy, and impacts.

For more information on our sampling frame, see Appendix A of the LEAP white paper.

Expert respondents. Our respondent sample of 339 experts includes:

  • Top computer scientists. 41 of our 76 computer science experts (54%) are professors, and 30 of these 41 (73%) are from top-20 institutions (according to CSRankings.org). 23 (30%) had top-rated (top-40 or better) papers at NeurIPS or ICLR in recent years, and eight others (11%) are PhD students or postdocs who are highly cited according to our criteria. Ten (13%) are among the 200 top-cited authors in AI (according to OpenAlex). This category also includes researchers at academic and non-academic research institutions. Our CS respondents have a median of 7,100 citations (for the 95% of panelists for whom data is available).
  • AI industry experts. 20 of our 76 industry respondents (26%) work for one of five leading AI companies: OpenAI, Anthropic, Google DeepMind, Meta, and Nvidia. 21 of the remaining industry respondents (28% of the total) work for either a top AI company (top-20 model providers, by training compute, as measured by Epoch AI), were identified as contributors to top-15 LLMs according to training compute on Epoch or performance on Chatbot Arena, or work for one of the top-30 (by fundraising) AI-related companies (according to Crunchbase). The remaining respondents were recategorized from our CS literature sampling pools, referral sampling, or other categories. Our industry respondents have a median of 9,100 citations (for the 59% of panelists for whom data is available).
  • Top AI economists. 54 of our 68 economist respondents (79%) are professors, and 30 (44%) are from top-50 economics institutions (according to RePEc). Our economist respondents have a median of 2,200 citations (for the 96% of panelists for whom data is available).
  • Policy and think tank group. Of our 119 AI policy respondents, 75 (60%) work for the following most-represented organizations (unordered): Brookings, RAND, Epoch AI, Federation of American Scientists, Center for Security and Emerging Technology, AI Now, Carnegie Endowment, Foundation for American Innovation, Stanford’s Institute for Human-Centered Artificial Intelligence and other related groups, GovAI, Institute for AI Policy and Strategy, Future of Life Institute, Institute for Law & AI, Center for a New American Security, Data & Society Research Institute, Abundance Institute, and the Centre for International Governance Innovation.
  • TIME 100. Our panel includes 12 honorees from TIME’s 100 Most Influential People in AI in 2023 and 2024. (TIME 100 honorees are categorized by their expertise and distributed among the above categories.)

In addition to domain experts, respondents included:

  • 60 highly accurate forecasters (“superforecasters”), based on prior geopolitical forecasting tournaments.
  • 1,400 members of the public, largely consisting of especially engaged participants in previous research, reweighted to be nationally representative of the U.S.

In the first three waves, the median expert respondent spent 44 minutes on each survey, while the median member of the public and superforecaster spent 29 and 90 minutes, respectively.

Individuals with certain viewpoints might be disproportionately likely to respond to our survey conditional on receiving an invite, skewing our results towards these viewpoints associated with a high propensity to respond to the survey. To address concerns about nonresponse bias, we use a standard approach in the public polling field, raking, to adjust aggregate statistics to be representative of the sampling frames. These adjustments do not substantially change any of our results.

For more information on our approach to reweighting, see Appendix A of the LEAP white paper.

Survey Questions

Below is a summary of each question posed in each wave. Participants were also provided with additional context, background information, baselines, and resolution criteria, which are not reproduced here for brevity. For examples of additional question details, see our wave reports. For full text of all questions, see Appendix E of the LEAP white paper.

Wave 1: Headliners

July 2025

  1. FrontierMath. What will be the highest percentage accuracy achieved by an AI model on FrontierMath, by 2025, 2027, and 2030?
  2. Autonomous Vehicle Trips. What percentage of U.S. ride-hailing trips will be provided by autonomous vehicles that are classified SAE Level 4 or above in the years 2027 and 2030?
  3. Occupational Employment Index. What will the percent change in the number of jobs (compared to Jan 1, 2025) in the U.S. be for white-collar, blue-collar, and pink-collar occupations, by 2027 and 2030?
  4. General AI Progress. At the end of 2030, what percent of LEAP panelists will choose “slow progress”, “moderate progress”, or “rapid progress” as best matching the general level of AI progress? (See detailed scenario descriptions here.)
  5. Technological Richter Scale. At the end of 2040, how will experts rank the historical significance of artificial intelligence? Will its impact be more comparable to the internet, the Industrial Revolution, or the emergence of humans? (More details here.)
  6. Cognitive Limitations of AI. What do you see as the main cognitive limitations of AI systems in 2025?

For full results on each question, see our wave 1 report.

Wave 2: AI for Science

August 2025

  1. Millennium Prize. Will AI solve or substantially assist in solving a Millennium Prize Problem in mathematics by 2027, 2030, and 2040?
  2. Diffusion of AI Across Sciences. What percent of publications in the fields of Physics, Materials Science, and Medicine in 2030 will be ‘AI-engaged’ as measured in a replication of this study?
  3. Drug Discovery. What percent of sales of recently approved U.S. drugs will be from AI-discovered drugs and products derived from AI-discovered drugs in the years 2027, 2030, and 2040?
  4. Electricity Consumption. What percent of U.S. electricity consumption will be used for training and deploying AI systems in the years 2027, 2030, and 2040?
  5. Cognitive Limitations, Part II. By the end of 2030, what percent of LEAP expert panelists will agree that each of the following is a serious cognitive limitation of state-of-the-art AI systems?
  6. Barriers to Adoption, Part I. What do you see as the main barriers to adopting current or future state-of-the-art AI systems for broader use in society?

For full results on each question, see our wave 2 report.

Wave 3: Adoption

September 2025

  1. AI Investment. What will be the global private investment (in billion USD) in AI in 2027 and 2030?
  2. Generative AI Use Intensity. What percent of work hours in the U.S. at the following dates will be estimated as assisted by generative AI in 2025, 2027, and 2030?
  3. Personalized Education. What percentage of weekly instructional hours on average will K-12 students in the United States spend using AI-powered tutoring or teaching tools in 2027 and 2030?
  4. Open vs Proprietary Polarity. What will be the mean benchmark performance of the best closed-weight AI models and the top open-weight AI models on the following set of benchmarks by 2025, 2027, 2030?
  5. AI Companions. What proportion of U.S. adults will self-report using AI for companionship at least once daily by 2027, 2030, and 2040?
  6. Barriers to Adoption, II. By the end of 2030, what percent of LEAP expert panelists will say that each of the following factors has significantly slowed AI adoption relative to popular expectations?

For full results on each question, see our wave 3 report.

Key Insights

We summarize key insights below. For additional analysis of each insight, see our white paper.

1. Experts expect sizable societal effects from AI by 2040

In particular, the median expert expects substantial impacts on the ability of AI systems to solve difficult math problems, the use of AI for companionship and work, electricity usage from AI, and investment in AI. Even the lower end of the expert belief distribution still implies substantial impacts of AI:

  • Work: The median expert forecast is that 18% of work hours will be assisted by generative AI in 2030, up from approximately 4.1% in November 2024 (Bick et al. 2025), a 4x increase.1 The bottom quartile of experts give a forecast of 9%, while the top quartile gives a forecast of 30%. The median expert gives a 25% chance the value is 9% or lower (still a 2x increase), and a 25% chance it exceeds 28%.2
  • Private AI investment: The median expert predicts $260 billion of private AI investment by 2030, up from $130 billion in 2024. The median expert gives a 25% chance that investment will be at or below $175 billion, still nearly a third higher than the baseline value, and another 25% chance that investment matches or exceeds $400 billion, just over 3x larger than the baseline level.
  • Electricity usage: The median expert predicts that 7% of U.S. electricity consumption will be used for training and deploying AI systems in 2030, and close to double that (12%) in 2040. For context, 7% is 1.5x today’s entire data-center load, 13% is all of Texas’ electricity use, 23% is almost all of the industrial sector’s electricity use, and 40% accounts for all residential electricity use. Even experts expecting less electricity consumption give substantial median forecasts: the bottom quartile of experts still predict values of 5% in 2030 and 8% in 2040.
  • Math research: 23% of experts predict that the FrontierMath benchmark will be saturated by the end of 2030,3 meaning that AI can autonomously solve a set of math problems that resemble those a math PhD student might spend several days completing. The bottom quarter of experts expect 60% or less of these problems to be solved on the same timeframe, substantially more than the 19% baseline at the time of the survey. The bottom quartile of experts expect 60% or less of these problems to be solved by 2030, substantially more than the 19% baseline at the time of the survey. By 2040, experts predict it is more likely than not (60%) that AI will substantially assist in solving a Millennium Prize Problem, a set of problems comprising some of the most difficult unsolved mathematical problems.
  • Companionship: The median expert predicts that by 2030, 15% of adults will self-report using AI for companionship, emotional support, social interaction, or simulated relationships at least once daily, up from 6% today. By 2040, that number doubles to 30% of adults.

To assess the broader scope of AI’s impacts, we asked experts to assess “slow” versus “moderate” or “fast” scenarios for AI progress, and how AI will compare to other historically significant developments such as the internet, electricity, and the Industrial Revolution. We found:

  • Speed of AI progress: By 2030, the average expert thinks that 23%4 of LEAP panelists will say the world most closely mirrors a “rapid” progress scenario, which we described as: AI writes Pulitzer Prize-worthy novels, collapses years-long research into days and weeks, outcompetes any human software engineer, and independently develops new cures for cancer.5 Conversely, the average expert believes that 28% of panelists will indicate that reality is closest to a slow-progress scenario, in which AI is a useful assisting technology but falls short of transformative impact.
  • Societal impact: By 2040, the median expert predicts that the impact of AI will be comparable to a “technology of the century,” akin to electricity or automobiles. Experts also give a 32% chance that AI will be at least as impactful as a “technology of the millennium,” such as the printing press or the Industrial Revolution and just a 16% chance the AI is equally or less impactful than a “technology of the year” like the VCR.6

The median expert predicts 2%7 growth in white-collar jobs between January 2025 and December 2030. This is significantly slower than a recent linear trend, which would predict 6.8% growth. However, we did not collect forecasts on the causal effect of AI on white collar employment.8 While some experts expect AI to cause white collar job loss, this question does not allow for a clear understanding of that causal relationship.

We summarize the expert forecasts for these various indicators in Figures 1, 2, and 3.

Figure 1: Median expert forecasts for various questions. We display the 10th, 25th, 50th, 75th, and 90th percentiles of the median forecasts given by experts at each date. For example, if 25% of experts give a median forecast of 5%, the 25th percentile series in the graph will lie at 5%; these series are not confidence intervals. Where available, we include a historical baseline.
Figure 2: Average expert forecasts on the General AI Progress question.
Figure 3: Average expert forecasts on the Technological Richter Scale question.

2. Experts disagree and express substantial uncertainty about the trajectory of AI

While the median expert predicts substantial AI progress, and a sizable fraction of experts predict fast progress, experts disagree widely. Notably, the top quartile of experts give a median forecast that 50% of newly approved drug sales in the U.S. in 2040 will be from AI-discovered drugs, compared to a median forecast of just 10% for the bottom quartile of experts.9 Further, the top quartile of experts gives a forecast of at least 81% that AI will solve or substantially assist in solving a solution to a Millennium Prize Problem by 2040, compared to a forecast of just 30% from the bottom quartile of experts.10 We use our pooled distributions to express the relative importance of within-forecaster uncertainty and between-forecaster disagreement. We find that, across all forecasting questions that allow forecasters to express their uncertainty, within-forecaster uncertainty explains 49% of the total variation in forecasts, compared to the 51% explained by between-forecaster disagreement.

In Figure 4 below, we plot the pooled distributions for expert forecasts on the share of work hours assisted by generative AI and FrontierMath scores by the ends of 2025, 2027, and 2030.

Figure 4: Pooled distributions for expert forecasts on Work Hours Assisted by Generative AI (top panels) and FrontierMath scores (bottom panels). These pooled distributions combine within-expert uncertainty and between-expert disagreement. Densities are normalized to the same peak for comparability. See the 'Uncertainty and Disagreement' section of the LEAP white paper for details.

3. The median expert expects significantly less AI progress than leaders of frontier AI companies

Leaders of frontier AI companies have made aggressive predictions about AI progress.

Dario Amodei, co-founder and CEO of Anthropic, predicts:

  • January 2025: “By 2026 or 2027, we will have AI systems that are broadly better than almost all humans at almost all things.”
  • March 2025: Anthropic also claimed in a response to the US Office of Science and Technology Policy that it anticipates that by 2027 AI systems will exist that equal the intellectual capabilities of "Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering."
  • May 2025: Amodei has stated that AI could increase overall unemployment to 10-20% in the next one to five years.

Sam Altman of OpenAI states that:

  • January 2025: “I think AGI will probably get developed during [Donald Trump’s second presidential] term, and getting that right seems really important.”

Elon Musk, leader of xAI and Tesla, writes:

  • December 2024: “It is increasingly likely that AI will superset [sic] the intelligence of any single human by the end of 2025 and maybe all humans by 2027/2028. Probability that AI exceeds the intelligence of all humans combined by 2030 is ~100%.”
  • August 2025: When a user posted “By 2030, all jobs will be replaced by AI and robots,” Musk responded: “Your estimates are about right.”

Demis Hassabis, CEO and co-founder of Google DeepMind predicts:

  • August 2025: “We'll have something that we could sort of reasonably call AGI, that exhibits all the cognitive capabilities humans have, maybe in the next five to 10 years, possibly the lower end of that.”
  • August 2025: “It’s going to be 10 times bigger than the Industrial Revolution, and maybe 10 times faster.”

While we cannot directly compare these claims to LEAP questions, we offer clear evidence based on LEAP forecasts that the median expert expects substantially smaller effects of AI than is expected by frontier AI company leaders.

These industry leader predictions diverge sharply from our expert panel's median forecasts:

  • General capabilities: Lab leaders predict human-level or superhuman AI by 2026-2029, while our expert panel indicates longer timelines for superhuman capabilities. By 2030, the average expert thinks that 23% of LEAP panelists will say the world most closely mirrors an (“rapid”) AI progress scenario that matches some of these claims.
  • White-collar jobs: The median expert predicts 2% growth in white-collar employment by 2030 (compared to a 6.8% trend extrapolation). This contrasts with Elon Musk's suggestion that all jobs might be replaced by 2030.11 Relatedly, Dario Amodei predicts 10-20% overall unemployment within the next five years.
  • Millennium Prize Problems: The median expert gives a 60% chance that AI will substantially assist in solving a Millennium Prize Problem by 2040 (and 20% by 2030). Amodei's prediction of general "Nobel Prize winner" level capabilities by 2026-2027 could imply a much more aggressive timeline, but the implication of Amodei’s predictions are somewhat unclear.12

4. Experts predict much faster AI progress than the general public

Of 68 total forecasts (across 14 questions with multiple time horizons and quantiles) with a clear valence of AI capabilities, the general public holds views about AI progress, capabilities, and diffusion that are statistically indistinguishable13 from experts in 9% of cases, predict less progress14 than experts in a large majority (71%) of all cases, and predict more progress in 21% of forecasts.

Where experts and the public disagree, the public predicts less progress over three times as often as more progress. Across these forecasts that exhibit a clear valence of AI capabilities, a randomly selected expert is 16% more likely than a randomly selected member of the public to predict faster progress than would be expected by random chance.

For full details behind our analysis, see our white paper.

We summarize some of the differences in aggregate forecasts in Figure 5. The questions selected in Figure 5 reflect the progress-valenced questions that easily map onto a percentage scale.

Figure 5: Differences between the expert and public median 50th percentile forecasts for several questions where the unit is a percentage. Points indicate the median of each groups’ 50th percentile forecasts. We apply transformations to create valenced forecasts, where values closer to the left indicate slower progress and values to the right indicate faster progress.

Some of the major differences include:

  • Societal impact: On average, experts give a 63% chance that AI will be at least as impactful as a “technology of the century”—like electricity or automobiles—whereas the public gives this a 43% chance. Further, experts give a 32% chance that it will be at least as impactful as a “technology of the millennium” (akin to the printing press or the Industrial Revolution), while the public gives this a 22% chance.
  • Autonomous vehicles: The public predicts half as much autonomous vehicle progress as experts by 2030 as suggested by each group’s 50th percentile forecasts. The median expert in our sample predicts that usage of autonomous vehicles will grow dramatically—from a baseline of 0.27% of all U.S. rideshare trips in Q4 2024 to 20% by the end of 2030. In comparison, the general public predicts 12% (p < 0.001, Cliffs δ = 0.26).
  • Generative AI use: The public predicts half as much generative AI use in 2030. Experts on average predict that 18% of U.S. work hours will be assisted by generative AI in 2030, whereas the general public predicts 10% (p < 0.001, Cliffs δ = 0.3).
  • Mathematics: 23% of experts predict that FrontierMath15 will be saturated by the end of 2030 in the median scenario,16 meaning that AI can autonomously solve a typical math problem that a math PhD student might spend multiple days completing. Only 6% of the public predict the same, about 3x less.
  • Diffusion into science: Experts predict a roughly 10x increase (from 3% to roughly 30%) in AI-engaged papers across Physics, Materials Science, and Medicine between 2022 and 2030. The general public predicts two thirds as much diffusion into science: that roughly 20% of papers in these fields will be AI-engaged.
  • Drug discovery: By 2040, experts on average predict that 25% of sales from newly approved U.S. drugs will be from AI-discovered drugs, compared to 15% for the public (p < 0.001, Cliff’s δ = 0.26). The median expert also thinks there’s a 25% chance that AI-discovered drugs will account for more than 43% of recent drug sales, whereas the general public predicts there’s a 25% chance of a share greater than 23%—about half the expert forecast.

Contrary to this result, the public assigns more weight to the “Rapid Progress” scenario in the General AI Progress question: the average member of the public assigns a 26% chance to the rapid scenario, compared to 23% for experts.

5. There are few differences in prediction between superforecasters and experts, but, where there is disagreement, experts tend to expect more AI progress. We don’t see systematic differences between the beliefs of computer scientists, economists, industry professionals, and policy professionals.

There are no discernible differences between forecasts from different groups of experts. Across all pairwise comparisons of expert categories for each of the questions with a clear AI progress valence, only 32 out of 408 combinations (7.8%) show statistically significant differences (at a 5% threshold), similar to what you would expect from chance. This means that computer scientists, economists, industry professionals, and policy professionals largely predict similar futures as groups, despite there being significant disagreement about AI among experts. This raises questions about popular narratives that economists tend to be skeptical of AI progress and that employees of AI companies tend to be more optimistic about fast gains in AI capabilities. In other words, while we do see widespread disagreement among experts about the future of AI systems, capabilities, and diffusion, we fail to find evidence that this disagreement is explained by the domain in which experts work. As LEAP continues, we plan to study what factors most drive expert disagreement. However, the groups used for these comparisons are subsets of our expert sample, so these comparisons are necessarily less powered.

Superforecasters and expert groups predict similar futures. Superforecasters are statistically indistinguishable from experts in 69% of valenced forecasts, predict less progress than experts in 26% of forecasts, and more progress in 4% of forecasts. A randomly selected expert is 9.8% more likely than a randomly selected superforecaster to predict faster progress than would be expected by random chance.

Where superforecasters and experts disagree, superforecasters usually (86% of such cases) predict less progress. Further, some of these disagreements are quite large. For example, the median expert predicts that use of autonomous vehicles will grow dramatically—from 0.27% of all U.S. rideshare trips in 2024 to 20% by the end of 2030, whereas the median superforecaster predicts less than half that, 8% (p < 0.001). A randomly selected superforecaster would predict, in the median scenario for 2030, less AV penetration than a randomly selected expert, 37% more than would be expected by random chance. Superforecasters also predict less societal impact from AI and less AI-driven electricity use. Drug discovery is the only setting where superforecasters are more optimistic than experts: By 2040, experts, on average, predict that 25% of sales from recently approved U.S. drugs will be from AI-discovered drugs. Superforecasters predict 45%, almost double (p < 0.01). Here, a randomly selected superforecaster predicts a higher share, in the median scenario, than a randomly selected expert 23% more often than would be expected by chance.

This is consistent with the follow-up to our Existential Risk Persuasion Tournament, where a small sample of experts were more optimistic about AI progress than around 80 superforecasters about AI progress and capabilities in a 2022 (pre-ChatGPT) survey, although both experts and superforecasters significantly underestimated AI progress by 2025 (Kučinskas et al., 2025; Karger et al., 2025). We summarize some of the differences in aggregate forecasts in Figure 6. In Figure 7, we compare aggregate forecasts of the various expert groups.

Figure 6: Differences between the expert and superforecaster median 50th percentile forecasts for several questions where the unit is a percentage. Points indicate the median of each groups’ 50th percentile forecasts. We apply transformations to create valenced forecasts, where values closer to the left indicate slower progress and values to the right indicate faster progress.
Figure 7: Expert category median 50th percentile forecasts for several questions where the unit is a percentage. Points indicate the median of 50th percentile forecasts for each category. We apply transformations to create valenced forecasts, where values closer to the left indicate slower progress and values to the right indicate faster progress.

Sample Rationale and Question-Level Analysis

For each question, we conduct an analysis like the Millennium Prize example below, and present them in our individual wave reports. Each analysis summarizes the question and background information, summarizes the results, and analyses rationales to uncover the core differences in view between low and high forecasts. In the first wave alone, experts and superforecasts wrote over 600,000 words supporting their beliefs. Analysing these rationales alongside predictions provides significantly more context on why experts believe what they believe, and the drivers of disagreement, than the forecasts alone.

Example: Millennium Prize

Question. What is the probability that AI will solve or substantially assist in solving a Millennium Prize Problem in mathematics by the following resolution dates?

Results. Experts estimate a 10% chance that AI will solve or substantially assist in solving a Millennium Prize Problem by 2027,17 up to 20% by 2030,18 and 60% by 2040.19 All categories of experts, superforecasters, and the public largely predict similarly across timescales. However, there is wide disagreement between experts: the top quartile of experts think there’s at least an even (50%) chance of AI assistance by 2030, whereas the bottom quartile of experts think there’s only a 10% chance. The disagreement by 2040 is even larger: the interquartile range for expert medians is 30%–81%, while the top decile of experts believe there’s a 95% chance and the bottom decile think there’s only a 10% chance.

Millennium Prize. The figure above shows the distribution of forecasts by participant group, illustrating the median (50th percentile) and interquartile range (25th–75th percentiles) of each forecast.

Rationale analysis:

  • DeepMind/Navier-Stokes: High-forecast respondents frequently cite DeepMind CEO's January 2025 statement that, in partnership with a team of mathematicians, they're “close to solving” one of the problems (later identified as Navier-Stokes) within “a year or year and a half” (Ansede 2025). This is treated as strong, concrete evidence. Low-forecast respondents generally don't mention this or dismiss corporate pronouncements. One expert cites the Clay Institute president's June 2025 claim: “We're very far away from AI being able to say anything serious about any of those problems” (Heaven 2025).
  • Benchmarks: Many high-forecast respondents point to International Mathematical Olympiad gold medals (Luong and Lockhart 2025) and FrontierMath progress as evidence of rapid capability growth in mathematical reasoning that will likely continue. Low-forecast respondents tend to argue that these are fundamentally different challenges. One mathematician notes: “The Math Olympiad is targeted toward gifted high school students spending an afternoon on a problem solvable with known techniques…[whereas] Millennium Prize problems can consume entire careers without a solution.” Multiple forecasters note FrontierMath Tier 4 (which poses much harder problems than Tiers 1-3) has <10% solve rates.
  • The nature of Millennium problems: High-forecast respondents commonly emphasize that math is verifiable, has clear structure, and that some problems (Navier-Stokes, Birch–Swinnerton-Dyer) may be suited to AI-assisted numerical exploration or pattern recognition. Low-forecast respondents often express doubts that Millennium Problems are solvable with the current AI paradigm, emphasizing doing so requires “deep conceptual breakthroughs,” “developing new concepts and mathematical rules,” and “truly out of the box thinking.” One domain expert writes: “The current generation of AI does not seem to be able to do this sort of creative mathematical work at all. It can apply known techniques and get novel results, but these results would be very easy for top working mathematicians.”
  • Base rates and timelines: High-forecast respondents mostly don't engage with base rates, or they argue that AI changes the game fundamentally. By contrast, many low-forecast respondents emphasize that only one out of seven problems have been solved in the 25 years since the prize was announced, meaning some have remained unsolved for more than a century. They also highlight Millennium Prize rules: upon the publication of a solution, a minimum of two years must pass before a prize can be awarded, to allow time for adequate verification. (In the case of the one prize that was awarded, the gap between the publication of the solution and the awarding of the prize was over seven years.) This, many low-forecast respondents point out, renders the 2027/2030 dates almost impossible regardless of technical progress.
  • "Substantially assist" interpretation: High-forecast respondents tend toward a broad interpretation—any meaningful acceleration of human-AI hybrid research counts, whereas low-forecast respondents tend toward restrictive interpretation. One notes the resolution criteria require contribution “likely not producible without AI,” which is a higher bar.
  • Architecture sufficiency: Most high-forecast respondents believe incremental improvements over current LLM capabilities will be sufficient, especially when paired with specialized tools (Lean, AlphaProof) and human collaboration. Low-forecast respondents frequently argue the current LLM paradigm fundamentally cannot do this. Multiple forecasters say we need “entirely new architectures” (neurosymbolic systems were mentioned several times) or that a “pattern matching paradigm doesn't extend to the deep creativity required.”
  • Difficulty of achieving superhuman performance: Although rarely discussed by high-forecast respondents, a few low-forecast respondents expressed doubts that this could be achieved, with one writing, “Training a model to do math at the level of human experts might be a qualitatively different ML problem from training a model to do math surpassing expert capabilities. RL training requires creating problems with reward functions...We haven't achieved that with reasoning post-training yet.”

Next Steps

We will elicit LEAP forecasts each month for the next three years, and many additional analyses will become possible over time.

Our next LEAP survey (Wave 4) will be focused on forecasts related to AI R&D. For example, we tentatively plan to ask for forecasts on how much AI would uplift frontier AI engineers in a hypothetical randomized trial, when AI could serve as a “drop-in” replacement for AI researchers, how long AI can autonomously complete tasks according to METR’s time horizon assessment, and more.

Future LEAP waves will focus on themes such as security and geopolitics, robotics, productivity and welfare, labor and automation, and more.

If you have suggestions for LEAP questions or wave themes, please let us know via this survey.

Potential future analyses and research directions include:

  • Assessing accuracy. As questions resolve, we will be able to assess the accuracy of experts’ forecasts. This will allow us to identify particularly accurate individual forecasters, assess the relative accuracy of different expert subgroups within our sample, and highlight forward-looking forecasts from the most accurate forecasters. We will also present forecasters with their past performance, assessing how this feedback translates into accuracy on new forecasting questions.
  • Re-surveying to see how views evolve. We plan to re-survey the LEAP sample on many of our questions so that we can see how their views evolve over time. For example, 6 months from now, how will experts’ likelihood of AI assisting with the solution to a Millennium Problem change?
  • “Schools of thought” analysis and crux-finding. We will do more analysis on which “schools of thought” emerge among our forecasters. Are we able to distinguish consistent differences in sets of forecasts among subsets of our sample, e.g. fast vs. slow AI progress groups? Are we able to identify “cruxes”—i.e., strongly differential forecasts between schools of thought in the near term that will enable faster assessment of which group is more likely to be accurate in the long term? This analysis involves not only a quantitative grouping of forecasts, but a look into the rationale data to see what arguments different schools of thought favor.

Footnotes

  1. Respondents were shown a historical baseline value of 2%, based on an earlier version of the cited paper. The most recent draft gives a range of 1.6% to 6.6%. We select the midpoint, 4.1%, as the historical baseline value.

  2. Sentences of the form, “The median expert gives an X% chance,” report the median of experts’ Xth percentile forecasts.

  3. This estimate of 22% reflects the fraction of experts whose median forecast is that AI systems will achieve performance of at least 90% (which we call saturation) on Tiers 1-3 of FrontierMath. We take the average of the proportions calculated under weak and strict inequality.

  4. Raw data: IQR on the 50th percentile was (10.0%-30.0%).

  5. See here for additional context. We ask participants what proportion of LEAP panelists will choose "slow progress," "moderate progress," or "rapid progress" as best matching the general level of AI progress.

  6. See here for background information

  7. Raw data: IQR on the 50th percentile was (-4%-5%)

  8. We have a complementary survey in the field exploring these topics which we plan to release results from in early 2026.

  9. The median forecast for this question was 25%. Pooled distribution: IQR (10%-49%); variance decomposition: 73% between-forecaster disagreement, 27% within-forecaster uncertainty. Raw data: median 25th and 75th percentile forecasts were 10% and 43% respectively.

  10. The median forecast for this question was 60%.

  11. In a future survey wave, we plan to collect forecasts of the predicted relationship between AI capabilities and employment growth in each sector by asking respondents to forecast employment growth conditional on low-, moderate-, and rapid-progress scenarios.

  12. The degree to which progress on Millennium Prize Problems is serial or parallel, as well as the general difficulty of the Problems, complicates this comparison. Eliciting forecasts from multiple experts on consistent forecasting questions with clear resolution criteria helps us bring clarity to debates often plagued by ambiguous definitions.

  13. We use Mann-Whitney U-tests for equality in distribution unless otherwise stated, with a 5% significance threshold. All Mann-Whitney U-tests and Cliff’s δ values are currently unweighted. For full details, see our white paper.

  14. We claim a group predicts statistically significantly less progress according to a Mann-Whitney U-test and a negative Cliff’s δ.

  15. "To gauge the difficulty of FrontierMath problems, we organized a competition at MIT involving around 40 exceptional math undergraduates and subject-matter experts. Participants formed eight teams of four or five members, each with internet access, and had four and a half hours to solve 23 problems. On a subset of 23 tier 1-3 problems, the average team scored 19%, while 35% of the problems were solved collectively across all teams." (Epoch.ai 2025).

  16. This estimate of 22% reflects the fraction of experts whose median forecast is that AI systems will achieve performance of at least 90% (which we call saturation) on Tiers 1–3 of FrontierMath. We take the average of the proportions calculated under weak and strict inequality.

  17. Raw data: IQR on the 50th percentile was (3.0%–20.0%). 90th percentile of median forecast: 44.5.

  18. Raw data: IQR on the 50th percentile was (10.0%–50.0%). 90th percentile of median forecast: 65.4.

  19. Raw data: IQR on the 50th percentile was (30.3%–80.8%). 90th percentile of median forecast: 95.0.

  20. The expert is referring to and quoting from Ansede (2025).

  21. The expert is referring to MacKenzie (1999).

  22. The expert appears to be referring to an August 2025 post by OpenAI researcher Sebastien Bubeck (Bubeck 2025).

Cite Our Work

Please use one of the following citation formats to cite this work.

APA Format

Murphy, C., Rosenberg, J., Canedy, J., Jacobs, Z., Flechner, N., Britt, R., Pan, A., Rogers-Smith, C., Mayland, D., Buffington, C., Kučinskas, S., Coston, A., Kerner, H., Pierson, E., Rabbany, R., Salganik, M., Seamans, R., Su, Y., Tramèr, F., Hashimoto, T., Narayanan, A., Tetlock, P. E., & Karger, E. (2025). The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact (Working paper No. 5). Forecasting Research Institute. Retrieved 2025-12-27, from https://leap.forecastingresearch.org/reports/waves-1-to-3-insights

BibTeX

@techreport{leap2025,
    author = {Murphy, Connacher and Rosenberg, Josh and Canedy, Jordan and Jacobs, Zach and Flechner, Nadja and Britt, Rhiannon and Pan, Alexa and Rogers-Smith, Charlie and Mayland, Dan and Buffington, Cathy and Kučinskas, Simas and Coston, Amanda and Kerner, Hannah and Pierson, Emma and Rabbany, Reihaneh and Salganik, Matthew and Seamans, Robert and Su, Yu and Tramèr, Florian and Hashimoto, Tatsunori and Narayanan, Arvind and Tetlock, Philip E. and Karger, Ezra},
    title = {The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact},
    institution = {Forecasting Research Institute},
    type = {Working paper},
    number = {5},
    url = {https://leap.forecastingresearch.org/reports/waves-1-to-3-insights}
    urldate = {2025-12-27}
    year = {2025}
  }