Wave 1: Headliners

Wave 1 asked participants to forecast the broad speed of AI development, the broad societal impacts from AI, AI performance on a PhD-level math benchmark, autonomous vehicle usage, and how AI will affect jobs.

First released on:
10 November 2025

The following report summarizes responses from 330 experts, in addition to 59 superforecasters and 1360 members of the public collected between Jun 26, 2025 and Aug 16, 2025. Within expert respondents, 74 computer scientists, 75 industry professionals, 65 economists, and 116 research staff at policy think tanks participated.

Our wider website contains more information about LEAP, our Panel, and our Methodology, as well as reports from other waves.

Questions

  • FrontierMath: What will be the highest percentage accuracy achieved by an AI model on FrontierMath, by the following resolution dates? ⬇️

  • Autonomous Vehicle Trips: What percentage of U.S. ride-hailing trips will be provided by autonomous vehicles that are classified SAE Level 4 or above in the years 2027 and 2030? ⬇️

  • Occupational Employment Index: What will the percent change in the number of jobs (compared to Jan 1, 2025) in the U.S. be for white-collar, blue-collar, and service-sector occupations, by the following resolution dates? ⬇️

  • General AI Progress: At the end of 2030, what percent of LEAP panelists will choose "slow progress", "moderate progress", or "rapid progress" as best matching the general level of AI progress? ⬇️

  • Technological Richter Scale: At the end of 2040, what is the probability for AI achieving the following levels of net impact on human society as compared to the impact of past technological events? ⬇️

For full question details and resolution criteria, see below.

Results

In this section, we present each question, and summarize the forecasts made and the reasoning underlying those forecasts. More concretely, we present (1) background material, historical baselines, and resolution criteria; (2) graphs, results summaries, and results tables; (3) rationale analyses and rationale examples. In the first three waves, experts and superforecasters wrote over 600,000 words supporting their beliefs. We analyze these rationales alongside predictions to provide significantly more context on why experts believe what they believe, and the drivers of disagreement, than the forecasts alone.

FrontierMath

Question. What will be the highest percentage accuracy achieved by an AI model on FrontierMath, by the following resolution dates?

Results. Experts predict1 state-of-the-art (SOTA) accuracy on FrontierMath of 31% by the end of 2025,2 55% by the end of 2027,3 and 75% by the end of 2030.4 23% of experts predict that FrontierMath will be saturated by 2030 (>90%), meaning that AI can autonomously solve almost any math problem that resembles a problem a math PhD student might spend multiple days completing. While experts and superforecasters predict similarly, the public expects much less progress, predicting 27% by 2025,5 38% by 2027,6 and 50% by 2030725p.p. less than experts. Since eliciting forecasts, in Aug 2025 Gemini 2.5 surpassed o4-mini,8 scoring 29+/-3%, 10p.p. higher than the previous SOTA of 19%.

FrontierMath. The figure above shows the median 50th percentile (as well as 25th and 75th percentiles when applicable) forecasts by participant group.
FrontierMath. The figure above shows pooled probability distributions. We estimate each forecaster’s full probability distribution from their 25th, 50th, and 75th percentile forecasts, by fitting the cumulative density function of an appropriate distribution (i.e., beta or gamma distribution) to the observed forecasts using nonlinear least squares. We then sample from these distributions and plot the aggregated distribution for each forecaster category.

Rationale analysis:

  • Recent progress: High-forecast respondents often cite impressive recent trends: “Evaluated scores across benchmark types have tended to jump over 3 year periods. For example: MATH from 7% in 2021 to 72% in 2023 and 90% in 2024.” Another observes, “The historical data show a very rapid increase in the accuracy [on FrontierMath] over a time period just short of a year. In that time the accuracy increased from 0% to 19%.”11
  • Trajectory: High-forecast respondents frequently emphasize likely continued progress: “We've seen jumps of around 5 points on this benchmark every couple of months12 so far and these jumps will only accelerate as scores approach 50% (benchmark scores tend to be roughly sigmoid-shaped over time).” Among low-forecast respondents, a common sentiment is that “the fastest…progress is behind us and we are now approaching the flat/end point portion of the S-Curve of advancement.”​
  • Test-time compute: High-forecast respondents often highlight potential gains from scaling inference compute: “The current top scorer is a reasoning model, and the reasoning model paradigm is relatively new; this suggests that rapid improvements are likely as the paradigm evolves.” Another noted, “with very large amounts of inference compute, it's possible that o3 or o4-mini could already get well over 30% today.” Many low-forecast respondents, however, are skeptical that inference scaling will be sufficient to overcome fundamental architectural limitations.
  • Data: High-forecast respondents tend to believe that labs will have the training data they need to drive improvements. One notes that an “increased use of synthetic data…will likely lead to broader problem-solving capabilities,” and another that, “by 2031, the frontier labs will have enormous amounts of math data if they believe it's valuable and the RL process will solve the problem.” Low-forecast respondents are more skeptical that training data will be suitable for the task: “One of the key bottlenecks is the scarcity of high-quality, human-generated mathematical data.”
  • Architecture suitability: Many high-forecast respondents highlight math's suitability for current architectures. One states, “Existing LLMs are well-suited to make progress in math because math is about logical pattern matching, can be well-represented by text, and the answers are typically verifiable allowing for easy rewards.” Low-forecast respondents express considerable doubt that the current architecture will be sufficient. One mathematician states: “I am deeply skeptical of LLMs ever being able to excel at complex math. LLMs are linguistic pattern-replication machines, and advanced math requires a high degree of conceptual creativity and flexibility that does not derive from linguistic patterns.”
  • Incentives: Some high-forecast respondents expect sustained investment due to prestige and R&D value. One notes, “Math is highly relevant to many R&D domains, so progress in math has been, and is highly likely to continue to be, a focus for leading AI companies.” But many low-forecast respondents question that assumption: “Performance from 2028 to 2031 entirely depends on whether or not the frontier AI labs believe it's economically useful to continue improving performance on this benchmark,” writes one, with another bluntly concluding, “I don’t think current companies are focused on solving math​.”

Autonomous Vehicle Trips

Question. What percentage of U.S. ride-hailing trips will be provided by autonomous vehicles that are classified SAE Level 4 or above in the years 2027 and 2030?

Results. Experts predict that 7% of U.S ride-hailing trips in 2027 will be provided by autonomous vehicles that are classified as SAE Level 4 or above,13 and 20% in 2030,14 up from our estimate of 0.27% in Q4 2024. Experts have a long tail of prediction, with the median expert assigning a 25% chance to more than 35% AV use in 2030. Superforecasters and the public forecast a much lower autonomous vehicle share, predicting 2%15 and 5%16 respectively in 2027, and 8%17 and 12%18 respectively in 2030.

Autonomous Vehicle Trips. The figure above shows the median 50th percentile (as well as 25th and 75th percentiles when applicable) forecasts by participant group.
Autonomous Vehicle Trips. The figure above shows pooled probability distributions. We estimate each forecaster’s full probability distribution from their 25th, 50th, and 75th percentile forecasts, by fitting the cumulative density function of an appropriate distribution (i.e., beta or gamma distribution) to the observed forecasts using nonlinear least squares. We then sample from these distributions and plot the aggregated distribution for each forecaster category.

Rationale analysis:

  • Technology readiness: High-forecast respondents commonly emphasize that Level 4 technology “has a proven, real-world track record and is already being commercially deployed,” suggesting that the fundamental technical challenges have mostly been solved.​ In particular, many high-forecast respondents point to Waymo's demonstrated safety record, noting, “Studies have demonstrated that its vehicles had far fewer airbag deployment crashes and injury-causing crashes as compared to human drivers.” Conversely, a large number of low-forecast respondents emphasize that autonomous driving “is a tech that is absolutely notorious historically for overpromising and snail-like progress” and argue the current technology “is not scalable because it requires lots of case-by-case optimization for a particular region (down to individual intersections).”
  • Data flywheel: High-forecast respondents frequently highlight Waymo's exponential expansion, observing, “Waymo is currently more-than-doubling every year,” which they believe will result in a data flywheel: “Broader deployment will generate more data, which in turn enhances safety—creating a positive feedback loop.” Another states: “Historically, when a technology finally gets to be used in the wild, it improves very rapidly.” Low-forecast respondents often acknowledge progress but stress constraints, arguing that “progress in Phoenix or Miami does not generalize easily to New York, Boston, or Chicago.”​
  • Economic viability: Many high-forecast respondents argue that cost advantages will drive adoption, stating that “once AV ride-hailing proves substantially safer and cheaper, adoption could accelerate sharply.” Low-forecast respondents tend to focus on higher near-term costs, with one noting that “Waymo cars are still 30-40% more expensive compared to Uber and Lyft,” and that “AVs still require high capital expenditure, vehicle downtime, and supervision costs.”
  • Tesla: Some high-forecast respondents see Tesla as a game-changer, arguing that “Tesla alone could drive explosive growth if their FSD [Full Self-Driving] vision model is successful in select markets, as their capex is just the mass-produced vehicle cost,” and that Tesla's manufacturing advantage over Waymo could enable rapid scaling. Low-forecast respondents largely remain skeptical, with one predicting, “it is well under 25% likely that Tesla will achieve its dream of all of their 2m cars/year being capable of serving as robotaxis straight out of the factory.”
  • Regulatory environment: Some high-forecast respondents believe regulatory barriers will diminish, with a few noting that the Trump administration's anti-regulatory stance may help to speed things up. Low-forecast respondents emphasize fragmented regulation, arguing “the U.S. will continue to have a patchwork of rules on autonomous vehicles,” and that “investors are less likely to invest in areas that present less regulatory certainty.”
  • Geographic and weather constraints: High-forecast respondents often downplay geographic limitations, with one noting that even cities like Boston have “more simple grid-like” streets than some might expect. By contrast, low-forecast respondents tend emphasize both weather and geography constraints, noting initial deployment has been “in cities with no or little ice and snow,” and arguing this creates “a chicken and egg problem where they need to train in [icy and snowy] conditions to get better, but [doing so is] really risky and unpredictable.”
  • Social and labor resistance: Some high-forecast respondents believe “consumer acceptance is likely to rise with exposure,” while some low-forecast respondents warn that “the cities where taxis are most used are also likely to have strong unions and/or lobbying efforts to limit the damage of job losses.” They note growing concern about “the replacement of blue collar labor with more automation,” and predict that “ride share drivers aren't going to go down without a fight.”

Occupational Employment Index

Question. What will the percent change in the number of jobs (compared to Jan 1, 2025) in the U.S. be for white-collar, blue-collar, and service-sector occupations, by the following resolution dates?

Results. Experts, the public, and superforecasters generally do not expect significant job losses for blue-collar or service-sector workers by 2027 or 2030. However, the median expert predicts 2% growth in white-collar jobs between January 2025 and December 2030. This is significantly slower than the pre-existing trend, which predicts 6.8% growth. 25% of experts predict more than 5% white-collar job gain (close to baseline), meaning over 75% of experts predict slower growth than recent trends. Moreover, 25% of experts predict more than 4% white-collar job loss by 2030.

Occupational Employment Index. The figure above shows the distribution of forecasts by participant group, illustrating the median (50th percentile) and interquartile range (25th–75th percentiles) of each forecast.

Rationale analysis:

  • White-collar jobs: Most high-forecast respondents (those predicting more growth) believe white-collar employment will adapt rather than collapse: “Revolutionary technologies rarely deliver the job losses that people fear,” writes one, and another, “AI adoption can engender task-level displacement, but the broader digital ecosystem…generates complementary roles that absorb displaced workers and create net employment gains.” Low-forecast respondents (those predicting more losses) frequently express the belief that AI's speed and cognitive focus make this disruption different: “I am most pessimistic about white-collar jobs due to two developments - the tech sector becoming an increasingly bigger proportion of the white collar sector and AI systems making sufficient progress to replace a small but significant proportion of jobs in the tech sector.”
  • Blue-collar jobs: Forecasters broadly agree blue-collar jobs will be the most resilient to automation. High-forecast respondents particularly emphasize physical complexity barriers: “Blue collar jobs involving complex physical manipulation will likely be among the last job categories to be fully automated by AI. Humans excel at intricate physical tasks requiring dexterity, problem-solving in unpredictable environments, and adaptability.” Even among forecasters who foresee long-term losses, many acknowledge that “blue-collar occupations are likely resilient to AI trends short of advanced post-AGI robotics.” Infrastructure investment is commonly thought to be another factor bolstering blue-collar work: “There will be significant investment as part of the AI revolution (data centers, research, new types of companies being spun out), which will create new other jobs across the spectrum.”
  • Service-sector jobs: Views diverge on service sector outcomes. Some emphasize that “people have a stronger intrinsic preference for many of these roles to be done by humans,” and that there are demographic tailwinds: “Pink collar will increase because of longevity and old-age care.” Others focus on automation vulnerability, particularly in customer service.
  • Demand elasticity: Experts are split on whether AI productivity gains will increase or decrease employment. High-forecast respondents invoke economic theory: “A correct prediction depends upon predicting the effects of the Jevons Paradox (where price declines lead to increased purchases) as well as wealth effects (where efficiency boosts overall wealth increasing consumption).” Several low-forecast respondents argue elasticity won't offset displacement: “Even for software, elasticity is probably not high enough to increase the number of SWEs if the number needed to produce software falls by 1000x.”
  • Recent trends: Low-forecast respondents often emphasize current evidence: “From Intel to Microsoft, many top executives and management staff were laid off to make room for other investments at the organization. Google laid off 10% of its managerial staff last December.” Another notes: “In October 2024, S&P Global claimed that one in every four American workers that lost their jobs last year had worked in professional and business services.”19

General AI Progress

Question. At the end of 2030, what percent of LEAP panelists will choose "slow progress", "moderate progress", or "rapid progress" as best matching the general level of AI progress?

Results. Experts, superforecasters and the public, all modally expect21 a “moderate” progress scenario. Most of the difference between these groups is in the weight they put on a “rapid” progress scenario, with superforecasters estimating 14%,22 experts 23%,23 and the public 26%.24 This is one of the few cases where the public expects faster progress than experts.

General AI Progress. Participants estimated what proportion of LEAP panelists will choose “slow progress,” “moderate progress” or “rapid progress” as best matching the general level of AI progress in 2030 (and the standard deviation of responses). This figure shows the mean percent on each scenario (split by color) and by participant group (y axis).

Rationale analysis:

  • Scaling: Rapid-progress forecasters typically cite consistent capability improvements: “historically AI system development has followed a steep scaling curve and increases in model size-data and compute have led to rapid capability gains.” One points out that “METR [Model Evaluation and Threat Research] results imply a roughly 4 to 10x improvement in time horizon every year, which means that we'll have systems capable of doing weeks or months of work by the late 2020s.”25 Many rapid-progress forecasters also argue that AI progress has been consistently underestimated, and that “rapid progress is quite likely due to the [...] possibility of AI improving themselves.” Slow-progress forecasters often argue that current AI paradigms are likely insufficient: “For any of the moderate and rapid progress criteria to be met there would need to be a massive paradigm shift in AI technology. LLMs are unlikely to achieve these goals with the incremental progress made over the last 2-5 years.”
  • Physical world capabilities: Robotics is the strongest consensus limitation: “As a roboticist I have serious doubts about how quickly the embodied aspect of AI will make progress due to real-world implementation issues (sensors, communication delays, etc.) regardless of how good the brain is.” Many slow-progress forecasters also believe that historical precedent suggests caution is merited: “Having end to end task completion ability is hard as can be told from the development of autonomous vehicles (15+ years of R&D and 100+ billions of dollars invested and we are still scaling L4 robotaxis today).” Most moderate-progress forecasters also acknowledge this challenge.
  • Input constraints: Slow-progress forecasters often highlight the need for better training data, the cost of compute, and especially energy: “I expect energy to be the chief bottleneck to AI progress such that it will be a rate-limiter for progress in general.” Some forecasters expecting rapid progress, however, argue that such constraints can be eliminated by massive investment in AI, driven by corporate competition and national security incentives.
  • Reliability: A common consideration among slow-progress forecasters is that “[reliability] will be a difficult bar to clear for advanced-capability systems given current cognitive limitations.” They tend to emphasize that barriers like robust generalization, complex symbolic reasoning, and end-to-end safe autonomy for complex tasks will likely hold “superhuman across the board” scenarios back.

Technological Richter Scale

Question. At the end of 2040, what is the probability for AI achieving the following levels of net impact on human society as compared to the impact of past technological events?

Results. Experts modally expect26 that the impact of AI by 2040 will be comparable to a “technology of the century” (akin to electricity or automobiles), while the public expects AI’s impacts to be closer to the “technology of the decade” (more like social media). Experts give a 32% chance that AI will be at least as impactful as a “technology of the millennium”—like the printing press or the Industrial Revolution—whereas the public gives this a 22% chance. Superforecasters sit between experts and the public.

Technological Richter Scale. In this question, participants estimate the probability of AI achieving various levels of societal impact. This figure shows the mean probability assigned to each level.

Rationale analysis:

  • Intelligence as a transformative force: Many high TRS forecasters argue that, because AI will augment and eventually surpass human intelligence, its transformative potential is unique and eclipses all prior transformative technologies. “Intelligence is one of the most disruptive phenomena known to history,” writes one. High TRS forecasts also tend to argue that progress in intelligence builds on itself: “People fundamentally don't think in exponentials. 2040 is a LONG time away, technologically. And AI will modify AI, at which point its improvement will go even more second-order.”
  • Current societal impact: Most high TRS forecasters believe AI has already surpassed levels 5 and 6. Many argue level 7 has already been met given its use in specialized fields (e.g., medical imaging, SWE, text editing, etc.) and its level of diffusion. Some even argue it can be credibly claimed that advanced capitalist societies have already reached level 8, given that “AI is strongly affecting geo-political considerations, changing energy demand and planning, disrupting education, [and] accelerating biomedical research.” Many argue that even a slow, linear extrapolation of current trendlines suggests level 8 or higher by 2040, especially given that much of the existing potential of this new technology has yet to be adopted by society. Low TRS forecasters often view current AI as roughly level 5 or 6, essentially “an extension of the Internet.”
  • Bottlenecks: The main argument from low TRS forecasters is that progress will be substantially slowed by implementation constraints that are difficult to overcome with software improvements. One notes, “There likely exist thousands (millions?) of potential bottlenecks in the economy which will only become legible as other processes are sped up by orders of magnitude.” Low TRS forecasters expect that many of these bottlenecks will be input bottlenecks, like energy, production capacity, and chip availability.
  • Diffusion speed: High TRS forecasters frequently emphasize AI's rapid adoption rate compared to historical technologies, noting that it is “much faster than it was for prior technologies.” Several low TRS forecasters, however, argue that historical precedent remains relevant: “The integrated circuit took about 15 years to change electronics. Computers took about 25 years. The Internet took 15 years to produce the WWW and another 10 or 15 years to change lives.” Some low TRS forecasters also argue that, even if AI is proceeding on a faster diffusion timeline than other transformative technologies, the higher TRS levels require not only technical breakthroughs, but also changes to our political, economic, regulatory, and cultural structures.
  • Economic transformation: Some high TRS forecasters see AI fundamentally restructuring society. One writes, “[Just as] the industrial revolution helped usher in capitalism, because it was an economic system that was compatible with that type of transformation, AI—being a technology that has the potential to replace human labor in most fields—might force societies to shift to a new economic model, which would place it at level 9.” But most low TRS forecasters question whether AI will deliver enough tangible benefits to lead to such a transformation: “The average citizen will not have much benefit to buy from AI. Improved games or art? Cheaper manufactured goods? A robot to clean your house? How does AI deliver things that humans want, like better, cheaper healthcare?”

Footnotes

  1. Unless otherwise stated, when stating what a group “predicts,” we are stating what the median member of that group predicts.

  2. Raw data: IQR on the 50th percentile was (27.3%–37.5%); median 25th and 75th percentile forecasts were 23.3% and 41.3% respectively.

  3. Raw data: IQR on the 50th percentile was (41.3%–70.0%); median 25th and 75th percentile forecasts were 38.0% and 70.0% respectively.

  4. Raw data: IQR on the 50th percentile was (60.0%–90.0%); median 25th and 75th percentile forecasts were 51.6% and 93.0% respectively.

  5. Raw data: IQR on the 50th percentile was (22.0%–35.0%); median 25th and 75th percentile forecasts were 20.0% and 35.0% respectively.

  6. Raw data: IQR on the 50th percentile was (28.0%–50.0%); median 25th and 75th percentile forecasts were 28.0% and 50.0% respectively.

  7. Raw data: IQR on the 50th percentile was (35.0%–70.0%); median 25th and 75th percentile forecasts were 36.0% and 65.0% respectively.

  8. https://epoch.ai/frontiermath

  9. In some cases, the "aggregate" refers to the mean; in others, the median is used, depending on which is more appropriate for the distribution of responses. 2 3 4 5

  10. We occasionally elicit participants' quantile forecasts (estimates of specific percentiles of a continuous outcome) to illustrate the range and uncertainty of their predictions. 2 3 4 5

  11. This was true at the time the expert completed the survey.

  12. Actual progress was marginally slower. The top Tier 1-3 accuracy rate rose from 1.03% in June of 2024 to 29% in August of 2025, where it remained as of the publication of this paper.

  13. Raw data: IQR on the 50th percentile was (3.0%–15.0%); median 25th and 75th percentile forecasts were 2.3% and 15.0% respectively.

  14. Raw data: IQR on the 50th percentile was (10.0%–40.0%); median 25th and 75th percentile forecasts were 8.0% and 35.0% respectively.

  15. Raw data: IQR on the 50th percentile was (1.0%–5.0%); median 25th and 75th percentile forecasts were 1.0% and 4.2% respectively.

  16. Raw data: IQR on the 50th percentile was (1.6%–13.3%); median 25th and 75th percentile forecasts were 2.0% and 8.6% respectively.

  17. Raw data: IQR on the 50th percentile was (3.0%–25.0%); median 25th and 75th percentile forecasts were 4.0% and 20.0% respectively.

  18. Raw data: IQR on the 50th percentile was (5.0%–29.0%); median 25th and 75th percentile forecasts were 5.0% and 20.0% respectively.

  19. The October 2024 S&P Global report referred to job losses over the course of the first nine months of 2024 (S&P Global 2024).

  20. As per the footnote above, the October 2024 S&P Global report referred to job losses over the course of the first nine months of 2024.

  21. Expect LEAP panelists to choose as best-matching the general level of AI progress.

  22. Raw data: IQR on the 50th percentile was (5.0%–19.0%). 90th percentile of median forecast: 35.0

  23. Raw data: IQR on the 50th percentile was (10.0%–30.0%). 90th percentile of median forecast: 50.0

  24. Raw data: IQR on the 50th percentile was (12.0%–35.0%). 90th percentile of median forecast: 50.0

  25. Actual rate of improvement likely falls within this range, especially given recent acceleration trends, but there is considerable uncertainty and domain variability. See Kwa et al. (2025) for more information.

  26. Expect as best-matching the level of societal impact from AI.

Cite Our Work

Please use one of the following citation formats to cite this work.

APA Format

Murphy, C., Rosenberg, J., Canedy, J., Jacobs, Z., Flechner, N., Britt, R., Pan, A., Rogers-Smith, C., Mayland, D., Buffington, C., Kučinskas, S., Coston, A., Kerner, H., Pierson, E., Rabbany, R., Salganik, M., Seamans, R., Su, Y., Tramèr, F., Hashimoto, T., Narayanan, A., Tetlock, P. E., & Karger, E. (2025). The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact (Working paper No. 5). Forecasting Research Institute. Retrieved 2025-12-17, from https://leap.forecastingresearch.org/reports/wave1

BibTeX

@techreport{leap2025,
    author = {Murphy, Connacher and Rosenberg, Josh and Canedy, Jordan and Jacobs, Zach and Flechner, Nadja and Britt, Rhiannon and Pan, Alexa and Rogers-Smith, Charlie and Mayland, Dan and Buffington, Cathy and Kučinskas, Simas and Coston, Amanda and Kerner, Hannah and Pierson, Emma and Rabbany, Reihaneh and Salganik, Matthew and Seamans, Robert and Su, Yu and Tramèr, Florian and Hashimoto, Tatsunori and Narayanan, Arvind and Tetlock, Philip E. and Karger, Ezra},
    title = {The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact},
    institution = {Forecasting Research Institute},
    type = {Working paper},
    number = {5},
    url = {https://leap.forecastingresearch.org/reports/wave1}
    urldate = {2025-12-17}
    year = {2025}
  }