Whitepaper
36% Average Revenue Lift in A/B Tests of Product Genius Across Millions of Shopper Sessions and 25 Brands
Large Interaction Model (LIM) AI versus Standard E-Commerce Websites
Summary
We performed controlled randomized A/B tests to compare brands’ original (OG) websites against the same websites amplified with Product Genius (PG). The results of the test was 36% average revenue lift across millions of shopper sessions and 25 different brands including Betsey Johnson, a subsidiary of Steve Madden.
Technology
Growing out of $35 million of DARPA research and development, Product Genius created a new approach to training Large Language Models (LLMs) that we call a Large Interaction Model (LIM). LIMs improve upon today’s LLMs by learning continuously. Applied to e-commerce, LIMs optimize for each shopper as they scroll and click, and have continued to rapidly learn from over half a billion shopper interactions.
The rapid learning of a LIM adapts to rapid changes in shopper interests, products, ads, and markets. Rapid learning while a shopper scrolls also means we have eliminated the need for third-party data which is increasingly noncompliant with modern data privacy laws.
Rapid learning also means that the algorithm works without needing to accumulate large quantities of historical user data. Brands can derive full benefit from the technology without needing to have a lot of shopper traffic on the website. The system can be successfully applied to e-commerce websites with as few as ten thousand shopping sessions per month.
Problem Definition
In commerce, brands sell through marketplace platforms such as Amazon.com, but they give up a significant percentage of the profit margin on sales facilitated by a third party. To meet gross margin business targets, this forces most brands to have their own Direct-to-Consumer (D2C) website.
Maintaining and increasing the website’s conversion rate and revenue per session is, for most teams, a constant fire-fight. This study shows that AI can help.
Figure: Large Interaction Model (LIM) interacting with multiple Betsey Johnson shoppers simultaneously on BetseyJohnson.com and also on Google and Meta Ads. The LIM learns continuously while it interacts by iteratively “pre-thinking”.¹
While using a website powered by Product Genius AI, shoppers experience a TikTok-like shopping feed where they see true, useful product and brand information curated by the AI. As they scroll, the AI can tell the shopper what it thinks their interests are so far, and the shopper can also directly tell the AI what they’d like. This demo link shows what shoppers experience. This demo link provides a peek into what the AI is thinking as the shopper interacts.
Betsey Johnson Results
Betsey Johnson turned out to have very fast moving changes in shopper interests, ad campaigns, marketplace dynamics (e.g. competitor pricing), and new product launches. The shifts were also highly abrupt. The AI would provide consistent significant revenue lift for a week, and then suddenly within a few hours without us making any change, the revenue lift would completely disappear.
These abrupt changes seemed to happen faster than we could collect statistics. By the time the change was statistically significant, it was often too late to do much about it. To make matters worse, the lift at any given time was radically different within different traffic segments (e.g. Facebook mobile, traffic from Google search, etc.). Trying to track each of these separate lift silos separately, slimmed down the number of sessions we could average together in order to estimate the lift, making tracking of the lift even more sluggish.
The “ah-ha” was that rather than track statistics and pull levers based on those signals, we needed to trust the AI to optimize its interactions with each shopper and reinforce its policies, before there was any statistically significant feedback. This solved the problem.
We essentially had to modify our learning rate to adapt “faster than the speed of statistics”. The more we have increased the learning rate, the more revenue lift we observed across all Product Genius powered websites.
To plot a single curve for revenue lift, we take the total revenue from Product Genius (PG) sessions and subtract the total revenue from Original Website (OG) sessions. Then we divide by the OG revenue:
Figure: During the first twenty five days of the test on BetseyJohnson.com, as the algorithm learned, the revenue lift climbed from 0% to approximately 13%.
Overall Results: 36% Avg. Revenue Lift Across Millions of Sessions and 25 Brands
By performing controlled A/B tests across twenty five brands, we found an average revenue lift of 36%.
The revenue lift generally starts lower and then increases as the trial proceeds. We used the average of the revenue lift across all days of each A/B test. We did not cherry pick the revenue lift at the end of each test when the AI had learned the most, and the lift was the highest.
36% revenue lift is therefore a conservative estimate of the benefit these brands will experience after the test concluded.
To average the revenue lift from multiple brands together, we started with the revenue lift for each brand and computed a weighted average, weighted by the amount of traffic (number of sessions) flowing through each brand’s website. Larger stores with more sophisticated original websites counted more heavily in the results than smaller stores.
Here are example revenue lift learning curves from the test:
Why Do the Revenue Lift Plots Fluctuate?
The general trend in the tests was “up and to the right.” During the trial, our AI continuously learned and adapted to shoppers and merchant needs. That said, revenue lift varies, similar to the stock market. You may not win every hour, but the goal is to make smart trades that accumulate over time into an increasing advantage.
Insights
Figure: This dashboard shows merchants which content is driving engagement.
Figure: This dashboard shows merchants which content is driving sales.
Who is Product Genius?
Product Genius is built by a team of scientists, engineers, product leaders, and growth operators with deep experience turning advanced AI into real commercial systems.
The company is led by Ben Vigoda, founder and Chief Scientist — a pioneer in deep learning systems and AI hardware. Vigoda started at age 13 at Stanford University in David Rumelhart’s lab adding momentum terms to the back-propagation algorithm. He was an original inventor of the first Tensor Processing Units (deep learning microchips), created the first and only analog deep learning chips, and developed some of the first attention transformer language models. He holds a PhD from MIT, where he has served as a Post-Doctoral Fellow and Visiting Scientist, sits on the DARPA AI horizon scanning committee, and is a Kavli National Academy of Sciences Fellow.
Steve Papa, Board Chairman, brings decades of experience scaling category-defining technology companies. He founded Endeca, which powered more than half of the top e-commerce sites in the U.S. and exited to Oracle for $1.3B, co-founded Toast through its $20B IPO, and has founded and invested in multiple successful enterprise and commerce platforms.
Noah Maffitt, Chief Growth Officer, has repeatedly built and scaled digital commerce businesses. He led a company from $0 to $250M before its acquisition by OfficeDepot.com, grew Office Depot’s digital P&L to $5B post-acquisition, led global digital at Live Nation, and Ogilvy’s Continuous Commerce division.
Together, the company’s core product and technology execution is driven by:
Ryan Healey, VP of Product Management & Customer Outcomes
Thomas Rochais, Senior Research Scientist and R&D Engineer
Matthew Barr, VP of Machine Learning Research
Hugh Morgenbesser, Principal Software Engineer
Jamie Lovette, Full-Stack Software Engineer
David Saginashvili, Software Engineer
This group combines backgrounds in theoretical physics, machine learning research, large-scale software systems, and product delivery. As a team, they are responsible for designing, building, and operating the Large Interaction Model systems that power Product Genius’ shopping experiences.
Product Genius grew out of $35 million dollars DARPA research investment and operates as an AI research-to-product lab — translating advanced modeling and learning systems into production commerce environments.
We are a team of scientists, creatives, and engineers energized by bringing advanced AI into real business impact.
Appendix 1: A/B Test Methodology
Overview
Figure: A/B test splitter
For A/B testing, traffic was divided into two comparable groups: the Product Genius feed (PG) and the original page (OG). The split was performed randomly and could be customized for different traffic categories (e.g., desktop vs. mobile), ensuring parity between the two experiences for visitors until the point the PG feed was displayed.
The key metric was Revenue per Session (pre-tax sales price of total goods). A session was defined by a unique visitor ID in local storage to capture cross-tab activity. Augmenting Shopify order metadata with session information ensured all conversions were accurately attributed to sessions without double-counting. The identical attribution method was applied to both the PG and OG branches for direct and fair comparability.
Future work could include having the algorithm optimize not just for revenue, but for the most profitable revenue. For this experiment, our system did not have access to gross margin information for each product, so the reinforcement signal for the system was, by necessity, revenue per session.
Randomized Split of Traffic Into “PG” versus “OG”
We consider “PG” to be the Product Genius feed being added to a page, with “OG” as the original page without Product Genius treatment.
Our Javascript tag in the front end fetches the config, first sending along vital contextual information including but not limited to UTM sources and customer ids for logged in visitors:
Our backend uses this information to determine the specific configuration to send back to a visitor. The LIM model learns how PG should be configured for individuals coming from different contexts.
The first step is to randomly decide which experimental branch a visitor should experience for A/B testing purposes. This can be set up independently for distinct categories of traffic, e.g. desktop vs. mobile visitors.
Appendix 2: Deployment
Advanced AI shouldn’t be limited to big tech. We’re proud to be partnering with Google Cloud and the Shopify App Store to bring this technology to entrepreneurial brands of all sizes.
Display
For Shopify websites, adding Product Genius on the page is fully automated and requires zero coding or human effort.
For non-Shopify e-commerce websites, the front-end integration takes a web developer less than thirty minutes.
Product Genius also provides the capability to display an algorithm-powered feed on any other website or in any app, such as an iOS or Android phone app. For websites and mobile apps and other display surfaces (for example we are deploying Kiosks in Las Vegas properties), our integration partners provide excellent, streamlined integrations.
Content
Just like TikTok or Instagram, Product Genius feeds can contain content from any source including creator content, user generated content, and AI-generated content. Similarly, it can be just about any kind of content, including videos, text cards, image cards, products, user posts, products, etc.
Our integration partners provide the capability to pull information from virtually any data source.
Merchant Control
The brand’s staff can prompt the system to give it rules / policies for what to include in feeds and what not to include. Policies can also control anything about the feeds, for example who sees or does not see particular content items or types of content.
Shopper Identity
When a user returns to visit a digital surface powered by Product Genius, it picks up where they left off. Our identity tracking can be integrated with other sources of identity information.
——————————————————————————————————————————————————————————————————————
¹ Is this related to reinforcement learning (e.g. GRPO or GEPA)? The Large Interaction Model is more sample efficient. When applied to e-commerce it learns significant improvements that drive revenue lift in less than a dozen interactions. Approaches like GEPA operate on tasks like solving math problems, where (1) the AI gets an unlimited number of tries, and (2) there is a verifiable right answer. Interacting with online shoppers does not conform to either of these assumptions.
Customers Confirmed Revenue and Conversion Rate Lift
© Product Genius AI 2025




























