Track record vs system: how to evaluate a mobile UA agency in 2026

Q: 1. Is the founder still hands-on, or are they only the CEO?

Good answer: “I review every account weekly, sit in test plan reviews, write strategy briefs for new accounts. And I’m hands-on with our AI tooling — the workflows, the prompts, the integrations.”

Q: 5. How many distinct concepts per week are you producing for an account at our scale?

Volume of creative is rarely the constraint. Concept diversity is. Fifty ads from five concepts will saturate the same five audiences.

Track record vs system: how to evaluate a mobile UA agency in 2026

Published by Shamanth at May 4, 2026

Why standard agency vetting fails

The fundamental problem with conventional agency vetting: you are trying to predict future performance from past artifacts that are partially or fully unreliable.

Industry rankings like the AppsFlyer Performance Index tell you about top-performing ad networks. They tell you nothing about whether a specific agency will perform on your specific account — which is the prediction you actually need to make.

Five things most founders look at, and why each one is weaker signal than it appears:

What you’re being shown	What it actually predicts
Case studies (logos, percentages, before/after)	That the agency has a marketing team that can write case studies. Often cherry-picked from the top 10% of accounts.
Client logos	That those brands paid them at some point. Tells you nothing about whether your account will perform.
Years in business / team size	Mostly nothing. Some great agencies are 4 people. Some terrible ones are 40.
Tools / tech stack	Commodity. Almost everyone uses similar attribution, MMP, and creative tools.
Reviews on third-party sites	Often gamed. Negative reviews get suppressed; positive ones come from clients trying to get a discount.

What we see: the artifacts agencies use to sell are not the artifacts that predict whether they will deliver for you specifically.

The shift you need to make: stop evaluating what they have done. Start evaluating how they think and how they work. The first set of signals is curated by them. The second set is harder to fake.

First, get clear on what YOU are hiring for

Before you can evaluate any agency, you have to know what you actually want them to do. This is the step most founders skip, and it makes every later evaluation noisier.

The agencies I hired at Bash Gaming and PuzzleSocial were not generalists. Each had a tight lane. Each ran their lane to complement what we did in-house — not to replace our internal work. Specialization was the entire reason we hired them.

We had our hands full in-house. That is why we brought in agencies — to support us in the lanes we could not staff ourselves, not to do the entirety of our job.

Three honest framings of what you are hiring for, before you start evaluating agencies:

Hiring framing	What you own	What agency owns	Spend level fit
Specific lane (channel / geo / function)	Strategy + measurement	One bounded execution lane	Any spend level
Full-stack mobile UA	Direction + evaluation (requires UA fluency)	Channels + creative + ops + reporting	$50K-$1.5M/month
Hybrid	Strategy + measurement	Creative production OR ops	$200K-$500K/month

What we see: the framing you pick determines which questions matter. Vetting a creative-production specialist on the same dimensions as a full-stack agency makes neither evaluation sharp.

The full-stack option is realistic — but only if YOU have enough understanding of the discipline to evaluate them mid-engagement. If you are hiring full-stack because you do not know UA, you cannot tell good work from bad work.

Specific-lane hires are the easiest to evaluate. Each agency owns one bounded slice. Channel-specific specialists in particular tend to outperform generalists on their channel because the rep counts compound.

Get this framing right and the questions in the next section narrow naturally. Get it wrong and you will be evaluating every agency against a fuzzy bar that is impossible to fail or pass.

The 8 questions that actually predict performance

Each question + why it works + what good and bad answers sound like.

1. Is the founder still hands-on, or are they only the CEO?

Good answer: “I review every account weekly, sit in test plan reviews, write strategy briefs for new accounts. And I’m hands-on with our AI tooling — the workflows, the prompts, the integrations.”

“The agencies that scale in 2026 are the ones whose founders are building the AI layer themselves, not delegating it to a tools team.” Or some specific version of doing the work AND owning the AI infrastructure.

Bad answer: “I am the CEO. I run the business. The team handles execution.” In the age of AI, this is a problem at any agency size — including 200-person shops.

A founder who is not doing some hands-on work is too distant from the front lines. It is unclear whether they will be able and willing to adapt their agency’s workflow to AI tooling, which is now the differentiating capability.

Founder-led without hands-on craft is a sales-led organization with the founder as the rainmaker. The work is delivered by whoever happens to staff your account — and the people building the AI tooling are not the people deciding strategy.

2. What AI tools have you BUILT — not just what AI tools do you use?

This question separates 2026 agencies from agencies that bolted ChatGPT onto a 2022 workflow.

Good answer: “We have built X for competitive ad teardowns, Y for creative variant generation, Z for weekly account reads. Here is how the tooling integrates with our test cadence.” Concrete, specific, defended by the workflow it sits inside.

Proprietary AI tooling is becoming the moat in agency economics. Agencies without their own infrastructure are competing on labor cost — which is a losing race in 2026.

Industry analysis at Mobile Dev Memo has tracked the post-ATT era’s structural shifts in mobile UA economics. The agencies adapting are the ones building tools, not just buying licenses to other people’s tools.

Bad answer: “We use ChatGPT for ideation” or “We use Cursor for the creative team.” Using off-the-shelf AI is table stakes.

Building AI infrastructure that compounds is the actual differentiator. If they cannot describe what they have built and why, they are running last decade’s workflow with a chatbot bolted on.

3. What is your specific lane?

Tied to your “what am I hiring for” framing in the section above.

Good answer: A clear, narrow specialization that they can defend with depth. “We are best at subscription apps spending $200K-$2M/month on Meta and TikTok in Tier 1 English markets.” Specific. Bounded.

Bad answer: “We do everything for everyone at every scale.” This is rarely true. When it is true, you are paying for capacity, not expertise.

4. Walk me through your weekly creative testing cadence.

This is the question that exposes whether they have a system or wing it.

Good answer: Specific. “Every Monday we review last week’s results against the test plan, kill or scale based on these criteria, and brief next week’s tests by Wednesday. Production happens Thursday-Friday. New tests live Sunday.” With named meeting structures, named outputs, named owners.

Bad answer: “We test creatives continuously.” Or worse: “It depends on the client.” Both mean there is no system. The work happens when someone remembers to do it.

For context on what “good” testing cadence looks like at scale, the A/B testing framework we use describes the structural pieces: hypothesis design, kill criteria, learning capture, and weekly cadence rhythm.

5. How many distinct concepts per week are you producing for an account at our scale?

Volume of creative is rarely the constraint. Concept diversity is. Fifty ads from five concepts will saturate the same five audiences.

Good answer: A specific number anchored in the account’s spend level (e.g., “8-12 distinct concepts per week for accounts at $200K+/month, with 2-3 variants per concept”). They distinguish concept count from variant count.

Bad answer: A big-sounding asset count without distinguishing concepts from variants. “We produce 50 creatives a week” tells you nothing about diversity. Volume is cheap. Concept diversity is what compounds — and most agencies blur this distinction on purpose by reporting variant counts as if they were concept counts.

6. When a campaign underperforms, what are your first three actions?

Tests their failure-mode response. Underperformance is when work-quality shows up most clearly.

Good answer: A diagnostic ladder. “First, check whether the underperformance is structural (audience saturation, attribution drop) or creative (the concept itself fatiguing). Second, run the next 3 candidates from the testing queue. Third, redesign the concept brief if the issue is concept-level.”

Bad answer: “We optimize the campaign.” Vague. Or: “We make new creatives.” Reactive, not diagnostic.

7. Show me what your weekly client report looks like — a real one, names redacted.

Tests transparency. The agency that hides behind summary metrics has something to hide.

Good answer: They send the actual artifact. Real numbers, full-funnel cuts, last week’s tests with results, this week’s plan, open questions. You can read it without clarification.

Bad answer: A pretty PDF deck with hand-curated highlights. Or refusal to share until you sign. The reporting they show in a sales process is the BEST version they have. If that version is theater, the working version is theater.

8. What would you NOT take on, and why?

Tests strategic honesty. An agency with a real strategy turns work down.

Good answer: Specific exclusions with reasons. “We don’t take on accounts under $10K/month that also have unclear goals and targets — without baseline clarity, neither side can tell if we’re succeeding.”

“We don’t run web-only attribution clients — we don’t have the depth there.”

Bad answer: “We work with anyone serious.” Or evasive non-answers.

An agency that takes any check that clears does not have a strategy. They have an inbox. Strategic discipline shows up most clearly in what gets refused.

Red flags during pitches and discovery calls

Pattern-match these against the agencies you are evaluating.

Red flag	Underlying problem
Way too slick. The pitch is rehearsed, every objection has a polished response, the deck looks like it was made by a brand agency.	You are talking to the sales machine. The work is done by people you have not met yet. Slick pitches and craftsman work rarely come from the same shop.
Promises specific CPI / ROAS at signing	Either dishonest or naive. Performance depends on factors neither side controls (your product, attribution windows, market conditions). Numbers given before they have account access are sales theater.
Resists giving you full account access	They want you to depend on their reporting. You should own your Meta Ad Manager / Google Ads / TikTok Ads accounts and grant them access — not the other way around.
Talks brands more than methodology	“We worked with [big brand]” is the wrong sentence to lead with. The agency’s actual product is its system. If they can’t describe it, they don’t have one.
More salespeople than analysts on calls	Sales-led organization. The senior person you’re talking to will disappear after signing.
“All-inclusive” pricing with no breakdown	You can’t evaluate value without knowing what you’re paying for. Unbundled pricing is also a forcing function for the agency to know what’s worth what.
Wants long-term contract before work happens	If they were confident in the work, they would let it speak for itself. Long contracts before performance protect them, not you.

What we see: the slickness signal is the most reliable single tell. A pitch that feels too polished usually is.

The best agencies I have worked with — both as a buyer and as a peer — are visibly imperfect in their pitches. They show their work instead of selling around it. They will admit the things they don’t know.

When agency is the wrong answer (and you should hire in-house)

The honest version: agency is not always the right answer. Three thresholds where in-house wins.

Spend level above $1.5M/month sustained. At this scale, the math on agency fees vs. building a 3-5 person internal team usually flips toward in-house. The exception: a specialty agency in a specific lane (creative production, a specific geo) on top of internal account ops.
Vertical complexity that requires deep product knowledge. If your monetization, retention loops, or core mechanics are unusual enough that a UA team needs months to understand them, in-house is faster. Agencies amortize learning across clients; you don’t get that benefit if your account is the only one of its kind.
You have or can hire a senior UA leader. The single biggest predictor of in-house success is whether you have a senior person who has run this work before. Without that person, in-house is a more expensive way to make the same mistakes an agency would have caught. Even with a senior leader, an agency can still complement and support — a specialty agency for creative production, channel ops, or specific geos works alongside a strong internal lead, not as a replacement for one.

The hybrid model — internal lead + agency for creative production or specific channels — is often the right answer between $300K/month and $1.5M/month spend.

Internal owns strategy and measurement. Agency owns specific execution lanes. Building in-house creative capabilities alongside an agency is increasingly common at scaled subscription apps. RocketShip’s consulting and advisory services are designed for this hybrid context — internal teams that need senior advisory without full agency lift.

Spend level	Recommended structure	Why
Under $10K/month	Freelancer	Agency overhead is too heavy at this scale; goals also tend to be too undefined for an agency engagement
$10K-$50K/month	Light-structure agency engagement — only if there’s a clear path to $50K+	Agency expertise without full-service overhead; bring an agency in only when the trajectory justifies it
$50K-$300K/month	Full-service agency OR internal generalist + creative specialist	Need consistent execution, can’t afford internal full-stack team
$300K-$1.5M/month	Hybrid: internal lead + agency for creative or specific channels	Internal lead can direct, agency executes lanes
Over $1.5M/month sustained	In-house team + specialty agency on top	Math on agency fees usually flips toward in-house build

What we see: the right answer depends on your spend level and your team’s UA fluency, not on a generic “use an agency” or “go in-house” prescription.

Frequently asked questions

What should a UA agency cost in 2026?

Most reputable mobile UA agencies charge either a percentage of ad spend (typically 10-20% for full-service) or a fixed retainer ($15K-$60K/month depending on scope). Pure creative-production agencies often charge per asset or per concept.

Avoid agencies that won’t unbundle pricing — you can’t evaluate value without seeing what you’re paying for. See our mobile app UA cost breakdown for full benchmarks.

How long should a UA agency contract be?

Three to six months is standard. Beware of any agency that wants 12+ months before any work has happened. The work should speak for itself.

Build a clear exit clause into whatever contract you sign — usually 30 days notice — so the relationship is mutual choice every month, not a hostage situation.

Should I hire one agency or split between specialists?

Depends on your spend level and internal capability. Below $300K/month, one full-service agency is usually right — coordinating multiple specialists is its own overhead cost.

Above $500K/month with a senior internal lead, splitting is often better: one creative-production specialist + one paid-channels specialist + your internal lead orchestrating.

How do I know if my current agency is performing?

Three signals. First: are CPIs and CACs trending in the direction you agreed at engagement start? Second: is creative diversity expanding (concept count, not variant count) week over week?

Third: are weekly reports diagnostic (test plan, results, next plan) or descriptive (here’s what happened)? Diagnostic reports compound. Descriptive reports do not.

Your attribution data — whether you are on AppsFlyer, Adjust, or another MMP — should be checked monthly against the agency’s reported numbers. If the agency dashboard and the MMP diverge, ask why.

Can I trust an agency with my Meta account access?

Yes, if it’s structured correctly. You own the Meta Business Manager, you own the ad account, you grant the agency admin access.

Never let an agency create the account for you. Never let them be the only admin. The account is your asset; their access is conditional.

What’s the difference between a UA agency and a creative agency?

UA agencies own paid acquisition end-to-end: campaigns, creative, measurement, reporting. Creative agencies produce ad assets but typically don’t run the account.

Many of the best modern setups split: a UA agency or in-house team running paid + a specialist creative agency producing high volume of concepts. The integration matters — both sides need shared test plans and shared learning loops.

Should I work with an agency that handles a competitor?

Yes. The whole point of hiring an agency is access to their cross-client learnings — including learnings from accounts in your category.

An agency that has never run a competitor doesn’t have the pattern recognition you’re paying for. If you’re nervous about competitive data crossing over, the right tool is an NDA — not a blanket no-compete clause.

A well-scoped NDA protects the sensitive specifics — your exact numbers, your account structure, your roadmap — while still letting the agency share the high-level learnings and pattern recognition you hired them for. If you cannot tolerate any overlap at all, hire in-house instead.

How long until I see results from a new UA agency?

The honest answer: 4-8 weeks for first signal, 12+ weeks for trust. The first 4 weeks are setup, learning, and baseline tests.

Weeks 4-8 are when their actual creative testing should start producing results vs. baseline. Anyone promising results in 2 weeks either is buying installs cheap from a low-quality channel or is being dishonest about timeline.

Track record vs system: how to evaluate a mobile UA agency in 2026

Why standard agency vetting fails

First, get clear on what YOU are hiring for

The 8 questions that actually predict performance

1. Is the founder still hands-on, or are they only the CEO?

2. What AI tools have you BUILT — not just what AI tools do you use?

3. What is your specific lane?

4. Walk me through your weekly creative testing cadence.

5. How many distinct concepts per week are you producing for an account at our scale?

6. When a campaign underperforms, what are your first three actions?

7. Show me what your weekly client report looks like — a real one, names redacted.

8. What would you NOT take on, and why?

Red flags during pitches and discovery calls

When agency is the wrong answer (and you should hire in-house)

Frequently asked questions

What should a UA agency cost in 2026?

How long should a UA agency contract be?

Should I hire one agency or split between specialists?

How do I know if my current agency is performing?

Can I trust an agency with my Meta account access?

What’s the difference between a UA agency and a creative agency?

Should I work with an agency that handles a competitor?

How long until I see results from a new UA agency?

Related reading

Shamanth

Related posts

Consolidation vs diversification: how to scale Meta ad creative sustainably and profitably

Four types of mobile ad hooks for apps and subscription apps (2026)

I roasted Ladder’s ad creative strategy: 69 ads reviewed, 3 big opportunities