IBM Research Customer Wallet and Opportunity Estimation: Analytical Approaches and Applications Saharon Rosset Tel Aviv University (work done at IBM T. J. Watson Research Center) Collaborators: Claudia Perlich, Rick Lawrence, Srujana Merugu, et al. 2007 IBM Corporation IBM Research Project evolution and my roles Business problem definition Modeling problem definition Targeting, Sales force mgmt. Wallet /

opportunity estimation Model generation & validation Programming, Simulation, IBM Wallets 2 Statistical problem definition Modeling methodology design Quantile est., Latent variable est. Quantile est., Graphical model Implementation

& application development Minor role Major role OnTarget, MAP Leading contributor 2006 IBM Corporation IBM Research Outline Introduction Business motivation and different wallet definitions Modeling approaches for conditional quantile estimation Local and global models Empirical evaluation MAP (Market Alignment Program) Description of application and goals The interview process and the feedback loop Evaluation of Wallet models performance in MAP 3

2006 IBM Corporation IBM Research What is Wallet (AKA Opportunity)? Total amount of money a company can spend on a certain category of products. Company Revenue IT Wallet IBM Sales IBM sales IT wallet Company revenue 4 2006 IBM Corporation IBM Research Why Are We Interested in Wallet? Customer targeting OnTarget Focus on acquiring customers with high wallet Evaluate customers growth potential by combining wallet estimates and sales history For existing customers, focus on high wallet, low share-of-wallet customers

Sales force management Make resource assignment decisions MAP 5 Concentrate resources on untapped Evaluate success of sales personnel and sales channel by share-of-wallet they attain 2006 IBM Corporation IBM Research Wallet Modeling Problem Given: customer firmographics x (from D&B): industry, emloyee number, company type etc. customer revenue r IBM relationship variables z: historical sales by product IBM sales s Goal: model customer wallet w, then use it to predict present/future wallets No direct training data on w or information about its distribution! 6

2006 IBM Corporation IBM Research Historical Approaches within IBM Top down: this is the approach used by IBM Market Intelligence in North America (called ITEM) Use econometric models to assign total opportunity to segment (e.g., industry geography) Assign to companies in segment proportional to their size (e.g., D&B employee counts) Bottom up: learn a model for individual companies Get true wallet values through surveys or appropriate data repositories (exist e.g. for credit cards) Many issues with both approaches (wont go into detail) We would like a predictive approach from raw data 7 2006 IBM Corporation IBM Research Traditional Approaches to Model Evaluation Evaluate models based on surveys Cost and reliability issues

Evaluate models based on high-level performance indicators: Do the wallet numbers sum up to numbers that make sense at segment level (e.g., compared to macroeconomic models)? Does the distribution of differences between predicted Wallet and actual IBM Sales and/or Company Revenue make sense? In particular, are the % we expect bigger/smaller? Problem: no observation-level evaluation 8 2006 IBM Corporation IBM Research Proposed Hierarchical IT Wallet Definitions TOTAL: Total customer available IT budget Probably not quantity we want (IBM cannot sell it all) SERVED: Total customer spending on IT products covered by IBM Share of wallet is portion of this number spent with IBM? REALISTIC: IBM sales to best similar customers This can be concretely defined as a high percentile of: P(IBM revenue | customer attributes) Fits typical definition of opportunity? TOTAL

REALISTIC SERVED TOTAL 9 SERVED REALISTIC 2006 IBM Corporation IBM Research An Approach to Estimating SERVED Wallets Company firmographics SERVED Wallet IT spend with IBM Historical relationship with IBM Wallet is unobserved, all other variables are Two families of variables --- firmographics and IBM relationship are conditionally independent given wallet

We develop inference procedures and demonstrate them Theoretically attractive, practically questionable (Will not discus further) 10 2006 IBM Corporation IBM Research REALISTIC Wallet: Percentile of Conditional Distribution of IBM sales to the customer given customer attributes: s|r,x,z ~ f,r,x,z E.g., the standard linear regression assumption: s | x, r , z ~ N (x r z, 2 ) E(s|r,x,z) REALISTIC What we are looking for is the pth percentile of this distribution 11 2006 IBM Corporation IBM Research

Estimating Conditional Distributions and Quantiles Assume for now we know which percentile p we are looking for First observe that modeling well the complete conditional distribution P(s|r,x,z) is sufficient If have good parametric model and distribution assumptions can also use it to estimate quantiles E.g.: linear regression under linear model and homoskedastic iid gaussian errors assumptions Practically, however, may not be good idea to count on such assumptions Especially not a gaussian model, because of statistical robustness considerations 12 2006 IBM Corporation IBM Research Modeling REALISTIC Wallet Directly REALISTIC defines wallet as pth percentile of conditional of spending given customer attributes Implies some (1-p)% of the customers are spending full wallet with IBM Two obvious ways to get at the pth percentile: Estimate the conditional by integrating over a neighborhood of similar customers

Take pth percentile of spending in neighborhood Create a global model for pth percentile Build global regression models 13 2006 IBM Corporation IBM Research Local Models: K-Nearest Neighbors Universe of IBM customers with D&B information Design distance metric, e.g.: Target company i sp en d Similar IBM relationship Neighborhood sizes (k): Employees Neighborhood size has significant

effect on prediction quality IB M Similar employees/revenue Industry Same industry Neighborhood of target company Quantile of firms in the neighborhood Frequency Prediction: Wallet Estimate IBM Sales 14 2006 IBM Corporation IBM Research Global Estimation: the Quantile Loss Function Our REALISTIC wallet definition calls for estimating the

pth quantile of P(s|x,z). Can we devise a loss function which correctly estimates the quantile on average? Answer: yes, the quantile loss function for quantile p. if y y p ( y y ) L p ( y, y ) (1 p) ( y y ) if y y This loss function is optimized in expectation when we correctly predict REALISTIC: arg min y E ( L p ( y, y ) | x) p th quantileof P( y | x) 15 2006 IBM Corporation IBM Research Some Quantile Loss Functions 4 p=0.8 0 1

2 3 p=0.5 (absolute loss) -3 -2 -1 0 1 2 3 Residual (observed-predicted) 16 2006 IBM Corporation IBM Research Quantile Regression

Squared loss regression: Estimation of conditional expected value by minimizing sum of n squares 2 min (s i f ( zi , xi , )) i 1 Quantile regression: Minimize Quantile loss: min n L p ( si , f ( zi , xi , )) i 1

if y y p ( y y ) L p ( y, y ) (1 p) ( y y ) if y y quantile regression loss function Implementation: assume linear function in some representation y = t f(x,z), solution using linear programming Linear quantile regression package in R (Koenker, 2001) 17 2006 IBM Corporation IBM Research Quantile Regression Tree Local or Global? Motivation: Identify a locally optimal definition of neighborhood Inherently nonlinear Adjustments of M5/CART for Quantile prediction: Predict the quantile rather than the mean of the leaf Empirically, splitting/pruning criteria do not require adjustment Industry = Banking

no yes Sales<100K Frequency Frequency yes no IBM Rev 2003>10K Wallet Estimate Wallet Estimate yes no IBM Sales 18

Wallet Estimate IBM Sales Frequency Frequency IBM Sales Wallet Estimate 2006 IBM Corporation IBM Sales IBM Research Aside: Log-Scale Modeling of Monetary Quantities Due to exponential, very long tailed typical distribution of monetary quantities (like Sales and Wallet), it is typically impossible to model them on original scale, because e.g.: Biggest companies dominate modeling and evaluation Any implicit homoskedasticity assumption in using fixed loss function is invalid Log scale is often statistically appropriate, for example if % change is likely to be homoskedastic Major issue: models ultimately judged in dollars,

not log-dollars 19 2006 IBM Corporation IBM Research Empirical Evaluation: Quantile Loss Setup Four domains with relevant quantile modeling problems: direct mailing, housing prices, income data, IBM sales Performance on test set in terms of 0.9th quantile loss Approaches: Linear quantile regression, Q-kNN, Quantile trees, Bagged quantile trees, Quanting (Langrofd et al. 2006 -- reduces quantile estimation to averaged classification using trees) Baselines Best constant model Traditional regression models for expected values, adjusted under Gaussian assumption (+1.28) 20 2006 IBM Corporation IBM Research Performance on Quantile Loss

Conclusions Standard regression models are not competitive If there is a time-lagged variable, LinQuantReg is best Otherwise, bagged quantile trees (and quanting) perform best Q-kNN is not competitive 21 2006 IBM Corporation IBM Research Residuals for Quantile Regression Total positive holdout residuals: 90.05% (18009/20000) 22 2006 IBM Corporation IBM Research Market Alignment Project (MAP): Background MAP - Objective: Optimize the allocation of sales force Focus on customers with growth potential Set evaluation baselines for sales personal MAP Components: Web-interface with customer information

Analytical component: wallet estimates Workshops with Sales personal to review and correct the wallet predictions Shift of resources towards customers with lower wallet share 23 2006 IBM Corporation IBM Research MAP Tool Captures Expert Feedback from the Client Facing Teams MAP interview process all Integrated and Aligned Coverages MAP Interview Team Client Facing Unit (CFU) Team Insight Delivery and Capture Web Interface Wallet models: Predicted Opportunity Transaction Data

D&B Data Expert validated Opportunity Resource Assignments Analytics and Validation Data Integration Post-processing The objective here is to use expert feedback (i.e. validated revenue opportunity) from from last years workshops to evaluate our latest opportunity models 24 2006 IBM Corporation IBM Research MAP Workshops Overview Calculated 2005 opportunity using naive Q-kNN approach 2005 MAP workshops Displayed opportunity by brand

Expert can accept or alter the opportunity Select 3 brands for evaluation: DB2, Rational, Tivoli Build ~100 models for each brand using different approaches Compare expert opportunity to model predictions Error measures: absolute, squared Scale: original, log, root 25 2006 IBM Corporation IBM Research Initial Q-kNN Model Used Distance metric Identical Industry Universe of IBM customers with D&B information Euclidean distance on size (Revenue or employees) Median of the non-zero neighbors (Alternatives Max, Percentile) Post-Processing

Floor prediction by max of last 3 years revenue 26 Target company i Employees Re ve nu e Prediction Industry Neighborhood sizes 20 Neighborhood of target company 2006 IBM Corporation IBM Research Expert Feedback (Log Scale) to Original Model (DB2) 20

Experts accept opportunity (45%) 18 16 Increase (17%) Expert Feedback 14 12 Experts change opportunity (40%) 10 Decrease (23%) 8 6 4 2 0 0 2

4 6 8 10 12 14 16 18 20 Experts reduce opportunity to 0 (15%) MODEL_OPPTY 27 2006 IBM Corporation IBM Research

Observations Many accounts are set for external reasons to zero Exclude from evaluation since no model can predict this Exponential distribution of opportunities Evaluation on the original (non-log) scale suffers from huge outliers Experts seem to make percentage adjustments Consider log scale evaluation in addition to original scale and root as intermediate Suspect strong anchoring bias, 45% of opportunities were not touched 28 2006 IBM Corporation IBM Research Evaluation Measures Different scales to avoid outlier artifacts Original: e = model - expert Root: e = root(model) - root(expert) Log:

e = log(model) - log(expert) Statistics on the distribution of the errors Mean of e2 Mean of |e| Total of 6 criteria 29 2006 IBM Corporation IBM Research Model Comparison Results We count how often a model scores within the top 10 and 20 for each of the 6 measures: Model 30 Rational DB2 Tivoli Displayed Model (kNN) 6

6 4 5 6 6 Max 03-05 Revenue 1 1 0 3 1 4 Linear Quantile 0.8 5 6

2 4 3 5 Regression Tree 1 3 2 4 1 2 Q-kNN 50 + flooring 2 3

6 6 4 6 Decomposition Center 0 0 3 5 0 4 Quantile Tree 0.8 0 1 2

4 1 4 (Anchoring) (Best) 2006 IBM Corporation IBM Research MAP Experiments Conclusions Q-kNN performs very well after flooring but is typically inferior prior to flooring 80th percentile Linear quantile regression performs consistently well (flooring has a minor effect) Experts are strongly influenced by displayed opportunity (and displayed revenue of previous years) Models without last years revenue dont perform well Use Linear Quantile Regression with q=0.8 in MAP 06 31

2006 IBM Corporation IBM Research MAP Business Impact MAP launched in 2005 In 2006 420 workshops held worldwide, with teams responsible for most of IBMs revenue Most important use is segmentation of customer base Shift resources into invest segments with low wallet share Extensive anecdotal evidence to success of process E.g., higher growth in invest accounts after resource shifts MAP recognized as 2006 IBM Research Accomplishment Awarded based on proven business impact 32 2006 IBM Corporation IBM Research Summary Wallet estimation problem is practically important and under-researched Our contributions:

Propose Wallet definitions: SERVED and REALISTIC Offer corresponding modeling approaches: Quantile estimation methods Graphical latent variable model Evaluation on simulated, public and internal data Implementation within MAP project We are interested in extending both theory and practice to other domains than IBM 33 2006 IBM Corporation IBM Research Thank you! [email protected] 34 2006 IBM Corporation