Building a database

Distinguish between database of existing customers through normal transaction and a prospect database consisting of people who don’t yet buy your product or with whom you don’t yet have a relationship.

What do you want to know?

Mailing list
 

First Name Title
Last name  Suffix
Initial Apt. No.
Address Phone
City
Province
Postal Code

Indexes:

What additional information needs to be indexed for ease of locating, reporting, queries?
 

Geocode Age Sex
Telephone Area Code Total Purchases
Most Recent Purchase Earliest purchase Credit limit

What kinds of reports?

  • Sales by postal code
  • Sales by month
  • Sales by SKU
  • Customer frequency
  • Customer recency
  • Sales by customer income level

Where do you get names
 

In House registration cards
credit cards
smart cards
frequent buyer clubs
point-of-sale computer records (Future Shop, Radio Shack)
Outside Services credit card companies
Research firms (CompuSearch, Neilson)
Rent lists from list brokers

To create database from variety of input sources, need software to convert formats, merge, purge, edit check.

Build a database to build relationship with existing customers

Method 1 RFM Analysis
Determine who are your best customers
Concept 1

Those who have bought from you most recently Recency
Those who buy from you frequently Frequency
Those who spend most with you Monetary

Concept 2

The database is sorted e.g. by Recency and divided into quintiles: i.e. five equal parts so that 20% of your database is represented in each quintile; The first quintile being the most recent buyers, the last quintile being the least frequent buyers.

Each of  Recency, Frequency and Monetary fields is sorted and coded by quintile as we will see soon.

Creating RFM Codes

Recency

  • Keydate = Most recent purchase sort by most recent 
  • divide dB into 5 equal parts(quintile)
  • Code top 20% as code 5 (most recent purchase, next quintile 4, the next 3...

Do the same for frequency

  • Now the records have 2 numbers side by side (55 highest...11 lowest)

Do the same for monetary

Now the records are coded 555...111

Important Note:

The above method and the method used in the examples which follow, are very simplistic. It is more likely that you would use variable RFM calculations that more closely simulate regression analysis. A more sophisticated model might use the following:

Recency points

24 points Current quarter
12 points Last 6 months
6 points Last 9 months
3 points Last 12 months

Frequency points: Number of purchases x 4 points

Monetary Points: 10 percent of dollar purchase with a ceiling of 9 points. (The ceiling avoids distortion by an unusually large purchase.)

The number of points allotted varies among users but the principle is the same. Points are assessed and each customer is given a total RFM value. See the following example; an analysis of accounts by Recency, Frequency, and Monetary points. 

Account   Recency No. of Frequency Dollar Monetary Total Cumulative
Number Month Points Purchases Points Purchases Points Points Total Points
16,441 9 12 2 8 32.17 3.21 23 39
16,441 12 24 1 4 46.10 4.61 32 71
16,521 1 3 3 12 87.09 8.71 23 23
16,608 7 12 1 4 21.00 2.10 18 28
16,708 4 6 1 4 33.60 3.36 13 18
16,708 8 12 2 8 71.00 7.10 27 45
16,708 11 24 1 4 206.00 9.00 37 82
16,921               68

In reviewing the list of accounts, note that account number 16,708 Spent $206 in November but was given only 9 monetary points. This reflects the arbitrary decision of the marketer to give no more than 9 monetary points regardless of amount of purchase. Finally, note that account number 16,921 shows no activity for calendar year 1997 but has 68 points for 1996.

The opportunities for manipulating a database under R-F-M system are numerous. Each firm has to consider which variables have the most importance to their profitability. RFM systems become more accurate and valuable as they are fine-tuned over time.

Use RFM codes to predict response

1. Assume customer database of 1 million names, prepare a test mailing

2. Select a test group e.g. 30,000 assume that all records have been RFM coded

Determine Nth

To make sure that the 30,000 names selected are exactly representative of the total 1 million database, use a procedure known as Nth.

1 million / 30,000 = 33. To create an exact Nth, take every 33rd record from your database

3. Do a test mailing offering a product or service

Assume a response to your offer of 474 people . Response rate is 1.58%

Table 5-1 Results of mailing to 30,000
Cell

RFM

Percent 

Number

No. of

Response

Position

Cell

of File

Mailed 

Responses

Rate

A

B

C

D

E

F

1

555

0.79%

238

19

7.98%

2

554

0.81%

244

12

4.92%

3

553

0.83%

250

12

4.80%

4

552

0.77%

231

8

3.46%

5

551

0.81%

244

5

2.05%

6

545

0.80%

240

9

3.75%

7

544

0.74%

221

12

5.43%

8

543

0.81%

243

7

2.88%

9

542

0.89%

267

10

3.75%

10

541

0.96%

287

7

2.44%

...

...

...

...

...

...

39

432

0.84%

253

5

1.98%

40

431

0.77%

230

2

0.87%

41

425

0.63%

189

4

2.12%

42

424

0.64%

192

3

1.56%

43

423

0.84%

253

3

1.19%

44

422

0.81%

243

2

0.82%

45

421

0.75%

224

4

1.79%

46

415

0.84%

253

3

1.19%

...

. - -

· - -

..

-

· - -

87

234

0.89%

267

2

0.75%

88

233

0.96%

287

0

0.00%

89

232

0.66%

199

3

1.51%

90

231

0.78%

234

3

1.28%

91

225

0.84%

253

2

0.79%

...

...

...

...

...

121

115

0.84%

253

0

0.00%

122

114

0.81%

243

1

0.41%

123

113

0.78%

234

0

0.00%

124

112

0.84%

253

1

0.40%

125

111

0.92%

277

0

0.00%

Total

125

100.00%

30,000

474

1.58%

The Offer and the Profit from the Test

To take our example further, let's look at the financial side of this mailing

We will assume that

  • the mailing offered a product that sold for $100 of which
  • $65 was the cost of the product and the fulfillment.
  • The net profit on each unit sold was $35.
  • The mailing cost $0.55 per piece.
  • The overall results, therefore, looked something like Table 5-2.

As such, the test mailing was not much of a success. A $90 profit on an investment of $16,500 is hardly worth the effort. However, the test mailing has given us vital information about the offer and our customer's reaction which will enable us to turn this miserable test into a magnificent profit.

Table 5-2: Results of Test Mailing

  Qty.  Rate  Amount
Revenue      
Sales  474 $35.00 $16,590
Costs      
Mailing  30,000 $0.55 $16,500
Profit      $90
Since we now know the response of each RFM cell to the offer, we will design our rollout mailing
to drop the losing RFM cells and include all the profitable RFM cells so as to maximize our profit.
Let's see how we go about doing that.

Rollout Predictions Must Be Discounted

In planning our rollout mailing to selected prospects from the one million customer universe, we

will make the assumption that

  • the test response is representative of the rollout (major mailing) response for each RFM cell. Is this a reasonable assumption? Absolutely yes.
  • If the test names were selected based on an across-the-board Nth from the total universe, the behavior of the customers in each RFM cell in the rollout should be almost identical to the behavior of the customers in each RFM cell in the test, with one difference.
  • The overall response to the test was 1.58 percent. Will the rollout do as well?
  • Answer: probably not. While there are exceptions, of course, in general rollouts never do as well as the test. How much worse? To be safe, we will assume that the rollout response will be only 85 recent as good as the test (a 15 percent reduction in response rate).

To see how the rollout will look, let's use the test results, discounted by 15 percent, to predict our rollout response.

This is shown as Table 5-3.  

Cell

RFM

Percent

Rollout

Cum.mailed

Response

Rollout Resp.

Rollout

Position

Cell

of file

Universe

Rollout

Rate

Rate

Response

A

B

C

D

E

F

G

H

1

555

0.79%

7,933

7,933

7.98%

6.78%

538

2

554

0.81%

8133

16,066

4.92%

4.18%

340

3

553

0.83%

8333

24,399

4.80%

4.08%

340

4

552

0.77%

7,700

32,099

3.46%

2.94%

226

5

551

0.81%

8133

40,232

2.05%

1.74%

142

6

545

0.80%

8,000

48,232

3.75%

3.19%

255

7

544

0.74%

7367

55,599

5.43%

4.62%

340

8

543

0.81%

8100

63,699

2.88%

2.45%

198

9

542

0.89%

8,900

72,599

3.75%

3.19%

284

10

541

0.96%

9,567

82,166

2.44%

2.07%

198

39

432

0.84%

8,433

313,097

1.98%

1.68%

142

40

431

0.77%

7,667

320,764

0.87%

0.74%

57

41

425

0.63%

6,300

327,064

2.12%

1.80%

113

42

424

0.64%

6,400

333,464

1.56%

1.33%

85

43

423

0.84%

8433

341,897

1.19%

1.01%

85

44

422

0.81%

8,100

349,997

0.82%

0.70%

57

...

...

...

...

...

...

...

...

120

121

0.64%

6,400

957,992

1.04%

0.88%

56

121

115

0.84%

8,433

966,425

0.00%

0.00%

0

122

114

0.81%

8100

974,525

0.41%

0.35%

28

123

113

0.78%

7,800

982,325

0.00%

0.00%

0

124

112

0.84%

8,433

990,758

0.40%

0.34%

29

125

111

0.92%

9,233

999,991

0.00%

0.00%

0

Total

125

100.00%

999,991

999,991

1.58%

1.34%

13,432

If the test response rate is discounted by 15 percent, there will be 13,432 responses to the

rollout if we mail all the RFM cells. Let's explain some of the new columns shown below.

  • The Rollout Universe (Column D) is simply the number of customers in each RFM cell. We have not excluded the 30,000 mailed in the test mailing.
  • The Cumulative Mailed Rollout (Column E) is a column included by many marketers. It is based on the assumption that we will plan our final mailing from the top down, mailing the most responsive cells and working our way down through the RFM cells to less and less responsive ones until we decide to stop.

This is not the best way to use RFM, however, as we will soon see.

  • The Response Rate (Column F) is the rate for each cell derived from the test mailing.
  • The Rollout Response Rate (Column G) is the rate in Column discounted by 15 percent. This is the rate we can conservatively count for our rollout mailing.
  • The Rollout Response (Column H) then is the number of people from each cell who will buy our product if we mail to them.

How Many Should We Mail?

Knowing our response rate, the question becomes,
"How many of of universe of one million should we mail to?"
The answer has to be, "Mail in such a way that you maximize your profits." If
we were to mail to our entire one million names, the picture would look like Table 5-4:

Table 5-4: Results of Rollout Mailing

Assuming All Names Mailed
 

  Quantity  Rate  Amount
Revenue      
Sales  13,432  $35.00  $470,120
Costs      
Mailing  1,000,000  $0.55  $550,000
Loss      ($79,880)
Mailing to all one million names would be a disaster. What we want to do is to pick and choose
among the RFM cells to select those that are profitable and drop those that are not. Let's see
how we determine our maximum profit mailing plan.

Mailing Only to Profitable Cells

  • The secret to successful RFM marketing is to mail only to those cells that you know will be profitable. How can you know that?
  • By determining the break-even response rate for each cell, and mailing to all cells that do better than the break-even rate.

Figure 5-4 shows the number of profitable RFM cells versus the number of unprofitable cells-the ones that produce a response rate lower than the break-even rate.

Break-Even Response Rate

The break-even point occurs when the profit from a cell is exactly zero. Mailing to any cell that produces a positive profit adds to the total profit. To determine which cells are profitable, we need to determine the response rate that just breaks even. This occurs when the cost of mailing to the cell is equal to the net revenue from sales to members of the cell.

Break-even response rate occurs when:

Mailing costs to cell = net revenue from cell

Where NM = number mailed

MC = cost per piece

R = break-even response rate

NR = revenue per sale

NMxMC = (NMxR)xNR

Solving this formula for R (the break-even response rate) leads to:

  R=MC/NR

In our example:

R = $.55 ./$35 R = 1.57% This is the break-even response rate.

This calculation tells us that if the response rate to any cell is greater than 1.57 percent, we will add to our profit by mailing to that cell. We can use a spreadsheet, as shown in table 5-5, to examine each RFM cell and select those for which the response rate exceeds the break-even point.
 

Table 5-5: Profitable and Unprofitable Cells

Cell

RFM

Rollout

Resp.

Cell

RFM

Rollout 

Resp.

Position

Cell

Universe

Rate

Position

Cell

Universe

Rate

A

B

C

D

A

B

C

D

1

555

7933

6.78%

64

332

7,800

1.45%

2

554

8133

4.18%

65

331

8,433

0.67%

3

553

8,333

4.08%

66

325

7,667

1.48%

4

552

7,700

2.94%

67

324

6,300

2.25%

5

551

8,133

1.74%

68

323

6,400

1.33%

6

545

8,000

3.19%

69

322

8,433

2.01%

7

544

7,367

4.62%

70

321

8,100

1.40%

8

543

8,100

2.45%

71

315

7,467

1.90%

9

542

8,900

3.19%

72

314

8,433

1.01%

10

541

9,567

2.07%

73

313

9,233

1.22%

11

535

6,633

2.13%

74

312

7,033

0.81%

12

534

7,800

2.18%

75

311

7,700

1.11%

13

533

8433

2.69%

76

255

9,600

1.18%

14

532

7,667

1.84%

77

254

8,467

1.33%

15

531

6,300

3.60%

78

253

8,200

1.39%

...

...

...

...

...

56

345

7,700

1.11%

119

122

6,633

0.00%

57

344

8,133

1.74%

120

121

6,400

0.88%

58

343

8,000

0.71%

121

115

8,433

0.00%

59

342

7,367

1.16%

122

114

8,100

0.35%

60

341

8,100

0.35%

123

113

7,800

0.00%

61

335

8,900

1.28%

124

112

8,433

0.34%

63

333

6,633

1.28%

125

111

9,233

0.00%

Total

125

1,000,000

1.34%

As you can see from Table 5-5,

  • some cells have a rollout response rate of more than 1.57 percent and many have a lower predicted response rate. Our job is to select the winning cells, and mail to them, leaving out the remaining cells.
  • How can we pick out these winning cells rapidly?
  • If we have constructed a spreadsheet similar to the one shown here using Lotus 1-2-3 or Microsoft Excel, we merely have to add a column that identifies the profitable cells. The Lotus language for such a selection is:
  • Assuming that column D contains the response rate,
  • column C contains the quantity of that cell available to be mailed.

The result will be placed in new column E. This yields a table similar to Table 5-6.
 

Mailing only to profitable cells
Rollout
Cell RFM Rollout Response Profitable Rollout
Position Cell Universe Rate Cells Response
A B C D E F

555

7,933

6.78%

7,933

538

2

554

8,133

4.18%

8,133

340

3

553

8,333

4.08%

5,333

340

4

552

7,700

2.94%

7,700

226

5

551

8,133

1.74%

8,133

142

6

545

8,000

3.19%

8,000

255

7

544

7,367

4.62%

7,367

340

8

543

8,100

2.45%

8,100

198

9

542

8,900

3.19%

8.9

284

10

541

9,567

2.07%

9,567

198

11

535

6,633

2.13%

6,633

141

12

534

7,800

2.18%

7,800

170

13

533

8,433

2.69%

8,433

227

14

532

7,667

1.84%

7,667

141

15

531

6,300

3.60%

6,300

227

16

525

6,400

5.31%

6,400

340

17

524

8,433

3.36%

8,433

283

18

523

8,100

2.10%

8,100

170

19

522

7,467

1.90%

7,467

142

20

521

8,433

2.01%

8,433

170

21

515

9,233

1.84%

9.233

170

22

514

7,033

2.01%

7,033

141

23

513

7,700

1.47%

0

0

24

512

9,600

2.36%

9,600

227

25

511

8,467

1.67%

8,467

141

26

455

8,200

1.39%

0

0

27

454

7,933

0.00%

0

0

28

453

8,133

1.05%

0

0

29

452

8,333

1.70%

8,333

142

30

451

7,700

1.47%

0

0

31

445

8,133

1.74%

8,133

142

125

111

9233

0

0

0

Total

125

1,000,000

1.34%

290,763

7,394

In short, we will mail to only 290,763 people (29 percent of the file) make only 7,394 sales. Our profit, however, will far exceed that if mailed to the entire file as shown by Table 5-7.

Table 5-7: Comparison of rest, Full File, and RFM Selected Mailings
 

      Selected
  Test Full File By RFM
Revenue      
Response Rate  1.58%  1.34%  2.54%
Responses  474  13,432  7,394
Net Revenue  $16,590 $470,120  $258,790
Costs      
Mailing  30,000  1,000,000 290,763
Total Costs  $16,500 $550,000 $159,920
Profits  $90 ($79,880) $98,870

We have turned a $79,880 loss into a $98,870 profit by use of RFM break-even cell selection. Is this a fluke? A classroom example cooked up for this book, but not possible in real life? Not at all. It is an example that every reader of this can emulate, provided that:

  • You have a database that contains customer data, including recency frequency, and monetary amounts.
  • You create the necessary RFM codes from the data.
  • You do a test mailing to an Nth of the file. (It is surprising how difficult it is to get marketers to do test mailings. They always want to rush out too early with what they assume is a knockout mailing.)
  • You create a spreadsheet with the test mailing results, discounting rollout percentage by an appropriate amount.

Why Quintiles?

Dividing your database into five equal parts (quintiles) for the purpose of analysis seems rather arbitrary. Why not divide it into quartiles (four parts), or deciles (10 parts)? Wouldn't deciles, for example, be more accurate?

Actually, the answer is no. With deciles, accuracy tends to go down. Using RFM with deciles gives you a total of 1,000 RFM cells (10 x 10 x 10) ,read of the 125 cells you get with quintiles. With a test mailing to 30,000 Id an average response rate of 2 percent, you will get an average of only .6 respondents per cell with deciles. This is such a small number that the law of chance becomes much more important than the law of consumer behavior (the concept that underlies RFM analysis). Each person respond to your promotion makes his cell seem like a winner (since 1.0 is greater than the average of 0.6), when in reality, the cell may be a real loser.

The rollout.

To get a prediction with deciles that is as accurate as a 30,000 mailing using quintiles, you would have to send each test mailing to 240,000 people. This is such a large test mailing that it makes extensive testing uneconomical and impractical in most cases.

On the other hand, using a smaller division than quintiles, such as quartiles reduces the accuracy

in another way. With quartiles, you have only cells (4 x 4 x 4). The fewer cells you have, the

more you mix different ,consumer behaviors together and thereby lose the pinpointed predictive accuracy that you get with a larger number. · or these reasons, I suggest that you stick with quintiles and learn how to use them in your marketing.

LOOKING AT CUSTOMER5 BY RFM CELL

Once you have coded your file by RFM, you have an entirely different .way of looking at your customers. You can pick certain cells and give them special treatment. You can design promotions to get certain RFM w move up. By Keeping track of the previous RFM cell of each customer after an update, you can determine which way they moved.

·

How do we know with confidence that this prediction will work?

Confidence Level

Most marketers work with either a 95% or 99% confidence level

At the 95% confidence level, you accept the fact there is a 5% chance your rollout results will be better or worse than you forested. In Canadian dbM 95% is acceptable

Acceptable limits of error or levels of variance

A test mailing’s limit of error or level of variance is the range of possible deviation from accuracy

If the limits of error on a projected 2.0 percent response rate are +/- 2% then the actual results may vary from 1.8% to 2.2% (2% +/-0.2)

How to use probability tables

Using e.g of 1.58% response from the previous example

1. Go to the chart of 95% confidence
2. Select 1.6 R (response)
3. Go across to find 30,000 (test mailing size
4. Read up to find limit of error of .14%

This shows there is a 95% probability that a future mailing under identical conditions will produce a response between 1.51% and 1.8% (1.65-.14 1.64+.14)

Probability Rule of Thumb

There is a very useful rule of thumb for determining test results, and to determine whether test results are accurate or not. A properly conducted test which produces 200 orders will be accurate to within +/- 30 orders. If you have fewer than 200 orders, treat your results with caution. You will therefore have reasonably accurate front-end results with: