ATT is a large telecommunications company and they have really good data about phone calls globally.
They would like to build new business services around the nice data assets that they have, similar to a data broker. One beautifully rich data set they have is Call Detail Records (CDRs). They would like to build a data broker service around this internal data. (google: call detail records).
There are privacy restrictions around data that they can sell, but they can derive new data fields from their rich proprietary data. If averaged at a large enough geographical region (zip9, zip5, zip3?) these derived fields can be sold. Also, they can sell new fields at the customer/address/phone number level that are derived from CDR fields if the new fields are of a different nature than the CDRs. This would allow them to sell a segment label/description.
They see a company called Claritas and others that make a lot of money selling highly descriptive consumer segments (google: Claritas PRIZM segments).
An ATT executive wants to build such a data broker business, selling such demographic-like segments built using the CDRs. Let’s start with the U.S. only. Design a process to build PRIZM-like segments using CDR records.
Here are three examples (see attached files) of pretty good submissions for homeworks. It’s best if you begin by stating what the business problem is, in your own words, so it’s clear to the reader that you understand the problem. Then describe the proposed solution approach. If appropriate, you can also describe any assumptions you made, particularly about data that is needed. You might describe several solution paths depending on what data might be available. Describe the input and out put of the framework you designed.
The goal is for the reader to read it, easily understand it, and believe it will work well to solve the problem. It’s not enough to have a great idea – you also need to communicate it well.
Questions and Answers:
- From whom does AT&T collect CDRs? From AT&T customers only? In other words, is it the case that whenever an AT&T customer calls someone or receives a call from someone, there will be a CDR collected? If not, how were CDRs collected?
- They collect the data for the CDRs from their own internal network, and they only see their own customers’ behavior. ATT sees about 40% of all US phone traffic.
- If the CDRs only include AT&T customers’ calls, what data do we have available on these customers other than the CDRs? (Demographic data?)
- We know their PII, so we can link them at a data broker. We don’t have our own demographic data about them.
- Do we match customers in the CDRs to other data through their phone numbers?
- We can match by phone number to any other data that we have or could get. We also have their names and addresses.
- Are the characteristics of each segment we have in the end based on the CDR information only (e.g. a segment in which the average call duration is above 30 mins)? Or on any information we can find on a customer?
- We could use anything we can gather about a customer to help define the segments.
- Could you please explain “average at a geographical region (zip9, zip5, zip3…)” again with examples?
- Example: Credit bureau data is very sensitive. Companies can’t buy CB data at the individual person level unless the person gives them permission (BTW – every time you apply for a credit product you give the company permission to get your CB for as long as you have their credit product.) BUT, a company can buy averaged CB data at geographic region levels, with the smallest region being zip9 (9 digit zip code). This is about 4 square city blocks.
- After we have our segments and say a company gives us a list of people to assign the segments, what information/data will the company provide to us on the list of people?
- Generally you’d get what one sends to a data broker, enough PII (name, address, phone number…) to link to your database.
- Where can we access to the slide of class 2 and the notebook you shared with us during class?
- Should be on Blackboard now. They should appear after the class. Is over.
- Who will buy our data (demographic-like segments built using the CDRs) and how they will make use of these data?
- Companies that want to do marketing solicitation, just like we went to a data broker when we wanted to do installment loan solicitations.
- What type of information in CDRs that we cannot use or assume that we cannot obtain? (For instance, like the destination number, can we use it?)
- Internally, we can use anything on the CDR for analysis, model building, mapping to other data sets etc. We generally can’t sell the CDC info.
- Do we have information about content in SMS they receive? And what about browsing history? And geolocation data from cell towers?
- No, not content, but I bet we can get geolocation/tower identification data. That would be useful.
- What data does PI include which we have? Age, Gender etc?
- PII is personal identifying data fields, most commonly name, address, phone number, SSN, date of birth… But all we have are a few fields, like name, address, phone number.
- Is there a preferred time scope for the data product to be valid for sale? For instance, data in the past one year.
- I’m flexible, so no preference on time scale. I just want the resulting segments that I can sell to be useful in consumer models so people will buy them.
- How many call destinations do we have? Do we have the call destination names if they are not owned by a person but a business, or public institute? If the destination number is owned by an individual, do we have their occupation?
- We know very little about the destination number, unless it’s a phone number in our customer base. We can do elaborate statistical analysis to make inferences on the destination numbers which might provide insight.
- Do CDRs include locations of the 2 persons having the phone call?
- Not explicitly, but we can frequently make inferences from the first 3 digits. For landlines, the first 6 digits tell the geography.