Introduction, Activites, Data Mining Lab.

HOME > Introduction (Activities)

The objective of our research project entitled 'Practical science approaches for data mining business applications' is to construct a research system that comprehensively and seamlessly links several research processes - development of basic technology and applications for data mining, development of consumer behavior models, and validation of the entire structure through practical testing - with the end goal of applying the system to data mining tasks in the business field, as well as creating a world- class hub for data mining research.

Our research can be broadly classified into the following three areas.

1) Models of customer management that use large-scale sales history data

By developing techniques for accumulation and analysis of large-scale consumer sales data sets spanning 1-2 million individuals, the Data Mining Laboratory conducts theoretical and practical research into customer management systems capable of effectively managing vast amounts of customer data.

2) Integration of customer path data and purchase models

The Data Mining Laboratory not only develops techniques and methodologies for analysis of streaming customer movement data amassed by RFID, but conducts theoretical research on integration of these techniques and methodologies with customer purchase models. Once created, the utility of these models is verified by conducting in-store tests.

3) Application of multidimensional time-series data modeling to advertising effect models

Using methods of time-series modeling such as statistical mathematics and machine learning, we analyze data sets on advertising viewing that contain over 10,000 time-series attributes, and construct models that measure the effect of advertising.

In addition to the above, the Data Mining Laboratory works in the following three technical areas.

1) Development of stream data mining technology specialized for treatment of in-store customer paths

Although there are many forms of sensor network data (stream data) pertaining to diverse customer purchasing behavior, we mainly focus on in-store movement of customers when developing data mining technology.

2) Development of a data mining-oriented platform

In order to integrate the abovementioned stream data with customer sales history data and derive theoretical implications for marketing, we use MUSASHI, previously developed at the Data Mining Laboratory, as a base on which to build a data mining-oriented platform that seamlessly integrates data and simplifies the process of knowledge discovery.

3) Construction of new consumer behavior models

The Data Mining Laboratory constructs original consumer behavior models, contrasting them with extant marketing models throughout their development. The models are then validated scientifically on the basis of data obtained in in-store tests. For example, we perform theoretical investigations of exposure models in advertising theory and their applicability to compatibility with stream data. To determine specific items for theoretical investigation in marketing, we design in-store experiments in collaboration with the Marketing Research Group at Columbia University.

What path does a customer take around a store when purchasing a given product? What can we learn about purchasing from previously unseen customer movements?

An example of customer path data

Attempts to comprehend consumer purchase behavior in the retail industry have to date made use of POS and other purchase data of such type, as well as POS data tagged with a user ID stored on a member's card or similar piece of identification. However, with the rapid advances in communications technology being made of late, attempts are now being made to attach a technology of electromagnetic authentication known as RFID to shopping carts, which will allow data on customer paths through stores to be recorded. This data will clarify customer movement within stores, a factor that was difficult to determine using ID-tagged POS data.
At the Data Mining Laboratory, we are constructing a framework that will incorporate not only all previous ID-tagged POS data, but also data on customer paths in stores. Combining these two sets of data will make it possible to determine what kind of customer bought what kind of product, at what time and at what price, and by taking what path through the store. Yet, no data mining has been carried out to date using customer path data, a host of issues remain unaddressed, including construction of analysis techniques and standardization of rules for use as indices.
By applying the various analysis methods and data mining technologies developed to date, and by creating novel techniques, the Data Mining Laboratory aims to gain insights into customer purchase behavior, including the previously analysis-resistant area of in-store movement. A further aim of the laboratory is to make new discoveries about in-store purchase behavior while establishing methods to analyze customer path data.

By manipulating data mining techniques to link purchase behavior to 20,000-plus consumer attributes, we attempt to identify means of placing advertisements that will maximize their effect.

As a result of the financial crisis, many companies are searching high and low for means to reduce their advertising expenditure and cut costs. Hence, advertising effect is drawing ever more attention among many advertising media. Television advertising represents the major portion of advertising costs, hence cost cutting and maximization of advertising effect in this area is the primary objective of many companies. The Data Mining Laboratory analyzes the relationship between TV advertisements and consumer purchasing behavior through manipulation of data mining techniques, in an effort to find optimum advertising budget allocation methods that maximize effect while minimizing expenses.

However, as data pertaining to TV advertisements and consumer purchase behavior is vast, containing some 20,000-plus attributes, conventional analysis techniques cannot easily be applied.

The laboratory is now amassing data on the time and variety of programs that consumers watch, what advertisements are watched at that time, and whether there is change in purchase level for a given product after watching the advertisement. However, tens of thousands of television programs are broadcast every month, with ten or more advertisements broadcast within each program. For every advertisement, we can consider the attribute of whether or not a consumer has seen it, as well as the various attribute branches for each condition leading up to the final purchase of a product. Hence, as the attributes associated with the data for a given customer will run into several tens of thousands, analysis becomes a real challenge. By processing large volumes of data, such as that for customer attributes, using the large-scale data mining tool MUSASHI developed at the Data Mining Laboratory, and by developing novel methods of analysis, we aim to establish methods for optimum advertising budget allocation.

PAKDD 2008

Data Mining for Design and Marketing (DMDM 2006)

Data Mining for Design and Marketing (DMDM 2008)

Invited Session in KES 2009

Special Session in SMC2007

Special Session in SMC2008

Special Session in SMC2009

Return to Page Top