GUGGI for ORACLE - Tool For DBAs and Developers

 

 

 
 
 
 
 
 
 
 
 
 
 

Data Mining in eBusiness

                                                                                                                 

Shakti Goel 

Impact Solutions LLC

Waltham, MA 02453

                        

 

 

 

Table of Contents

 

1.  Abstract

2.  Introduction

3.  What is Data Mining?

 

        3.1  Myths about Data Mining

        3.2  Applications

 

4.  Relevance to eBusiness: The Future

 

        4.1  Information Portals

        4.2  Common Data Mining Techniques

        4.3  Data Mining Examples in eBusiness

 

5.  Summary

6.  References

 

1.  Abstract

Data mining is an emerging technology aimed at discovering patterns in the underlying historical data.  A new dimension has been added to data mining by extending this technique to the realm of eBusiness.  The mining of Web site and other transactional data using data mining techniques and tools is an attempt to recognize, anticipate and learn the buying habits and preferences of customers in the new economy.  Data mining will be a critical process impacting our client’s long-term eBusiness success, where failure to quickly react, adapt, and evolve can translate into customer attrition in the click of a mouse.  This paper discusses the fundamentals of data mining, the desired characteristics of a data mining installation, the myths surrounding the technology, and its application to eBusiness.  Techniques for bringing the best opportunity for knowledge discovery, industries that can derive the most significant competitive advantage from the application of these technologies and how these technologies will impact the new economy are also discussed.

Key words: data warehouse, data mining, pattern recognition, e-business, Internet, information portal, clustering, cookies, ICS

 


2.  Introduction

The advent of fast computers, significant developments in the database software, and the need for making business decisions in real time have paved way for the data mining technology.  A data warehouse can be seen as a technology originating out of the amalgamation of strategy consulting and information technology (IT).  In this new economy, wherein business change is accelerated, it is imperative that business decisions be made in real time.  Generally, corporations have large amounts of data and a multitude of legacy systems.  The data is not necessarily organized in a way that meaningful analysis can be carried out without much data reorganization and manipulation.  

 

Besides carrying out the standard data analysis such as weekly or monthly sales by customer or product, it is sometimes necessary to develop business strategies in which a certain segment of the market is targeted.  Development of this marketing strategy requires understanding the data pattern regarding the segment being targeted.  This understanding is most efficiently achievable through the use of the data mining technology.  While a data warehouse is a source of data, data mining is the technology that operates on it.  Data mining utilizes the data pattern recognition algorithms such as neural networks [Ref. 1], probability theory, chaos theory, wavelet theory [Ref. 2] etc. to name a few.  These entities take the historical data as the input and output the conditions that most likely will generate the desired marketing results at a certain level of confidence.   Marketing strategies are developed and implemented using the traditional channels such as regular mail and magazine.  However, in this age of ever changing technology it is imperative that the strategy be cost-effective and its impact be immediate and widespread.  This has been one of the driving forces behind the emergence and wide acceptance of the Internet technology.  In other words, a desired technology solution is the one that can result in a significant reduction in cost, increase in revenue, and better understanding of the business.

 

The paper discusses the basics of data mining, the desired features in a data mining application and myths surrounding the technology.  The applications of the technology to various industries and businesses done over the Internet have been discussed.  The relevance of data mining to eBusiness in profiling customers and developing direct and personalized marketing campaigns, and web empowering the corporations by using information portals (for the purpose of wider information dissemination) have also been discussed.

 

3.  What is Data Mining?

Data mining is a sophisticated form of decision support that utilizes sheer computing power and software applications to identify previously unrecognized patterns in large data sets or data stores.  Moreover, data mining is the iterative process of discovering actionable and meaningful patterns, profiles and trends that can enable our clients to be more competitive, more productive and more efficient.  In some cases, the implementation of data mining applications has unearthed millions of dollars in savings and new revenue opportunities.  Byte Magazine reported that some companies have reaped returns on investment of as much as 1,000 times their initial investment on a single project.  For a number of years, artificial intelligence in the form of data mining has been in use by:

 

§         Cellular phone companies, to stop customer attrition

§         Financial services firms, for portfolio and risk management

§         Credit card companies, to detect fraud and set pricing

§         Mail catalogers, to life their response rates

§         Retailers, for market basket analysis

 

 

 

 



 


Figure 1: Data mining technology in a business intelligence process.


 

 

Why mine data?  Simply, data has become too large and complex to be examined by humans.  In fact, some experts believe that mankind has captured more data in the past 15 years than in all previously recorded history.  The voluminous amounts of data generated from a Web site, for example, often hide patterns that reveal conditions when visitors are likely to make purchases or click on certain ads or banners.  Given that our clients are starting to operate at Internet speed, it is no longer appropriate to broadcast a marketing message to one and all without expecting a diminishing return.  Our client’s customers want targeted, relevant information that speaks directly to them.  And in our target industries, especially in the super competitive banking, insurance and telecommunications segments, most of our clients want the most profitable customers they can find.  How do you identify these customers?  How can you convert less profitable customers to more profitable ones?  The answers are buried in the data, untapped.  In fact, Gartner Group estimates that only 2 percent of the existing online data is currently being analyzed, a percentage which is rapidly falling towards zero as storage more than doubles each year.  Most online companies have not begun to exploit their customer data, which in the new economy, is an intangible asset worth far more than most physical assets today. 

 

Figure 1 shows the integration of data mining technology in a business intelligence process.  In eBusiness, a customer accesses the company’s Web site using an Internet browser.  Cookies and data are sent between user machine and the Web server, and are subsequently stored in the server log files.  Information on the user machine such as domain name or the IP address, pages accessed, and the order in which the pages were accessed is also stored.  If the customer places an order, the information is recorded in a back-end relational database (Online Transaction Processing - OLTP).  The business users may analyze the data stored in the OLTP or data warehouse databases as a part of their daily routine.  Since, pure Online Analytical Processing (OLAP) is not enough for understanding the patterns in the data, other techniques such data mining may be deployed.  This may involve developing a model, building a sample data set by collecting the information from server logs and other corporate databases, and validating the model.  Once the model is validated, a full-fledged data mining approach can be used to discover the patterns in the underlying data.  The findings can be delivered, for example, by using information portals (discussed later in the paper), or in form of reports.  Finally, the discoveries can be integrated in the form of a solution, which may be a marketing strategy to retain customers or fix any anomalies in the ordering system.  In summary, the entire process involves extraction of actionable intelligence for product and financial analysis and development of a marketing strategy with the view that customers are the most important assets that companies have.

 

The term  “data mining” is often confused with OLAP.  In OLAP, the user can carry out the what-if analysis, make forecasts based on the historical data, run weekly or monthly reports and obtain the results within seconds.  OLAP analysis, while hints at, say, customer behavior, generally does not provide the benchmarks to segment the customers.  Data mining provides these benchmarks.  Data mining is data driven and not user driven as in OLAP.  The current generation of data mining systems enables a user to launch a search process without actually knowing the answer to the query beforehand.  Figure 2 shows the graphical representation of a retail business with standard dimensions such as customer, product, geography, and time.  The primary factual information captured is sales dollars, but may also include costs, discounts, and quantity sold.  A typical OLAP analysis may help in identifying the top 10 percent of the

customers by sales.  It, however, will not suggest whether the best customers are identified as top 10 percent or top 15 percent of the customer base.  The OLAP analysis becomes very complex as the number of attributes describing a customer increases dramatically.  An efficient data mining tool should be able to take into account the entire set of customer related attributes, and develop the benchmarks for the ones that describe the purchasing behavior of the customers at certain level of confidence.  A marketing strategy can then be devised around these benchmarks.  The general idea behind data mining is to use pattern recognition algorithms such as neural networks theory, wavelet theory, probability theory and likelihood theories to name a few.  Objective functions may be defined with sales as the dependent variable and the customer or product attributes as the independent variable as described by Eq. 1.

 

max FSales (customer attributes: age, annual income, gender, product type etc.)     …………1

 



Figure 2: Graphical representation of a retail business.


 


where, Fsales is an objective function described here as a function of several attributes.  The nature of the objective function may depend on business operations.  A data-mining tool will analyze the patterns of customers who contribute to maximum sales and link them to their attributes and the products that they purchase.  It is not necessary that a data warehouse-like

database design be available to carry out data mining, but if one is available will help reduce the analysis time.  Figure 1 shows the layout of a typical data mining scenario.

 

The implementation of a data mining application requires business justification besides conducting a technical feasibility study [Ref. 3].  Although the implementation of the application requires significant IT investment, the ultimate beneficiaries are the business users.  A data mining application should have the following eight business related characteristics:

 

1.        The system and the results generated should be intuitive and not require much statistical knowledge or the over-dependence on the IT department.

2.         The system should be capable of carrying out the what-if analysis so that the follow up questions can be answered in real time.

3.        The system should be able to support multiple users and cater to different business needs (slicing and dicing).

4.        The results generated should be accurate, consistent and must be presented at a certain level of confidence.

5.        The system should be growable and should be able to handle incremental loads.

6.        The system will be credulent if it can automatically detect the patterns in the data as it is loaded, and make the user aware of any major trends.

7.        The system should not be limited to a particular class of databases, and be easily integrated with other business wide applications.

8.        The system should not require much maintenance.

 

From a technical perspective:

 

§         The system operating in a client-server architecture mode should have minimum dependence on the client as otherwise application may be severely limited when it comes to very large databases (VLDB).

§         The system should be Web based for a global reach.

§         If the system generates a series of flat files (instead of working directly against the RDBMS), then a minimum clean is required.

§         The system should have the capabilities of both the parallel and sequential task processing.

§         The system should not require a data warehouse like database design, and should work against a normalized OLTP design (this may require large processing times).

§         The system should be able to handle multiple dimensions and attributes generating a multi-variate solution.

§         The system should have a self learning feature meaning that the end results should not be limited to a set of pre-defined pattern types, and require minimal user input.

§         It should be possible to automate the system.

§         The system should be able to filter out the noise from data probably using data filter techniques as Kalman filter [Ref. 4] techniques, and work with low quality data.

 

Expecting one single data mining application (also referred to as “system” above) to have all the features described above would be Utopic.  However, the application should strive for as many of these features as possible.

 

Having discussed the desired features of a data mining application, our next step is to shed some light on the common myths surrounding this technology.

 

 

Myths about Data Mining

 

Quite often data mining has been confused with OLAP, although both are required to make a sound business decision.  OLAP analysis does not discover patterns in the data but assists in making business decisions once the data patterns have been established.  Other misconceptions often encountered during client engagements are:

 

§         Data mining is a step after building a data warehouse.  Data  mining can be carried out on a database that does not have a warehouse-like design, is normalized in structure, and is not necessarily relational in nature.  It is possible to discover patterns in the data that is stored in flat files.

§         Data mining is an IT problem.  The end users of the system are business users and not the IT personnel.  The application should be fairly intuitive, and require minimal maintenance and support.  This can be achieved by having a strong training program.  A word of caution, however: some of the data mining tools are quite mathematically intensive and may require the user to have significant knowledge of statistics.

§         Data mining is a new technology.  Both the data warehousing and mining technologies are at least ten years old.  The reason that the world has seen their recent emergence is because of the availability of faster computers, inexpensive disk space, availability of expertise, and the robustness of various development tools.

§         Data mining gives the client understandable and actionable output .  The answer to this myth is  “it depends”.  If a business user, well versed in the operation of his or her business, also understands the concepts of mathematical theories and statistics, then the output of a data mining exercise can be considered operational.  However, unless the application is tied with an OLAP engine, the use of the tool may remain esoteric.

§         Data mining is risk-free, just implement the technology.  One should proceed with caution.  An intelligent tool is not necessarily a good tool as it may produce erroneous results.  For example, the output may suggest that women who are less than 5 feet in height buy more tomatoes than others do.  This result may not make sense, as the two attributes (customer related and product related) are non-sequitors.  However, a study reported in Wall Street Journal that the young and single fathers buy an alcoholic beverage when purchasing baby diapers is quite logical.

§         Data mining is always a great investment.  This depends on the expertise within the organization that deploys the technology.  Also, the data mining application must work with a line of business where data patterns exist.  If the database has a lot of attribute information but no factual information (a factless system) then the data mining application may not work.  Continuous dependence on outside sources for providing data mining expertise and poor data quality, may result in serious data integrity issues that may severely prohibit the use of this technology.  

§         Data mining application implementation costs a lot and does not really work.  If one has the technical know-how and can finish the project in a limited time period, then the cost of the project can be minimized.  Higher implementation costs while implementation may be due to other extraneous factors.  In order to make the technology workable and effective, its relevance to the business scenario, high data quality, and expertise within the company must be considered.  One principal reason behind the success of Wal-Mart is the successful deployment of a data warehouse and the product affinity analysis which led to an increase in the revenue of around 20 percent quarter after quarter [Ref. WSJ October 6, 1998]

 

Applications

 

Data mining technology has been deployed in industries with a significant customer focus.  These industries include general retail, health services, credit card industry, retail banking, and crime prevention services.  Thought analyses for two industries, viz., retail and health services have been discussed in this article.  Figure 2 shows a typical retail model with customer, product, geography, and time as the major dimensions.  Let us consider that the immediate goal is to develop the criteria for defining “best” customers, or in other words, profile our customers.  The available customer attributes stored in the warehouse are name, address, annual income, ethnicity, and occupation.  Transactional information may be stored as customer ID, product stock keeping unit (sku), store ID, sales amount, cost of transaction, and the quantity purchased.  There may be several transactions in the market basket of a given customer.  The customer related attributes may now be fed as input to the data mining application along with the factual information.  The user, if he/she so desires, may set the parameters of the algorithm used for data pattern recognition along with a statistical level of confidence.  The data mining tool after carrying out a series of complex analyses (that the user may be oblivious to) may suggest that customers with annual income greater than $50,000 who live within one mile of the store location, are bankers by profession, and are of Asian or European descent are the “best” customers at a confidence level of 90 percent.  If customer ethnicity is not an important criteria (per the policy of the organization) then a re-run of the above mentioned exercise may tweak the numbers on salary and distance to the store.  Some data mining tools can taken into account as many as 300+ variables.  The obvious question now is how did the tool decide how the customers should be evaluated.  The objective function may be pre-defined by the user or the tool may assume that “sales” should be used as the basis. 

 

Another example from the retail industry is product affinity reports.  There are some products that sell more than others, and have affinity with others.  Consider a large retail chain like Wal-Mart, which carries thousands of items.  It is difficult for a shopper to survey the entire store to find the items on his or her shopping list.  It would be helpful if the like products are placed in the same isle.  This can be easily achieved by looking for patterns in the data of items that are bought together (can be clustered), and the ones that are mutually exclusive.  A series of analyses may suggest that milk and bread, coffee and coffee-cakes, tissue paper and cold medicine are bought together [Ref. WSJ, October 6, 1998].  One may also find that a young, single parent buys beer with baby diapers.  It is also useful to know that cabbage and buffalo wings are rarely purchased together.  Product affinity reports not only help the business in arranging the items in the store but also in managing the inventory, and retaining the customers. Products that exhibit high affinity are not on sale at the same time (diapers and beer should not be on sale during the same week) leading to enhanced revenues.  Products with zero affinity can be put on sale during the same week.  A sale on products with negative affinity may lead to risk in product cannibalization (e.g., sale on Coca ColaTM may lead to decrease in Pepsi-ColaTM sales).  Data mining conducted by Wal-Mart is a good example of how to exploit the technique to sustain a high revenue growth quarter after quarter. 

 

Once we know who our best customers are, we can design campaigns to reward them.  One such strategy may be to give these customers Gold cards to indicate that they are our valued customers.  These gold cards may carry better discounts on select items.  This customer information can be used to design customer retention campaigns in case another chain opens a store in the vicinity.  Based on the product affinity reports and customer purchasing patterns, coupons on select products can be sent out instead of having a large-scale general mailing.  For example, customer data from Safeway’s Information Warehouse is segmented into hundred’s of customer characteristics that can be used to tailor individual mailings, analyze product performance, or forecast shopping patterns [Ref. 5].This will lead to significant reduction in mailing costs.  Since the information on the products that are selling is also available, promotions can be devised on behalf of the vendors leading to an additional source of revenue. 

 

Data mining techniques have also been used in the health services industry for fraud detection.  Some major dimensions describing the health service business are recipients, providers, category of service, specialty of the provider, diagnosis, geography, and claim type – inpatient or outpatient.  A provider specializes in certain categories of services, treating a given set of diseases, and prescribing certain treatments.  Patterns in the claim data that may run into several terabytes for a state run program like Medicaid may provide the characteristics of the providers that are most likely to file fraudulent claims.  Similar patterns can be analyzed for recipients.  Answers can also be found regarding why some recipients do not avail of the state run programs resulting in excess program budget.  Diagnosis / treatment affinity reports can lead to strategies that can reduce treatment costs. 

 

A crime prevention program may be interested in knowing the characteristics of the people who are most likely to develop criminal behavior.  Credit card companies would like to know who is most likely to default on their loans, which customers should be offered a low interest rate and what the interest rate should be.

 

Retail financial institutions can do householding, analyze branch sales, establish customer profiles, monitor credits, develop score cards, and design fraud and delinquency triggers.  Data from Bank of Montreal’s two-terabyte data warehouse is segmented into several categories such as loans, mortgage, credit cards, mutual funds, retail banking and online banking [Ref. 6].  The customers are segmented into homogeneous groups to retain customer loyalty and attract new customers.  The pilot study itself saved the bank 22.8 million US dollars.

 

Telecommunications industry customers can examine data patterns in the customer base and discover profitable segments in the population to effectively control churning arising due to short-term promotions.  Oracle’s DarwinTM was able to predict churn for a European cellular company with 70-80 percent accuracy [Ref. 7]

 

Insurance industry customers can also reduce marketing expenditures and develop prediction models to be more effective in selling the insurance products.  For example, Farmer Insurance Group found that covering a certain type of sports car was profitable if the owner had at least one other car [Ref. 8].  This will enable Farmers to price insurance on sports car more suitably and still make profits.

 

Auditors, accountants and professional service organizations such as ours can capitalize on business opportunities around the world, isolate anomalies, and confirm and dispute previous conditions.

 

Having discussed the basics of data mining, its desired features – business and technical, the misconceptions about the technology, and some applications of data mining, our next step is to relate its use to the eBusiness technology.

 

4.  Relevance to eBusiness: The Future

Business done over the Internet or World Wide Web is termed as eBusiness.  The focus of a business may be to serve customers (B2C) or other businesses (B2B).  Internet technology is a shift in the paradigm when compared to the operations of traditional businesses, suggesting that eBusiness is no longer optional but a necessity for survival.  eBusiness is based on client-server technology and operates in a thin client environment.  There is a front-end Internet browser like Netscape NavigatorTM and a back-end database supporting the OLTP architecture.  There may be a middle layer software, which may parse the data in the format that the traditional OLTP system can understand.

 

In the context of the retail model discussed earlier, a front-end user (typically a customer) may fill in the request for items that he or she wants to purchase.  The entered data passes thorough the middle layer software and is stored in the back-end OLTP system as a record.  The request is eventually processed and the ordered items are sent to the customers.  What is now of relevance in the context of this paper is how the data stored in the OLTP database and other associated systems can be mined to develop a marketing campaign in a thin client environment.

 

 

 



 


Figure 3: High level architecture of an Information Portal

 


 

 

Information Portals

 

It may be of little interest to the readers of this paper as to how a data-mining tool can be Web site enabled, albeit its importance cannot be undermined.  What may be of practical relevance is to discuss how the data mining technology can be used to propel the eBusiness further, and develop targeted campaigns that can be deployed over the Web site.  Not only should the information be retrievable over the Web site, it should also be possible to post it on the Web site.  Therefore, data mining in eBusiness can be discussed in the context of corporate information portals.  Figure 3 shows a high level architecture layout of an information portal.  In general, information portals allow the users to access corporate information using a Web browser over the company’s intranet, which may have links to external sites.  There are three main components of an information portal, viz., business information directory (BID), the subscription and publishing features, and a front-end information assistant.  Indexed business information, or in other words, the metadata, of an organization is stored in BID.  Metadata crawlers scan a group of servers for new business information.  The IT department then updates BID generally via flat files or a graphical user interface.  Not only is the business information contained in Web URLs, word processing documents, video images, database tables and catalogs indexed, the user queries used for analyses are also indexed.  If the user has the authority, these pre-defined queries can be executed over the Web.  BID stores the information available within the corporation as well as the one that is available over the Internet.

 

The subscription features facilitate the dissemination of information accessed through the portal.  A user can subscribe to the information so that it is delivered on a regular basis just like a magazine subscription.  Alternatively, the user can forward the information (unsolicited information) to other personnel of the company, or publish in the business information directory for indexing purposes.  This newly indexed information is then distributed to other users who express interest in related information.  

 

An information assistant, which is what the user directly interacts with, often works in conjunction with a search engine to retrieve and publish business information.  It is possible to configure and design the information assistant to suit the needs of different users.  The interface, for example, may resemble a desktop file folder or a Web site search engine interface.  In any case, the information assistant should have the drilling capabilities so that the information can be drilled up or down (along the hierarchy, e.g., from store to region to division in the geography dimension), and across (to a related a dimension, e.g., including time fields such as month on a report displaying sales by stores). 

 

Information portals should allow the administrators of the portal to define user and user group profiles.  The users, in turn, should be able to change their profile by using the information assistant.  The Web interface should be interactive in such a way that a diverse set of information including but not limited to word documents, spreadsheets, video images, databases, XML and HTML pages, and e-mails can be documented and published in the business information directory in an indexed manner.  Indexing of information should be automatic.  The publishing feature of the portal should facilitate the import and export of files.  XML is increasingly becoming an industry standard for storing and transferring information over the Web.  Figure 4 shows a sample high-level architecture for a corporation depicting the flow of business information.  At an enterprise level, a corporation may have several OLTP systems and ERP applications.  Information stored in these systems is mined and analyzed on a regular basis.  eBusiness decisions impacting the future of the corporation are regularly made.  Data collected as a result of these decisions is collected to make future decisions.  The system acts like a feedback loop, and data mining plays an essential role here.  Wider dissemination of information is achieved if these systems are accessible over the Web, and the information can also be posted back to the Web.  

 

 


 


 


Figure 4: An information flow diagram at enterprise level


 

 

Common Data Mining Techniques

 

 

This section describes the data mining techniques used in answering key business questions, and the main ones are then discussed in some detail.  Some of the commonly used techniques are:

 

Classification 

§         Who will buy, what will they buy and how much will they buy?

Segmentation

§         What are the different types of visitors to the Web site?

Association

§         What relationships exist between the visitors and products?

§         What hidden associations exist between various attributes like gender, age, products, time, domains?

Clustering

§         What are the groupings hidden in client Web data?

§         What product cross-selling patterns exist?

§         What distinct visitor’s trait groupings exist?

Visualization

§         What are the distributions and patterns in the Web data?

Optimization

§         How can we help the client maximize their online presence and sales?

 

Clustering involves natural grouping of users, pages, and other like items whereas associations are the grouping of items that are requested together.  These items need not be in the same group.  Clustering and association are exploratory methods of discovering previously unknown relationships.  The use of a neural network is one of the first types of analysis that can be done to search for discrete clusters in the data.  Using the neural net on the Web data may produce previously unknown relationships between visitor attributes like gender, age and income and the number of sales they make or the total amount of purchases at the site.  Once this is completed, a visual depiction of each cluster is constructed and then evaluated using a machine learning algorithm.  This explains the data further using descriptive rules that are easily understood by a marketer or other business professional.

 

Under cookie-based analysis cookies are transferred between the client and Web site locations and viewed pages are stored in the Web server logs.  By collecting a visitor's cookie every time a page is requested, say for a different product or service in the site, specific patterns or paths can be mined to determine what parts of the site are most popular.  The Web site owner can begin to profile customers and use this knowledge for additional analysis and marketing applications.  Cookies with server log and demographic and household data serve as the beginning input for classification, segmentation or any other data mining analysis.   Knowledge is also acquired about the order in which different pages and URLs are accessed.  This forms the basis of sequential analysis where the URLs are accessed in a certain logical order.  

 

A business deploying data mining technology can now use the techniques described to achieve the following:

 

§         build unique market segments identifying the attributes of high value prospects,

§         identify the key attributes of Web customers for each client product,

§         select promotional strategies that best reach the client’s Web customer segments

§         analyze online sales to improve targeting of the client’s high-value customers

§         test and determine which marketing activities have the greatest impact

§         identify client customers most likely to be interested in their new products

§         improve the site’s product cross-selling and up-selling

§         identify the best online prospects for the offered services

§         help understand the reasons for brand switching

§         improve the Web site advertisement and sales process

§         maximize the online advertisement click through rate

§         optimize the site arrangement of products and services

 

The ultimate goal of the business here is to attain true one-to-one [Ref. 9] marketing effectiveness based on observed behavior patterns rather than just demographics, log files, and other traditional methods.

 

 

Data Mining Examples in eBusiness

 

In this section we consider an example as to how data mining can be applied to an e-tail (retail over the Web site) business using Amazon.com as an example.  Amazon.com is a pioneer in the field of selling merchandise over the Web site.  A Web user will access the Web site site (http://www.amazon.com), to search for items that may fall under a variety of product groupings.  Say, the user buys books on XML, database, and data mining technology.  In addition, the user surfs the site for computer peripherals.  The user accesses the Amazon Web site by sending cookies to the remote server where they are saved and stored along with other cookies.  The Web site pages accessed by the user are recorded in a server log along with the IP address or the domain name of the user machine, search engine, and time the site was accessed. The Amazon Web site site is accessed in a similar manner by millions of users every month.  Amazon management can develop a logical model of their business, which can be described along the time, product, customer, and the search technology dimension, with number of hits, page accessed, sale dollars, cost of sale, and quantity purchased as the factual information.  The Web site users can be divided into two categories: existing and potential customers.  The data so collected can be entered into the data mining application.  Pattern recognition algorithms such as neural networks can search for patterns on product affinity, and may suggest that IT consultants, less than 30 years in age,  purchasing books on the latest computer technology generally buy computer peripherals, viz., DolbyTM speakers.  It may also be discovered that 53 percent of the customers spend more than 25 dollars in their purchase, whereas 2 percent spend more than thousand dollars.  Based on this information, Amazon marketing may devise different promotions for different user groups.  Next day shipping may be made free for purchases greater than 1000 dollars.  A free Web site designing software CD may be given to customers who buy books on latest computer technologies.  The purchasing behavior can also be stored in a CRM application, which can generate automatic emails sent out to valuable customers regarding targeted promotional offers. If the data-mining tool can detect any abnormalities in the shipment process or order system, then the information can be downloaded to the Supply Chain Management system (see Fig. 4).  Any data-mining related information including the URLs accessed, queries run, and searches made can be stored in the business information directory.  A business user can later access this information via the corporate information portal.

 

Data mining techniques can be applied to other businesses done over the Web site or Internet.  Consider a business-to-business scenario where a consortium of industry players meets over the Web to buy or sell products.  For example, Ford Motors may place a bid for car tires for their TaurusTM line of cars, and Michelin may set a price for these tires with a floor price (below which the product will not be sold).  In such a scenario, there may be several buyers such as General Motors, Chrysler etc., and several other sellers such as Bridgestone, Goodyear etc.  For a transaction to complete, the bid price of the buyer must match the ask price of the seller (similar to equity transactions).  These transactions are completed over the Internet without much human intervention. The transaction data will be stored in a back-end data base server, and the business user information may be stored in the server logs. The business hosting these auctions such as FreeMarket may be interested in discovering patterns behind the successful bids.  For example, it may be discovered that successful bids are characterized by large orders (greater than 5 million dollars) and large lead times for product delivery.  In addition, the products that do not require special processing are more easily sold.  A successful transaction has a low spread between the bid and ask price.  Most of the bids are executed within three days.  This new knowledge can now be used in two ways.  The host of the auction site can design promotions to encourage successful buyers and sellers to sign a long term contract to use the services of FreeMarket.  Depending on who the buyer or seller point of contact is, executive visits, mailing campaigns, or customer service calls may be placed to retain the customers.  Such a marketing campaign is all the more useful when there is new competition [Ref. 10] to the business, for example, CommerceOne, Oracle, or Ariba may host Web sites for auto-maker business.  Besides, analyzing the data patterns for self use, FreeMarket may develop strategies to construct successful bids and sell them to, say, Ford Motors, for a price resulting in an additional source of information.  Some of this information may also be placed on the intranet for various internal divisions to act upon.  Data mining will help FreeMarket in being proactive instead of being reactive.

 

A plethora of tools are available in the market to carry out data mining and publish the results on the Web site.  Some of these tools include Intelligent MinerTM for data and text by IBM, DarwinTM by Oracle Corporation, KnowledgeTM suite by Angoss, and Enterprise MinerTM by SAS.  Data in XML format can be published over the Web site or transported between applications using Discoverer 3iTM and Reports 3iTM developed by Oracle.

 

Sagent Corporation’s eBusiness analytical applications have been used to perform click-stream analysis for PeopleFirst.com, a customer service provider in the online auto-loan category.  Click-stream analysis has been performed in association with the Web site-based CRM application to track customer behavior, application submissions and loan approvals.  This has enabled PeopleFirst.com to create customized one-to-one marketing campaigns to target potential car buyers, identify online customer cross-sell and up-sell opportunities, and better service the clients [Ref. 11].  Similarly, drugstore.com, an online provider of health and beauty products, has used Sagent’s graphical data flow technology to analyze customer purchase transaction data [Ref. 12], and, thus, project revenues every quarter.  HomeGrocer.com has selected E.piphany’s E.4TM product to maintain and improve customer relationships for the new customer economy [Ref. 13].  E.4TM system is an integrated suite of software with data mining facilities to assist companies profile and analyze individual customer characteristics and preferences.  This information is then leveraged to drive marketing campaigns and individually tailored products and services. Using data mining on a CRM application, a business can develop rules regarding customer interactions.  These business rules can help in targeting market campaigns by modeling the likelihood of responses to promotions.  Customer acquisitions, customer defection or attrition, and reduction of loss in revenue through fraud investigations can be modeled using a data-mining tool.  E-Chemicals is exploring the use of IBM’s Intelligent MinerTM so that its marketing team can customize promotions [Ref. 14].  Angoss, a Canada based data-mining firm, has used its products (KnowledgeSTUDIOTM and Knowledge WebMinerTM) against a variety of applications including CRM to determine customer behavior.

 

5.  Summary

Data mining technology is mature, robust and has proven its value in the marketplace.  In the burgeoning market, where new ideas are being introduced at a rapid rate, the companies are finding themselves in a more competitive environment than before.  The focus has shifted from product based strategies to customer based strategies.  The motto is “Know Your Customer”.  This has paved way for one-to-one marketing.  Data mining has shown promise in the development of personalized direct marketing strategies.  Development of such strategies in the eBusiness environment is becoming all the more important since a large number of people spend a considerable amount of time over the Internet.  The surfing patterns and purchasing behavior of web users can be mined upon to segment customers, develop brand loyalty, enhance online advertisement revenues and minimize customer attrition.

 

6.  References

  1. Parks, R., Levine, D., and Long, D. (eds.), Fundamentals of Neural Network Modeling Neuropsychology and Cognitive Neuroscience,  MIT Press, Cambridge MA (1998).
  2. Young, R., Wavelet Theory and its Applications, Kluwer Academic Publications, Boston MA (1992).
  3. What Should You Know About Data Mining, Information Discovery, Inc. Web site @ http://www.datamining.com.
  4. Kalman, R, Trans. ASME, J. Basic Engng, 82:35 (1960).
  5. Case Study: IBM’s Intelligent Miner Helps Safeway Focus on the Individual Shopper, http://www2.software.ibm.com/casestudies/.
  6. Case Study: Bank of Montreal’s Data Warehouse Yields Knowledge Worth Millions, http://www2.software.ibm.com/casestudies/.
  7. Reducing Customer Churn, http://www.oracle.com/datawarehouse/products/datamining/case-studies/index.html.
  8. Case Study: Driving profitability Farmers Insurance Group Teams with IBM to Gain Auto Coverage Insights, http://www2.software.ibm.com/casestudies/.
  9. Peppers, D. and Rogers, M., and Dorf, B., The One to One Fieldbook: The Complete Toolkit for Implementing a 1to1 Marketing Program, http://www.1to1.com/tools/books/fieldbook.html
  10. Porter, M., Competitive Strategy:  Techniques for Analyzing Industries and  Competitors, Free  Press (1980).
  11. http://www.sagent.com/news/releases/pr122099.html
  12. http://www.sagent.com/news/releases/pr1100499.html
  13. http://biz.yahoo.com/prnews/000107/ca_e_pipha_1.html
  14. Case Study: e-Chemicals Launches Chemical Industry’s First Online Store, http://www2.software.ibm.com/casestudies/.

 

 

 

 

 

 

 

 

 

 

 

 

Copyright (c) 2006 Impact Solutions LLC - All Rights Reserved

 

 

 


Oracle is a registered trademark of Oracle Corporation. DB2 is a registered trademark of IBM Corporation.

© 2006  Impact Solutions LLC
All Rights Reserved.

About Us | Products | Services | Downloads | Buy Now | Clients | Testimonials | Careers | Contact Us
Guggi for Oracle:: Download | Features | Screenshots | FAQ | Change Log | User Guide