Data mining is an emerging technology aimed at
discovering patterns in the underlying historical data.A new dimension has been added to data mining by extending this
technique to the realm of eBusiness.The
mining of Web site and other transactional data using data mining techniques
and tools is an attempt to recognize, anticipate and learn the buying habits
and preferences of customers in the new economy.Data mining will be a critical process impacting our client’s
long-term eBusiness success, where failure to quickly react, adapt, and evolve
can translate into customer attrition in the click of a mouse.This paper discusses the fundamentals of data mining, the desired
characteristics of a data mining installation, the myths surrounding the
technology, and its application to eBusiness.Techniques for bringing the best opportunity for knowledge discovery,
industries that can derive the most significant competitive advantage from the
application of these technologies and how these technologies will impact the
new economy are also discussed.
Key words: data
warehouse, data mining, pattern recognition, e-business, Internet, information
portal, clustering, cookies, ICS
The advent of fast
computers, significant developments in the database software, and the need for
making business decisions in real time have paved way for the data mining
technology.A data warehouse can
be seen as a technology originating out of the amalgamation of strategy
consulting and information technology (IT).In this new economy, wherein business change is accelerated, it is
imperative that business decisions be made in real time.Generally, corporations have large amounts of data and a multitude of
legacy systems.The data is not
necessarily organized in a way that meaningful analysis can be carried out
without much data reorganization and manipulation.
Besides carrying out the standard data analysis such as
weekly or monthly sales by customer or product, it is sometimes necessary to
develop business strategies in which a certain segment of the market is
targeted.Development of this
marketing strategy requires understanding the data pattern regarding the
segment being targeted.This
understanding is most efficiently achievable through the use of the data
mining technology.While a data
warehouse is a source of data, data mining is the technology that operates on
it.Data mining utilizes the data pattern recognition algorithms
such as neural networks [Ref. 1], probability theory, chaos theory, wavelet
theory [Ref. 2] etc. to name a few.These
entities take the historical data as the input and output the conditions that
most likely will generate the desired marketing results at a certain level of
confidence.Marketing
strategies are developed and implemented using the traditional channels such
as regular mail and magazine.However,
in this age of ever changing technology it is imperative that the strategy be
cost-effective and its impact be immediate and widespread.This has been one of the driving forces behind the emergence and wide
acceptance of the Internet technology.In
other words, a desired technology solution is the one that can result in a
significant reduction in cost, increase in revenue, and better understanding
of the business.
The paper discusses the basics of data mining, the
desired features in a data mining application and myths surrounding the
technology.The applications of
the technology to various industries and businesses done over the Internet
have been discussed.The
relevance of data mining to eBusiness in profiling customers and developing
direct and personalized marketing campaigns, and web empowering the
corporations by using information portals (for the purpose of wider
information dissemination) have also been discussed.
Data mining is a sophisticated form of decision support
that utilizes sheer computing power and software applications to identify
previously unrecognized patterns in large data sets or data stores. Moreover,
data mining is the iterative process
of discovering actionable and meaningful patterns, profiles and trends that
can enable our clients to be more competitive, more productive and more
efficient.In some cases, the
implementation of data mining applications has unearthed millions of dollars
in savings and new revenue opportunities.Byte Magazine reported that
some companies have reaped returns on investment of as much as 1,000 times
their initial investment on a single project.For a number of years, artificial intelligence in the form of data
mining has been in use by:
§Cellular phone companies, to stop
customer attrition
§Financial services firms, for portfolio and risk management
§Credit card companies, to detect fraud and set pricing
§Mail catalogers, to life their response rates
§Retailers, for market basket analysis
Figure 1: Data
mining technology in a business intelligence process.
Why mine data?Simply,
data has become too large and complex to be examined by humans.In fact, some experts believe that mankind has captured more data in
the past 15 years than in all previously recorded history.The voluminous amounts of data generated from a Web site, for example,
often hide patterns that reveal conditions when visitors are likely to make
purchases or click on certain ads or banners.Given that our clients are starting to operate at Internet speed, it is
no longer appropriate to broadcast a marketing message to one and all without
expecting a diminishing return.Our
client’s customers want targeted, relevant information that speaks directly
to them.And in our target
industries, especially in the super competitive banking, insurance and
telecommunications segments, most of our clients want the most profitable
customers they can find. How do
you identify these customers?How
can you convert less profitable customers to more profitable ones?The answers are buried in the data, untapped.In fact, Gartner Group estimates that only 2 percent of the existing
online data is currently being analyzed, a percentage which is rapidly falling
towards zero as storage more than doubles each year.Most online companies have not begun to exploit their
customer data, which in the new economy, is an intangible asset worth far more
than most physical assets today.
Figure 1 shows the integration of data mining technology
in a business intelligence process.In
eBusiness, a customer accesses the company’s Web site using an Internet
browser.Cookies and data are
sent between user machine and the Web server, and are subsequently stored in
the server log files.Information
on the user machine such as domain name or the IP address, pages accessed, and
the order in which the pages were accessed is also stored.If the customer places an order, the information is recorded in a
back-end relational database (Online Transaction Processing - OLTP).The business users may analyze the data stored in the OLTP or data
warehouse databases as a part of their daily routine.Since, pure Online Analytical Processing (OLAP) is not enough for
understanding the patterns in the data, other techniques such data mining may
be deployed.This may involve
developing a model, building a sample data set by collecting the information
from server logs and other corporate databases, and validating the model.Once the model is validated, a full-fledged data mining
approach can be used to discover the patterns in the underlying data.The findings can be delivered, for example, by using information
portals (discussed later in the paper), or in form of reports.Finally, the discoveries can be integrated in the form of a solution,
which may be a marketing strategy to retain customers or fix any anomalies in
the ordering system.In summary,
the entire process involves extraction of actionable intelligence for product
and financial analysis and development of a marketing strategy with the view
that customers are the most important assets that companies have.
The term“data
mining” is often confused with OLAP.In
OLAP, the user can carry out the what-if analysis, make forecasts based on the
historical data, run weekly or monthly reports and obtain the results within
seconds.OLAP analysis, while
hints at, say, customer behavior, generally does not provide the benchmarks to
segment the customers.Data
mining provides these benchmarks.Data
mining is data driven and not user driven as in OLAP.The current generation of data mining systems enables a user to launch
a search process without actually knowing the answer to the query beforehand.Figure 2 shows the graphical representation of a retail business with
standard dimensions such as customer, product, geography, and time.The primary factual information captured is sales dollars, but may also
include costs, discounts, and quantity sold.A typical OLAP analysis may help in identifying the top 10 percent of
the
customers by sales.It, however, will not suggest whether the best customers are identified
as top 10 percent or top 15 percent of the customer base.The OLAP analysis becomes very complex as the number of attributes
describing a customer increases dramatically.An efficient data mining tool should be able to take into account the
entire set of customer related attributes, and develop the benchmarks for the
ones that describe the purchasing behavior of the customers at certain level
of confidence.A marketing
strategy can then be devised around these benchmarks.The general idea behind data mining is to use pattern recognition
algorithms such as neural networks theory, wavelet theory, probability theory
and likelihood theories to name a few.Objective
functions may be defined with sales as the dependent variable and the customer
or product attributes as the independent variable as described by Eq. 1.
max FSales
(customer attributes: age, annual
income, gender, product type etc.)…………1
Figure 2: Graphical representation of a retail business.
where, Fsales is an objective function
described here as a function of several attributes.The nature of the objective function may depend on business operations.A data-mining tool will analyze the patterns of customers who
contribute to maximum sales and link them to their attributes and the products
that they purchase.It is not
necessary that a data warehouse-like
database design be available to carry out data mining,
but if one is available will help reduce the analysis time.Figure 1 shows the layout of a typical data mining scenario.
The implementation of a data mining application requires
business justification besides conducting a technical feasibility study [Ref.
3].Although the implementation
of the application requires significant IT investment, the ultimate
beneficiaries are the business users.A
data mining application should have the following eight business related
characteristics:
1.The system and the results generated should be intuitive and not
require much statistical knowledge or the over-dependence on the IT
department.
2.The system should be
capable of carrying out the what-if analysis so that the follow up questions
can be answered in real time.
3.The system should be able to support multiple users and cater to
different business needs (slicing and dicing).
4.The results generated should be accurate, consistent and must be
presented at a certain level of confidence.
5.The system should be growable and should be able to handle incremental
loads.
6.The system will be credulent if it can automatically detect the
patterns in the data as it is loaded, and make the user aware of any major
trends.
7.The system should not be limited to a particular class of databases,
and be easily integrated with other business wide applications.
8.The system should not require much maintenance.
From a technical perspective:
§The system operating in a client-server architecture mode should
have minimum dependence on the client as otherwise application may be severely
limited when it comes to very large databases (VLDB).
§The system should be Web based for a global reach.
§If the system generates a series of flat files (instead of
working directly against the RDBMS), then a minimum clean is required.
§The system should have the capabilities of both the parallel and
sequential task processing.
§The system should not require a data warehouse like database
design, and should work against a normalized OLTP design (this may require
large processing times).
§The system should be able to handle multiple dimensions and
attributes generating a multi-variate solution.
§The system should have a self learning feature meaning that the
end results should not be limited to a set of pre-defined pattern types, and
require minimal user input.
§It should be possible to automate the system.
§The system should be able to filter out the noise from data
probably using data filter techniques as Kalman filter [Ref. 4] techniques,
and work with low quality data.
Expecting one single data mining application (also
referred to as “system” above) to have all the features described above
would be Utopic.However, the
application should strive for as many of these features as possible.
Having discussed the desired features of a data mining
application, our next step is to shed some light on the common myths
surrounding this technology.
Quite often data mining has been confused with OLAP,
although both are required to make a sound business decision.OLAP analysis does not discover patterns in the data but assists in
making business decisions once the data patterns have been established.Other misconceptions often encountered during client engagements are:
§Data
mining is
a step after building a data warehouse.Datamining can be carried
out on a database that does not have a warehouse-like design, is normalized in
structure, and is not necessarily relational in nature.It is possible to discover patterns in the data that is stored in flat
files.
§Data
mining is an IT problem.The end users of the system are business users and not the IT
personnel.The application should
be fairly intuitive, and require minimal maintenance and support.This can be achieved by having a strong training program.A word of caution, however: some of the data mining tools are quite
mathematically intensive and may require the user to have significant
knowledge of statistics.
§Data
mining is a new technology.Both the data warehousing and mining technologies are at least ten
years old.The reason that the
world has seen their recent emergence is because of the availability of faster
computers, inexpensive disk space, availability of expertise, and the
robustness of various development tools.
§Data
mining gives the client understandable and actionable output .The answer to this myth is“it
depends”.If a business user,
well versed in the operation of his or her business, also understands the
concepts of mathematical theories and statistics, then the output of a data
mining exercise can be considered operational.However, unless the application is tied with an OLAP engine, the use of
the tool may remain esoteric.
§Data
mining is risk-free, just implement the technology.One should proceed with caution.An
intelligent tool is not necessarily a good tool as it may produce erroneous
results.For example, the output
may suggest that women who are less than 5 feet in height buy more tomatoes
than others do.This result may
not make sense, as the two attributes (customer related and product related)
are non-sequitors.However, a study reported in Wall
Street Journal that the young and single fathers buy an alcoholic beverage
when purchasing baby diapers is quite logical.
§Data
mining is always a great investment.This depends on the expertise within the organization that deploys the
technology.Also, the data mining
application must work with a line of business where data patterns exist.If the database has a lot of attribute information but no factual
information (a factless system) then the data mining application may not work.Continuous dependence on outside sources for providing data mining
expertise and poor data quality, may result in serious data integrity issues
that may severely prohibit the use of this technology.
§Data
mining application implementation costs a lot and does not really work.If one has the technical know-how and can finish the project in a
limited time period, then the cost of the project can be minimized.Higher implementation costs while implementation may be due to other
extraneous factors.In order to
make the technology workable and effective, its relevance to the business
scenario, high data quality, and expertise within the company must be
considered.One principal reason behind the success of Wal-Mart is the
successful deployment of a data warehouse and the product affinity analysis
which led to an increase in the revenue of around 20 percent quarter after
quarter [Ref. WSJ October 6, 1998]
Data mining technology has been deployed in industries
with a significant customer focus.These
industries include general retail, health services, credit card industry,
retail banking, and crime prevention services.Thought analyses for two industries, viz.,
retail and health services have been discussed in this article.Figure 2 shows a typical retail model with customer, product,
geography, and time as the major dimensions.Let us consider that the immediate goal is to develop the criteria for
defining “best” customers, or in other words, profile our customers.The available customer attributes stored in the warehouse are name,
address, annual income, ethnicity, and occupation.Transactional information may be stored as customer ID, product stock
keeping unit (sku), store ID, sales
amount, cost of transaction, and the quantity purchased.There may be several transactions in the market basket of a given
customer.The customer related
attributes may now be fed as input to the data mining application along with
the factual information.The
user, if he/she so desires, may set the parameters of the algorithm used for
data pattern recognition along with a statistical level of confidence.The data mining tool after carrying out a series of complex analyses
(that the user may be oblivious to) may suggest that customers with annual
income greater than $50,000 who live within one mile of the store location,
are bankers by profession, and are of Asian or European descent are the
“best” customers at a confidence level of 90 percent.If customer ethnicity is not an important criteria (per the policy of
the organization) then a re-run of the above mentioned exercise may tweak the
numbers on salary and distance to the store.Some data mining tools can taken into account as many as 300+
variables.The obvious question
now is how did the tool decide how the customers should be evaluated.The objective function may be pre-defined by the user or the tool may
assume that “sales” should be used as the basis.
Another example from the retail industry is product
affinity reports.There are some
products that sell more than others, and have affinity with others.Consider a large retail chain like Wal-Mart, which carries
thousands of items.It is
difficult for a shopper to survey the entire store to find the items on his or
her shopping list.It would be
helpful if the like products are
placed in the same isle.This can
be easily achieved by looking for patterns in the data of items that are
bought together (can be clustered), and the ones that are mutually exclusive.A series of analyses may suggest that milk and bread, coffee
and coffee-cakes, tissue paper and cold medicine are bought together [Ref.
WSJ, October 6, 1998].One may
also find that a young, single parent buys beer with baby diapers.It is also useful to know that cabbage and buffalo wings are rarely
purchased together.Product
affinity reports not only help the business in arranging the items in the
store but also in managing the inventory, and retaining the customers.
Products that exhibit high affinity are not on sale at the same time (diapers
and beer should not be on sale during the same week) leading to enhanced
revenues.Products with zero
affinity can be put on sale during the same week.A sale on products with negative affinity may lead to risk in
product cannibalization (e.g., sale
on Coca ColaTM may lead to decrease in Pepsi-ColaTM
sales).Data mining conducted by
Wal-Mart is a good example of how to exploit the technique to sustain a high
revenue growth quarter after quarter.
Once we know who our best customers are, we can design
campaigns to reward them.One
such strategy may be to give these customers Gold
cards to indicate that they are our valued customers.These gold cards may carry better discounts on select items.This customer information can be used to design customer retention
campaigns in case another chain opens a store in the vicinity.Based on the product affinity reports and customer purchasing patterns,
coupons on select products can be sent out instead of having a large-scale
general mailing.For example,
customer data from Safeway’s Information Warehouse is segmented into
hundred’s of customer characteristics that can be used to tailor individual
mailings, analyze product performance, or forecast shopping patterns [Ref.
5].This will lead to significant reduction in mailing costs.Since the information on the products that are selling is also
available, promotions can be devised on behalf of the vendors leading to an
additional source of revenue.
Data mining techniques have also been used in the health
services industry for fraud detection.Some
major dimensions describing the health service business are recipients,
providers, category of service, specialty of the provider, diagnosis,
geography, and claim type – inpatient or outpatient.A provider specializes in certain categories of services, treating a
given set of diseases, and prescribing certain treatments.Patterns in the claim data that may run into several terabytes for a
state run program like Medicaid may provide the characteristics of the
providers that are most likely to file fraudulent claims.Similar patterns can be analyzed for recipients.Answers can also be found regarding why some recipients do not avail of
the state run programs resulting in excess program budget.Diagnosis / treatment affinity reports can lead to strategies that can
reduce treatment costs.
A crime prevention program may be interested in knowing
the characteristics of the people who are most likely to develop criminal
behavior.Credit card companies
would like to know who is most likely to default on their loans, which
customers should be offered a low interest rate and what the interest rate
should be.
Retail financial institutions can do householding,
analyze branch sales, establish customer profiles, monitor credits, develop
score cards, and design fraud and delinquency triggers.Data from Bank of Montreal’s two-terabyte data warehouse is
segmented into several categories such as loans, mortgage, credit cards,
mutual funds, retail banking and online banking [Ref. 6].The customers are segmented into homogeneous groups to retain
customer loyalty and attract new customers.The pilot study itself saved the bank 22.8 million US dollars.
Telecommunications industry customers can examine data
patterns in the customer base and discover profitable segments in the
population to effectively control churning arising due to short-term
promotions.Oracle’s DarwinTM
was able to predict churn for a European cellular company with 70-80 percent
accuracy [Ref. 7]
Insurance industry customers can also reduce marketing
expenditures and develop prediction models to be more effective in selling the
insurance products.For example,
Farmer Insurance Group found that covering a certain type of sports car was
profitable if the owner had at least one other car [Ref. 8].This will enable Farmers to price insurance on sports car
more suitably and still make profits.
Auditors, accountants and professional service
organizations such as ours can capitalize on business opportunities around the
world, isolate anomalies, and confirm and dispute previous conditions.
Having discussed the basics of data mining, its desired
features – business and technical, the misconceptions about the technology,
and some applications of data mining, our next step is to relate its use to
the eBusiness technology.
Business done over the Internet or World Wide Web is
termed as eBusiness.The focus of
a business may be to serve customers (B2C) or other businesses (B2B).Internet technology is a shift in the paradigm when compared to the
operations of traditional businesses, suggesting that eBusiness is no longer
optional but a necessity for survival.eBusiness
is based on client-server technology and operates in a thin client
environment.There is a front-end Internet browser like Netscape NavigatorTM
and a back-end database supporting the OLTP architecture.There may be a middle layer software, which may parse the data in the
format that the traditional OLTP system can understand.
In the context of the retail model discussed earlier, a
front-end user (typically a customer) may fill in the request for items that
he or she wants to purchase.The
entered data passes thorough the middle layer software and is stored in the
back-end OLTP system as a record.The
request is eventually processed and the ordered items are sent to the
customers.What is now of
relevance in the context of this paper is how the data stored in the OLTP
database and other associated systems can be mined to develop a marketing
campaign in a thin client environment.
Figure 3: High
level architecture of an Information Portal
It may be of little interest to the readers of this paper
as to how a data-mining tool can be Web site enabled, albeit its importance
cannot be undermined.What may be
of practical relevance is to discuss how the data mining technology can be
used to propel the eBusiness further, and develop targeted campaigns that can
be deployed over the Web site.Not
only should the information be retrievable over the Web site, it should also
be possible to post it on the Web site.Therefore,
data mining in eBusiness can be discussed in the context of corporate
information portals.Figure 3
shows a high level architecture layout of an information portal.In general, information portals allow the users to access corporate
information using a Web browser over the company’s intranet, which may have
links to external sites.There are three main components of an information portal, viz.,
business information directory (BID), the subscription and publishing
features, and a front-end information assistant.Indexed business information, or in other words, the metadata, of an
organization is stored in BID.Metadata
crawlers scan a group of servers for new business information.The IT department then updates BID generally via flat files or a graphical user interface.Not only is the business information contained in Web URLs, word
processing documents, video images, database tables and catalogs indexed, the
user queries used for analyses are also indexed.If the user has the authority, these pre-defined queries can be
executed over the Web.BID stores the information available within the corporation
as well as the one that is available over the Internet.
The subscription features facilitate the dissemination of
information accessed through the portal.A user can subscribe to the information so that it is delivered on a
regular basis just like a magazine subscription.Alternatively, the user can forward the information
(unsolicited information) to other personnel of the company, or publish in the
business information directory for indexing purposes.This newly indexed information is then distributed to other users who
express interest in related information.
An information assistant, which is what the user directly
interacts with, often works in conjunction with a search engine to retrieve
and publish business information.It
is possible to configure and design the information assistant to suit the
needs of different users.The
interface, for example, may resemble a desktop file folder or a Web site
search engine interface.In any
case, the information assistant should have the drilling capabilities so that
the information can be drilled up or down (along the hierarchy, e.g.,
from store to region to division in the geography dimension), and across
(to a related a dimension, e.g.,
including time fields such as month on a report displaying sales by stores).
Information portals
should allow the administrators of the portal to define user and user group
profiles.The users, in turn,
should be able to change their profile by using the information assistant.The Web interface should be interactive in such a way that a diverse
set of information including but not limited to word documents, spreadsheets,
video images, databases, XML and HTML pages, and e-mails can be documented and
published in the business information directory in an indexed manner.Indexing of information should be automatic.The publishing feature of the portal should facilitate the import and
export of files.XML is
increasingly becoming an industry standard for storing and transferring
information over the Web.Figure
4 shows a sample high-level architecture for a corporation depicting the flow
of business information.At an
enterprise level, a corporation may have several OLTP systems and ERP
applications.Information stored in these systems is mined and analyzed on
a regular basis.eBusiness
decisions impacting the future of the corporation are regularly made.Data collected as a result of these decisions is collected to make
future decisions.The system acts like a feedback loop, and data mining plays
an essential role here.Wider
dissemination of information is achieved if these systems are accessible over
the Web, and the information can also be posted back to the Web.
Figure 4: An
information flow diagram at enterprise level
This section describes the
data mining techniques used in answering key business questions, and the main
ones are then discussed in some detail.Some
of the commonly used techniques are:
Classification
§Who will buy, what will they buy
and how much will they buy?
Segmentation
§What are the different types of
visitors to the Web site?
Association
§What relationships exist between
the visitors and products?
§What hidden associations exist
between various attributes like gender, age, products, time, domains?
Clustering
§What are the groupings hidden in
client Web data?
§What product cross-selling
patterns exist?
§What distinct visitor’s trait
groupings exist?
Visualization
§What are the distributions and
patterns in the Web data?
Optimization
§How can we help the client
maximize their online presence and sales?
Clustering involves natural
grouping of users, pages, and other like items whereas associations are the
grouping of items that are requested together.These items need not be in the same group.Clustering
and association are exploratory methods of discovering previously unknown
relationships.The use of a
neural network is one of the first types of analysis that can be done to
search for discrete clusters in the data.Using the neural net on the Web data may produce previously unknown
relationships between visitor attributes like gender, age and income and the
number of sales they make or the total amount of purchases at the site. Once
this is completed, a visual depiction of each cluster is constructed and then
evaluated using a machine learning algorithm.This explains the data further using descriptive rules that are easily
understood by a marketer or other business professional.
Under cookie-based analysis cookies are transferred between the client and
Web site locations and viewed pages are stored in the Web server logs.By collecting a visitor's cookie every time a page is requested, say
for a different product or service in the site, specific patterns or paths can
be mined to determine what parts of the site are most popular.The Web site owner can begin to profile customers and use this
knowledge for additional analysis and marketing applications.Cookies with server log and demographic and household data serve as the
beginning input for classification, segmentation or any other data mining
analysis.Knowledge is also
acquired about the order in which different pages and URLs are accessed.This forms the basis of sequential
analysis where the URLs are accessed in a certain logical order.
A business deploying
data mining technology can now use the techniques described to achieve the
following:
§build unique market segments
identifying the attributes of high value prospects,
§identify the key attributes of
Web customers for each client product,
§select promotional strategies
that best reach the client’s Web customer segments
§analyze online sales to improve
targeting of the client’s high-value customers
§test and determine which
marketing activities have the greatest impact
§identify client customers most
likely to be interested in their new products
§improve the site’s product
cross-selling and up-selling
§identify the best online
prospects for the offered services
§help understand the reasons for
brand switching
§improve the Web site
advertisement and sales process
§maximize the online advertisement
click through rate
§optimize the site arrangement of
products and services
The ultimate goal of the
business here is to attain true one-to-one [Ref. 9] marketing effectiveness
based on observed behavior patterns rather than just demographics, log files,
and other traditional methods.
In this section we consider an example as to how data
mining can be applied to an e-tail (retail over the Web site) business using
Amazon.com as an example.Amazon.com
is a pioneer in the field of selling merchandise over the Web site.A Web user will access the Web site site (http://www.amazon.com),
to search for items that may fall under a variety of product groupings.Say, the user buys books on XML, database, and data mining technology.In addition, the user surfs the site for computer peripherals.The user accesses the Amazon Web site by sending cookies to
the remote server where they are saved and stored along with other cookies.The Web site pages accessed by the user are recorded in a server log
along with the IP address or the domain name of the user machine, search
engine, and time the site was accessed. The Amazon Web site site is accessed
in a similar manner by millions of users every month.Amazon management can develop a logical model of their business, which
can be described along the time, product, customer, and the search technology
dimension, with number of hits, page accessed, sale dollars, cost of sale, and
quantity purchased as the factual information.The Web site users can be divided into two categories: existing and
potential customers.The data so
collected can be entered into the data mining application.Pattern recognition algorithms such as neural networks can search for
patterns on product affinity, and may suggest that IT consultants, less than
30 years in age,purchasing books
on the latest computer technology generally buy computer peripherals, viz., DolbyTM speakers.It may also be discovered that 53 percent of the customers spend more
than 25 dollars in their purchase, whereas 2 percent spend more than thousand
dollars.Based on this
information, Amazon marketing may devise different promotions for different
user groups.Next day shipping
may be made free for purchases greater than 1000 dollars.A free Web site designing software CD may be given to customers who buy
books on latest computer technologies.The
purchasing behavior can also be stored in a CRM application, which can
generate automatic emails sent out to valuable customers regarding targeted
promotional offers. If the data-mining tool can detect any abnormalities in
the shipment process or order system, then the information can be downloaded
to the Supply Chain Management system (see Fig. 4).Any data-mining related information including the URLs accessed,
queries run, and searches made can be stored in the business information
directory.A business user can
later access this information via the corporate information portal.
Data mining techniques can be applied to other businesses
done over the Web site or Internet.Consider
a business-to-business scenario where a consortium of industry players meets
over the Web to buy or sell products.For example, Ford Motors may place a bid for car tires for
their TaurusTM line of cars, and Michelin may set a price for these
tires with a floor price (below which the product will not be sold).In such a scenario, there may be several buyers such as General Motors,
Chrysler etc., and several other sellers such as Bridgestone, Goodyear etc.For a transaction to complete, the bid price of the buyer must match
the ask price of the seller (similar to equity transactions).These transactions are completed over the Internet without much human
intervention. The transaction data will be stored in a back-end data base
server, and the business user information may be stored in the server logs.
The business hosting these auctions such as FreeMarket may be interested in
discovering patterns behind the successful bids.For example, it may be discovered that successful bids are
characterized by large orders (greater than 5 million dollars) and large lead
times for product delivery.In
addition, the products that do not require special processing are more easily
sold.A successful transaction
has a low spread between the bid and ask price.Most of the bids are executed within three days.This new knowledge can now be used in two ways.The host of the auction site can design promotions to encourage
successful buyers and sellers to sign a long term contract to use the services
of FreeMarket.Depending on who
the buyer or seller point of contact is, executive visits, mailing campaigns,
or customer service calls may be placed to retain the customers.Such a marketing campaign is all the more useful when there is new
competition [Ref. 10] to the business, for example, CommerceOne, Oracle, or
Ariba may host Web sites for auto-maker business.Besides, analyzing the data patterns for self use, FreeMarket may
develop strategies to construct successful bids and sell them to, say, Ford
Motors, for a price resulting in an additional source of information.Some of this information may also be placed on the intranet for various
internal divisions to act upon.Data
mining will help FreeMarket in being proactive instead of being reactive.
A plethora of tools are available in the market to carry
out data mining and publish the results on the Web site.Some of these tools include Intelligent MinerTM
for data and text by IBM, DarwinTM by Oracle Corporation, KnowledgeTM
suite by Angoss, and Enterprise MinerTM by SAS.Data in XML format can be published over the Web site or transported
between applications using Discoverer 3iTM and Reports 3iTM
developed by Oracle.
Sagent Corporation’s eBusiness analytical applications
have been used to perform click-stream analysis for PeopleFirst.com, a
customer service provider in the online auto-loan category.Click-stream analysis has been performed in association with
the Web site-based CRM application to track customer behavior, application
submissions and loan approvals.This
has enabled PeopleFirst.com to create customized one-to-one marketing
campaigns to target potential car buyers, identify online customer cross-sell
and up-sell opportunities, and better service the clients [Ref. 11].Similarly, drugstore.com, an online provider of health and beauty
products, has used Sagent’s graphical data flow technology to analyze
customer purchase transaction data [Ref. 12], and, thus, project revenues
every quarter.HomeGrocer.com has
selected E.piphany’s E.4TM product to maintain and improve
customer relationships for the new customer economy [Ref. 13].E.4TM system is an integrated suite of software with data
mining facilities to assist companies profile and analyze individual customer
characteristics and preferences.This
information is then leveraged to drive marketing campaigns and individually
tailored products and services.Using data mining on a CRM
application, a business can develop rules regarding customer interactions.These business rules can help in targeting market campaigns by modeling
the likelihood of responses to promotions.Customer acquisitions, customer defection or attrition, and reduction
of loss in revenue through fraud investigations can be modeled using a
data-mining tool.E-Chemicals is
exploring the use of IBM’s Intelligent MinerTM so that its
marketing team can customize promotions [Ref. 14].Angoss, a Canada based data-mining firm, has used its products (KnowledgeSTUDIOTM
and Knowledge WebMinerTM) against a variety of applications
including CRM to determine customer behavior.
Data mining technology is mature, robust and has proven
its value in the marketplace.In
the burgeoning market, where new ideas are being introduced at a rapid rate,
the companies are finding themselves in a more competitive environment than
before.The focus has shifted
from product based strategies to customer based strategies.The motto is “Know Your Customer”.This has paved way for one-to-one marketing.Data mining has shown promise in the development of personalized direct
marketing strategies.Development
of such strategies in the eBusiness environment is becoming all the more
important since a large number of people spend a considerable amount of time
over the Internet.The surfing
patterns and purchasing behavior of web users can be mined upon to segment
customers, develop brand loyalty, enhance online advertisement revenues and
minimize customer attrition.
Parks,
R., Levine, D., and Long, D. (eds.), Fundamentals
of Neural Network Modeling Neuropsychology and Cognitive Neuroscience, MIT
Press, Cambridge MA (1998).
Young,
R., Wavelet Theory and its
Applications, Kluwer Academic Publications, Boston MA (1992).
What
Should You Know About Data Mining, Information Discovery, Inc. Web
site @ http://www.datamining.com.
Kalman,
R, Trans. ASME, J. Basic Engng, 82:35 (1960).