Avertise here
Visit our sponsor

St@tServ Data Mining page

A definition    Books    Conferences    Journals
   Links    Mailing lists    Reports    Software


What is Data Mining ?

« Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. » (Gartner Group).

« Data mining is the exploration and analysis, by automatic and semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules » (M.J.A. Berry, G. Linoff).

« Data mining the use of advanced statistical tools to reach into a company’s existing databases to discover patterns and relationships that can be exploited in a business context» (Trajecta lexicon).

« Data mining is a combine of powerful methods that help reducing costs and risks as well as increasing revenues, by extracting strategical information from the available data. » (T. Fahmy).

15 years ago, most of the information that companies were storing was not used because of some technical limits. Today these limits have disappeared, and several softwares are able to perform statistical analyses on databases containing several millions of rows and hundreds of variables.

If the data analysis and statistical techniques used to produce some strategical information are sometimes more than 30 years old, the computational methods are much more recent. Datamining on huge databases is now possible because of the storage and microprocessors advances, and because of new algorithms that were conceived to explore very quickly big data sets. The most advanced products enable several users to work at the same time on the same data from several terminals.


Requirements for doing datamining :

To do datamining, one needs to collect as much information as possible by transferring the information from paper archives to a computer database, or by reorganizing the existing databases. It is then necessary to restructure these eventually assorted data in a big information centre, the Data Warehouse. This stage often needs the help and consulting of a specialized company.

Then a tool is necessary to create a link between the data warehouse and a software that can analyse the data and perform several statistical analyses to extract some crucial information that is not immediately available. Several datamining products are able to import the data from the information centre and directly treat them (see below). To avoid memory problems some tools can be very parcimonious when importing the data and very fast because using parallel processing (for example, IBM Intelligent Miner can do analyses on hundreds of gigabytes, and even terabytes of data).

To run the analyses using the datamining software, a person having a good experience in data analysis is highly recommended, as the statistical analyses often require a difficult interpretation stage. In addition, knowing the way the tools proceed is a big advantage to avoid misleading conclusions. Several companies and freelance consultants or university departments are offering consulting services to help companies in doing the analyses and taking the right decisions.


Major tools :

  • Classification
  • Prediction
  • Clustering
  • Link analysis

    Major applications :

  • Market basket analysis
  • Customers segmentation
  • Customers scoring
  • Fraud detection
  • Sales forecasting
  • Pricing



    Data Mining Conferences


    Date Name, place Information
    September 13-16, 2000 PKDD-2000, LYON, FRANCE Jan Rauch
    April 5-7, 2001 First SIAM International Conference on Data Mining, Chicago, USA Mohammed Zaki


    Data Mining Reports


    Editor Title Comments
    META Group Inc. DATA MINING: Trends, Technology, and Implementation Imperatives Issued November 1997. Cost : $2,500
    Ovum Ovum Evaluates: Data Mining 300 pages. Issued November 1997. Cost : £995 Europe, US$1850 in the rest of the world



    Links to other Data Mining pages


  • The Data Mining Insitute
  • The Data Warehouse Insitute
  • Geometry in Action : Data Mining by David Eppstein
  • Data mining in molecular biology, by Alvis Brazma
  • Graham Williams page
  • Knowledge Discovery and Data Mining Resources, maintained by GTE Laboratories
  • University of Helsinki Data Mining Group
  • UCLA Data Mining Laboratory
  • The Data Mine by Andy Pryke
  • Data Mining and Database Marketing by Kurt Thearling's
  • Microsoft Datamining Site
  • KD Mine: Data Mining and Knowledge Discovery
  • Datamation page on datamining
  • Companion page of the "Data Mining Techniques" book




    Copyright St@tServ 1997 - 2000, All rigths reserved
    Click here to send information to the St@tServ contributors.