by Mark Bardsley, MLIS Day
On Thursday, March 3rd I attended a workshop led by Melody Ivory-Ndiaye entitled "Moving Beyond Databases to Data Warehouses and Data Mining." Melody provided a good balance between the details of data mining and the 'big picture.' To get everyone on the same page, she first helped us come to a consensus on definitions of words such as data warehouse (a collection of databases) and data mining (the process of analyzing data to answer a question or search for interesting patterns). Both those definitions are my distillations after the workshop, so blame me if they are off a bit.
Several real world examples were discussed to demonstrate the advantages (commercial, personal, etc.) of and ethical issues involved in data mining. For example, to gauge insurance premium costs, insurers might mine databases to understand the statistics of auto accidents. One might find it unethical to charge an eighteen-year-old male more than a thirty-year-old female for insurance based on group behavior. Data mining is also frequently used on-line where a bookstore might offer books of interest based on the contents of your electronic shopping cart. To decide what to offer, a data mining process analyzes your choices and, among other things, looks at the other items bought by previous buyers of similar items.
After a general discussion of data mining, Melody actually demonstrated it by analyzing a few datasets using software called SPSS. The statistical software is not free but is licensed to UW faculty, staff, and students. SPSS is feature-rich and for someone like myself who is statistically challenged, an example of how to use the software was beneficial to say the least. As an exciting last example, we analyzed a dataset of criminal records with many factors. For example, each entry had information about the initial arrest such as age, whether or not the crime was violent, how much time the individual spent in prison, how much time elapsed before their second arrest, and so forth. Using SPSS, Melody mined the data for clusters or related groups among the datasets.
I left the workshop with a few data mining related thoughts. Whether you like it or not, your data is being mined. For those who are interested in competitive intelligence, I think data mining is, or will become, key in the process. Without analyzing the ethical details, one can see how data mining in a library setting could be advantageous yet frightening to patrons.