From the category archives:

Interviews

Here is an interview with one of the chief evangelists to data quality in the field of Business Intelligence, Jim Harris who has a renowned blog at http://www.ocdqblog.com/. I asked Jim about his experiences in the field on data quality messing up big budget BI projects, and some tips and methodologies to avoid them.

No one likes to feel blamed for causing or failing to fix the data quality problems- Jim Harris, Data Quality Expert.

Jim Harris Large Photo

Ajay- Why the name OCDQ? What drives your passion for data quality? Name any anecdotes where bad data quality really messed up a big BI project.

Jim Harris - Ever since I was a child, I have had an obsessive-compulsive personality. If you asked my professional colleagues to describe my work ethic, many would immediately respond: “Jim is obsessive-compulsive about data quality…but in a good way!” Therefore, when evaluating the short list of what to name my blog, it was not surprising to anyone that Obsessive-Compulsive Data Quality (OCDQ) was what I chose.

On a project for a financial services company, a critical data source was applications received by mail or phone for a variety of insurance products. These applications were manually entered by data entry clerks. Social security number was a required field and the data entry application had been designed to only allow valid values. Therefore, no one was concerned about the data quality of this field – it had to be populated and only valid values were accepted.

When a report was generated to estimate how many customers were interested in multiple insurance products by looking at the count of applications per social security number, it appeared as if a small number of customers were interested in not only every insurance product the company offered, but also thousands of policies within the same product type. More confusion was introduced when the report added the customer name field, which showed that this small number of highly interested customers had hundreds of different names. The problem was finally traced back to data entry.

Many insurance applications were received without a social security number. The data entry clerks were compensated, in part, based on the number of applications they entered per hour. In order to process the incomplete applications, the data entry clerks entered their own social security number.

On a project for a telecommunications company, multiple data sources were being consolidated into a new billing system. Concerns about postal address quality required the use of validation software to cleanse the billing address. No one was concerned about the telephone number field – after all, how could a telecommunications company have a data quality problem with telephone number?

However, when reports were run against the new billing system, a high percentage of records had a missing telephone number. The problem was that many of the data sources originated from legacy systems that only recently added a telephone number field. Previously, the telephone number was entered into the last line of the billing address.

New records entered into these legacy systems did start using the telephone number field, but the older records already in the system were not updated. During the consolidation process, the telephone number field was mapped directly from source to target and the postal validation software deleted the telephone number from the cleansed billing address.

Ajay- Data Quality – Garbage in, Garbage out for a project. What percentage of a BI project do you think gets allocated to input data quality? What percentage of final output is affected by the normalized errors?

Jim Harris- I know that Gartner has reported that 25% of critical data within large businesses is somehow inaccurate or incomplete and that 50% of implementations fail due to a lack of attention to data quality issues.

The most common reason is that people doubt that data quality problems could be prevalent in their systems. This “data denial” is not necessarily a matter of blissful ignorance, but is often a natural self-defense mechanism from the data owners on the business side and/or the application owners on the technical side.

No one likes to feel blamed for causing or failing to fix the data quality problems.

All projects should allocate time and resources for performing a data quality assessment, which provides a much needed reality check for the perceptions and assumptions about the quality of the data. A data quality assessment can help with many tasks including verifying metadata, preparing meaningful questions for subject matter experts, understanding how data is being used, and most importantly – evaluating the ROI of data quality improvements. Building data quality monitoring functionality into the applications that support business processes provides the ability to measure the effect that poor data quality can have on decision-critical information.

Ajay- Companies talk of paradigms like Kaizen, Six Sigma and LEAN for eliminating waste and defects. What technique would you recommend for a company just about to start a major BI project for a standard ETL and reporting project to keep data aligned and clean?

Jim Harris- I am a big advocate for methodology and best practices and the paradigms you mentioned do provide excellent frameworks that can be helpful. However, I freely admit that I have never been formally trained or certified in any of them. I have worked on projects where they have been attempted and have seen varying degrees of success in their implementation. Six Sigma is the one that I am most familiar with, especially the DMAIC framework.

However, a general problem that I have with most frameworks is their tendency to adopt a one-size-fits-all strategy, which I believe is an approach that is doomed to fail. Any implemented framework must be customized to adapt to an organization’s unique culture. In part, this is necessary because implementing changes of any kind will be met with initial resistance, but an attempt at forcing a one-size-fits-all approach almost sends a message to the organization that everything they are currently doing is wrong, which will of course only increase the resistance to change.

Starting with a framework as a reference provides best practices and recommended options of what has worked for other organizations. The framework should be reviewed to determine what can best be learned from it and to select what will work in the current environment and what simply won’t. This doesn’t mean that the selected components of the framework will be implemented simultaneously. All change comes gradually and the selected components will most likely be implemented in phases.

Fundamentally, all change starts with changing people’s minds. And to do that effectively, the starting point has to be improving communication and encouraging open dialogue. This means more of listening to what people throughout the organization have to say and less of just telling them what to do. Keeping data aligned and clean requires getting people aligned and communicating.

Ajay- What methods and habits would you recommend to young analysts starting in the BI field for a quality checklist?

Jim Harris- I always make two recommendations.

First, never make assumptions about the data. I don’t care how well the business requirements document is written or how pretty the data model looks or how narrowly your particular role on the project has been defined. There is simply no substitute for looking at the data.

Second, don’t be afraid to ask questions or admit when you don’t know the answers. The only difference between a young analyst just starting out and an expert is that the expert has already made and learned from all the mistakes caused by being afraid to ask questions or admitting when you don’t know the answers.

Ajay- What does Jim Harris do to have quality time when not at work?

Jim- Since I enjoy what I do for a living so much, it sometimes seems impossible to disengage from work and make quality time for myself. I have also become hopelessly addicted to social media and spend far too much time on Twitter and Facebook. I have also always spent too much of my free time watching television and movies. I do try to read as much as I can, but I have so many stacks of unread books in my house that I could probably open my own book store. True quality time typically requires the elimination of all technology by going walking, hiking or mountain biking. I do bring my mobile phone in case of emergencies, but I turn it off before I leave.

Biography-

Jim Harris Small PhotoJim Harris is the Blogger-in-Chief here at Obsessive-Compulsive Data Quality (OCDQ), which is an independent blog offering a vendor-neutral perspective on data quality.

He is an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality (DQ), and business intelligence (BI),

Jim has worked with Global 500 companies in finance, brokerage, banking, insurance, healthcare, pharmaceuticals, manufacturing, retail, telecommunications, and utilities. Jim also has a long history with the product that is now known as IBM InfoSphere QualityStage. Additionally, he has some experience with Informatica Data Quality and DataFlux dfPower Studio.

Jim can be followed at twitter.com/ocdqblog and contacted at http://www.ocdqblog.com/contact/


{ 3 comments }

An interview with Eric Siegel, Ph.D.President of Prediction Impact, Inc. and founding chair of Predictive Analytics World.

eric_siegel_picture

Ajay- What does this round of Predictive Analytics World have —–which was not there in the edition earlier in the year.

Eric- Predictive Analytics World (pawcon.com) – Oct 20-21 in DC delivers a fresh set of 25 vendor-neutral presentations across verticals employing predictive analytics, such as banking, financial services, e-commerce, education, healthcare, high technology, insurance, non-profits, publishing, retail and telecommunications.

PAW features keynote speaker, Stephen Baker, author of The Numerati and Senior writer at BusinessWeek.  His keynote is described at www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-2

A strong representation of leading enterprises have signed up to tell their stories — speakers will present how predictive analytics is applied at Aflac, AT&T Bell South, Amway, The Coca-Cola Company, Financial Times, Hewlett-Packard, IRS, National Center for Dropout Prevention, The National Rifle Association, The New York Times, Optus (Australian telecom), PREMIER Bankcard, Reed Elsevier, Sprint-Nextel, Sunrise Communications (Switzerland), Target, US Bank, U.S. Department of Defense, Zurich — plus special examples from Anheuser-Busch, Disney, HSBC, Pfizer, Social Security Administration, WestWind Foundation and others.

To see the entire agenda at a glance: www.predictiveanalyticsworld.com/dc/2009/agenda_overview.php

We’ve added a third workshop, offered the day before (Oct 19), “Hands-On Predictive Analytics.  There’s no better way to dive in than operating real predictive modeling software yourself – hands-on.”  For more info: www.predictiveanalyticsworld.com/dc/2009/handson_predictive_analytics.php

Ajay- What do academics, corporations and data miners gain in this conference? list 4 bullet points for the specific gains.

Eric- A. First, PAW’s experienced speakers provide the “how to” of predictive analytics. PAW is a unique conference in its focus on the commercial deployment of predictive analytics, rather than research and development. The core analytical technology is established and proven, valuable as-is without additional R&D — but that doesn’t mean it’s a “cakewalk” to employ it successfully to ensure value is attained.  Challenges include data requirements and preparation, integration of predictive models and their scores into existing organizational systems and processes, tracking and evaluating performance, etc. There’s no better way to hone your skills and cultivate an informed plan for your organization’s efforts than hearing how other organizations did it.

B. Second, PAW covers the latest state-of-the-art methods produced by research labs, and how they provide value in commercial deployment. This October, almost all sessions in Track 2 are at the Expert/Practitioner-level.  Advanced topics include ensemble models, uplift modeling (incremental modeling), model scoring with cloud computing, predictive text analytics, social network analysis, and more.

PAW’s pre- and post-conference workshops round out the learning opportunities. In addition to the hands-on workshop mentioned above, there is a course covering core methods, “The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes” (www.predictiveanalyticsworld.com/dc/2009/predictive_modeling_methods.php) and a business-level seminar on decision automation and support, “Putting Predictive Analytics to Work” (www.predictiveanalyticsworld.com/dc/2009/predictive_analytics_work.php).

C. Third, the leading predictive analytics software vendors and consulting firms are present at PAW as sponsors and exhibitors, available to provide demos and answer your questions.  What do the predictive analytics solutions do, how do they compare, and which is best for your? PAW is the one-stop-shop for selecting the tool or solution most suited to address your needs.

D. Fourth, PAW provides a unique, focused opportunity to network with colleagues and establish valuable contacts in the predictive analytics industry.  Mingle, connect and hang out with professionals facing similar challenges (coffee breaks, meals, and at the reception).

Ajay- How do you balance the interests of various competing softwares and companies who sponsor such event?

Eric- As a vendor-neutral event, PAW’s core program of 25 sessions is booked exclusively with enterprise practitioners, thought leaders and adopters, with no predictive analytics software vendors speaking or co-presenting. These sessions provide substantive content with take-aways which provide value that’s independent of any particular software solution — no product pitches!  Beyond these 25 sessions are three short sponsor sessions that are demarcated as such, and, despite being branded, generally prove to be quite substantive as well.  In this way, PAW delivers a high quality, unbiased program.

Supplementing this vendor-neutral program, the room right next door has an expo where attendees have all the access to software and solution vendors they could want (cf. in my answer to the prior question regarding software vendors, above).

Here are a couple more PAW links:

For informative PAW event updates:
www.predictiveanalyticsworld.com/notifications.php

To sign up for the PAW group on LinkedIn, see:
www.linkedin.com/e/gis/1005097

Ajay- Describe your career in science including research that you specialize in. How would you motivate students today to go for science careers

Eric- Well, first off, my work as a predictive analytics consultant, instructor and conference chair is in the application of established technology, rather than the research and development of new or improved methods.

But the Ph.D. next to my name reveals my secret past as an “academic”. Pure research is something I really enjoyed and I kind of feel like I had a brain transplant in order to change to “real world work”. I’m glad I made the change, although I see good sides to both types of work (really, they’re like two entirely different lifestyles).

In my research I focused on core predictive modeling methods. The ability for a computer to automatically learn from experience (data really is recorded experience, after all), is the best thing since sliced bread. Ever since I realized, as a kid, that space travel would in fact be a huge pain in the neck, nothing in science has ever seemed nearly as exciting.

Predictive analytics is an endeavor in machine learning. A predictive model is the encoding of a set of rules or patterns or regularities at some level. The model is the thing output by automated, number-crunchin’ analysis and, therefore, is the thing “learned” from the “experience” (data).  The “magic” here is the ability of these methods to find a model that performs not only over the historical data on your disk drive, but that will perform equally well for tomorrow’s new situations. That ability to generalize from the data at hand means the system has actually learned something.

And indeed the ability to learn and apply what’s been learned turns out to provide plenty of business value, as I imagined back in the lab.  The output of a predictive model is a predictive score for each individual customer or prospect.  The score in turn directly informs the business decision to be taken with that individual customer (to contact or not to contact; with which offer to contact, etc.) – business intelligence just doesn’t get more actionable than that.

For the impending student, I’d first point out the difference between applied science and research science. Research science is fun in that you have the luxury of abstraction and are usually fairly removed from the need to prove near-term industrial applicability. Applied science is fun for the opposite reason: The tangle of challenges, although less abstract and in that sense more mundane, are the only thing between you and getting the great ideas of the world to actually work, come to fruition, and have an irrefutable impact.

Ajay- What are the top five conferences in analytics and data mining in your opinion in the world including PAW.

Eric- KDD – The leading event for research and development of the core methods behind the commercial deployments covered at PAW (”Knowledge Discovery and Data Mining”).

ICML – Another long-standing research conference on machine learning (core data mining).

eMetrics.org – For online marketing optimization and web analytics

Text Analytics Summit – Text mining can leverage “unstructured data” (text) to augment predictive analytics; the chair of this conference is speaking at PAW on just that topic: www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-15

Predictive Analytics World, the business-focused event for predictive analytics professionals, managers and commercial practitioners – focused on the commercial deployment of predictive analytics: pawcon.com

Ajay- Would PAW 2009 have video archives, videos as well or podcasts for people not able to attend on site.

Eric- While the PAW conferences emphasize in-person participation, we are in the planning stages for future webcasts and other online content. PAW’s “Predictive Analytics Guide” has a growing list of online resources: www.predictiveanalyticsworld.com/predictive_analytics.php

Ajay- How do you think social media marketing can help in these conferences.

Eric- Like most events, PAW leverages social media to spread the word.

But perhaps most pertinent is the other way around: predictive analytics can help social media by increasing relevancy, dynamically selecting the content to which each reader or viewer is most likely to respond.

Ajay- Do you have any plans to take PAW more international? Any plans for a PAW journal for trainings and papers.

Eric- We’re in discussions on these topics, but for now I can only say, stay tuned!

Biographyy

The president of Prediction Impact, Inc., Eric Siegel is an expert in predictive analytics and data mining and a former computer science professor at Columbia University, where he won awards for teaching, including graduate-level courses in machine learning and intelligent systems – the academic terms for predictive analytics.He has published 13 papers in data mining research and computer science education, has served on 10 conference program committees, has chaired a AAAI Symposium held at MIT, and is the founding chair of Predictive Analytics World.

For more on Predictive Analytic World-

Predictive Analytics World Conference
October 20-21, 2009, Washington, DC
www.predictiveanalyticsworld.com
LinkedIn Group: www.linkedin.com/e/gis/1005097

{ 0 comments }

Here is an interview with Gary Miner, Phd who has been in the data mining business for almost 30 years and a pioneer in healthcare studies pertaining to Alzheimer’s diseases. He is also co author of “the Handbook of Statistical Analysis and  Data Mining Applications”. Gary writes on how he has seen data mining change over the years, health care applications as well as his book and quotes from his experience.

GaryMinersmall

Ajay- Describe your career in science starting from college till today. How would you interest young students in science careers today in the mid of the recession

Gary - I knew that I wanted to be in “Science” even before college days, taking all the science and math courses I could in high school. This continued in undergraduate college years at a private college [Hamline University, St. Paul, Minnesota……..older than the State of Minnesota, founded in 1854, and had the first Medical School, later “sold” to the University of Minnesota] as a Biology and Chemistry major, with a minor in education. From there is did a M.S. conducting a “Physiological genetics research project”, and then a Ph.D. at another institution where I worked on Genetic Polymorphisms of Mouse blood enzymes. So through all of this, I had to use statistics to analyze the data. My M.S. was analyzed before the time of even “electronic calculators”, so I used, if you can believe this, a “hand cranked calculator”, rented, one summer to analyze my M.S. dataset. By the time my Ph.D. thesis data was being analyzed, electronic calculators were available, but the big main-frame computers were on college campuses, so I punched the data into CARDS, walked down the hill to the computing center, dropped off the stack of cards, to come back the next day to get “reams of output” on large paper [about 15” by 18”, folded in a stack, if anyone remembers those days …]. I then spent about 30 years doing medical research in academic environments with the emphasis on genetics, biochemistry, and proteonomics in the areas of mental illness and Alzheimer’s Disease, which became my main area of study, publishing the first book in 1989 on the GENETICS OF ALZHEIMER’S DISEASE.

Today, in my “semi-retirement careers”, one side-line outreach is working with medical residents on their research projects, which I’ve been doing for about 7 or 8 years now. This involves design of the research project, data collection, and most importantly “effective and accurate” analysis of the datasets. I find this a way I can reach out to the younger generation to interest them not only in “science”, but in doing “science correctly”. As you probably know, we are in the arena of the “Duming of America”; anti-science, if you wish. I’ve seen this happening for at least 30 years, during the 1980’s, 1990’s, and continuing into this Century. Even the medical residents I get to work with each year have been going “downhill” yearly in their ability to “problem solve”. I believe this is an effect of this “dumning of America”.

There are several books coming out on this Dumning of America this summer; one the first week of June, another on July 12, and another in September [see the attached PPT for slides with the covers of these 3 books}. It is a real problem, as Americans over the past few decades have moved towards “wanting simple answers”, and most things in the “real world”, e.g. reality are not simple………..that’s where Science comes in.

A recent 2008 study done by the School of Public Health at Ohio University showed that up to 88% of the published scientific papers in a top respected cancer journal either used statistics INCORRECTLY, and/or the CONCLUSION was INCORRECT. When I and my wife both did Post-Docs in Psychiatric Epidemiology in 1980-82, basically doing an MPH, the first words out of the mouth of the “Biostats – Epidemiology” professor in the first lecture to the incoming MPH students was “We might as well through out most of the medical research literature of the past 25 years, as it has either not been designed correctly or statistics have been used incorrectly”!!! ……That caught my attention. And following medical research [and medicine in general] I can tell you that “not much has changed in the past 25 years since then”, and thus that puts us “50 years behind in medical research” and medicine. ANALOGY: If some of our major companies, that are successfully using predictive analytics to organize and efficiently run their organizations, took on the “mode of operation” of medicine and medical research, they’d be “bankrupt” in 6 months” …. That’s what I tell my students.

Ajay- Describe some of the exciting things data mining can do to lower health care costs and provide more people with coverage.

Gary- As mentioned above, my personal feeling is that “medicine / health care” is 50 years “behind the times”, compared to the efficiency needed to successfully survive in this Global Economy; corporations and organizations like Wal-Mart, INTEL, many of our Pharmaceutical Companies, have used data mining / predictive analytics to survive successfully. Wal-Mart especially: Wal-Mart has it’s own set of data miners, and were writing their own procedures in the early 1990’s ………..before most of us ever heard of data mining; that is why Wal-Mart can go into China today, and open a store in any location, and know almost to 99% accuracy 1) how many check out stand needed, 2) what products to stock, 3) where in the store to stock them, and 4) what their profit margin will be. They have done this through very accurate “Predictive Analytics” modeling.

Other “ingrained” USA corporations have NOT grabbed onto this “most accurate” technology [e.g. predictive analytics modeling], and reaping the “rewards” of impending bankruptcy and disappearance today. Examples in the news, of course, our our 3 – big automakers in Detroit. If they had engaged effective data mining / modeling in the late 1990’s they could have avoided their current problems. I see the same for many of our oldest and larges USA Insurance Companies………..they are “middle management fat”, and I’ve seen their ratings go down over the past 10 years from an A rating to even a C rating [for the company in which I have my auto insurance ? you might ask me why I stay? …. An agent who is a friend, BUT it is frustrating, and this companies “mode of operation” is completely “customer un-friendly”.], while new insurance companies have “grabbed” onto modern technology, and are rising stars.

So my influence on the younger generation is to have my students do research and DATA ANALYSIS correctly.

Ajay- Describe your book ” HANDBOOK OF STATISTICAL ANALYSIS & DATA MINING APPLICATIONS”. Who would be the target audience of this and can corporate data miners gain from it as well.

Gary- There are several target audiences: The main audience we were writing for, after our Publisher looked at what “niches” had been un-met in data mining literature, was for the professional in smaller and middle sized businesses and organizations that needed to learn about “data mining / predictive analytics” “fast”…..e.g. maybe situations where the company did not have a data anlaysis group using predictive analytics, but the CEO’s and Professionals in the company knew they needed to learn and start using predictive analytics to “stay alive”. This seemed like potentially a very large audience. The book is oriented so that one does NOT have to start at chapter 1, and read sequentially, but instead can START WITH A TUTORIAL. Working through a tutorial, I’ve found in my 40 years of being in education, is the fastest way for a person to learn something new. And this has been confirmed………..I;ve had newcomers to data mining, who have already gotten the HANDBOOK, write me and say: “I’ve gone through a bunch of tutorials, and finding that I am really learning ‘how to do this’……..I’ve ready other books on ‘theory’, but just didn’t get the ‘hang of it’ from those”. My data mining consultants at StatSoft, who travel and work in “real world” situations every day, and who wrote maybe 1/3 of the tutorials in the HANDBOOK, tell me: “A person can go through the TUTORIALS in the HANDBOOK, and know 70% of what we who are doing predictive analytics consulting every day know !!!”

But there are other audiences: Corporate data miners can find it very useful also, as a “way of thinking as a data miner” can be gained from reading the book, as was expressed by one of the Amazon.com 5-STAR reviews: “What I like about this book is that it embeds those methods in a broader context, that of the philosophy and structure of data mining, especially as the methods are used in the corporate world. To me, it was really helpful in thinking like a data miner, especially as it involves the mix of science and art.”

But we’ve had others who have told us they will use is as an extra textbook in their Business Intelligence and Data Minng courses, because of the “richness” of the tutorials. Here’s a comment on the Amazon reviews from a Head of Business School who has maybe over 100 graduate students doing data mining:

“5.0 out of 5 stars. At last, a useable data mining book”

This is one of the few, of many, data mining books that delivers what it promises. It promises many detailed examples and cases. The companion DVD has detailed cases and also has a real 90 day trial copy of Statistica. I have taught data mining for over 10 years and I know it is very difficult to find comprehensive cases that can be used for classroom examples and for students to actually mine data. The price of the book is also very reasonable expecially when you compare the quantity and quality of the material to the typical intro stat book that usually costs twice as much as this data mining book.

The book also addresses new areas of data mining that are under development. Anyone that really wants to understand what data mining is about will find this book infinitively useful.”

So, I think the HANDBOOK will see use in many college classrooms.

Ajay- A question I never get the answer to is which data mining tool is good for what and not so good for what. Could you help me out with this one? What in your opinion, among the data mining and statistical tools used by you in your 40 years in this profession would you recommend for some uses, and what would you not recommend for other uses ( eg SAS,SPSS,KXEN,Statsoft,R etc etc)

Gary- This is a question I can’t answer well; but my book co-author, Robert Nisbet, Ph.D. can. He has used most of these softwares, and in fact has written 2 reviews over the past 6 years in which most of these have been discussed. I like “cutting edge endeavors”, that has been the modus operandi of my ‘career’, so when I took this “semi-retirement postion” as a data mining consultant at StatSoft, I was introduced to DATA MINING, as we started developing STATISTICA Data Miner shortly after I arrived. So most of my experience is with STATISTICA Data Miner, which of course has always been rated NO 1 in all the reviews on data miner software done by Dr. Nisbet – I believe this is primarily due to the fact that STATISTICA was written for the PC from the beginning, thus dos not have any legacy “main frame computer” coding in its history, and secondly StatSoft has been able to move rapidly to make changes as business and government data analysis needs change, and thirdly and most importantly, STATISTICA products have very “open architecture”, “flexibility”, and “customization” with every “built together / workable together” as one package. And of course the graphical output is second to none – that is how STATISTICA originally got its reputation. So I find no need of any other software, as if I need a new algorithm, I can program it to work with the “off the shelf” STATISTICA Data Miner algorithms, and thus get anything I need with the full graphical and other outputs seamlessly available.

Ask Bob Nisbet to answer this question, as he has the background to do so.

Ajay- What are the most interesting trends to watch out for in 2009-2010 in data mining in your opinion.

Gary- Things move so rapidly in this 21st century world, that this is difficult to say. Let me answer this with “hindsight”:

In late October, 2008 I wrote the first draft of Chapter 21 for the HANDBOOK. This was the “future directions of data mining”. You can look in that chapter yourself to find the 4 main areas I decided to focus on. One was on “social networking”, and one of the new examples used was TWITTER. At that time, less than one year ago, no one knew if TWITTER was going to amount to much or not ??? big question? Well, on Jan 14 when the US-AIRWAYS A320 Airbus made an emergency landing in the Hudson River, I got an EMAIL automatic message from CNN [that I subscribe to] telling me that a “plane was down in the Hudson, watch it live” …………I click on the live video: The voice form the Helicopter overhead was saying: “We see a plane, half sunk into the water, but no people? What has happened to the people? Are they all dead?………” Well, as it turned out, the CNN Helicopters had spend nearly one hour searching the river for the plane, as had other news agencies. BUT THE “ENTIRE” WORLD ALREADY KNEW !!! … Why? A person on a ferry that was crossing the river close to the crash landing used his I-Phone, snaped a photo, uploaded it to TWIT-PIX and sent a TWITTER message, and this was re-tweeted around the world. The world knew in “seconds to minutes” to which the traditional NEWS MEDIA was 1 hour late on the scene, when ALL the PEOPLE had been rescued and were on-shore in a warm building within 45 minutes of the landing. THE TRADITIONAL NEWS MEDIA ARRIVED 15 MINUTES AFTER EVERYTHING HAD HAPPENED !!!! ………AT THIS POINT we ALL KNEW that TWITTER was a new phenomenon ……….and it started growing, with 10,00 people an hour joining at one point in last spring of this year, and who knows what the rate is today. TWITTER has become a most important part not only of “social networking” among friends, but for BUSINESS —- companies even sending out ‘Parts Availability” lists to their dealers, etc.

TWITTER affected Chapter 21…………..I immediately re-wrote Chapter 21, including this first photo of the Hudson Plane crash-landing with all the people standing on the wings. BUT, not the end of this story: By the time the book was about to go to press, TWITTER had decided that “ownership” of uploaded photos resided with the photographer, and the person who took this original US-AIRBUS – PEOPLE ON THE WINGS photo wanted $600 for us to publish it in the HANDBOOK. So, I re-wrote again [the chapter was already “set” in page proofs……….so we had to make the changes directly at the printer]………this time finding another photo uploaded to social media, but in this case the person had “checked” the box to put the photo in public domain.

So TWITTER is one that I predicted would become important, but I’d thought it would be months AFTER the HANDBOOK was released in May, not last January!!!

Other things we presented in Chapter 21 about the “future of data mining” involved “photo / image recognition”, among others. The “Image Recognition”, and more importantly “movement recognition / analysis” for things like Physical Therapy and other medical areas may be more slow to evolve and fully develop, but are immensely important. The ability to analyze such “Three-dimensional movement data” is already available in rudimentary form in our version 9 of STATISTICA [just released in June], and anyone could implement it fully with MACROS, but it probably will be some time before it is fully feasible from a business standpoint to develop it with fully automatic “point and click” functionality to make it readily accessible for anyone’s use.

Ajay What would your advice be to a young statistician just starting his research career.

Gary- Make sure you delve in / grab in FULLY to the subject areas……….you need to know BOTH the “domain” of the data you are working with, and “correct” methods of data analysis, especially when using the traditional p-value statistics. Today’s culture is too much on “superficiality”………..good data analysis requires “depth” of understanding. One needs to FOCUS ………good FOCUS can’t be done with elaborate “multi-tasking”. Granted, today’s youth [the “Technology-Inherited”] probably have their brains “wired differently” than the “Technology-Immigrants” like myself [e.g. the older generations], but never-the-less, I see ERRORS all over the place in today’s world, from “typos” in magazine and newspaper, to web page paragraphs, links that don’t work, etc etc ……….and I conclude that this is all do to NON-FOCUSED / MULTI-TASKING people. You can’t drive a car / bus / train and TEXT MESSAGE at the same time ……….the scientific tests that have been conducted show that it takes 20-times as long for a TEXT MESSAGING driver to stop, than a driver fully focused on the road, when given a “danger” warning. [Now, maybe this scientific experiment used ALL TECHNOLOGY-IMMIGRANTS as drivers?? If so, the scientific design was “flawed” ……..they should have used BOTH Technology-Immigrants and Technology-Inheritants as participants in the study. Then we’d have 2 dependent, or target variables: Age and TEXT MESSAGING…..]

Short Bio-

Professor, 30 years medical research in genetics, DNA, Proteins, Neuropsychology of Schizophrenia and Alzheimer’s Disease……….now semi-retired position as DATA MINING CONSULTANT – SENIOR STATISTICIAN

{ 0 comments }

Here is an interview with John F Moore, VP Engineering and Chief Technology Officer, Swimfish a provider of business solutions and CRM. A well known figure in Technology and CRM circles, John talks of Social CRM, Technology Offshoring, Community Initiatives and his own career.

Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

-John F Moore

JohnMoore

Ajay – Describe your career journey from college to CTO. What changes in mindset did you undergo along the journey? What advice would you give to young students to take up science careers ?

John- First, I wanted to take time to thank you for the interview offer. I graduated from Boston University in 1988 with a degree in Electrical Engineering. At the time of my graduation I found myself to be very interested in the advanced taking place on the personal computing front by companies like Lotus with their 1-2-3 product. I knew that I wanted to be involved with these efforts and landed my first job in the software space as a Software Quality Engineer working on 1-2-3 for DOS.

I spent the first few years of my career working at Lotus as a developer, a quality engineer, and manager, on products such as Lotus 1-2-3 and Lotus Notes. Throughout those early career years I learned a lot and focused on taking as many classes as possible.

From Lotus I sought out the start-up environment and by early 2000 and joined a startup named Brainshark (http://www.brainshark.com). Brainshark was, and is, focused on delivering an asynchronous communication platform on the web and was one of the early providers of SAAS. In my seven years at Brainshark I learned a lot about delivering an Enterprise class SAAS solution on top of the Microsoft technology stack. The requirements to pass security audits for Fortune 500 companies, the need to match the performance of in-house solutions, resulted in all of us learning a great deal. These were very fun times.

I now work as the VP of Engineering and CTO at Swimfish, a services and software provider of business solutions. We focus on the financial marketplace where we have the founder has a very deep background, but also work within other verticals as well. Our products are focused on the CRM, document management, and mobile product space and are built on the Microsoft technology stack. Our customers leverage both our SAAS and on-premise solutions which require us to build our products to be more flexible than is generally required for a SAAS-only solution.

The exciting thing for me is the sheer amount of opportunities I see available for science/engineering students graduating in the near future. To be prepared for these opportunities, however, it will be important to not just be technically savvy.

Engineering students should also be looking at:

* Business classes. If you want to build cool products they must deliver business value.

* Writing and speaking classes. You must be able to articulate your ideas or no one will be willing to invest in them.

I would also encourage people to take chances, get in over your head as often as possible.You may fail, you may succeed. Either way you will gain experiences that make it all worthwhile.

Ajay- How do you think social media can help with CRM. What are the basic do’s and don’ts for social media CRM in your opinion?

John- You touch upon a subject that I am very passionate about. When I think of Social CRM I think about a system of processes and products that enable businesses to actively engage with customers in a manner that delivers maximum value to all. Customers should be able to find answers to their questions with minimal friction or effort; companies should find the right customers for their products.

Social CRM should deliver on some of these fronts:

* Analyze the web of relationships that exists to define optimal pathways. These pathways will define relationships that businesses can leverage for finding their customers. These pathways will enable customers to quickly find answers to their questions. For example, I needed an answer to a question about SharePoint and project management. I asked the question on Twitter and within 3 minutes had answers from two different people. Not only did I get the answer I needed but I made two new friends who I still talk to today.

* Monitor conversations to gauge brand awareness, identify customers having problems or asking questions. This monitoring should not be stalking; however, it should be used to provide quick responses to customers to benefit the greater community.

* Usability. Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

Finally, when I think of social media I think of these properties:

* Social is about relationship building.

* You should always add more value to the community than you take in return.

* Be transparent and honest. People can tell when you’re not.

Ajay-  You are involved in some noble causes – like using blog space for out of work techies and separately for Alzheimer’s disease. How important do you think is for people especially younger people to be dedicated to community causes?

John- My mother-in-law was diagnosed with Alzheimer’s disease at the age 57. My wife and I moved into their two-family house to help her through the final years of her life. It is a horrible disease and one that it is easy to be passionate about if you have seen it in action.

My motivation on the job front is very similar. I have seen too many people suffer through these poor economic times and I simply want to do what I can to help people get back to work.

It probably sounds corny, but I firmly believe that we must all do what we can for each other. Business is competitive, but it does not mean that we cannot, or should not, help each other out. I think it’s important for everyone to have causes they believe in. You have to find your passions in life and follow them. Be a whole person and help change the world for the better.

Ajay- Describe your daily challenges as head of Engineering of Swimfish, Inc How important is it for the tech team to be integrated with the business and understand it as well.

John- The engineering team at Swimfish works very closely with the business teams. It is important for the team to understand the challenges our customers are encountering and to build products that help the customer succeed. I am not satisfied with the lack of success that many companies encounter when deploying a CRM solution.

We go as deep as possible to understand the business, the processes currently in use, the disparate systems being utilized, and then the underlying technologies currently in use. Only then do we focus on the solutions and deliver the right solution for that company.

On the product front it is the same. We work closely with customers on the features we are planning to add, trying to ensure that the solutions meet their needs as well as the needs of the other customers in the market that we are hoping to serve.

I do expect my engineers to be great at their core job, that goes without question. However, if they cannot understand the business needs they will not work for me very long.My weeks at Swimfish always provide me with interesting challenges and opportunities.

My typical day involves:

* Checking in with our support team to understand if there are any major issues being encountered by any of our customers.

* Challenging the support team to hit their targets. I love sales as without them I cannot deliver products.

* Checking in with my developers and test teams to determine how each of our projects is doing. We have a daily standup as well, but I try and personally check-in with as many people as possible.

* Most days I spend some time developing, mostly in C#. My current focus area is on our next release of our Milestone Tracking Matrix where I have made major revisions to our user interface.

I also spend time interacting on various social platforms, such as Twitter, as it is critical for me to understand the challenges that people are encountering in their businesses, to keep up with the rapid pace of technology, and just to check-in with friends. Keep it real.

Ajay-  What are your views on off shoring work especially science jobs which ultimately made science careers less attractive in the US- at the same time outsourcing companies ( in India) generally pay only 1/3 rd of billing fees to salaries. Do you think concepts like ODesk can help change the paradigm of tech out-sourcing.

John- I have mixed opinions on off-shoring. You should not offshore because of perceived cost savings only. On net you will generally break even, you will not save as much as you might originally think.

I am, however, close to starting a relationship with a good development provider in Costa Rica. The reason for this relationship is not cost based, it is knowledge based. This company has a lot of experience with the primary CRM system that we sell to customers and I have not been successful in finding this experience locally. I will save a lot of money in upfront training on this skill-set; they have done a lot of work in this area already (and have great references). There is real value to our business, and theirs.

Note that Swimfish is already working with a geographically dispersed team as part of the engineering team is in California and part is in Massachusetts. This arrangement has already helped us to better prepare for an offshore relationship and I know we will be successful when we begin.

Ajay- What does John Moore do to have fun when he is not in front of his computer or with a cause.

John- As the father of two teenage daughters I spend a lot of time going to soccer, basketball, and softball games. I also enjoy spending time running, having completed a couple of marathons, and relaxing with a good book. My next challenge will be skydiving as my 17 year old daughter and I are going skydiving when she turns 18.

Brief Bio:

For the last decade I have worked as a senior engineering manager for SAAS applications built upon the Microsoft technology stack. I have established the processes, and hired the teams that delivered hundreds of updates ranging from weekly patches to longer running full feature releases. My background as a hands-on developer combined with my strong QA background has enabled me to deliver high quality software on-time.

You can learn more about me, and my opinions, by reading my blog at http://johnfmoore.wordpress.com/ or joining me on Twitter at http://twitter.com/JohnFMoore

{ Comments on this entry are closed }

Here is an in depth interview with Peter J Thomas, one of Europe’s top Business Intelligence expert and influential thought leaders. Peter talks about BI tools, data quality, science careers, cultural transformation and BI and the key focus areas.

I am a firm believer that the true benefits of BI are only realised when it leads to cultural transformation. -Peter James Thomas

peter_thomas_005-h1000

Ajay- Describe about your early career including college to the present.

Peter -I was an all-rounder academically, but at the time that I was taking public exams in the 1980s, if you wanted to pursue a certain subject at University, you had to do related courses between the ages of 16 and 18. Because of this, I dropped things that I enjoyed such as English and ended up studying Mathematics, Further Mathematics, Chemistry and Physics. This was not because I disliked non-scientific subjects, but because I was marginally fonder of the scientific ones. In a way it is nice that my current blogging allows me to use language more.

The culmination of these studies was attending Imperial College in London to study for a BSc in Mathematics. Within the curriculum, I was more drawn to Pure Mathematics and Group Theory in particular, and so went on to take an MSc in these areas. This was an intercollegiate course and I took a unit at each of King’s College and Queen Mary College, but everything else was still based at Imperial. I was invited to stay on to do a PhD. It was even suggested that I might be able to do this in two years, given my MSc work, but I decided that a career in academia was not for me and so started looking at other options.

As sometimes happens a series of coincidences and a slice of luck meant that I joined a technology start-up, then called Cedardata, late in 1988; my first role was as a Trainee Analyst / Programmer. Cedardata was one of the first organisations to offer an Accounting system based on a relational database platform; something that was then rather novel, at least in the commercial arena. The RDBMS in question was Oracle version 5, running on VAX VMS – later DEC Ultrix and a wide variety of other UNIX flavours. Our input screens were written in SQL*Forms 2 – later Oracle Forms – and more complex processing logic and reports were in Pro*C; this was before PL/SQL. Obviously this environment meant that I had to become very conversant with SQL*Plus and C itself.

When I joined Cedardata, they had 10 employees, 3 customers and annual revenue of just £50,000 ($80,000). By the time I left the company eight years later, it had grown dramatically to having a staff of 250, over 300 clients in a wide range of industries and sales in excess of £12 million ($20 million). It had also successfully floatated on the main London Stock Exchange. When a company grows that quickly the same thing tends to happen to its employees.

Cedardata was probably the ideal environment for me at the time; an organisation that grew rapidly, offering new opportunities and challenges to its employees; that was fiercely meritocratic; and where narrow, but deep, technical expertise was encouraged to be rounded out by developing more general business acumen, a customer-focused attitude and people-management skills. I don’t think that I would have learnt as much, or progressed anything like as quickly in any other type of organisation.

It was also at Cedardata that I had my first experience of the class of applications that later became known as Business Intelligence tools. This was using BusinessObjects 3.0 to write reports, cross-tabs and graphs for a prospective client, the UK Foreign and Commonwealth Office (State Department). The approach must have worked as we beat Oracle Financials in a play-off to secure the multi-million pound account.

During my time at Cedardata, I rose to become an executive and filled a number of roles including Head of Development and also Assistant to the MD / Head of Product Strategy. Spending my formative years in an organisation where IT was the business and where the customer was King had a profound impact on me and has influenced my subsequent approach to IT / Business alignment.

Ajay- How would you convince young people to take maths and science more? What advice would you give to policy makers to promote more maths and science students?

Peter- While I have used little of my Mathematics directly in my commercial career, the approach to problem-solving that it inculcated in me has been invaluable. On arriving at University, it was something of a shock to be presented with Mathematical problems where you couldn’t simply look up the method of solution in a textbook and apply it to guarantee success. Even in my first year I had to grapple with challenges where you had no real clue where to start. Instead what worked, at least most of the time, was immersing yourself in the general literature, breaking down the problem into more manageable chunks, trying different techniques – sometimes quite recherché ones – to make progress, occasionally having an insight that provides a short-cut, but more often succeeding through dogged determination. All of that sounds awfully like the approach that has worked for me in a business context.

Having said that, I was not terribly business savvy as a student. I didn’t take Mathematics because I thought that it would lead to a career, I took it because I was fascinated by the subject. As I mentioned earlier, I enjoyed learning about a wide range of things, but Science seemed to relate to the most fundamental issues. Mathematics was both the framework that underpinned all of the Sciences and also offered its own world where astonishing an beautiful results could be found, independent of any applicability; although it has to be said that there are few braches of Mathematics that have not be applied somewhere or other.

I think you either have this appreciation of Science and Mathematics or you don’t and that this happens early on.

Certainly my interest was supported by my parents and a variety of teachers, but a lot of it arose from simply reading about Cosmology, or Vulcanism, or Palaeontology. I watched a YouTube of Steven Jay Gould recently saying that when he was a child in the 1950s all children were “in” to Dinosaurs, but that he actually got to make a career out of it. Maybe all children aren’t “in” to dinosaurs in the same way today, perhaps the mystery and sense of excitement has gone.

In the UK at least there appears to be less and less people taking Science and Mathematics. I am not sure what is behind this trend. I read pieces that suggest that Science and Maths are viewed as being “hard” subjects, and people opt for “easier” alternatives. I think creative writing is one of the hardest things to do, so I’m not sure where this perspective comes from.

Perhaps some things that don’t help are the twin images of the Scientist as a white-coated boffin and the Mathematician as a chalk-covered recluse, neither of whom have much of a grasp on the world beyond their narrow discipline. While of course there is a modicum of truth in these stereotypes, they are far from being wholly accurate in my experience.

Perhaps Science has fallen off of the pedestal that it was placed on in the 1950s and 1960s. Interest in Science had been spurred by a range of inventions that had improved people’s lives and often made the inventors a lot of money. Science was seen as the way to a better tomorrow, a view reinforced by such iconic developments as the discovery of the structure of DNA, our ever deepening insight about sub-atomic physics and the unravelling of many mysteries of the Universe. These advances in pure science were supported by feats of scientific / engineering achievement such as the Apollo space programme. The military importance of Science was also put into sharp relief by the Manhattan Project; something that also maybe sowed the seeds for later disenchantment and even fear of the area.

The inevitable fallibility of some Scientists and some scientific projects burst the bubble. High-profile problems included the Thalidomide tragedy and the outcry, however ill-informed, about genetically modified organisms. Also the poster child of the scientific / engineering community was laid low by the Challenger disaster. On top of this, living with the scientifically-created threat of mutually-assured destruction probably began to change the degree of positivity with which people viewed Science and Scientists. People arrived at the realisation that Science cannot address every problem; how much effort has gone into finding a cure for cancer for example?

In addition, in today’s highly technological world, the actual nuts and bolts of how things work are often both hidden and mysterious. While people could relatively easily understand how a steam engine works, how many have any idea about how their iPod functions? Technology has become invisible and almost unimportant, until it stops working.

I am a little wary of Governments fixing issues such as these, which are the result of major generational and cultural trends. Often state action can have unintended and perverse results. Society as a whole goes through cycles and maybe at some future point Science and Mathematics will again be viewed as interesting areas to study; I certainly hope so. Perhaps the current concerns about climate change will inspire a generation of young people to think more about technological ways to address this and interest them in pertinent Sciences such as Meteorology and Climatology.

Ajay-. How would you rate the various tools within the BI industry like in a SWOT analysis (briefly and individually)?

Peter- I am going to offer a Politician’s reply to this. The really important question in BI is not which tool is best, but how to make BI projects successful. While many an unsuccessful BI manager may blame the tool or its vendor, this is not where the real issues lie.

I firmly believe that successful BI rests on four mutually reinforcing pillars:

  • understand the questions the business needs to answer,
  • understand the data available,
  • transform the data to meet the business needs and
  • embed the use of BI in the organisation’s culture.

If you get these things right then you can be successful with almost any of the excellent BI tools available in the marketplace. If you get any one of them wrong, then using the paragon of BI tools is not going to offer you salvation.

I think about BI tools in the same way as I do the car market. Not so many years ago there were major differences between manufacturers.

The Japanese offered ultimate reliability, but maybe didn’t often engage the spirit.

The Germans prided themselves on engineering excellence, slanted either in the direction of performance or luxury, but were not quite as dependable as the Japanese.

The Italians offered out-and-out romance and theatre, with mechanical integrity an afterthought.

The French seemed to think that bizarrely shaped cars with wheels as thin as dinner plates were the way forward, but at least they were distinctive.

The Swedes majored on a mixture of safety and aerospace cachet, but sometimes struggled to shift their image of being boring.

The Americans were still in the middle of their love affair with the large and the rugged, at the expense of convenience and value-for-money.

Stereotypically, my fellow-countrymen majored on agricultural charm, or wooden-panelled nostalgia, but struggled with the demands of electronics.

Nowadays, the quality and reliability of cars are much closer to each other. Most manufacturers have products with similar features and performance and economy ratings. If we take financial issues to one side, differences are more likely to related to design, or how people perceive a brand. Today the quality of a Ford is not far behind that of a Toyota. The styling of a Honda can be as dramatic as an Alfa Romeo. Lexus and Audi are playing in areas previously the preserve of BMW and Mercedes and so on.

To me this is also where the market for BI tools is at present. It is relatively mature and the differences between product sets are less than before.

Of course this doesn’t mean that the BI field will not be shaken up by some new technology or approach (in-memory BI or SaaS come to mind). This would be the equivalent of the impact that the first hybrid cars had on the auto market.

However, from the point of view of implementations, most BI tools will do at least an adequate job and picking one should not be your primary concern in a BI project.

Ajay- SAS Institute Chief Marketing Officer, Jim Davis (interviewed with this blog) points to the superiority of business analytics rather than business intelligence as an over hyped term. What numbers, statistics and graphs would you quote rather than semantics to help re direct those perceptions?

I myself use SAS,SPSS, R and find the decision management capabilities as James Taylor calls Decision Management much better enabled than by simple ETL tools or reporting and aggregating graphs tools in many BI tools.

Peter- I have expended quite a lot of energy and hundreds of words on this subject. If people are interested in my views, which are rather different to those of Jim Davis, then I’d suggest that they read them in a series of articles starting with Business Analytics vs Business Intelligence [URL http://peterthomas.wordpress.com/2009/03/28/business-analytics-vs-business-intelligence/ ].

I will however offer some further thoughts and to do this I’ll go back to my car industry analogy. In a world where cars are becoming more and more comparable in terms of their reliability, features, safety and economy, things like styling, brand management and marketing become more and more important.

As the true differences between BI vendors narrow, expect more noise to be made by marketing departments about how different their products are.

I have no problem in acknowledging SAS as a leader in Business Analytics, too many people I respect use their tools for me to think otherwise. However, I think a better marketing strategy for them would be to stick to the many positives of their own products. If they insist on continuing to trash competitors, then it would make sense for them to do this in a way that couldn’t be debunked by a high school student after ten seconds’ reflection.

Ajay- In your opinion what is the average RoI that a small, large medium enterprise gets by investing in a business intelligence platform. What advice would you give to such firms (separately) to help them make their minds?

Peter- The question is pretty much analogous to “What are the benefits of opening an office in China?” the answer is going to depend on what the company does; what their overall strategy is and how a China operation might complement this; whether their products and services are suitable for the Chinese market; how their costs, quality and features compare to local competitors; and whether they have cracked markets closer to home already.

To put things even more prosaically, “How long is a piece of string?”

Taking to one side the size and complexity of an organisation, BI projects come in all shapes and sizes.

Personally I have led Enterprise-wide, all-pervasive BI projects which have had a profound impact on the company. I have also seen well-managed and successful BI projects targeted on a very narrow and specific area.

The former obviously cost more than the latter, but the benefits are commensurately greater. In fact I would argue that the wider a BI project is spread, the greater its payback. Maybe lessons can be learnt and confidence built in an initial implementation to a small group, but to me the real benefit of BI is realised when it touches everything that a company does.

This is not based on a self-interested boosting of BI. To me if what we want to do is take better business decisions, then the greater number of such decisions that are impacted, the better that this is for the organisation.

Also there are some substantial up-front investments required for BI. These would include: building the BI team; establishing the warehouse and a physical architecture on which to deliver your application. If these can be leveraged more widely, then costs come down.

The same point can be made about the intellectual property that a successful BI team develops. This is one reason why I am a fan of the concept of BI Competency Centres [URL http://peterthomas.wordpress.com/2009/05/11/business-intelligence-competency-centres/ ].

I have been lucky enough to contribute to an organisation turning round from losing hundreds of millions of dollars to recording profits of twice that magnitude. When business managers cite BI as a major factor behind such a transformation, then this is clearly a technology that can be used to dramatic effect.

Nevertheless both estimating the potential impact of BI and measuring its actual effectiveness are non-trivial activities. A number of different approaches can be taken, some of which I cover in my article:

Measuring the benefits of Business Intelligence [URL http://peterthomas.wordpress.com/2009/02/26/measuring-the-benefits-of-business-intelligence/ ]. As ever there is no single recipe for success.

Ajay-. Which BI tool/ code are you most comfortable with and what are its salient points?

Peter -Although I have been successful with elements of the IBM-Cognos toolset and think that this has many strong points, not least being relatively user-friendly, I think I’ll go back to my earlier comments about this area being much less important than many others for the success of a BI project.

Ajay -How do you think cloud computing will change BI? What percentage of BI budgets go to data quality and what is eventual impact of data quality on results?

Peter -I think that the jury is still out on cloud computing and BI. By this I do not mean that cloud computing will not have an impact, but rather that it remains unclear what this impact will actually be.

Given the maturity of the market, my suspicion is that the BI equivalent of a Google is not going to emerge from nowhere. There are many excellent BI start-ups in this space and I have been briefed by quite a few of them.

However, I think the future of cloud computing in BI is likely to be determined by how the likes of IBM-Cognos, SAP-BusinessObjects and Oracle-Hyperion embrace the area.

Having said this, one of the interesting things in computing is how easy it is to misjudge the future and perhaps there is a potential titan of cloud BI currently gestating in the garage so beloved of IT mythology.

On data quality, I have never explicitly split out this component of a BI effort. Rather data quality has been an integral part of what we have done. Again I have taken a four-pillared approach:

  • improve how the data is entered;
  • make sure your interfaces aren’t the problem;
  • check how the data has been entered / interfaced;
  • and don’t suppress bad data in your BI.

The first pillar consists of improved validation in front-end systems – something that can be facilitated by the BI team providing master data to them – and also a focus on staff training, stressing the importance to the organisation of accurately recording certain data fields.

The second pillar is more to do with the general IT Architecture and how this relates to the Information Architecture, again master data has a role to play, but so does ensuring that the IT culture is one in which different teams collaborate well and are concerned about what happens to data when it leaves “their” systems.

The third pillar is the familiar world of after-the-fact data quality reports and auditing, something that is necessary, but not sufficient, for success in data quality.

Finally there is what I think can be one of the most important pillars; ensuring that the BI system takes a warts-and-all approach to data. This means that bad data is highlighted, rather than being suppressed. In turn this creates pressure for the problems to be addressed where they arise and creates a virtuous circle.

For those who might be interested in this area, I expand on it more in Using BI to drive improvements in data quality [URL http://peterthomas.wordpress.com/2009/02/11/using-bi-to-drive-improvements-in-data-quality/ ].

Ajay- You are well known with England’s rock climbing and boulder climbing community. A fun question- what is the similarity between a BI implementation/project and climbing a big boulder.

Peter -I would have to offer two minor clarifications.

First it is probably my partner who is better known in climbing circles, via here blog [URL http://77jenn.blogspot.com/ ] and articles and reviews that she has written for the climbing press; though I guess I can take credit for most of the photos and videos.

Second, particularly given the fact that a lot of our climbing takes place in Wales, I should acknowledge the broader UK climbing community and also mention our most mountainous region of Scotland.

Despite what many inhabitants of Sheffield might think to the contrary, there is life beyond Stanage Edge [URL http://en.wikipedia.org/wiki/Stanage ].

I have written about the determination and perseverance that are required to get to the top of a boulder, or indeed to the top of any type of climb [URL http://peterthomas.wordpress.com/2009/03/31/perseverance/ ].

I think those same qualities are necessary for any lengthy, complex project. I am a firm believer that the true benefits of BI are only realised when it leads to cultural transformation. Certainly the discipline of change management has many parallels with rock climbing. You need a positive attitude and a strong belief in your ultimate success, despite the inevitable setbacks. If one approach doesn’t yield fruit then you need to either fine-tune or try something radically different.

I suppose a final similarity is the feeling that you get having completed a climb, particularly if it is at the limit of your ability and has taken a long time to achieve. This is one of both elation and deep satisfaction, but is quickly displaced by a desire to find the next challenge.

This is something that I have certainly experienced in business life and I think the feelings will be familiar to many readers.

Biography-

peter_thomas_015-w1000

Peter Thomas has led all-pervasive, Business Intelligence and Cultural Transformation projects serving the needs of 500+ users in multiple business units and service departments across 13 European and 5 Latin American countries. He has also developed Business Intelligence strategies for operations spanning four continents. His BI work has won two industry awards including “Best Enterprise BI Implementation”, from Cognos in 2006 and “Best use of IT in Insurance”, from Financial Sector Technology in 2005. Peter speaks about success factors in both Business Intelligence and the associated Change Management at seminars across both Europe and North America and writes about these areas and many other aspects of business, technology and change on his blog [URL http://peterthomas.wordpress.com ].

{ Comments on this entry are closed }