Jill Dyche on 2012

In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.

Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/

Ajay: What are the top trends you saw happening in 2011?

 

Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:

 

1.       In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.

 

2.       Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…

 

3.       No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.

 

But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.

 

Ajay: What are some of those trends?

 

Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!

 

Ajay: Anything surprise you, Jill?

 

I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.

 

I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.

 

I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.

 

About-

Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.

Interview Scott Gidley CTO and Founder, DataFlux

Here is an interview with Scott Gidley, CTO and co-founder of leading data quality ccompany DataFlux . DataFlux is a part of SAS Institute and in 2011 acquired Baseline Consulting besides launching the latest version of their Master Data Management  product. [Read more...]

Decisionstats Update

A List of All Decisionstats Interviews Till Now-

http://goo.gl/Xpo5y

 

Date Name of Interviewee Designation and Organization Url
6/26/2011 Elissa Fink VP Tableau Software http://decisionstats.com/interview-elissa-fink-vp-tableau-software/
6/14/2011 Gaurav Vohra Jigsaw Academy http://decisionstats.com/interview-gaurav-vohra-jigsaw-academy/
5/27/2011 Rob La Gesse Chief Disruption Officer ,Rackspace Hosting. http://decisionstats.com/interview-with-rob-la-gesse-chief-disruption-officer-rackspace/
5/24/2011 Sandro Saitta Data Mining Blog http://decisionstats.com/interview-top-data-mining-blogger-on-earth-sandro-saitta/
2011 2/21/2011 Anne Milley Senior Director JMP http://decisionstats.com/interview-anne-milley-jmp/
2/11/2011 David Katz Senior Analyst , Dataspora http://decisionstats.com/interview-david-katz-dataspora-david-katz-consulting/
1/18/2011 Carole Anne Matignon Sparkling Logic http://decisionstats.com/carole-ann%E2%80%99s-2011-predictions-for-decision-management/
1/12/2011 Luis Torgo Author,Data Mining with R http://decisionstats.com/interview-luis-torgo-author-data-mining-with-r/
1/8/2011 Ajay Ohri Decisionstats http://decisionstats.com/interview-ajay-ohri-decisionstats-com-with-dmr/
2010 11/29/2010 Timo Elliott SAP, Business Objects http://decisionstats.com/brief-interview-timo-elliott/
11/27/2010 Jill Dyche founder ,Baseline Consulting (acquired by DataFlux/SAS Institute) http://decisionstats.com/short-interview-jill-dyche/
11/25/2010 James Kobielus Senior Analyst, Forrester. http://decisionstats.com/brief-interview-with-james-g-kobielus/
11/22/2010 Jamie Nunnelly Communications Director of National Institute of Statistical Sciences http://decisionstats.com/interview-jamie-nunnelly-niss/
11/14/2010 James Dixon founder ,Pentaho http://decisionstats.com/pentaho/
10/12/2010 John F Moore CEO The Lab http://decisionstats.com/interview-john-f-moore-ceo-the-lab/
10/5/2010 Michael J. A. Berry Data Miners, Inc http://decisionstats.com/interview-michael-j-a-berry-data-miners-inc/
9/30/2010 Dean Abbott Abbott Analytics http://decisionstats.com/interview-dean-abbott-abbott-analytics/
8/23/2010 Stephanie McReynolds Director Product Marketing, AsterData http://decisionstats.com/stephanie/
8/3/2010 David Smith VP, Revolution Analytics http://decisionstats.com/q-a-with-david-smith-revolution-analytics/
6/29/2010 Bob Muenchen Author, R For Stata http://decisionstats.com/interview-r-for-stata-users/
3/13/2010 Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics
1/12/2010 Interview Hadley Wickham R Project Data Visualization Guru
1/6/2010 Audio Interviews -Dr. Colleen McCue National Security Expert
12/30/2009 Interview Sarah Blow – Girly Geekdom Founder
12/7/2009 Interview Donald Farmer Microsoft
11/24/2009 M2009 Interview Peter Pawlowski AsterData
11/19/2009 Interview Phil Rack WPS Consultant and Developer
11/10/2009 Data Mining 2009 Interviews- Terry Whitlock, BlueCross BlueShield of TN
11/2/2009 Audio Interview Anne Milley , Part 1
10/21/2009 Interview Carole Jesse Experienced Analytics Professional
10/5/2009 Interview Michael Zeller,CEO Zementis on PMML
10/5/2009 Interview Ken O Connor Business Intelligence Consultant
10/1/2009 Interview Shawn Kung Sr Director Aster Data
9/28/2009 Interview Thomas C. Redman Author Data Driven
9/25/2009 Interview Augusto Albeghi (Straycat) —Founder Straysoft
9/20/2009 Interview Evan Levy Baseline Consulting
9/18/2009 Interview James Taylor Decision Management Expert (Updated)
9/16/2009 Interview Timo Elliott SAP
9/14/2009 Interview Professor John Fox Creator R Commander
9/10/2009 Interview Stephen Baker Author The Numerati
9/9/2009 Interview Jeff Bass, Bass Institute (Part 2)
9/9/2009 Interview Neil Raden Founder of Hired Brains Inc
8/29/2009 Interview Dylan Jones DataQualityPro.com
8/13/2009 Interview Gregory Piatetsky KDNuggets.com
8/13/2009 Interview Tasso Argyros CTO Aster Data Systems
8/13/2009 Interview Steve Sarsfield Author The Data Governance Imperative
8/11/2009 Interview Dr Usama Fayyad Founder Open Insights LLC
7/28/2009 Interview Karen Lopez Data Modeling Expert
7/28/2009 Interview John Sall Founder JMP/SAS Institute
2009 7/16/2009 Jim Harris OCDQ Blog http://www.decisionstats.com/2009/07/16/interview-jim-harris-data-quality-evangelist/
7/14/2009 Eric Siegel Founder, Predictive Analytics World http://www.decisionstats.com/2009/07/14/interview_eric-siege/
7/10/2009 Gary D Miner Author ‘Handbook of Statistical Analysis and Data Mining Applications’ http://www.decisionstats.com/2009/07/10/interview-gary-d-miner-author-and-professor/
7/3/2009 John F Moore CTO, Swimfish http://www.decisionstats.com/2009/07/03/interview-john-moore-cto-swimfish/
7/2/2009 Peter J Thomas Award Winning BI Expert http://www.decisionstats.com/2009/07/02/peter-james-thomas-bi/
6/30/2009 Alison Bolen Editor- in- Chief SAS.COM http://www.decisionstats.com/2009/06/30/interview-alison-bolen-sas-com/
6/30/2009 Jill Dyche Co- Founder, Baseline Consulting http://www.decisionstats.com/2009/06/30/interview-jill-dyche-baseline-consulting/
6/18/2009 Gary Cokins Senior Leader, Performance Management SAS Institute http://www.decisionstats.com/2009/06/18/interview-gary-cokins-sas-institute/
6/9/2009 Karl Rexer President, Rexer Analytics http://www.decisionstats.com/2009/06/09/interview-karl-rexer-rexer-analytics/
6/5/2009 Jim Daves CMO, SAS Institute http://www.decisionstats.com/2009/06/05/interview-jim-davis-sas-institute/
6/4/2009 Paul van Eikeren President and CEO, Blue Reference http://www.decisionstats.com/2009/06/04/inference-for-r/
5/29/2009 David Smith Director of Community, REvolution Computing http://www.decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/
5/17/2009 Dominic Pouzin CEO, Data Applied http://www.decisionstats.com/2009/05/17/interview-dominic-pouzin-data-applied/
5/11/2009 Bruno Delahaye VP, KXEN http://www.decisionstats.com/2009/05/11/interview-kxen-bruno-delahaye/
5/4/2009 Ron Ramos Director, Zementis http://www.decisionstats.com/2009/05/07/interview-ron-ramos-zementis/
4/30/2009 Oliver Jouve VP, SPSS Inc http://www.decisionstats.com/2009/04/30/interview-spss-olivier-jouve/
4/21/2009 Fabian Dill Co- Founder, Knime.com http://www.decisionstats.com/2009/04/21/interview-knime-fabian-dill/
4/18/2009 Alicia Mcgreevey Head Marketing, Visual Numerics http://www.decisionstats.com/2009/04/18/interview-visual-numerics-alicia-mcgreevey/
3/27/2009 Francoise Soulie Fogelman VP, KXEN http://www.decisionstats.com/2009/03/27/interview-franoise-soulie-fogelman-kxen/
3/17/2009 Jon Peck Principal Software Engineer, SPSS Inc http://www.decisionstats.com/2009/03/17/interview-jon-peck-spss/
3/6/2009 Anne Milley Director of product marketing, SAS Institute http://www.decisionstats.com/2009/03/06/interview-with-anne-milley-sas-ii/
3/4/2009 Anne Milley Director of product marketing, SAS Institute http://www.decisionstats.com/2009/03/04/interview-anne-milley-sas-part-1/
2/3/2009 Phil Rack Creator, Bridge to R,and CEO Minequest http://www.decisionstats.com/2009/02/03/interview-phil-rack/
2/3/2009 Michael Zeller CEO, Zementis http://www.decisionstats.com/2009/02/03/interview-michael-zeller-ceozementis/
1/31/2009 Richard Schultz CEO, Revolution Computing http://www.decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/
1/21/2009 Bob Muenchen Author, R for SAS and SPSS Users http://www.decisionstats.com/2009/01/21/r-for-sas-and-spss-users/
1/13/2009 Dr Graham Williams Creator, Rattle GUI for R http://www.decisionstats.com/2009/01/13/interview-dr-graham-williams/
1/5/2009 Roger Haddad CEO, KXEN http://www.decisionstats.com/2009/01/05/interview-roger-haddad-founder-of-kxen-automated-modeling-software/
2008 9/26/2008 June Dershewitz VP, Semphonic http://www.decisionstats.com/2008/09/26/online-analytics-june-dershewitz/
9/4/2008 Vincent Granville Head, Analyticbridge http://www.decisionstats.com/2008/09/04/the-worlds-largest-analytics-networker/

 

 

Analytics 2011 Conference

From http://www.sas.com/events/analytics/us/

The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.

Conference Details

October 24-25, 2011
Grande Lakes Resort
Orlando, FL

Analytics 2011 topic areas include:

  • Data Mining
  • Forecasting
  • Text Analytics
  • Fraud Detection
  • Data Visualization [Read more...]

Augustus- a PMML model producer and consumer. Scoring engine.

A Bold GNU Head

Image via Wikipedia

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Recent News

  • Augustus v 0.4.3.1 has been released
  • Added a guide (pdf) for including Augustus in the Windows System Properties.
  • Updated the install documentation.
  • Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
  • Added performance discussion concerning the optional cyclic garbage collection.

See Recent News for more details and all recent news.

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.


Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) ModelsTree ModelsRegression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

  1. Identify suitable data with which to construct a new model.
  2. Provide a model schema which proscribes the requirements for the model.
  3. Run the Augustus producer to obtain a new model.
  4. Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

  • Re-normalize the scoring results or performing an additional transformation.
  • Supplements the results with global meta-data such as timestamps.
  • Formatting of the results.
  • Select certain interesting values from the results.
  • Restructure the data for use with other applications.