Interview Karen Lopez Data Modeling Expert

Zachman Framework

Image via Wikipedia

Here is an interview with Karen Lopez who has worked in data modeling for almost three decades and is a renowned data management expert in her field.

Data professionals need to know about the data domain in addition to the data structure domain – Karen Lopez

Ajay- Describe your career in science. How would you persuade younger students to take more science courses.

Karen- I’ve always had an interest in science and I attribute that to the great science teachers I had. I studied information systems at Purdue University though a unique program that focuses on systems analysis and computer technologies. I’m one of the few who studied data and process modeling in an undergraduate program 25+ years ago.

I believe that it is very important that we find a way of attracting more scientists to teach. In both the natural and computer sciences, it’s difficult for institutions to tempt scientists away from professional positions that offer much greater compensation. So I support programs that find ways to make that happen.

Ajay- If you had to give advice to a young person starting their career in BI and had to give them advice in just three points – what would they be?

Karen- Wow. It’s tough to think of just three things, but these are recommendations that I make often:

- Remember that every design decision should be made based on cost, benefit, and risk. If you can’t clearly describe these for every side of a decision, then you aren’t doing design; you are guessing.

- No one beside you is responsible for advancing your skills and keeping an eye on emerging practices. Don’t expect your employer to lay out a career plan that is in your best interest. That’s not their job. Data professionals need to know about the data domain in addition to the data structure domain. The best database or data warehouse design in the world is worse than uses useless if the how the data is processed is wrong. Remember to expand your knowledge about data, not just the data structures and tools.

- All real-world work involves collaboration and negotiation. There is no one right answer that works for every situation. Building your skills in these areas will pay off significantly.

Ajay- What do you think is the best way for a technical consultant and client to be on the same page regarding requirements. Which methodology or template have you used, and which has given you the most success.

Karen- While I’m a huge fan of modeling (data modeling and other modeling), I still think that giving clients a prototype or mockup of something that looks real to them goes a long way. We need to build tools and competencies to develop these prototypes quickly. It’s a lost art in the data world.

Ajay- What are the special incentives that make Canada a great place for tech entrepreneurs rather than say go to the United States. ( Note- Disclaimer I have family in Canada and study in the US)

Karen- I prefer not to think of this as an either-or decision. I immigrated to Canada from the US about 15 years ago, but most of our business is outside of Canada. I have enjoyed special incentives here in Canada for small businesses as well as special programs that allowed me to work in Canada as a technical professional before I moved here permanently.

Overall, I have found Canadian employers more open to sponsoring foreign workers and it is easier for them to do so than what my US clients experience. Having said that, a significant portion of my work over the last few years has been on global projects where we leverage online collaboration tools to meet our goals. The advent of these tools has made it much easier to work from wherever I am and to work with others regardless of their visa statuses.

Where a company forms is less tied to where one lives or works these days.

Ajay- Could you tell us more about the Zachman framework (apart from the wikipedia reference)? A practical example on how you used it on an actual project would be great.

Karen- Of course the best resource for finding out about the Zachman framework is from John Zachman himself http://www.zachmaninternational.com/index.php/home-article/13 . He offers some excellent courses and does a great deal of public speaking at government and DAMA events. I highly recommend anyone interested in the Framework to hear about it directly from him.

There are many misunderstandings about John’s intent, such as the myth that he requires big upfront modeling (he doesn’t), that the Framework is a methodology (it isn’t), or that it can only be used to build computer systems (it can be used for more than that).

I have used the Zachman Framework to develop a joint Business-IT Strategic Information Systems Plan as well as to inventory and track progress of multi-project programs. One interesting use was a paper I authored for the Canadian Information Processing Society (CIPS) on how various educational programs, specializations, and certifications map to the Zachman Framework. I later developed a presentation about this mapping for a Zachman conference.

For a specific project, the Zachman Framework allows business to understand where their enterprise assets are being managed – and how well they are managed. It’s not an IT thing; it’s an enterprise architecture thing.

Ajay- What does Karen Lopez do for fun when not at work, traveling, speaking or blogging.

Karen- Sometimes it seems that’s all I do. I enjoy volunteering for IT-related organizations such as DAMA and CIPS. I participate in the accreditation of college and university educational programs in Canada and abroad. As a member of data-related standards bodies, namely the Association for Retail Technology Standards and the American Dental Association, I help develop industry standard data models. I’ve also been a spokesperson for a CIPS program to encourage girls to take more math and science courses throughout their student careers so that they may have access to great opportunities in the future.

I like to think of myself as a runner; last year I completed my first half marathon, which I’d never thought was possible. I am studying Hindi and Sanskrit. I’m also a addicted to reading and am thankful that some of it I actually get paid to do.

Biography

Karen López is a Senior Project Manager at InfoAdvisors, Inc. Karen is a frequent speaker at DAMA conferences and DAMA Chapters. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. Karen is also the ListMistress and moderator of the InfoAdvisors Discussion Groups at www.infoadvisors.com. You can reach her at www.twitter.com/datachick

Interview Jill Dyche Baseline Consulting

Here is an interview with Jill Dyche, co-Founder Baseline Consulting and one of the best Business Intelligence consultants and analysts. Her writing is read by huge portion of the industry and has influenced many paradigms.She is also Author of e-Data, The CRM Handbook, and Customer Data Integration: Reaching a Single Version of the Truth.

BI tools are not recommended when they’re the first topic in a BI discussion.

Jill Dyche, Baseline Consulting

Ajay- What approximate Return of Investment would you give to various vendors within Business Intelligence?

Jill- You don’t kid around do you, Ajay? In general the answer has everything to do with the problem BI is solving for a company. For instance, we’re working on deploying operational BI at a retailer right now. This new program is giving people in the stores more power to make decisions about promotions and in-store events. The projected ROI is $300,000 per store per year—and the retailer has over 1000 stores. In another example, we’re working with an HMO client on a master data management project that helps it reconcile patient data across hospitals, clinics, pharmacies, and home health care. The ROI could be life-saving. So, as they say in the Visa commercials: Priceless.

Ajay- What is impact of third party cloud storage and processing do you think will be there on Business Intelligence consulting?

Jill- There’s a lot of buzz about cloud storage for BI, most of it is coming from the VC community at this point, not from our clients. The trouble with that is that BI systems really need control over their storage. There are companies out there—check out a product called RainStor—that do BI storage in the cloud very well, and are optimized for it. But most “cloud” environments geared to BI are really just hosted offerings that provide clients with infrastructure and processing resources that they don’t have in-house.  Where the cloud really has benefits is when it provides significant processing power to companies that can’t build it easily themselves.

Ajay- What are the top writing tips would you give to young struggling business bloggers especially in this recession.

Jill- I’d advise bloggers to write like they talk, a standard admonishment by many a professor of Business Writing. So much of today’s business writing—especially in blogs—is stilted, overly-formal, and pedantic. I don’t care if your grammar is accurate; if your writing sounds like the Monroe Doctrine, no one will read it. (Just give me one quote from the Monroe Doctrine. See what I mean?) Don’t use the word “leverage” when you can use the word “use.” Be genuine and conversational. And avoid clichés like the plague.

Ajay-  How would you convince young people especially women to join more science careers. Describe your own career journey.

Jill- As much as we need those role models in science, high-tech, and math careers, I’d tell them to only embrace it if they really love it. My career path to high-tech was unconventional and unintentional. I started as a technical writer specializing in relational databases just as they were getting hot. One thing I know for sure is if you want to learn about something interesting, be willing to roll up your sleeves and work with it. My technical writing about databases, and then data warehouses, led to some pretty interesting client work.

Sure I’ve coded SQL in my career, and optimized some pretty hairy WHERE clauses. But the bigger issue is applying that work to business problems. Actually I’m grateful that I wasn’t a very good programmer. I’d still be waiting for that infinite loop to finish running.

Ajay- What are the areas within an enterprise where implementation of BI leads to the most gains. And when are BI tools not recommended?

Jill- The best opportunities for BI are for supporting business growth. And that typically means BI used by sales and marketing. Who’s the next customer and what will they buy? It’s answers to questions like these that can set a company apart competitively and contribute to both the top and bottom lines.

Not to be too heretical, but to answer your second question: BI tools are not recommended when they’re the first topic in a BI discussion. We’ve had several “Don’t go into the light” conversations with clients lately where they are prematurely looking at BI tools rather than examining their overall BI readiness. Companies need to be honest about their development processes, existing skill sets, and their data and platform infrastructures before they start phoning up data visualization vendors. Unfortunately, many people engage BI software vendors way before they’re ready.

Ajay- You and your partner Evan wrote what was really the first book on Master Data Management. But you’d been in the BI and data warehousing world before that. Why MDM?

Jill- We just kept watching what our clients couldn’t pull off with their data warehouses. We saw the effort they were going through to enforce business rules through ETL, and what they were trying to do to match records across different source systems. We also saw the amount of manual effort that went into things like handling survivor records, which leads to a series of conversations about data ownership.

Our book (Customer Data Integration: Reaching a Single Version of the Truth, Wiley) has as much to do with data management and data governance as it does with CDI and MDM. As Evan recently said in his presentation at the TDWI MDM Insight event, “You can’t master your data until you manage your data.” We really believe that, and our clients are starting to put it into practice too.

Ajay- Why did you and Evan choose to focus on customer master data (CDI) rather than a more general book on MDM?

Jill- There were two reasons. The first one was because other master data domains like product and location have their own unique sets of definitions and rules. Even though these domains also need MDM, they’re different and the details around implementing them and choosing vendor products to enable them are different. The second reason was that the vast majority of our clients started their MDM programs with customer data. One of Baseline’s longest legacies is enabling the proverbial “360-degree view” of customers. It’s what we knew.

Ajay- What’s surprised you most about your CDI/MDM clients?

Jill- The extent to which they use CDI and MDM as the context for bringing IT and the business closer together. You’d think BI would be ideal for that, and it is. But it’s interesting how MDM lets companies strip back a lot of the tool discussions and just focus on the raw conversations about definitions and rules for business data. Business people get why data is so important, and IT can help guide them in conversations about streamlining data quality and management. Companies like Dell have used MDM for nothing less than business alignment.

Ajay- Any plan to visit India and China for giving lectures?

Jill- I just turned down a trip to China this fall because I had a schedule conflict, which I’m really bummed about. Far as India is concerned, nothing yet but if you’re looking for houseguests let me know.(Ajay- sure I have a big brand new house just ready- and if I visit USA may I be a house guest too?)

About Jill Dyche-

Jill blogs at http://www.jilldyche.com/. where she takes the perpetual challenge of business-IT alignment head on in her trenchant, irreverent style.

Jill Dyché is a partner and co-founder of Baseline Consulting. Her role at Baseline is a combination of best-practice expert, industry gadfly, key client advisor, and all-around thought leader. She is responsible for key client strategies and market analysis in the areas of data governance, business intelligence, master data management, and customer relationship management. Jill counsels boards of directors on the strategic importance of their information investments.

Author

Jill is the author of three books on the business value of IT. Jill’s first book, e-Data (Addison Wesley, 2000) has been published in eight languages. She is a contributor to Impossible Data Warehouse Situations: Solutions from the Experts (Addison Wesley, 2002), and her book, The CRM Handbook (Addison Wesley, 2002), is the bestseller on the topic.

Jill’s work has been featured in major publications such as Computerworld, Information Week, CIO Magazine, the Wall Street Journal, the Chicago Tribune and Newsweek.com. Jill’s latest book, Customer Data Integration (John Wiley and Sons, 2006) was co-authored with Baseline partner Evan Levy, and shows the business breakthroughs achieved with integrated customer data.

Industry Expert

Jill is a featured speaker at industry conferences, university programs, and vendor events. She serves as a judge for several IT best practice awards. She is a member of the Society of Information Management and Women in Technology, a faculty member of TDWI, and serves as a co-chair for the MDM Insight conference. Jill is a columnist for DM Review, and a blogger for BeyeNETWORK and Baseline Consulting.

Interview KXEN Bruno Delahaye

In my continuing coverage of KXEN, the plucky company that has managed to revolutionize analytics automation and social network analysis- Here is an interview with KXEN’s Vice President Bruno Delahaye.

246ee7c

Ajay – What is the best feature you like in KXEN. – both   as a company and as a product.

Bruno- Well actually what I like the most about KXEN is the will to make a difference. This is true at different levels of course: each individual within the company is trying to make things happen. For employees at KXEN this is not just a job: they want to change the game! The product side is naturally cascading from this. We are not simply recoding existing algorithms like some of our competitors are doing, instead we are looking in every domain of predictive and descriptive analytics where we can deliver higher value to our customers. When customers, thanks to the automation we provide, come back to us stating that they manage to increase their modeling productivity by 10 or even 50 compared to their previous modeling process we really think that what we provide is changing the game. Also, the fact that we have well over 500 customers globally today is proving that our customers recognize this as well!

Ajay : What areas has KXEN been most suitable for ? Biggest success story so far.
Bruno- KXEN has been very successful for 2 types of customers. We have been very successful in companies with mature Data Mining practices, companies that have realized that they need to move from a fully hand crafted approach to a more industrialized one in order to answer business requirements. As an example, lots of large companies run 10s of marketing campaigns per month and actually use data mining for only 1 or 2 at best… once organizations have understood the power of Data Mining they certainly want to target each campaign. Only KXEN can provide the level of automation required for this. On the other side, new data mining users (either new companies or new departments in a company) are also very eager to use KXEN. The learning curve with KXEN is so quick that it enables them to use their existing team (the ones that are aware of the business issues) and make them run within few days successful churn management programs or rebuild their customer segmentation in a reliable manner.

If you were expecting figures here, some Vodafone entities are claiming that they reduce churn in some customer segments by more than 10% by implementing KXEN. Unicredit in Austria mentioned that due to KXEN they gained an additional 50m€ per season….as you can guess the success of our customers always brighten our days.

Ajay : What areas would you rather not recommend KXEN? What other software would you recommend in those cases ?

Bruno- Well, I would recommend to use KXEN in every area of course, nevertheless where we have been less successful so far is with companies where time pressure to deliver analysis is lower. Basically, research departments tend to use more softwares like SAS EM or SPSS Clementine that are more methods/algorithms oriented rather than results oriented.

Ajay : What is the biggest challenge you have faced while introducing KXEN to a wider audience.
Bruno- The bigger challenge we have is in building domain expertise, it is indeed very difficult to build knowledge of our teams at the same time in Customer Lifecycle Analytics, in HRM, SCM… that is where building a confident relationship with the customer is so important. We have to prove to our prospect very early in the discussions that with KXEN they will make significant steps forward! This is also where our partner are so important to us. KXEN works with international as well as local partners with specific expertise to help our customers make the best possible use of the KXEN Data mining software to insure a high and fast ROI.

Ajay -Do you think the text mining as well as the Data Fusion approach can work for online web analytics, search engines or ad targeting?

Bruno- The data fusion approach is certainly one that makes sense for online web analytics. Analyzing the sequence of events rather than just taking into account whether an event occurs is actually a very powerful way to predict customer behavior or in this case the next click or the next action that is going to be made. I am not in this case claiming that everything has to be real-time as this could be the cause of the creation of weak or even unreliable/non stable models. Instead what we recommend our customer to do is to split the learning part that can be made off-line from the deployment that needs to be done real-time.

Ajay- Describe the relationships of KXEN with other members of the business intelligence community in terms of alliances.

Bruno- KXEN is a very good complement to BI vendors. We are actually partnering with several Data warehouse Vendors. For Data warehouse, the equation is quite simple they allow customers to structure and store the data but to provide real ROI, solutions need to be plugged on top of them. Setting a Data warehouse if you do not use the stored data is just another cost, what KXEN does is enabling to take advantage of the data asset to build customer segments that you will use to define your marketing mix, or simply target your customer either for cross-selling, up-selling or retention/loyalty purposes. The same is valid for credit scoring, fraud detection….

Case Study- Assume I have 50000 leads daily on a Car buying website. How would KXEN help me in scoring the model (as compared to other online based scoring solutions). Is it technically possible for me to install KXEN on Windows/ Other instances in remote computing like Amazon EC2 and not a server sitting somewhere.

The key difference, I believe, is that with KXEN you will indeed be able to do this even if you are not a data mining expert, if you want to use the results of yesterday’s campaigns to rebuild a model and if you can only afford to spend 10 minutes on this task every day. At the end of the day what we allow our users is to answer their business questions within the time frame they have rather than trying to convince them that they do not really need to do so many analysis for their business to run successfully

Ajay- And that was Bruno, VP, EMEA, KXEN. His profile can be seen here

http://www.linkedin.com/in/brunodelahaye

Bruno Delahaye manages KXEN’s operations for Continental Europe, Middle East, Africa and South America at KXEN. He is responsible for identifying and managing key partnership opportunities and developing the overall strategy for new partnerships.

For more on KXEN please go to www.kxen.com, you may need to regsiter to download their properietary white papers on Structural Risk Management or Text Mining.

Conflict of Interest Disclaimer-I am a consultant to KXEN as a social media consultant. Chairman Roger Hadaad was one of the first Chairman of a major corporation to agree to give interview to this small blog.

Edith on GT : A BI solution for Advanced Data Mining

 

    About the Author-   Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 she started in SMU Singapore the development of GT software. She is involved in several areas of implementation, such as: BI, Quality Control, Bio-med and Research. She manages a DM forums with Israel Engineering Association and a DM forum with the Data Warehouse site (Israel). She is a member and active participant in a number of DM forums, give presentations, and write articles.

December 31, 2008

GT data mining of NYSE companies – example

This is an example of data mining with GT, based on web free data from http://www.ics.uci.edu.

The purpose is to demonstrate the ability to create a coherent explanation to complex, partial, incomplete, non-representative and unsupervised data. In this case the data also is restricted to a single point of time and exclude information regarding shares, and therefore is particularly difficult for analytics.

Given: two sets of 1000 records each about companies in the New York Stock Exchange year 2000 (just before the dotcom bubble burst). The records include 22 attribute describing the company field, its state of investments, assets, liabilities, expenses, R&D, sales, profits, dividends and other major elements from the Public Report Statement, except information regarding shares.

The method:

1. Define clusters based on just half of the data, find their characteristics and drivers, and conclude about the phenomena which they may represent.

2. Validate the results by projecting them on the other half of data. Once the stability of conclusions is re-affirmed, the following last part of the analysis.

3. Interpretation takes place. Usually it is done in collaboration with the client, in the example it shows basically just in outlines to give a sense of it.

General observation

The "heart" of the analytics is in the automatic clustering – here the pattern splits to two, and between them an exceptions subgroup:

1. Financially intense industries, such as Banking, Financial Services, Energy and Real Estates; an exception subgroup some of which financial companies have an extremely high sales profit margin – see discussion in Fig.4.

2. The rest of industries – Business Services, Transportation, Communication, Technology at large, Raw Materials, and Health Care. See Cluster map Fig.1.

image001

Fig. 1 Cluster map: strong relations among record clusters are marked by Red Purple, no-relations are marked in Light Green. The map shows polarized patterns, the financial (in the low top) and the rest. Next to the Financial pattern there is a small exception sub-group, titled in Red. Note that the Technology pattern is much diverse

Conclusions and explanations

After clustering of the data, the pattern and characteristics become easier to spot, and their typical behavior is more noticeable. Following is the description of clusters that were found with GT, and an interpretation of their typical behavior.

1. False profitability – a warning sign

GT finds that some Technology companies "behave" like financial companies, instead of their own industry’s behavior. It may be explained by the ease of raising money in 2000 "heated" Stock Exchange, and the practical option that was opened to companies to use the excessive funds for financial activities. In such a case, the reported high profits of companies may be a symptom of a dangerously inflated market rather than sign of sound companies, and while the graphs which show profitability encourage investors to continue in that practice, they are racing toward a dead end – the DotCom crisis.

2. Investments in loosing companies – high risk

In the Technology cluster, there are companies that have substantial losses yet manage to attract massive investments. Their characteristics are: low levels of long term liabilities and long term assets, and a high level of preferred stocks. An unlikely negative relation (instead of a positive one) is found to exist in these companies between Total Assets and Net Income. See Fig.2.

image003

Fig. 2 Technology companies: special behavior. In the red part there is an irrational phenomenon, where losing companies seem to attract investments

3. Conglomerates with "Banks" traits – need to be looked into

GT defines at the margins of the Financial and Technology clusters, a number of conglomerates, all of which have an exceptional "behavior". Although there are mainly industrial companies, their patterns resemble the Energy and the Financial ones.

Remark: knowing the characteristics of the exception behavior enables the analyst to "comb" the entire database and find by the use of a straight query, additional companies that might demonstrate similar irregularity, for close up study.

image005
Fig. 3 In Statistics the special pattern of Technology does not show; it is un-distinguishable from the general pattern of behavior

GT Second edition note

The upgraded new version of GT shades more light over the 2000 phenomenon and reveals among the rest an interesting exceptional behavior of a few financial organizations, which apparently found a different way to make money… Their profit seems to enjoy a much larger net value than of other companies. See the chart below. The organizations are: HSBC Holdings PLC, Chase Manhattan Corp., and Societe Generale Group. This fact may be part of the kind of practices that have led 8 years later in 2008 to the "credit crunch".

image007

Fig. 4 Exceptional high sales profits ratio is observed in a sub-group of financial organizations – companies such as HSBC Holdings, Chase Manhattan, and Societe General

Final words

GT produces a fresh view of complex unsupervised data. It can track
down even minute and rare phenomena (3 out of 1000 companies), an give early signals to financial managers and analysts about the things to come, their patterns, spread, drivers and key indicators. The study of this example belongs to a series of applications in which GT has consistently turns out ordinary data to new revelations on "what makes it tick".

image008

© Edith Ohri

Procedureware Ltd. POB 16558 Tel-Aviv 61165

Tel: 972-3-5232164 edit@actcom.co.il

More Analytics in the Cloud

Here is a company called www.birst.com which does this- upload data, crunch and share it.

 

image

 

Other softwares include Cloudbase released by www.business.com and available at http://cloudbase.sourceforge.net/

"CloudBase is a data warehouse system for Terabyte and Petabyte scale analytics. It is built on top of Map-Reduce architecture. The current code has been developed to Hadoop‘s map-reduce implementation. CloudBase allows you to query flat log files using ANSI SQL. It comes with JDBC driver so you can use any JDBC database manager application (e.g Squirrel) as front end. CloudBase is developed by Business.com and is released to open source community under GNU General Public License 2.0." 

 

A third product is Vertica , which can be seen here http://www.vertica.com/cloud

 

image

The benefits are "

Vertica Analytic Database for the Cloud is an on-demand version of Vertica’s blazingly fast, grid-enabled columnar database hosted on Amazon’s Elastic Compute Cloud. The pay-as-you-go offering enables companies to create large, high-performance analytic data marts without upfront data center costs and delays.

Built for the Cloud
Vertica is the only cloud-based analytic database with the following innovations, which enable it to manage terabytes of data faster and more reliably than any other cloud database:

  • “Scale-out” grid architecture – handles changing workloads as elastically as the cloud
  • Aggressive data compression – keeps storage costs low
  • Automatic K-Safety – provides replication, failover and recovery in the cloud

New Business Intelligence Possibilities
Vertica for the Cloud completely changes the economics of BI, making it possible to rapidly
initiate a much broader spectrum of analytic projects and businesses:

  • Ad-hoc and short-lived business analytic projects
  • New analytic Software as a Service (SaaS) businesses
  • Vertica Analytic Database proof of concept projects

Benefits of Vertica for the Cloud:

  • Fastest “Time to Terabyte” – Fully provisioned and ready for loading within minutes
  • Fastest performance – 100x to 1000x faster than other cloud databases
  • Runs 24×7 – Automatic K-Safety makes Vertica the only failure-resilient analytic cloud DBMS
  • Lowest startup cost – No upfront hardware, data center or admin overhead. Just pay for database usage until you’re done, then stop paying for it
  • Painless scalability – Scales seamlessly as data volume changes
  • Smallest footprint – Compresses data up to 90% to lower costs and improve performance
  • Proven platform – Hosted by Amazon, within their proven data center

"

 

But if you want to directly start experimenting with the Amazon Ec2 , costs are not verey high. Remember it is a pay as you go system. As an analytics supplier looking to cut costs , the cloud computing paradigm seems the fastest way to do so.

http://aws.amazon.com/ec2/instance-types/

image