Upcoming and Up and Coming Topics

It’s funny the different meanings words have when you put them in different order, a point which anyone who has imitated the dialectic of Yoda can tell you. I find words fascinating as they are not static but have meanings which change over time. For example the Iron Maiden meant something totally different before there were electric guitars. Thinking of works and things changing, as one year closes and another year begins, I start to evaluate past and future topics. Earlier this year, I held an informal poll on twitter to find out how long people tend to talk on the same topic. The answers were quite varied. Some people keep on talking about the same topic as long as there seems to be interest in hearing about it. That way you can get to be a really good speaker on that topic. Another feels obligated to create a new topic each time out to provide him a challenge. The answer that personally I related to, was keep on talking about the topic until you are tired of hearing about it, which takes about a year.

SQL Saturday Albuquerque

sqlsat358_ABQMy first upcoming engagement for 2015 will be as SQL Server Albuquerque where I will be talking about SSIS. I generally talk about things I am interested in or presently working on, and having working on a lot of ETL recently, I thought that it would be an interesting topic which I think most people would find helpful. As a consultant, I see a lot of code and wonder why parts of it were written that way. One big reason is someone thought the design was a good one. Since that is an objective decision, I thought it might be helpful to clarify design decisions with facts so that that people would be able to employ good logic for their design decisions.

Technology changes and their Impact on Data Development

Another topic which really interests me is the changes that new technologies are having on the database world. With the increased implementation of Hadoop and cloud things are really changing in the way data is being both stored and used. Predictive Analytics, Machine Learning, Cloud implementations, Interactive Data visualizations are changing what people are expecting from the way their data is stored and used. Expectations for data professionals are increasing as the business is looking away from HIPPO and towards the knowledge that they have gathered or integrated data from public sources.

Modern Data Warehouse

I have the pleasure of assisting in a day-long session to talk about Architecting the Modern Data Warehouse . During this one day session we will be showing how to use new technology such as HD Insight and Machine Learning to implement a modern data warehouse. Instead of just talking about new technologies we will be putting them to use to show how they can be used today. I’m really looking forward to it.

If you are able to attend any of these or any upcoming sessions, please stop by and introduce yourself as I would love to meet readers of my blog in person.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

When the Process matters nearly as much as Results

There are times when the results aren’t nearly as important as the process of doing as people learn things from figuring out what not to do. This applies if you are troubleshooting why your project isn’t working right or when your database crashes in the middle of the night. Knowing what not to waste your time with is the key to getting things back up and running. It’s the knowledge that you have gained doing the little things that add up to being the person who can fix things quickly. Often times, the little things add up to something big. It appears this has happened to me.

Finalist in Tribal Awards

I was most honored to be included in SQLServerCentral/Simple Talk Tribal Awards Finalist in the Best New Community Voice category. To be recognized among all the other people who speak and write about SQL Server is quite an honor. The only way that this could have happened is a myriad of people that I’ve met while speaking at various community events and reading this blog remembered my name when looking at a blank line on the Tribal Award. I was so surprised when I saw that my name was listed on the award I nearly fell out of my chair. To everyone who thought enough of me to enter my name in the Best New Community Voice thank you. I really appreciate it.

Nomination and Winning

Regardless of the outcome, especially given the other nominees in the category, I feel that I already won.  Being a finalist is a real honor. After all there are so many other people who are also really involved in the community. I’ve been fortunate to visit with many of you who I’ve met either online or at various events. Thank you so much for to the people who thought of me, as I really appreciate your kind thoughts.  This process mattered as much as the result, what ever it turns out to be.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Presentation Follow up to Data Quality Issues? There is a Service for that

PASSBIThank you to all of you who were able to attend my latest presentation. I received a number of great questions during the presentation and wanted to make sure that I was able to provide answers. I also wanted to say thank you for the people who provided feedback in the question and answer section as that was very thoughtful and most appreciated.

Questions and Answers

I assume DQS is included with the existing SQL Server license? Is DQS included with enterprise edition? Can you provide some idea of what the licensing costs are? What are the licensing cost for Master Data Services?

Because I don’t work for Microsoft, I hesitate to answer any but the most basic questions regarding licensing, but I can say DQS and MDS are included in the licensing costs for SQL Server. Neither are available in the standard edition either. For more information about what is included, see here http://technet.microsoft.com/en-us/library/cc645993#Other_Components or contact Microsoft directly.

Do you know of any testing of scalability that would provide some insight into the scalability of DQS?

The best place I can point you to for more information on DQS scalability is here http://www.microsoft.com/en-us/download/details.aspx?id=29075

It seems that DQS appears similar to Fuzzy Lookups and Fuzzy Grouping. Is that pretty much what DQS is using?

While the logic employed in the SSIS components Fuzzy Lookup and Fuzzy Grouping are similar, but not functionally equivalent. DQS uses logic much closer to that of Fuzzy Grouping, as both apply a rules to a set of data and set similarity thresholds to determine success. Where they differ is in DQS’ use and development of the knowledge base to continually improve the accuracy.

How do you see DQS and MDS working together? What gets done where? How does DQS integrate with MDS? Do you use MDS to fix the data?

One of Microsoft’s definition of Master Data Management is “we define Master Data Management (MDM) as the technology, tools, and processes required to create and maintain consistent and accurate lists of master data.” MDS contains a framework for identifying and implementing workflows, security, business rules and auditing to the business data. DQS is part of that process that can be used by the MDS product to accomplish these tasks. To best illustrate how they work together, I’d recommend downloading their DQS and MDS tutorial. At a high level MDS enforces rules for cleaning the data and DQS can be implemented as a part of that process. For example before adding data to MDS, you can use DQS to clean the data prior to evaluation, but it’s use isn’t required.

In your SSIS demo, what additional steps would you recommend to have the data steward process the invalid data so that it can be reprocessed under SSIS?

Ideally, when data fails a quality process, the data should be placed in a review table so that it can be reviewed by the data steward. Having a person resolve the issues and add them to the knowledgebase will continually improve the knowledgebase over time.

When building a knowledgebase within DQS, can you include multiple columns for evaluation within one table?

DQS is not limited to evaluating one column at a time. You can use combinations of columns within a table to determine what the correct values are.

 

If you missed my session, a recording is available here , so please feel free to review the video when you have a chance.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

DQS – DIY Guide to Getting Started with Data Quality Services

SQLServerDBListPower BI and SQL Server 2012 (and beyond) both have components Microsoft developed to shift the focus of Business Intelligence onto business users. Since there are more people who use Excel than have eaten at McDonald’s, you may know about Power BI and the data features in Excel, but have you heard of Data Quality Services [DQS]? If not, well now you have. To get started, if you don’t have Microsoft’s developer version of SQL Server 2012 or later, or access to an enterprise or BI server somewhere, you will need $59.95 to get a copy of the Developer Edition SQL Server 2014. Interestingly Microsoft won’t let you buy the Developer Edition SQL Server 2012, which I found out when tried earlier in the year. Unfortunately, although Visual Studio is now free, you still have to pay for the Developer Edition of SQL Server.

After you have installed SQL Server and selected that you wanted Data Quality Services, one would assume you had installed it. While that sounds like sterling logic, it is not correct. Here’s how to tell if you have DQS. Open up SSMS on your computer and look at the list of databases like I did here. Do you see any databases here which start with DQS? No. That is because it hasn’t been installed yet. It sure looks like it is if you look at the sql installer, which I have included below. I added the red boxes to highlight the fact that I really did select Data Quality Services when I installed.

If you don’t have the Data Quality Services and Data Quality Client installed in SQL Server like they are here, you will need to do that first, but this is only the first step. Once the install screen looks like the one pictured above, you need to go to the Data Quality Services folder in SQL Server and select the SQL Server Data Quality Server Installer. After this package is run, which takes a while, you will finally get a screen that lets you know the installation is finally completed.

DQSInstallSuccessfulScreenAfter DQS Server has installed, you will see that 3 databases have been added: DQS_Main, DQS_Projects and DQS_Staging_Data. Once these three databases are installed, you can then start using the DQS Client.

DQSDBList

The DQS client does not need to be installed on a server. Since I highly doubt most places will want their business users to be directly accessing their Server, most of the time it will not be installed on the server.

Once your environment is set up, it’s time to start using it. For more information on how to use DQS, please listen to my presentation on the PASS BI virtual chapter on November26. If you can’t make it, generally speaking it will be available on PASS BI’s You Tube Channel after about a week. I sincerely hope you can make it. Let me know what you think of my presentation by posting feeback to my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Introduction to Hadoop Presentation Follow-up

Thank you so much for everyone who was able to attend my webinar http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/676 . (If you weren’t able to attend, you can always click on the link for a recording)

It’s always hard to talk about Hadoop as the subject is so broad that there were a lot of things that I had to leave out, so it is fortunate that I have this blog to discuss the topics I wasn’t able to cover. I thought that I would take this time to respond to the questions I received.

Presentation Q & A

Do you need to Learn Java in order to develop with Hadoop?

No. If you wish to develop Hadoop in the cloud with HD Insight, you have the option of developing with .net. If you are working in the Linux environments, which is where a lot of Hadoop is being developed, you will need to learn Java.

Do you know of any courses or sessions available where you can learn about Big Data or Hadoop?

My friend Josh Luedeman is going to be teaching an online class on Big Data next year.  If you don’t want to wait that long I recommend checking out a code camp in your area, such as Desert Code Camp where they are offering courses in Azure,  or SQL Saturday, especially the BI editions

How do you recommend a person with a BI background in SQL get started in learning Hadoop and where can I get the VMs?

The two ways I recommend for a person with a BI background to get involved with Hadoop is either through a Hortonworks VM or in the Microsoft’s Azure cloud with HD Insight.  Hortonworks provides a VM and Microsoft’s environment is hosted on their cloud. As the company that Microsoft partnered with to develop their Hadoop offerings, Hortonworks has very good documentation targeted to people who have more of a Microsoft BI stack background.  If you chose to go with HD Insight, there is a lot of really good documentation and video training available as well.

How do you compare Hadoop with the PDW?

While both Hadoop and Microsoft’s PDW, which they now call APS, were both designed to handle big data, but the approaches are wildly different. Microsoft built the APS to handle the larger data requirements of people who have structured data, mostly housed in SQL Server.  Hadoop was developed in an open source environment to handle unstructured data.

How can I transfer data into HD Insight?

This is a great question, which I promise to devote an entire blog post to very soon. I’ll give you the Reader’s Digest version here.  There are a number of ways you can transfer data into HD Insight.  The first step is to transfer the data into the Azure cloud, which you can do via SSIS, with a minor modification of the process I blogged about earlier here.  The other methods you could use to transfer data are via secured FTP or by using Powershell.  You will need to call the REST API which you use to provision an HDInsight Cluster.  There is also a UI you can use within HDInsight to transfer data as well.

I really appreciate the interest in the Webinar.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Powerful Stuff at SQL Saturday 331 – Denver

SQLSaturday 

This will be the fourth SQL Saturday this year that I have honored enough to have been selected as a presenter. I know it is an honor, because I’ve also been turned down, which makes me work harder and keeps me humble.  If you decided today that hanging out at the University of Denver next Saturday is something you want to do, you are out of luck. There is now a waitlist to attend as SQL Saturday Denver is filled to capacity.  It is really awesome to think that they have so many people interested in learning about SQL Server that they are going to have to turn some away. It says something about the great job that Steve Wake, Windy Martin and friends are doing in publicizing this event too.

Power BI

I am going to presenting on Power BI. As a lot of people haven’t yet seen it in action, I will be demonstrating the features of it and explaining the rather complicated licensing issues around Power BI.  Power BI is mostly Excel, and I’ll be explaining what you can do in Excel 2013 without having to buy anything and if you do decide to buy PowerBI what you get.  Because there are some things which won’t really fit in my presentation, I’ll be posting a few things here on Power BI as well.

SQL Community

I don’t get out as much as I would like, and I certainly don’t get to visit the places where a lot of people who read my blog are, like Brazil. If you are attending SQL Saturday Denver please come by and introduce yourself.  I will give you 10,000 reasons to come to my presentation, none of which I am going to say here as it will ruin the surprise.  I am looking forward to hearing presentations from the other speakers too as there are some great topics being covered. I hope to see you there.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Where to go to be in the Know

At the last SQL Server User’s group meeting, I got into a long conversation with someone about where to go to find good training materials online for people at various skill levels. Contrary to popular belief, I do spend time on the internet not on twitter or shoe shopping, and have come up with a list of sites where I think the training material is quite good, and also free. I included a lot of video content, as a lot people find that easy to use for learning new things.

This is not meant to be an exclusive list, just the places I’ve found helpful most recently or have found myself visiting a lot. If you have some suggestions of places you think I should add, please send them to me.

Big Data

MSBIAcademy – This is a really good way to get up to speed on Big Data and other topics.

Apache – If you are interested in Hadoop, you will make your way here to get the latest releases and see what wacky name the latest new tools has.

Hortonworks – I found the training information on Hortonwork’s site to be very good at explaining things.

Learning Map for HDInsight and Azure – SQL Server data is moving to the cloud too and this is a great place to get up to speed.

Mostly SQL Server

A lot of these sites have crossover information too.

SQL University – This site has information for those people just getting started with the Microsoft stack to more advanced topics.

Pragmatic Works Training on the Ts – Every week on Tuesdays and Thursdays, Pragmatic Works provides free webinars on SQL Server and Big data topics with some of the people who wrote the book, ok lots of SQL books about all kinds of SQL and Big Data Stuff. Flip through the archives if you are working during the day and don’t have time to view them live.

Microsoft Virtual Academy – This is Microsoft’s site where they offer free training and you get points. I didn’t know that I wanted points before, but I do now.

Microsoft’s SSIS Tech Net Videos – The audio on these is often very lousy, but the content is pretty good. I am not sure how often these are updated, but you can find good best practice material here.

Ola Hallengren’s Site – At any time you have anyone calling you a DBA, you should know about this site.

CBT Nuggets on YouTube – If you can find anything on YouTube from CBT Nuggets, it probably won’t be a waste of time. This link is for information on SSIS. Be wary of some of the things posted on YouTube, as not everything there is correct or best practices and the quality can be marginal.

Channel Nine – Microsoft has some random-ish videos out here, some of which are really helpful

SQLServerCentral – This is a great resource. Go create an account here as it is free and there is a treasure trove of information.

SQLPass – Last but certainly not least, check out all of the information archived on SQL Pass. They have a lot of virtual groups on a wide variety of SQL related topics. If you can’t attend when they are being held, the videos are available for later viewing on the website. The previous PASS Summit information is awesome. They also have a YouTube channel as well, where you can find interesting things to watch.

SQL Saturday

All of the other stuff I mentioned is archived and available when you have a chance, but there is nothing like being able to ask resident experts about various stuff and network with other SQL Server people. Check out SQL Saturday to see when and where there is going to be an event near you. These events have gone worldwide, so it is very likely there will be an event near you sometime this year.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur