I have been exploring MDS and DQS over the past few days and I'm a bit frustrated with the lack of information that is out there on the two products. SSIS, SSRS, and SSAS have thousand page books dedicated to them but when looking for reading material on MDS or DQS all I can find are a few small books and some free PDFs. I'm working on a MDM project right now and could use some input/direction towards some heftier resources.
I think I can use MDS and DQS to do everything I want to do but I need someone to check my thought process and perhaps provide a better idea if you have one. I need to:
1. Ingest data from several different systems and integrate them into one unified customer model.
2. I need to take that data and identify obvious or subtle duplicates.
3. I need to blow away the dups and let the new records flow through to the rest of the data warehouse pipeline.
4. For those subtle records where we’re not entirely certain if the record is a dup or not, I need to put a human in the loop to check for those records that MIGHT be duplicates.
The fourth step is where I’m confused on how to execute. I know I create my customer model in MDS. I then use this as the source for customer master data for loading into the warehouse. I know that in order to de-duplicate that data, I create a DQS knowledge base and bump against that. I then get back a result set that contains the results of the matching and de-duping. I can then forward the good records on.
What I don’t get, is how do I create something where a business SME can look at the records and only see the ones in question? I need them to just fix those instead of having to deal with a spreadsheet of a million records. I’ve seen a lot of examples of people working with Excel but all the examples are with someone working with a few dozen records. That doesn’t seem very scalable.