Posts

Showing posts with the label data warehousing

How to Achieve Data Quality?

Image
Like any worthwhile business endeavor, improving the quality and utility of your data is a multi-step, multi-method process. Here's how: Method 1:  Big data scripting takes a huge volume of data and uses a scripting language that can communicate and combine with other existing languages to clean and process the data for analysis.     Errors in judgment and execution can trip up the whole process. Method 2:  Traditional ETL (extract, load, transform) tools integrate data from various sources and load it into a data warehouse where it's then prepped for analysis. But it usually requires a team of skilled, in-house data scientists to manually scrub the data first in order to address any incompatibilities with schema  and formats. Even less convenient is that these tools often process in batches instead of in real-time. Traditional ETL requires the type of infrastructure, on-site expertise, and time c...

What is Data Quality in Big Data?

Image
Data quality practices from BI and data warehousing are geared towards data cleansing to improve data correctness and data integrity which is used for reporting purposes. Correctness is difficult to determine when using data from external sources, and structural integrity can be difficult to test with unstructured and differently structured (non-relational) data. As the volume, sources, and velocity of data creation increase, businesses are grappling with the reality of figuring out what to do with it all and how to do it. And if your business hasn't determined the most real way to use its own data, then you're missing out on critical opportunities to transform your business and gain a significant advantage. Of course, without good data, it's a heck of a lot harder to do what you want to do. Whether you're launching a new product or service, or simply responding to the moves of your biggest competitor, making smart, timely business decisions depends almost entirely on...