sql server - Data Warehouse design - Handling NULL and empty values in the OLTP -


i creating dw oltp creaking somewhat.

a problem i'm faced there isn't data integrity in oltp database. example suburb field.

this suburb field free text field on oltp ui means we've got values in field, plus we've got empty strings , we've got null values.

how handle this? scenarios i've come are:

  1. import data (not ideal)
  2. in etl process, treat empty string same null , replace word 'unknown' in dw
  3. import both empty strings , null's empty strings in dw

just fyi, i'm using microsoft bi stack (sql server, ssis, ssas, ssrs)

the short answer is, depends on null , empty strings mean in source system.

this general question (handling null) has been discussed lot, e.g. here, here, here etc. think important point remember data warehouse database; may have specific type of schema , designed 1 purpose, it's still database , general advice on null still applies.

(as side note, prefer talk "reporting database" rather "data warehouse", because keeps things in perspective. dbas , developers start making plans huge server farms , multi-year etl projects hear words "data warehouse", in end it's reporting database.)

anyway, isn't clear want use null looks may attribute on dimension.

i (probably) wouldn't use of 3 approaches, depends on meaning of data. importing data as-is not useful because part of value of data warehouse data has been cleaned , consistent, makes querying , comparing data along other dimensions easier.

replacing empty strings 'unknown' may or may not correct: empty string mean in source system? there's big difference between "it means there's no suburb" , "it means don't know if there's suburb". assuming empty string means "no suburb" , null means "unknown" import empty strings are, replace null 'unknown'. main reason doing if suburb field used filter condition in report, it's easier users (and possibly reporting tool) work non-null value 'unknown'. , if there no consistency in source system , don't know empty strings , nulls mean, need clarify first , ideally fix source system (another benefit of dwh helps identify inconsistencies , data handling errors in source systems).

your last idea convert nulls empty strings same issue: null mean in source system? if means "no suburb" replacing empty string idea, if means else should handle else.

so summarize, preference import empty strings as-is, , convert null 'unknown', can't sure makes sense in case. there's no single answer question because depends on specific data , means. there's no problem using null in data warehouse (or other database) long consistently , clear understanding of how source systems handle data.


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -