- Home
- Interview Questions
- Data Warehousing
ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn't really matter that that data is in different forms or formats.
The data can come from any source.ETL is powerful enough to handle such data disparities.
First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data ‐ using rules or lookup tables, or creating combinations with other data ‐ to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints.
The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.
The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.
SCD Stands for Slowly changing dimensions. SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address. SCD2: maintaining historical information and current information by using A) Effective Date B) Versions C) Flags or combination of these SCD3: by adding new columns to target table we maintain historical information and current information.
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.