Data loading: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 08:02, 9 September 2024 edit Sarleeng (talk \| contribs) 1 edit No edit summary Tag: Reverted ← Previous edit		Latest revision as of 14:02, 29 November 2024 edit undo Citation bot (talk \| contribs) Bots 5,865,691 edits Add: website, series. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Extract, transform, load tools \| #UCB_Category 27/35
(One intermediate revision by one other user not shown)
Line 14: Full data refresh means that existing data in the target table is deleted first. All data from the source is then loaded into the target table, new indexes are created in the target table, and new [[Measure (data warehouse)\|measures]] are calculated for the updated table. Full refresh is easy to implement, but involves moving of much data which can take a long time, and can make it challenging to keep historical data.<ref name=":0">{{Cite web \|date=2022-04-14 \|title=Incremental Data Load vs Full Load ETL: 4 Critical Differences - Learn {{!}} Hevo \|url=https://hevodata.com/learn/incremental-data-load-vs-full-load/ \|access-date=~~2024~~2023-02-18 \|language=en-US}}</ref> === Incremental update === Line 21: === Tricle feed === Tricle feed or trickle loading means that when the source system is updated, the changes in the target system will occur almost immediately.<ref>{{Cite encyclopedia \|chapter=Near Real-Time Data Warehousing with Multi-stage Trickle and Flip \|publisher=Springer Berlin Heidelberg \|chapter-url=http://link.springer.com/10.1007/978-3-642-24511-4_6 \|date=2011 \|volume=90 \|pages=73–82 \|doi=10.1007/978-3-642-24511-4_6 \|author=Zuters, Janis \|editor1=Grabis, Janis \|editor2=Kirikova, Marite \|title=Perspectives in Business Informatics Research \|series=Lecture Notes in Business Information Processing \|quote=a data warehouse typically is a collection of historical data designed for decision support, so it is updated from the sources periodically, mostly on a daily basis. today's business however asks for fresher data. real-time warehousing is one of the trends to accomplish this, but there are a number of challenges to move towards true real-time. this paper proposes 'multi-stage trickle and flip' methodology for data warehouse refreshment. it is based on the 'trickle and flip' principle and extended in order to further insulate loading and querying activities, thus enabling both of them to be more efficient. \|isbn=978-3-642-24510-7}}</ref><ref>{{Cite web \|title=Trickle Loading Data \|url=https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/TrickleLoading/TrickleLoadingData.htm \|access-date=2023-02-18}}</ref> == Loading to systems that are in use == {{main\|Real-time computing}} When loading data into a system that is currently in use by users or other systems, one must decide when the system should be updated and what will happen to tables that are in use at the same time as the system is to be updated. One possible solution is to make use of [[Shadow table\|shadow tables]].<ref>{{Cite web \|title=Create shadow tables for synchronization - Data Management - Alibaba Cloud Documentation Center \|url=https://www.alibabacloud.com/help/en/data-management-service/latest/synchronize-shadow-tables \|access-date=2023-02-18}}</ref><ref>{{Cite web \|date=2015-08-10 \|title=Shadow tables \|website=[[IBM]] \|url=https://www.ibm.com/docs/en/db2/10.5?topic=tables-shadow \|access-date=2023-02-18 \|language=en-us}}</ref> == See also ==