Row Vs Column store :
The SAP HANA database supports two types of table: store data either
1) column-wise (column tables)
2) row-wise (row tables).
SAP HANA is optimized for column storage and this is the default table type.
Conceptually, a database table is a two dimensional data structure with cells organized in rows and columns. Computer memory however is organized as a linear sequence. For storing a table in linear memory, two options can be chosen as shown below. A row store stores a sequence of records that contains the fields of one row in the table. In a column store, the entries of a column are stored in contiguous memory locations.
Storage Type | When to Use |
---|
Column store | - Calculations are typically executed on individual or a small number of columns.
- The table is searched based on the values of a few columns.
- The table has a large number of columns.
- The table has a large number of rows and columnar operations are required (aggregate, scan, and so on)
- High compression rates can be achieved because the majority of the columns contain only a few distinct values (compared to the number of rows).
|
Row store | - The application needs to process only one single record at one time (many selects and /or updates of single records).
- The application typically needs to access the complete record.
- The columns contain mainly distinct values so compression rate would be low.
- Neither aggregations nor fast searching are required.
- The table has a small number of rows (for example, configuration tables).
|
Example of Compression :
The column store is optimized for read operations, but it also provides good performance for
write operations. This is achieved through two data structures: main storage and delta
storage
The main storage contains the main part of the data. Here, efficient data compression is
applied to save memory and to speed up searches and calculations. Write operations on
compressed data in the main storage are costly, however. Therefore, write operations do not
directly modify compressed data in the main storage. Instead, all changes are written to a
separate data structure called the delta storage. The delta storage uses only basic
compression and is optimized for write access. Read operations are performed on both
structures, while write operations only affect the delta
Delta Merge Operation:
The following steps are performed in the merge process:
1. Before the merge operation, all write operations go to Delta 1 storage, and all read
operations read from Main 1 and Delta 1 storages.
2. While the merge operation is running, the following events occur:
a. All write operations go to the second delta storage, Delta 2.
b. Read operations read from the original main storage, Main 1, and from both delta
storages, Delta 1 and Delta 2.
c. Uncommitted changes in Delta1 are copied to Delta2.
d. The content of Main 1 and the committed entries in Delta 1 are merged into the new
main storage, Main 2.
3. After the merge operation is completed, the following events occur
a. Main1 and Delta1 storages are deleted.
b. The compression of the new main storage (Main 2) is reevaluated and optimized. If
necessary, this operation reorders rows and adjust compression parameters. If
compression has changed, columns are immediately reloaded into memory.
c. The content of the complete main storage is persisted to disk.
The purpose of the delta merge operation is to move changes collected in the delta storage to
the read-optimized main storage. After the delta merge operation, the content of the main
storage is persisted to disk. Its compression is recalculated and optimized, if necessary.
A further result of the delta merge operation is the truncation of the delta log. The delta
storage structure itself exists only in-memory and is not persisted to disk. The column store
creates its logical redo log entries for all operations executed on the delta storage. This log is
called the delta log. If a system restart occurs, the delta log entries are replayed to rebuild the
in-memory delta storages. After the changes in the delta storage are merged into the main
storage, the delta log file is truncated by removing those entries that were written before the
merge operation.
Note:
Data that is in the delta storage can only be fully loaded or unloaded. Partial
loading is not possible. Therefore, if a delta merge has not been performed and the
table's entire data is in the delta storage, the table is either fully loaded or
unloaded
With this double buffer concept, the table only needs to be locked for a short time:
at the beginning of the process when open transactions are moved to Delta2, and
at the end of the process when the storages are “switched”
The performance of the delta merge depends on the size of the main storage. This size can
be reduced by splitting the table into multiple partitions, each with its own main and delta
storages. The delta merge operation is performed at partition level and only for partitions
that actually require it. This means that less data is merged and persisted
Merge Concepts:
Auto Merge :
The standard method for initiating a merge in SAP HANA is the auto merge. A system process
called mergedog periodically checks the column store tables that are loaded locally. For each
individual table or single partition of a split table, it determines whether or not a merge is
necessary based on certain criteria (for example, size of delta storage, available memory,
time since last merge, and others)
indexserver.ini -> [mergedog] -> check_interval =60000 ms =1 min
Smart Merge:
If an application starts loading relatively large data volumes, a delta merge during the load can have a negative impact both on the load performance and on other system users. Therefore, the application can disable the auto merge for those tables being loaded and send a hint to the database to merge once
the load has completed.
parameter smart_merge_enabled should be enabled in indexserver.ini file
Hard and Forced Merges:
the SQL statement MERGE DELTA OF "<table_name>". This is called a hard merge. It causes the
database to execute the delta merge for the table either immediately if sufficient system
resources are available
If you want the merge to take place immediately regardless of system resource availability
MERGE DELTA OF '<table_name>' WITH PARAMETERS
('FORCED_MERGE' = 'ON').
Critical Merge :
To keep the system stable, the database can trigger a critical merge. For example, when auto
merge is disabled and no smart merge hints are sent to the system, the size of the delta
storage can grow too large for a successful delta merge to occur. The system initiates a
critical merge automatically when a certain threshold is passed
Note:
Unlike system-triggered delta merge operations, all of the manually-executed delta merge operations listed here do not later trigger an optimization of the compression of the table's new main storage. If the table was compressed before the delta merge operation, it remains compressed with the same compression strategy afterward. If it was not compressed before the delta merge operation, it remains uncompressed afterward. After a manual delta merge, you must therefore trigger compression optimization manually
Comments
Post a Comment