SQL - Other

MAXDOP (Degree of Parallelism)

Adjusting MAXDOP can optimize parallel queries, balancing the load on the server.

You can use it directly in the query or change the settings in the DBMS (Recommended)

How to Set the MAXDOP Value at the Server Level

USE <database_name>; 
GO 
EXEC sp_configure 'show advanced options', '1'; 
GO 
RECONFIGURE WITH OVERRIDE; 
GO 
EXEC sp_configure 'max degree of parallelism', '16'; 
GO 
RECONFIGURE WITH OVERRIDE; 
GO

How to Set the MAXDOP Value at the Workload Group Level

CREATE WORKLOAD GROUP <group_name> 
WITH 
    (MAX_DOP = 4) 
USING <pool_name>; 
GO

How to Set the MAXDOP Value at the Database Level

USE <database_name>; 
GO 
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 16; 
GO

How to Set the MAXDOP Value at the Query Level

SELECT * 
FROM <table> 
OPTION (MAXDOP 2); 
GO

Below are Microsoft’s recommendations for MAXDOP values for SQL Server 2016 and on.

For a server with a single NUMA node

Set the MAXDOP value equal to or less than the number of logical processors if the server has less than or equal to eight logical processors.
Set the MAXDOP value to 8 if the number of logical processors is greater than eight.

For a server with multiple NUMA nodes

Set the MAXDOP value equal to or less than the number of logical processors per NUMA node if the server has less than or equal to sixteen logical processors per NUMA node.
Set the MAXDOP value to half the number of logical processors if the processors exceed sixteen per NUMA node. At maximum, the MAXDOP value should be 16.

Monitoring Active SQL Server Sessions

The sys.dm_exec_sessions view returns information about all active sessions connected to SQL Server. A session represents a single connection between the SQL Server and a client application.

Key Columns:

session_id: The unique identifier for the session.
login_name: The name of the user who initiated the session.
status: The status of the session (e.g., running, sleeping).
host_name: The name of the client machine connected to the SQL Server.
program_name: The name of the client program (e.g., SQL Server Management Studio).
login_time: The time the user logged in.

Use Case:

You can use sys.dm_exec_sessions to monitor all active connections and sessions on the server, including idle sessions that are not actively running queries.

Example Query:

SELECT session_id, login_name, status, host_name, program_name, login_time
FROM sys.dm_exec_sessions;

Monitoring System Memory Usage in SQL Server

sys.dm_os_sys_memory is a dynamic management view (DMV) in SQL Server that returns information about the overall memory state of the system. It includes details about how much memory is available, how much is being used, and how much is reserved for SQL Server.

Key Columns:

available_physical_memory_kb: Physical memory available to the system.
available_virtual_memory_kb: Virtual memory available to the system.
total_physical_memory_kb: Total physical memory installed on the system.
total_page_file_kb: Total size of the page file (swap file).
total_virtual_memory_kb: Total virtual memory available to the system.
physical_memory_in_use_kb: Physical memory currently used by SQL Server.

Use Case:

This view is useful for monitoring SQL Server’s memory usage at the system level. It helps you understand how much memory SQL Server is consuming relative to the physical and virtual memory of the operating system.

Example Query:

SELECT * 
FROM sys.dm_os_sys_memory;

Monitoring Transaction Log Space Usage in SQL Server

DBCC SQLPERF(LOGSPACE) is a system command in SQL Server that provides information about the transaction log usage for each database on the SQL Server instance. It shows how much of the transaction log is being used, which can help you monitor log growth and avoid running out of log space.

Output Columns:

Database Name: The name of the database.
Log Size (MB): The size of the transaction log in megabytes.
Log Space Used (%): The percentage of the log space currently used.
Status: The current status of the log file (0 indicates the log is available).

Use Case:

This command is particularly useful when managing database transaction log space and ensuring that logs do not fill up unexpectedly. It helps database administrators keep track of log file growth and usage.

Example Query:

DBCC SQLPERF(LOGSPACE);

Typical Output:

Database Name | Log Size (MB) | Log Space Used (%) | Status
----------------------------------------------------------------
MyDatabase    | 1000          | 55.00              | 0

NOLOCK

The NOLOCK hint is used in a SELECT statement by appending it to the FROM clause:

SELECT * 
FROM Orders WITH (NOLOCK);

Key Features of NOLOCK:

No Shared Locks: When using NOLOCK, SQL Server does not acquire shared locks on the data it is reading. This allows reads to happen without blocking ongoing transactions that are updating or inserting data.
Reads Uncommitted Data: It allows the query to read "dirty" data, meaning data that is currently being modified by other transactions but not yet committed. If the transaction modifying the data rolls back, the data you read could be incorrect.

Advantages:

Improved Performance: Since no locks are taken, it can significantly reduce the contention between read and write operations, especially in highly transactional environments.
Reduced Blocking: Queries with NOLOCK do not block other operations (such as inserts, updates, or deletes), and in turn, are not blocked by them.

Disadvantages:

Dirty Reads: Data that has not been committed yet may be read, which could lead to inconsistencies.
Non-repeatable Reads: The same query might return different results if re-executed, as data may change while the query is running.
Phantom Reads: Rows could be inserted or deleted by other transactions during the query, leading to inaccurate results.

Example of Potential Issues:

If a transaction is in progress that inserts a new row into an Orders table, but has not yet been committed, a NOLOCK query might still see that uncommitted row. If that transaction is later rolled back, the NOLOCK query will have read data that never actually existed.

Best Use Cases:

NOLOCK can be useful in scenarios where data accuracy is less critical than query performance, such as generating reports or reading large datasets for analytics where occasional inconsistencies are acceptable.

NULLIF()

The NULLIF() function returns NULL if two expressions are equal, otherwise it returns the first expression.

SELECT NULLIF(25, 25);

Output:
NULL

SELECT NULLIF(25, 26);

Output:
25

PARTITION

PARTITION is a more general term referring to the way tables or indexes are divided into smaller, manageable pieces.

Key Points:

Partitioned Table: A table that is split into partitions based on the defined partition function and scheme. Each partition is stored in a separate filegroup.
Partitioned Index: An index on a partitioned table, which is also divided according to the partitioning scheme.

Example of Viewing Partitions:

To view information about partitions and how data is distributed, you can use the following queries:

View Partition Information:

SELECT 
    partition_ordinal_position,
    partition_id,
    filegroup_id,
    rows
FROM sys.partitions
WHERE object_id = OBJECT_ID('Orders');

View Partition Function:

SELECT 
    name, 
    boundary_value_on_left
FROM sys.partition_functions
WHERE name = 'OrderDatePF';

View Partition Scheme:

SELECT 
    name, 
    partition_function_id
FROM sys.partition_schemes
WHERE name = 'OrderDatePS';

Summary:

Refers to the division of a table or index into smaller segments. In SQL Server, partitioning involves creating partition functions and schemes to manage how data is stored and queried.

PARTITION BY RANGE

PARTITION BY RANGE is used in partitioned tables and index partitioning to define how data should be distributed across partitions based on the values of a specific column.

How It Works:

Range Partitioning: Data is divided into partitions based on a specified range of values. For example, you might partition data based on date ranges or numerical ranges.
Partition Function: A partition function defines the ranges for the partitioning. It maps a range of values to a specific partition.
Partition Scheme: A partition scheme defines how the partitions are stored across filegroups.

Example:

Suppose you want to partition a table based on the OrderDate column, with each partition containing data for a specific year.

Create a Partition Function:

This function defines partitions for dates before January 1st, 2022, January 1st, 2023, and January 1st, 2024.

CREATE PARTITION FUNCTION OrderDatePF (DATE)
AS RANGE LEFT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

Create a Partition Scheme:

This scheme maps partitions to filegroups.

CREATE PARTITION SCHEME OrderDatePS
AS PARTITION OrderDatePF
TO ([PRIMARY], [FG2022], [FG2023], [FG2024]);

Create a Partitioned Table:

The table Orders is now partitioned based on the OrderDate column.

CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY,
    OrderDate DATE,
    Amount DECIMAL(10, 2)
)
ON OrderDatePS (OrderDate);

Summary:

Defines how data is distributed into partitions based on ranges of values. Useful for managing large datasets by dividing them into more manageable chunks.

Partitioning Tables

Table partitioning divides a large table into smaller, manageable pieces for better performance and maintainability.

CREATE TABLE sales_partitioned (
    sale_id INT,
    sale_date DATE,
    amount DECIMAL(10, 2)
) PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

Pivoting Data

Pivoting transforms row data into columns, making it easier to display aggregated data in a matrix format.

SELECT 
    product_id, 
    SUM(CASE WHEN month = 'January' THEN sales ELSE 0 END) AS January_Sales,
    SUM(CASE WHEN month = 'February' THEN sales ELSE 0 END) AS February_Sales
FROM 
    sales_data
GROUP BY 
    product_id;

Pivoting Data Dynamically 🔄

SELECT employee_id,
       MAX(CASE WHEN month = 'January' THEN sales END) AS january_sales,
       MAX(CASE WHEN month = 'February' THEN sales END) AS february_sales
FROM sales
GROUP BY employee_id;

Use CASE to pivot specific rows into columns.
MAX ensures unique values per column.

Query Profilers

Query profilers like pt-query-digest for MySQL or pg_stat_statements for PostgreSQL can help you capture and analyze query performance data. These tools record query execution details, including execution time, query text, and the number of times a query is executed.

# Use pt-query-digest to analyze slow query logs
pt-query-digest /var/log/mysql/slow.log