Database Engine Fundamentals
What is a Database Engine?
- Library that takes care of the on-disk storage and CRUD operations
- Can be as simple as a key-value store
- Or as rich and complex as full ACID support with transactions and foreign keys
- DBMS can use the database engine and build features on top (server, replication, isolation, stored procedures, etc.)
- Want to write a new database? Don’t start from scratch - use an engine
- Sometimes referred to as Storage Engine or embedded database
- Some DBMS gives you the flexibility to switch engines like MySQL & MariaDB
- Some DBMS comes with a built-in engine that you can’t change (PostgreSQL)
Popular Database Engines
MyISAM
- Stands for Indexed Sequential Access Method
- B-tree (Balanced tree) indexes point to the rows directly
- No transaction support
- Open Source & Owned by Oracle
- Inserts are fast, updates and deletes are problematic (fragments)
- Database crashes corrupt tables (have to manually repair)
- Table level locking
- MySQL, MariaDB, Percona (MySQL forks) supports MyISAM
- Used to be default engine for MySQL
Aria
- Created by Michael Widenius
- Very similar to MyISAM
- Crash-safe unlike MyISAM
- Not owned by Oracle
- Designed specifically for MariaDB (MySQL Fork)
- In MariaDB 10.4 all system tables are Aria
InnoDB
- B+tree - with indexes point to the primary key and the PK points to the row
- Replaces MyISAM
- Default for MySQL & MariaDB
- ACID compliant transactions support
- Foreign keys
- Tablespaces
- Row level locking
- Spatial operations
- Owned by Oracle
XtraDB
- Fork of InnoDB
- Was the default for MariaDB until 10.1
- In MariaDB 10.2 for InnoDB switched the default
- “XtraDB couldn’t be kept up to date with the latest features of InnoDB and cannot be used.” link
- System tables in MariaDB starting with 10.4 are all Aria
SQLite
- Designed by D. Richard Hipp in 2000
- Very popular embedded database for local data
- B-Tree (LSM as extension)
- PostgreSQL-like syntax
- Full ACID & table locking
- Concurrent read & writes
- Web SQL in browsers uses it
- Included in many operating systems by default
Berkeley DB
- Developed by Sleepycat Software in 1994 (owned by Oracle)
- Key-value embedded database
- Supports ACID transactions, locks, replications etc.
- Used to be used in Bitcoin Core (switched to LevelDB)
- Used in MemcacheDB
LevelDB
- Written by Jeff and Sanjay from Google in 2011
- Log structured merge tree (LSM) (great for high insert and SSD)
- No transactions
- Inspired by Google BigTable
- Levels of files:
- Memtable
- Level 0 (young level)
- Level 1 - 6
- As files grow, large levels are merged
- Used in Bitcoin Core blockchain, AutoCAD, Minecraft
RocksDB
- Facebook forked LevelDB in 2012 to become RocksDB
- Transactional
- High Performance, Multi-threaded compaction
- Many features not in LevelDB
- MyRocks for MySQL, MariaDB and Percona
- MongoRocks for MongoDB
- Many more projects use it!
Database Cursors
What are Database Cursors?
Database cursors are powerful tools that allow for row-by-row processing of result sets. In PostgreSQL, a cursor is a database object used to retrieve rows from a result set one at a time. This is particularly useful when working with large datasets or when you need to fetch data incrementally.
Cursor Capabilities:
- Efficient row-by-row processing: Fetch and process rows one at a time or in small chunks
- Multiple result sets: Keep a cursor open to fetch additional rows later, helpful for procedural processing
Key Steps for Using a Cursor in PostgreSQL:
- Declare a cursor – Define a cursor to point to a specific query or result set
- Open the cursor – Execute the query and generate the result set
- Fetch rows from the cursor – Retrieve individual or multiple rows
- Close the cursor – Release resources after processing
Example of Using Cursors in PostgreSQL:
Step 1: Declare the cursor
To declare a cursor, you need to use the DECLARE
statement, followed by the cursor name and the SQL query.
1
2
3
-- Declare a cursor
DECLARE my_cursor CURSOR FOR
SELECT id, name FROM employees WHERE department = 'Sales';
Here, my_cursor
is the name of the cursor, and the SELECT
statement fetches id
and name
columns from the employees
table where the department
is ‘Sales’.
Step 2: Fetch rows from the cursor
After declaring the cursor, you can use the FETCH
command to retrieve rows from the result set.
1
2
-- Fetch the first row
FETCH NEXT FROM my_cursor;
You can also specify how many rows to fetch at once, such as FETCH 5 FROM my_cursor
to fetch 5 rows at a time.
Step 3: Loop to fetch multiple rows
You can use a loop to fetch all rows one by one. Here is an example using a LOOP
in PL/pgSQL, PostgreSQL’s procedural language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DO $$
DECLARE
rec RECORD;
BEGIN
-- Declare the cursor
DECLARE my_cursor CURSOR FOR
SELECT id, name FROM employees WHERE department = 'Sales';
-- Open the cursor
OPEN my_cursor;
-- Fetch and process rows
LOOP
FETCH NEXT FROM my_cursor INTO rec;
EXIT WHEN NOT FOUND; -- Exit when no more rows
-- Process the row (For example, just outputting the data)
RAISE NOTICE 'Employee ID: %, Name: %', rec.id, rec.name;
END LOOP;
-- Close the cursor
CLOSE my_cursor;
END $$;
In this example:
rec
is aRECORD
type variable that holds the fetched row.- The
LOOP
continues fetching rows using theFETCH NEXT
statement until there are no more rows (EXIT WHEN NOT FOUND
). - After processing, the cursor is closed with
CLOSE my_cursor
.
Step 4: Close the cursor
1
CLOSE my_cursor;
Once you’ve finished working with the cursor, it is good practice to close it. This is done with the CLOSE statement, as shown above.
Important Notes:
Implicit Cursors: PostgreSQL automatically creates implicit cursors for SELECT queries outside of procedural code. For complex operations, explicit cursors are necessary.
- Cursor Types:
- Simple cursor: Basic cursor that fetches rows in order
- Scroll cursor: Allows fetching rows both forward and backward
- No scroll cursor: Can only fetch rows in one direction (default)
- Memory Considerations: Cursors are more memory-efficient than loading entire result sets, but should be closed when no longer needed.
Example in a Transaction Block:
1
2
3
4
5
6
7
8
9
10
11
12
13
BEGIN;
-- Declare and open the cursor
DECLARE my_cursor CURSOR FOR
SELECT id, name FROM employees WHERE department = 'HR';
-- Fetch and process rows
FETCH NEXT FROM my_cursor;
-- Close the cursor
CLOSE my_cursor;
COMMIT;
Pros and Cons of Database Cursors
Pros of Using Cursors
- Memory Efficiency:
- Process rows one at a time or in small batches
- Improved resource management for large result sets
- Avoids loading entire datasets into memory at once
- Better Performance for Large Datasets:
- Sequential processing can be more efficient for certain procedural operations
- Ability to pause and resume fetching process
- Advantageous for complex workflows or long-running tasks
- Control over Row Fetching:
- Control over order and frequency of row fetching
- Explicit iteration through result sets
- Scroll cursors provide flexibility for both forward and backward fetching
- Complex Query Handling:
- Useful for processing complex queries row-by-row
- Commonly used in stored procedures or functions
- Ideal for operations that need to be executed in steps
- Transactional Processing:
- Process data incrementally within transactions
- Useful for batch updates or long-running transactions
Cons of Using Cursors
- Performance Overhead:
- Multiple context switches between database and application code
- Opening, fetching, and closing a cursor can be more costly than executing a single query
- Performance issues can be exacerbated for unoptimized queries
- Complexity and Maintenance:
- Adds complexity to code
- Requires explicit handling of cursor lifecycle
- Forgetting to close cursors can lead to resource leaks
- Concurrency Issues:
- Can lock resources depending on usage
- May hold locks on rows during fetching
- Potential for deadlocks if not handled properly
- Limited Use Case:
- Not always necessary; simple SQL queries often suffice
- Set-based operations are usually more efficient
- Often overkill for read-heavy use cases
- Potential for Unintended Side Effects:
- Stateful nature can lead to unexpected results if data changes between fetches
- May cause long-running transactions
- Potential for transaction contention or deadlocks
- Resource Management:
- Requires explicit resource management
- Open cursors consume memory and resources
- Can cause performance issues in high-concurrency environments
When to Use Cursors:
- When row-by-row processing is necessary
- When memory constraints are a concern
- When implementing procedural logic in stored procedures or functions
- When performing complex updates or deletions with multiple operations per row
When to Avoid Cursors:
- For simple queries that don’t require row-by-row processing
- When performance is critical
- For read-heavy operations better handled by set-based SQL
- For simple CRUD operations where set-based operations are more efficient
Implementing Cursor-like Functionality in Spring Boot
In a Spring Boot application using Spring Data JPA, we typically work with repositories to perform CRUD (Create, Read, Update, Delete) operations on entities. However, if you need to implement cursor-like behavior (such as fetching results incrementally or processing rows one by one), you can use native SQL queries along with @Query annotation or JPA Criteria API in combination with pagination. While Spring Data JPA doesn’t directly provide a cursor concept like in PostgreSQL, you can simulate cursor-like behavior using pagination, streaming, or custom native queries.
1. Using @Query Annotation with Pagination
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Entity Class
@Entity
public class Employee {
@Id
private Long id;
private String name;
private String department;
// Getters and Setters
}
// Repository Interface
public interface EmployeeRepository extends JpaRepository<Employee, Long> {
// Custom query with pagination
@Query("SELECT e FROM Employee e WHERE e.department = :department")
Page<Employee> findEmployeesByDepartment(String department, Pageable pageable);
}
// Service Layer
@Service
public class EmployeeService {
@Autowired
private EmployeeRepository employeeRepository;
public void processEmployeesInBatches(String department) {
int page = 0;
int pageSize = 10; // Batch size
Pageable pageable = PageRequest.of(page, pageSize);
Page<Employee> employeePage;
// Fetch in batches (mimicking cursor behavior)
do {
employeePage = employeeRepository.findEmployeesByDepartment(department, pageable);
employeePage.getContent().forEach(employee -> {
// Process each employee
System.out.println("Processing Employee ID: " + employee.getId());
});
// Move to next page (mimicking cursor movement)
pageable = pageable.next();
page++;
} while (employeePage.hasContent());
}
}
2. Using Stream for Processing Large Results
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Repository with Streaming Query
@Repository
public interface EmployeeRepository extends JpaRepository<Employee, Long> {
@Transactional
@Query("SELECT e FROM Employee e WHERE e.department = :department")
Stream<Employee> findEmployeesByDepartmentStream(String department);
}
// Service Layer (Streaming)
@Service
public class EmployeeService {
@Autowired
private EmployeeRepository employeeRepository;
public void processEmployeesInStream(String department) {
// Open a stream to process the data lazily
try (Stream<Employee> employeeStream = employeeRepository.findEmployeesByDepartmentStream(department)) {
employeeStream.forEach(employee -> {
// Process each employee (row by row)
System.out.println("Processing Employee ID: " + employee.getId());
});
}
}
}
3. Using Native SQL Queries with @Query and Streaming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Repository with Native Query
@Repository
public interface EmployeeRepository extends JpaRepository<Employee, Long> {
@Query(value = "SELECT * FROM employees WHERE department = :department", nativeQuery = true)
Stream<Employee> findEmployeesByDepartmentNativeStream(String department);
}
// Service Layer (Using Native Stream)
@Service
public class EmployeeService {
@Autowired
private EmployeeRepository employeeRepository;
public void processEmployeesWithNativeQueryStream(String department) {
try (Stream<Employee> employeeStream = employeeRepository.findEmployeesByDepartmentNativeStream(department)) {
employeeStream.forEach(employee -> {
// Process each employee from the native query
System.out.println("Processing Employee ID: " + employee.getId());
});
}
}
}
Server-Side vs. Client-Side Cursors
Server-Side Cursor
A server-side cursor is managed by the database server, which handles the cursor state and row retrieval.
Characteristics:
- Cursor management is done by the database server
- Client only issues fetch commands
- Memory-efficient as rows are sent as needed
- Ideal for large datasets
- Stateful: server maintains cursor position between fetches
Advantages:
- Memory efficiency
- Better handling of large result sets
- Server-controlled optimizations
Disadvantages:
- Latency due to client-server communication
- Resource consumption on the server
Example in PostgreSQL:
1
2
DECLARE my_cursor CURSOR FOR
SELECT id, name FROM employees WHERE department = 'Sales';
Client-Side Cursor
A client-side cursor is managed by the client application, which retrieves the entire result set at once.
Characteristics:
- Cursor management is done by the client application
- Client fetches all rows at once
- Client controls cursor position
- Used for smaller result sets that fit in memory
Advantages:
- Simpler implementation
- Faster for small result sets
- Lower server load
Disadvantages:
- High memory consumption
- Inefficient for large result sets
- No incremental fetching
Example in Java (JDBC):
1
2
3
4
5
6
7
8
9
Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/mydb", "user", "password");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT id, name FROM employees");
while (rs.next()) {
int id = rs.getInt("id");
String name = rs.getString("name");
// Process the row
}
Comparison: Server-Side vs. Client-Side Cursors
Aspect | Server-Side Cursor | Client-Side Cursor |
---|---|---|
Cursor Management | Database server | Client application |
Memory Usage | Efficient: only portions in memory | Inefficient: entire result set in memory |
Fetch Behavior | Retrieving rows in batches | All rows fetched at once |
Resource Consumption | Server resources | Client resources |
Use Case | Large result sets | Smaller result sets |
Performance | Overhead from multiple round trips | Faster for small datasets |
Latency | Some latency due to batch fetching | No latency once loaded |
Which One Should You Use?
- Server-Side Cursors: Best for large result sets where memory efficiency is crucial
- Client-Side Cursors: Appropriate for smaller result sets that fit comfortably in memory
Comments powered by Disqus.