After a while, if you lead database teams, the pattern becomes familiar. Your DBAs spend too much time firefighting. Complexity keeps rising. The time available for real improvement keeps shrinking.
The pressure shows up in the data. The 2025 State of Database Report puts numbers behind that reality. DBAs spend an average of 27 hours a week on reactive work, and only 40% say their monitoring environment’s fully unified. In the report, nearly three-quarters say alert fatigue affects how well they can prioritize and respond to incidents, and more than one-third are considering leaving their roles.
That should concern any leader who depends on database reliability.
There’s also a clear perception gap around visibility. Only 40% of DBAs say their monitoring environment’s fully unified, while 50% of IT executives believe it already is. That disconnect helps explain why I/O bottlenecks, TempDB issues, and transaction log problems keep resurfacing as surprise incidents instead of getting fixed at the root.
“In this post, I want to stay at one of the most practical layers in SQL Server: storage…”
DBA Burnout Is More Than a People Problem
This is why we opened the High-Performance DBA webinar series by looking at burnout, the tyranny of the urgent, and what the State of Database Report says about constant firefighting across database teams.
If you’ve not read that earlier post, it’s useful background. It explains why so many DBAs stay stuck in reactive mode and why unified visibility matters. The wider observability gap isn’t just a tooling problem. It’s often the reason database issues stay hidden until users feel the impact.
In this post, I want to stay at one of the most practical layers in SQL Server: storage.
Storage is where day-to-day pressure becomes visible. It’s where recurring I/O bottlenecks, TempDB contention, and log growth surprises tend to show up first. It’s also one of the few places where a handful of deliberate changes can reduce incident volume quickly.
“On busy systems, storage is often the first major bottleneck. Repeated incidents involving slow queries, blocked sessions, or missed SLAs often have storage symptoms nearby…”
Storage Is More Than Capacity
Many projects still treat storage as a capacity checkbox. If there’s enough free space, the conversation ends. But SQL Server just doesn’t work that way.
On busy systems, storage is often the first major bottleneck. Repeated incidents involving slow queries, blocked sessions, or missed SLAs often have storage symptoms nearby. Common examples include inconsistent I/O latency, undersized log files, erratic autogrowth, or a TempDB layout that no longer matches the workload.
Once those factors pile up, nobody cares whether there’s technically enough disk. Users see slower response times. Business stakeholders see instability. DBAs see another week consumed by symptoms instead of improvement work.
If that sounds familiar, storage isn’t a side issue. It’s one of the main pressure points in the job.
Maturity Matters More Than Tricks
Across environments, the gap between constant firefighting and steady improvement rarely comes down to one clever tuning move. More often, it comes down to maturity.
In low-maturity environments, storage decisions are ad hoc, settings stay at their defaults, TempDB grows until it hurts, and fixes happen under pressure and rarely get documented.
In higher-maturity environments, teams do a few simple things well:
- They define sensible defaults.
- They monitor the right signals.
- They treat configuration changes as planned decisions instead of emergency reactions.
Storage’s a good place to start that shift because the payoff tends to show up quickly in both performance and day-to-day quality of life.
“The survival move is obvious: patch the symptom and move on…”
Why Storage Problems Keep Returning
Storage issues rarely exist in isolation. They interact with application design, indexing, memory pressure, and team habits.
Picture an environment where query patterns are poorly understood, indexes have accumulated over time, and TempDB shares a busy disk with several user databases. Add percentage-based autogrowth and a monitoring setup that only shows CPU and basic disk counters. In that environment, you may fix one incident today and watch a slightly different version return next week.
That cycle’s exactly why fragmented monitoring’s so costly. In siloed environments, DBAs spend more time stitching together alerts and dashboards just to understand where the problem lives. The survival move is obvious: patch the symptom and move on.
The way out isn’t flashy. It’s a set of stable practices that make system behavior more predictable.
Getting the Log Under Control
The transaction log is one of the most misunderstood parts of SQL Server storage. Under the hood, it’s sequential. It performs best when writes are steady and predictable.
In most environments, that means using one log file per database, giving it a realistic starting size, and using fixed growth increments instead of percentage-based growth.
Small and frequent autogrowth events create too many virtual log files over time. That can hurt write performance and slow recovery.
When I review an environment, I usually care less about the raw size of the log and more about what that size pattern reveals. A log file that keeps growing in tiny jumps during business hours tells you the system was never given enough room to operate cleanly.
Set more deliberate starting sizes and growth increments, and the result’s often immediate. Log growth interruptions drop. Backup behavior becomes easier to predict. Recovery planning improves.
“Consistency makes the storage layer easier to understand. It also makes troubleshooting easier…”
Making Data Files Predictable
Data files are where you can benefit from parallelism. In most cases, start with either four equally sized data files per database if your server has four or fewer CPUs, or eight data files if it has more than four CPUs. Multiple data files per database can help reduce locking pressure, especially schema locks, as user activity increases. As with transaction log files, make sure the autogrowth setting uses identical fixed file growth increments. That benefit disappears when files drift into different sizes and different growth patterns.
Consistency makes the storage layer easier to understand. It also makes troubleshooting easier. You don’t have to guess which file’s become the outlier or which hidden growth setting’s causing uneven behavior.
Over time, that uniformity gives you a believable baseline. Once normal file behavior’s clear, true anomalies stand out much faster.
Why Boring Storage Usually Wins
It’s tempting to design an intricate layout with multiple abstraction layers, elaborate filegroup strategies, or lots of carve-outs for specific workloads. Those designs can look smart on day one. They’re often painful to live with over several years of change, team turnover, and shifting workloads.
The most reliable SQL Server environments I see are usually conservative. Their layouts are stable. Their growth patterns are deliberate. Exceptions are limited. That stability reduces noise, makes patterns easier to spot, and keeps surprises to a minimum.
“I treat TempDB as a first-class workload…”
TempDB Is Where the Real Cost Shows Up
If any one database exposes the combined effect of your storage, memory, and query patterns, it’s TempDB.
TempDB supports sorts, hashes, work tables, temporary objects, row versioning, and many other internal operations across all databases and all user activity. Even if developers never write #temp, TempDB’s still doing a large amount of work behind the scenes.
When TempDB’s misconfigured, the symptoms are familiar: latch contention on specific pages, sudden space pressure, repeated autogrowth events, spills that show up under load, allocation-related waits, and intermittent slowdowns that are hard to explain at first glance.
I treat TempDB as a first-class workload. That means using multiple data files of equal size, assigning a realistic initial size, using fixed growth increments, and placing it on storage that can keep up with the workload.
Guidance’s evolved across SQL Server versions, but the goal hasn’t changed. Reduce allocation contention. Give TempDB room to breathe. Match the configuration to the workload you actually run.
Separating Competing I/O
Another practical improvement is to separate I/O patterns that shouldn’t compete with each other.
When transaction logs, TempDB, and user data all share the same busy path to disk, read-heavy and write-heavy activity begin fighting for the same resources. Add shared infrastructure or a noisy SAN layer, and unpredictability increases again.
Perfect separation isn’t always possible, but useful separation often is. Moving TempDB to its own volume can help. Keeping transaction logs away from the noisiest random reads can help. Placing read-mostly archival data on a different tier can help.
The point isn’t theoretical purity. Your goal’s to stop your most sensitive operations from tripping over each other all day.
“When I work with teams running SQL Server in Azure or other cloud environments, the same principles still apply. Understand the workload. Match storage performance to that reality…”
Using Monitoring to Replace Guesswork
None of these changes should happen in the dark.
This is where monitoring turns storage tuning into analysis instead of intuition. Rather than relying on one disk queue metric or a screenshot from another team, you combine SQL Server’s view with system-level telemetry and recent change history.
In practice, that means watching wait statistics, file-level I/O metrics such as sys.dm_io_virtual_file_stats, TempDB behavior over time, transaction log growth patterns, and recent deployments or configuration changes.
The observability gap matters here too. When teams lack end-to-end visibility, database issues stay hidden longer and are harder to connect to root causes. A unified view helps teams move faster from “something’s slow” to “this change altered file behavior.”
That linkage’s what pushes teams away from one-off reactions and toward repeatable operating habits.
Cloud Changes the Bill, Not the Physics
Cloud platforms make storage easier to provision. They don’t change the fundamentals. You still have IOPS limits. You still have bandwidth limits. You still need to think carefully about TempDB placement and transaction log sizing. The difference is that poor decisions now show up in two places: your performance graphs and your monthly invoice.
When I work with teams running SQL Server in Azure or other cloud environments, the same principles still apply. Understand the workload. Match storage performance to that reality. Avoid needless contention. Measure the effect of the changes you make.
Cloud makes bad storage decisions easier to hide for a while. It doesn’t make them cheaper.
What High-Performance DBAs Actually Do
The DBAs I trust with business-critical systems share a few habits.
They know how their log and data files are sized and how they grow. TempDB is configured for the workload they run now, not the one they had years ago. And when something changes, they can explain it with evidence, not guesswork.
They also treat alerts differently. They don’t accept noise as part of the job. They tune it, reduce it, and make it useful. In practice, that means:
- Focusing on signals that indicate real risk
- Reducing alerts that do not change decisions
- Keeping only what stays actionable over time
Most importantly, they use storage decisions to buy back time.
Every incident that does not recur is time returned to query tuning, automation, architecture, or proactive planning. That matters when the average DBA is already spending 27 hours a week on reactive work. Reclaimed time is not just efficiency. It is how teams reduce burnout and retain skilled people.
If you need a place to start, skip the exotic tricks. Focus on the basics: log sizing, fixed autogrowth, consistent data files, TempDB configuration, and clear storage monitoring.
Those are the levers that move teams from constant firefighting to work that actually improves the environment.
Database Storage Optimization FAQ
How many TempDB data files should I use?
A practical starting point is one TempDB data file per logical CPU core up to eight, with all files the same size. After that, let contention and monitoring guide further changes.
Should I ever use more than one transaction log file for a database?
In nearly all cases, no. A single, well-sized log file with fixed growth increments is easier to manage and usually performs better. The main exception is when a transaction log fills an entire disk. If there’s no room left to expand the log file, and a full log file’s stopping write activity for that database, adding a second log file on another disk can be a reasonable short-term move. Even then, I still try to rebuild the log later so the database returns to a single log file.
Should I avoid percentage-based autogrowth?
Not always, but it creates unpredictability. Fixed-size growth’s easier to reason about and avoids oversized jumps as files get larger.
How do I know whether storage’s really the bottleneck?
Look at wait statistics and file-level I/O metrics over time. If latency and waits line up with specific files or operations, you’ve got a concrete place to investigate.
Where should I start if I’m already overloaded?
Start where the visibility and risk are highest: TempDB, transaction log sizing, and basic I/O monitoring. Those changes often pay off quickly in performance, stability, and time returned to the team.
More Resources
- State of Database Report: The report shows how to move toward proactive database performance management, so teams spend less time on reactive tickets and more on architecture and improvement.
- Unlock Peak Performance with Smarter Database Optimization: A practical guide for DBAs and IT pros. Learn best practices in indexing, SQL tuning, execution plans, and observability in today’s hybrid IT environments.
- Bridging the Observability Gap: A detailed guide to modernizing database management with the help of observability software.




