If you lead database teams, the pattern is probably familiar. Your DBAs spend too much time firefighting. Complexity keeps rising. The time available for real improvement keeps shrinking.
The pressure shows up in the data: The State of Database Report puts numbers behind that reality. DBAs spend an average of 27 hours a week on reactive or routine work. Nearly three-quarters say alert fatigue affects how well they can prioritize and respond to incidents. More than one third are considering leaving their roles.
That should concern any leader who depends on database reliability.
There is also a clear perception gap around visibility. Only 40% of DBAs say their monitoring environment is fully unified, while 50% of IT executives believe it already is. That disconnect helps explain why I/O bottlenecks, TempDB issues, and transaction log problems keep resurfacing as “surprise” incidents instead of getting fixed at the root.
DBA burnout is more than a people problem
This is why we opened the series of High-Performance DBA webinars by looking at burnout, the tyranny of the urgent, and what the State of Database Report says about constant firefighting across database teams.
If you have not read that earlier post, it is useful background. It explains why so many DBAs stay stuck in reactive mode and why unified visibility matters. The wider observability gap is not just a tooling problem. It is often the reason database issues stay hidden until users feel the impact.
In this post, I want to stay at one of the most practical layers in SQL Server: storage.
Storage is where day-to-day pressure becomes visible. It is where recurring I/O bottlenecks, TempDB contention, and log growth surprises tend to show up first. It is also one of the few places where a handful of deliberate changes can reduce incident volume quickly.
Storage Is More Than Capacity
Many projects still treat storage as a capacity checkbox. If there is enough free space, the conversation ends. But SQL Server jusr does not work that way.
On busy systems, storage is often the first major bottleneck. Repeated incidents around slow queries, blocked sessions, or missed SLAs usually have storage symptoms close by. Common examples include inconsistent I/O latency, undersized log files, erratic autogrowth, or a TempDB layout that no longer matches the workload.
Once those factors pile up, nobody cares whether there is technically “enough disk.” Users see slower response times. Business stakeholders see instability. DBAs see another week consumed by symptoms instead of improvement work.
If that sounds familiar, storage is not a side issue. It is one of the main pressure points in the job.
Maturity Matters More Than Tricks
Across environments, the gap between constant firefighting and steady improvement rarely comes down to one clever tuning move. More often, it comes down to maturity.
In low-maturity environments, storage decisions are ad hoc. Settings stay at their defaults. TempDB grows until it hurts. Fixes happen under pressure and rarely get documented.
In higher-maturity environments, teams do a few simple things well. They define sensible defaults. They monitor the right signals. They treat configuration changes as planned decisions instead of emergency reactions.
At the lowest level of maturity, you react. As maturity improves, you add baselines, proactive monitoring, automation, and consistent configuration. Incidents still happen, but they are less frequent and easier to diagnose.
Storage is a good place to start that shift because the payoff tends to show up quickly in both performance and day-to-day quality of life.
Why Storage Problems Keep Returning
Storage issues rarely exist in isolation. They interact with application design, indexing, memory pressure, and team habits.
Picture an environment where query patterns are poorly understood, indexes have accumulated over time, and TempDB shares a busy disk with several user databases. Add percentage-based autogrowth and a monitoring setup that only shows CPU and basic disk counters. In that environment, you may fix one incident today and watch a slightly different version return next week.
That cycle is exactly why fragmented monitoring is so costly. In siloed environments, DBAs spend more time stitching together alerts and dashboards just to understand where the problem lives. The survival move is obvious: patch the symptom and move on.
The way out is not flashy. It is a set of stable practices that make system behavior more predictable.
Getting the Log Under Control
The transaction log is one of the most misunderstood parts of SQL Server storage. Under the hood, it is sequential. It performs best when writes are steady and predictable.
In most environments, that means using one log file per database, giving it a realistic starting size, and using fixed growth increments instead of percentage-based growth.
Small and frequent autogrowth events create too many virtual log files over time. That can hurt write performance and slow recovery.
When I review an environment, I usually care less about the raw size of the log and more about what that size pattern reveals. A log file that keeps growing in tiny jumps during business hours tells you the system was never given enough room to operate cleanly.
Set more deliberate starting sizes and growth increments, and the result is often immediate. Log growth interruptions drop. Backup behavior becomes easier to predict. Recovery planning improves.
Making Data Files Predictable
Data files are where you can benefit from parallelism, but that benefit disappears when files drift into different sizes and different growth patterns.
The better pattern is straightforward. Choose a reasonable number of data files for the workload, make them the same size, give them enough initial capacity, and grow them by the same fixed amount when growth is needed.
Consistency makes the storage layer easier to understand. It also makes troubleshooting easier. You do not have to guess which file has become the outlier or which hidden growth setting is causing uneven behavior.
Over time, that uniformity gives you a believable baseline. Once normal file behavior is clear, true anomalies stand out much faster.
Why Boring Storage Usually Wins
It is tempting to design an intricate layout with multiple abstraction layers, elaborate filegroup strategies, or lots of carve-outs for specific workloads. Those designs can look smart on day one. They are often painful to live with over several years of change, team turnover, and shifting workloads.
The most reliable SQL Server environments I see are usually conservative. Their layouts are stable. Their growth patterns are deliberate. exceptions are limited.
That kind of stability is not glamorous, but it is one of the traits that separates high-performance DBAs from overwhelmed ones. It reduces noise. It makes patterns easier to spot.
Crucially, it keeps surprises to a minimum.
TempDB: Where the Real Cost Shows Up
If one database exposes the combined effect of your storage, memory, and query patterns, it is TempDB.
TempDB supports sorts, hashes, work tables, temporary objects, row versioning, and many other internal operations. Even if developers never write #temp, TempDB is still doing a large amount of work behind the scenes.
When TempDB is misconfigured, the symptoms are familiar: latch contention on specific pages, sudden space pressure, repeated autogrowth events, spills that show up under load, allocation-related waits, and intermittent slowdowns that are hard to explain at first glance.
I treat TempDB as a first-class workload. That means using multiple data files of equal size, assigning a realistic initial size, using fixed growth increments, and placing it on storage that can keep up with the workload.
Guidance has evolved across SQL Server versions, but the goal has not changed. Reduce allocation contention. Give TempDB room to breathe. Match the configuration to the workload you actually run.
Separating Competing I/O
Another practical improvement is to separate I/O patterns that should not compete with each other.
When transaction logs, TempDB, and user data all share the same busy path to disk, read-heavy and write-heavy activity begin fighting for the same resources. Add shared infrastructure or a noisy SAN layer, and unpredictability increases again.
Perfect separation is not always possible, but usseful separation often is. Moving TempDB to its own volume can help. Keeping transaction logs away from the noisiest random reads can help. Placing read-mostly archival data on a different tier can help.
The point is not theoretical purity. Your goal is to stop your most sensitive operations from tripping over each other all day.
Using Monitoring to Replace Guesswork
None of these changes should happen in the dark.
This is where monitoring turns storage tuning into analysis instead of intuition. Rather than relying on one disk queue metric or a screenshot from another team, you combine SQL Server’s view with system-level telemetry and recent change history.
In practice, that means watching wait statistics, file-level I/O metrics such as sys.dm_io_virtual_file_stats, TempDB behavior over time, transaction log growth patterns, and recent deployments or configuration changes.
The observability gap matters here too. When teams lack end-to-end visibility, database issues stay hidden longer and are harder to connect to root causes. A unified view helps teams move faster from “something is slow” to “this change altered file behavior.”
That linkage is what pushes teams away from one-off reactions and toward repeatable operating habits.
Cloud Changes the Bill, Not the Physics
Cloud platforms make storage easier to provision. They do not change the fundamentals. You still have IOPS limits. You still have bandwidth limits. You still need to think carefully about TempDB placement and transaction log sizing. The difference is that poor decisions now show up in two places: your performance graphs and your monthly invoice.
When I work with teams running SQL Server in Azure or other cloud environments, the same principles still apply. Understand the workload. Match storage performance to that reality. Avoid needless contention. Measure the effect of the changes you make.
Cloud makes bad storage decisions easier to hide for a while. It does not make them cheaper.
What High-Performance DBAs Actually Do
The DBAs I would trust with a business-critical environment tend to have similar habits.
They know how their log and data files are sized. They know how those files grow. They have configured TempDB for the workload they run now, not the workload they had years ago. They can answer “what changed recently?” with evidence, not guesswork.
They also treat alerts differently. They do not accept alert noise as the cost of the job. They tune it. They reduce it. They make it useful.
Most importantly, they use storage decisions to buy back time.
Every incident that does not recur is an hour returned to query tuning, automation, architecture, or proactive planning. That matters even more when the average DBA is already spending more than half the week on reactive work. Reclaimed time is not just an efficiency win. It is part of how teams reduce burnout and retain skilled people.
If you need a place to start, do not chase the most exotic trick you saw online. Start with the basics: log sizing, fixed autogrowth settings, data file consistency, TempDB configuration, and clear storage monitoring.
Those are the levers that move a team from constant firefighting toward the kind of high-performance DBA work that actually improves the environment.
FAQ
How many TempDB data files should I use?
A practical starting point is one TempDB data file per logical CPU core up to eight, with all files the same size. After that, let contention and monitoring guide further changes.
Should I ever use more than one transaction log file for a database?
In nearly all cases, no. A single, well-sized log file with fixed growth increments is easier to manage and usually performs better.
Is percentage-based autogrowth always wrong?
Not always, but it creates unpredictability. Fixed-size growth is easier to reason about and avoids oversized jumps as files get larger.
How do I know whether storage is really the bottleneck?
Look at wait statistics and file-level I/O metrics over time. If latency and waits line up with specific files or operations, you have a concrete place to investigate.
Where should I start if I am already overloaded?
Start where the visibility and risk are highest: TempDB, transaction log sizing, and basic I/O monitoring. Those changes often pay off quickly in performance, stability, and time returned to the team.
Sources
- State of Database Report: The report shows how to move toward proactive database performance management, so teams spend less time on reactive tickets and more on architecture and improvement.
- Whitepapers, Videos, & More: Head to our Resources site for content on database products. Information and insights to help you get more from SolarWinds.


