SQL Interview Questions - Fail Over Cluster
SQL Interview Questions - Fail Over Cluster
New validation feature. With this feature, you can check that your system, storage,
and network configuration is suitable for a cluster.
Support for GUID partition table (GPT) disks in cluster storage. GPT disks can have
partitions larger than two terabytes and have built-in redundancy in the way partition
information is stored, unlike master boot record (MBR) disks.
Cluster continues to work but failover will not happen in case of any other failure in
the active node.
In Windows Server 2003, the Quorum disk resource is required for the Cluster to
function. In your example, if the Quorum disk suddenly became unavailable to the
cluster then both nodes would immediately fail and not be able to restart the cluster
service.
Groups that contain an IP address resource and a network name resource (along with
other resources) are published to clients on the network under a unique server name.
Because these groups appear as individual servers to clients, they are called virtual
servers. Users access applications or services on a virtual server the same way they
access applications or services on a physical server. They do not need to know that
they are connecting to a cluster and have no knowledge of which node they are
connected to.
In the Cluster Administrator, rick click on the SQL Server Group and from the popup
menu item choose Take Offline.
After adding the shared disk in the storage, we can add disk to the respective SQL
Server Group.
Maximum 16.
Services and applications are managed as single units for configuration and recovery
purposes. If a resource depends on another resource, both resources must be a member
of the same service or application. For example, in a file share resource, the service or
application containing the file share must also contain the disk resource and network
resources (such as the IP address and NetBIOS name) to which clients connect to
access the share. All resources within a service or application must be online on the
same node in the cluster.
10. What kinds of permissions are required in the active directory to setup the
SQL Server cluster objects?
11. Why do we keep SQL Services in manual mode on each of the instance?
SQL Services should always be in manual mode in case of cluster because these are
managed by the Cluster service and its taken online on its respective owner node
based on the failover.
LooksAlive: Verifies that the SQL Server service runs on the online node every 5
seconds by default.
Validation test is a mechanism of verifying that all the components which are
participating in the Windows cluster are fine and failover is happening between the
nodes.
18. What are the basics tests done by the validation tests in Windows Cluster?
21. Will there be any downtime in Active\Active cluster in case of any failover?
Yes, definitely there will be downtime when SQL Server failover from one node to
another.
22 Can we use other SQL Server cluster Nodes for reporting purpose as we can
do in Logshipping and Database mirroring?
23. Can we place out Non Critical SQL Server User Databases on a Clustered
Instance on Disks that are not clustered to Save Money?
No, its not possible. SQL Server 2012 and all previous versions of SQL Server
require databases be created on clustered resources. Internal drives or drives which are
not part of the cluster group cannot hold user databases.
With the introduction of SQL Server 2012 Microsoft officially supports local disk
TempDB in SQL Server cluster configurations.
25. Can we configure Windows cluster between two servers which are having
different hardware and software configurations?
No it is not possible.
SMB stands for Server Message Block file server which can be used as a storage
option starting SQL Server 2012 to store system databases (Master, Model, MSDB,
and TempDB), and Database Engine user databases .
27. How can we check the current node/host name where SQL Server is running?
Select serverproperty(ComputerNamePhysicalNetBIOS)
C:\Windows\System32>cluster node
--or
C:\Windows\System32>cluster group
--or
C:\Windows\System32>cluster resource
--or
CLUSTERING
63) What is the difference between SQL Server clustering methods: - Active/Active -
Active/Passive
Active/Active means that both nodes are active and accessing the shared disk resources, but are
running independent instances. When a node fails, you need to be sure that the remaining node
has the resources available to handle the additional databases that fail over. You can think of it
like this. Node A has 1 database on it, and Node B has 1 database on it. Node A goes down, the
resources fail over to Node B, and now Node B has 2 databases running on it.
In an Active/Passive cluster, you would only have 1 database running on a single node at any
given time. Node A is active with 1 DB, Node B is passive with no DBs. Node A goes down, the
resources fail over to Node B. Node B is now active with 1 database running on it.
I'm sure more experienced cluster admins or SQL admins will savage me for my terminology, but
that's it in a nutshell.
LooksAlive check:
LooksAlive is a basic check in which the Cluster service queries the Windows Service Control Manager to
check if the SQL Server service is still running. By default this check happens every 5 seconds.
Cluster service calls looksAlive function every 5 seconds and LookAlive function Queries the service
status by using the Windows NT Service Control Manager. When the LooksAlive test fails ISAlive test is
called immediately.
ISalive Check: A more rigorous IsAlive function is called every 60 second and monitors the health of the
SQL Server by opening up a connection to SQL Server and issuing select @@servername query over
the connection. If the checks fail the online Thread reports this failure to the Cluster Service.
During IsAlive check the Cluster Service connects to the SQL Server instance with the help of
c:\windows\system32\sqsrvres.dll and runs SELECT @@SERVERNAME against the instance.
By default, LooksAlive is fired every 5 seconds and IsAlive is fired every 60 seconds. The LooksAlive and
IsAlive polling intervals can be changed in Cluster Administrator or failover cluster manager from the
advanced tab for the SQL Server resource or using the cluster.execommand prompt utility.
65) What is meant by Active Passive and Active Active clustering setup?
An Active Passive cluster is a failover cluster configured in a way that only one cluster node is
active at any given time. The other node, called as Passive node is always online but in an idle
condition, waiting for a failure of the Active Node, upon which the Passive Node takes over the
SQL Server Services and this becomes the Active Node, the previous Active Node now being a
Passive Node.
An Active Active cluster is a failover cluster configured in a way that both the cluster nodes are
active at any given point of time. That is, one Instance of SQL Server is running on each of the
nodes always; when one of the nodes has a failure, both the Instances run on the only one node
until the failed node is brought up (after fixing the issue that caused the node failure). The
instance is then failed over back to its designated node.
66) List out some of the requirements to setup a SQL Server failover cluster.
Virtual network name for the SQL Server, Virtual IP address for SQL Server, IP addresses for the
Public Network and Private Network(also referred as Hearbeat) for each node in the failover
cluster, shared drives for SQL Server Data and Log files, Quorum Disk and MSDTC Disk.
67) On a Windows Server 2003 Active Passive failover cluster, how do you find the node
which is active?
Using Cluster Administrator, connect to the cluster and select the SQL Server cluster. Once you
have selected the SQL Server group, in the right hand side of the console,
thecolumn Owner gives us the information of the node on which the SQL Server group is
currently active.
68) How do you open a Cluster Administrator?
From Start -> Run and type CluAdmin (case insensitive) and the Cluster Administrator console
is displayed OR you can also go to Start -> All programs -> Administrative Tools -> Cluster
Administrator.
69) How will you restart your sqlserver on cluster without failing over ..?
Choose option ( Take offline and Bring online option by right clicking node)
70) What will you if want to add a disk to the SQL Group cluster ..?
Need to choose Add Dependancy option after doing that in Cluster administrator tool (or) in
Failover Cluster admin tool from 2008 version
71) As a DBA how will you design active/active cluster requirement . (i.e), how will you
manage resource if failed over ..?
Please read article from MSDN on this to have better understanding
73) Difference between SQLSERVER 2005 and SQLSERVER 2008 Cluster Installation ..?
In sql2005 we have the option of installing sql in remaining nodes from the primary node ., But in
sql2008 we need to go seperately(Login to the bith nodes) for installing sql cluster .
74) What is the status of services on passive node for failover cluster in SQL server?
SQL services will be in manual and stopped. Cluster service will be in automatic and started
mode on both the nodes.
75) Can you move the resources after pausing the node?
Yes resources can be moved after pausing the node. But we can't move them back till the node
is paused.
77) How does the failover happen? What checks are performed to ensure that another
node is up?
LooksAlive - The node which host the SQL server resources is verifed whether this node
(server) is up
IsAlive - The node which host the SQL server resources is verifed whether the SQL service is up
or not. Basically running SELECT @@SERVERNAME
78) What will happen if you try to start the fultext service on the passive node.
This can be started on both the nodes as this doesn't have any dependecy on SQL service or
any resource which is possessed by active node.
86) How is the quorum information located on the system disk of each node kept in
synch?
The server cluster infrastructure ensures that all changes are replicated and updated on all
members in a cluster.
89) What is the difference between a geographically dispersed cluster and an MNS
cluster?
A geographic cluster refers to a cluster that has nodes in multiple locations, while an MNS-based
cluster refers to the type of quorum resources in use. A geographic cluster can use either a
shared disk or MNS quorum resource, while an MNS-based cluster can be located in a single
site, or span multiple sites.
95) What new functionality does failover clustering provide in Windows Server 2008
?
New validation feature. With this feature, you can check that your system, storage, and network
configuration is suitable for a cluster.
Support for GUID partition table (GPT) disks in cluster storage. GPT disks can have partitions
larger than two terabytes and have built-in redundancy in the way partition information is stored,
unlike master boot record (MBR) disks.
96) What happens to a running Cluster if the quorum disk fails in Windows Server 2003
Cluster ?
In Windows Server 2003, the Quorum disk resource is required for the Clusterto function. In your
example, if the Quorum disk suddenly became unavailableto the cluster then both nodes would
immediately fail and not be able torestart the clussvc.
In that light, the Quorum disk was a single point of failure in a MicrosoftCluster implementation.
However, it was usually a fairly quick workaround toget the cluster back up and operational.
There are generally two solutionsto that type of problem.
1. Detemrine why the Quorum disk failed and repair.
2. Reprovision a new LUN, present it to the cluster, assign it a driveletter and format. Then start
one node with the /FQ switch and throughcluadmin designate the new disk resource as the
Quorum. Then stop andrestart the clussvc normally and then bring online the second node.
97) What happens to a running Cluster if the quorum disk fails in Windows Server 2008
Cluster ?
Cluster continue to work but failover will not happen in case of any other failure in the active
node.
99) What is the standard setting of Lookslive, IsAlive and Pending Timeout?
LooksAlive - 5 sec IsAlive - 30 sec Pending Timeout - 180 sec
Note- Do not modify Pending Timeout. The value, represented in seconds, is the amount of time
the resource in either the Offline Pending or Online Pending states has to resolve its status
before the Cluster Service puts the resource in either Offline or Failed status.
=========================================================
ISalive Check: A more rigorous IsAlive function is called every 60 second and monitors the
health of the SQL Server by opening up a connection to SQL Server and issuing select
@@servername query over the connection. If the checks fail the online Thread reports this
failure to the Cluster Service.
During IsAlive check the Cluster Service connects to the SQL Server instance with the help of
c:\windows\system32\sqsrvres.dll and runs SELECT @@SERVERNAME against the instance.
By default, LooksAlive is fired every 5 seconds and IsAlive is fired every 60 seconds. The
LooksAlive and IsAlive polling intervals can be changed in Cluster Administrator or failover
cluster manager from the advanced tab for the SQL Server resource or using
the cluster.execommand prompt utility.
An Active Passive cluster is a failover cluster configured in a way that only one cluster
node is active at any given time. The other node, called as Passive node is always online
but in an idle condition, waiting for a failure of the Active Node, upon which the Passive
Node takes over the SQL Server Services and this becomes the Active Node, the
previous Active Node now being a Passive Node.
An Active Active cluster is a failover cluster configured in a way that both the cluster
nodes are active at any given point of time. That is, one Instance of SQL Server is
running on each of the nodes always; when one of the nodes has a failure, both the
Instances run on the only one node until the failed node is brought up (after fixing the
issue that caused the node failure). The instance is then failed over back to its
designated node.
Question: List out some of the requirements to setup a SQL Server failover
cluster.
Virtual network name for the SQL Server, Virtual IP address for SQL Server, IP
addresses for the Public Network and Private Network(also referred as Hearbeat) for
each node in the failover cluster, shared drives for SQL Server Data and Log files,
Quorum Disk and MSDTC Disk.
1. How will you restart your sqlserver on cluster without failing over ..?
Choose option ( Take offline and Bring online option by right clicking node)
2. What will you if want to add a disk to the SQL Group cluster ..?
Need to choose Add Dependancy option after doing that in Cluster administrator tool (or)
in Failover Cluster admin tool from 2008 version
3. As a DBA how will you design active/active cluster requirement . (i.e), how
will you manage resource if failed over ..?
Please read article from MSDN on this to have better understanding
In sql2005 we have the option of installing sql in remaining nodes from the primary node
., But in sql2008 we need to go seperately(Login to the bith nodes) for installing sql
cluster .
6. What is the status of services on passive node for failover cluster in SQL
server?
SQL services will be in manual and stopped. Cluster service will be in automatic and
started mode on both the nodes.
9. How does the failover happen? What checks are performed to ensure that
another node is up?
LooksAlive - The node which host the SQL server resources is verifed whether this node
(server) is up
IsAlive - The node which host the SQL server resources is verifed whether the SQL
service is up or not. Basically running SELECT @@SERVERNAME
10. What will happen if you try to start the fultext service on the passive node.
This can be started on both the nodes as this doesn't have any dependecy on SQL
service or any resource which is possessed by active node.
18. How is the quorum information located on the system disk of each node
kept in synch?
The server cluster infrastructure ensures that all changes are replicated and updated on
all members in a cluster.
25. Does MNS get rid of the need for shared disks?
It depends on the application. For example, clustered SQL Server 2000 requires shared
disk for data. Remember, MNS only removes the need for a shared disk quorum.
27. What new functionality does failover clustering provide in Windows Server
2008 ?
New validation feature. With this feature, you can check that your system, storage, and
network configuration is suitable for a cluster.
Support for GUID partition table (GPT) disks in cluster storage. GPT disks can have
partitions larger than two terabytes and have built-in redundancy in the way partition
information is stored, unlike master boot record (MBR) disks.
28. What happens to a running Cluster if the quorum disk fails in Windows
Server 2003 Cluster ?
In Windows Server 2003, the Quorum disk resource is required for the Clusterto
function. In your example, if the Quorum disk suddenly became unavailableto the cluster
then both nodes would immediately fail and not be able torestart the clussvc.
In that light, the Quorum disk was a single point of failure in a MicrosoftCluster
implementation. However, it was usually a fairly quick workaround toget the cluster back
up and operational. There are generally two solutionsto that type of problem.
1. Detemrine why the Quorum disk failed and repair.
2. Reprovision a new LUN, present it to the cluster, assign it a driveletter and format.
Then start one node with the /FQ switch and throughcluadmin designate the new disk
resource as the Quorum. Then stop andrestart the clussvc normally and then bring
online the second node.
29. What happens to a running Cluster if the quorum disk fails in Windows
Server 2008 Cluster ?
Cluster continue to work but failover will not happen in case of any other failure in the
active node.
As all of us need every service should be 24 into 7 and available every time we need
that. So every services provider need to technically strong. TO overcome this problem
new concept comes into picture that is failover cluster. It is group of independent
computers that work together to increase the availability of applications and services.
Here each clustered server connected by cables and by software. And this clustered
server called nodes. If one cluste
r nodes fails then another node begins to work. It is good technique which is not
dependent on single server.
31. What is the standard setting of Lookslive, IsAlive and Pending Timeout?
LooksAlive - 5 sec IsAlive - 30 sec Pending Timeout - 180 sec
Note- Do not modify Pending Timeout. The value, represented in seconds, is the amount
of time the resource in either the Offline Pending or Online Pending states has to resolve
its status before the Cluster Service puts the resource in either Offline or Failed status.
To configure the failover policy, in the Threshold box, enter the number of times the
group is allowed to fail over within a set span of hours. In the Period box, enter the set
span of hours. For example, if Threshold is set to 10 and Period is set to 6, the Cluster
Service fails the group over a maximum of 10 times in a 6-hour period. At the 11th
failover in that 6-hour period, the server cluster leaves the group offline. This affects
only resources that were failed over; therefore, if the SQL Server resource failed 11
times, it would be left offline, but the IP could be left online.
33. What is the status of the Cluster Service and SQL service on both the
nodes? Would they both were stop on the passive node?
Cluster service is automatic and started mode on all the nodes. But SQL Service will run
only on the active node.
34. Is it possible to put Cluster Group and SQL Group on different nodes?
Yes it is Possible. If you have one group on one node and another group on another
node... that will run.
A:
Looks Alive check:Looks alive check is a basic resource health check to verify that the service(SQL
service in our context) is running properly.To perform this , cluster service queries the windows
service control manager to check the status of the service.By default looks alive check will happen in
every five seconds.
Is Alive check: An exhaustive check to verify that a resource is running properly. If this check fails,
the resource is moved offline and the failover process is triggered. During the Is alive check the
cluster service connects to the SQL server instance and execute select @@SERVERNAME.It will
check only the SQL server instance availability and does not check the availability of user databases.
You can specify two polling intervals and a timeout value for resources. The polling intervals affect
how often the MSCS Resource Monitor checks that the resource is available and operating. There are
two levels of polling; they are known in Cluster Administrator as "Looks Alive" and "Is Alive." These
values are named for the calls that the Resource Monitor makes to the resource to perform the
polling. In "Looks Alive" polling, MSCS performs a cursory check to determine if the resource is
available and running. In "Is Alive" polling, MSCS performs a more thorough check to determine if the
resource is fully operational. The timeout value specifies how many seconds MSCS waits before it
considers the resource failed.
A:
Quorum is the cluster's configuration file.This file (quorum.log) resides in the the quorum disk (one
disk from shared disk array).Quorum is the main interpreter between all nodes. It stores latest cluster
configuration and resource data. This helps the other nodes to take ownership when one node goes
down.
Setup.exe /ConfigurationFile=MyConfigurationFile.INI
Instead of specifying passwords inside the config file specify them explicitly as
below.
Setup.exe /SQLSVCPASSWORD=************ /AGTSVCPASSWORD=************
/ASSVCPASSWORD=************ /ISSVCPASSWORD=************
/RSSVCPASSWORD=************ /ConfigurationFile=MyConfigurationFile.INI
Q. What are the top performance counters to be monitor in Performance
Monitor?
Ans:
Processor\%Processor Time: Monitoring CPU consumption allows you to check for
a bottleneck on the server (indicated by high sustained usage).
High percentage of Signal Wait: Signal wait is the time a worker spends waiting
for CPU time after it has finished waiting on something else (such as a lock, a latch
or some other wait). Time spent waiting on the CPU is indicative of a CPU
bottleneck. Signal wait can be found by executing DBCC SQLPERF (waitstats) on
SQL Server 2000 or by querying sys.dm_os_wait_stats on SQL Server 2005.
Physical Disk\Avg. Disk Queue Length: Check for disk bottlenecks: if the value
exceeds 2 then it is likely that a disk bottleneck exists.
MSSQL$Instance: Buffer Manager\Page Life Expectancy: Page Life Expectancy is
the number of seconds a page stays in the buffer cache. A low number indicates
that pages are being evicted without spending much time in the cache, which
reduces the effectiveness of the cache.
MSSQL$Instance: Plan Cache\Cache Hit Ratio: A low Plan Cache hit ratio means
that plans are not being reused.
MSSQL$Instance:General Statistics\Processes Blocked: Long blocks indicate
contention for resources.
Q. Task manager is not showing the correct memory usage by SQL Server. How
to identify the exact memory usage from SQL Server?
Ans:
To know the exact memory usage relay on column physical_memory_in_use_kb
from DMV sys.dm_os_process_memory.
Instance: sqlservr
Instance: sqlservr
The Private Bytes counter measures the memory that is currently committed. The
Working Set counter measures the physical memory that is currently occupied by
the process.
For 64-bit sql servers we can also check the current memory usage using the below
performance counter.
We must be very careful in dealing with this option. One can enable this after a
detailed analysis of current environment.
Following issues may rise when Lock Pages in Memory is not turned on:
Q. How do you know how much memory has been allocated to sql server using
AWE?
Ans:
We can use DBCC MEMORYSTSTUS command to know the memory allocation
information. But its trick to understand the results.
From 2008 onwards we can get all memory related information using DMV
sys.dm_os_process_memory.
Q. How to apply service pack on Active / Passive cluster on 2008 and 2012?
Ans:
1. Freeze the service groups on Node A (active node).
2. Confirm all SQL services are stopped on Node B.
4. Reboot node B.
7. After the service group comes online, freeze the service group on Node B.
a. RDP to the console is ok, but a standard RDP connection is not recommended.
6. Verify all users are logged out from all other nodes (RDP and Console sessions)
a. You should not need to perform the install on any other nodes, nor reboot them.
The service pack will update the passive nodes first.
Q. You find SP is not applied on all the nodes across the cluster. How to apply
SP only on required nodes?
Ans:
If you find that the product level is not consistent across all the nodes, you will
need to fool the 2005 patch installer into only patching the nodes that need
updating. To do so, you will have to perform the following steps:
Method2:
1. Offline the SQL resources
2. Update the service account at SSCM and restart the service as needed
Note: Dont forget to update service account at the remaining nodes on the
cluster.
Method 3:
1. Node 2 (inactive node) change the SQL startup account in SQL Studio or SCM
2. To start with a clean slate and ensure any previous updates are completed both
nodes should be restarted if possible. Choose the physical node that you you want
to patch second and restart that node (in my example node2).
3. Restart the node you want to patch first (node1). This will mean that both
active SQL instances are now running on node2. Some restarts will be essential,
but you could avoid the first two restarts if you need to keep downtime to a
minimum and just fail SQL1 over to node2. The main point here is to always patch
a passive node.
4. In cluster administrator remove node1 from the possible owners lists of SQL1
and SQL2. This means that neither SQL instance can fail over to node1 while it is
being patched.
6. Restart node1.
7. Add node1 back into the possible owners lists of SQL1 and SQL2 and fail both
instances over to node1.
9. Add node2 back into the possible owners lists of SQL1 and SQL2 and fail both
instances over to node2. Check that the build level is correct and review the SQL
Server error logs.
10. Fail SQL1 over to node1. Check build levels and SQL Server error logs
TextData
ApplicationName
NTUserName
LoginName
CPU
Reads
Writes
Duration
SPID
StartTime
EndTime
Database Name
Error
HostName
LinkedServerName
NTDomainName
ServerName
SQLHandle
All these columns need not be available for all of the events, but depends on the
event select we have to choose the appropriate columns.
Filters:
ApplicationName
DatabaseName
DBUserName
Error
HostName
NTUserName
NTDomainName
For transactional replication, the behavior of log shipping depends on the sync
with backup option. This option can be set on the publication database and
distribution database; in log shipping for the Publisher, only the setting on the
publication database is relevant.
Setting this option on the publication database ensures that transactions are not
delivered to the distribution database until they are backed up at the publication
database. The last publication database backup can then be restored at the
secondary server without any possibility of the distribution database having
transactions that the restored publication database does not have. This option
guarantees that if the Publisher fails over to a secondary server, consistency is
maintained between the Publisher, Distributor, and Subscribers. Latency and
throughput are affected because transactions cannot be delivered to the
distribution database until they have been backed up at the Publisher.
Q. What are the best RAID levels to use with SQL Server?
Ans:
Before choosing the RAID (Redundant Array of Independent Disks) we should have a
look into usage of SQL Server files.
As a basic thumb rule Data Files need random access, Log files need
sequential access and TempDB must be on a fastest drive and must be separated
from data and log files.
We have to consider the below factors while choosing the RAID level:
Reliability
Storage Efficiency
Random Read
Random Write
Sequential Write
Sequential Write
Cost.
1. Replication monitor
2. Replication commands
3. Tracer Tokens
1. Replication Monitor: In replication monitor from the list of all subscriptions just
double click on the desired subscription. There we find three tabs.
Publisher to Distributor History
Distributor to Subscriber History
Undistributed commands
2. Replication Commands:
Publisher.SP_ReplTran: Checks the pending transactions at p
Distributor.MSReplCommands and MSReplTransactions: Gives the transactions
and commands details. Actual T_SQL data is in binary format. From the entry time
we can estimate the latency.
Distributor.SP_BrowseReplCmds: It shows the eaxct_seqno along with the
corresponding T-SQL command
sp_replmonitorsubscriptionpendingcmds: It shows the total number of pending
commands to be applied at subscriber along with the estimated time.
3. Tracer Tokens:
Available from Replication Monitor or via TSQL statements, Tracer Tokens are
special timestamp transactions written to the Publishers Transaction Log and
picked up by the Log Reader. They are then read by the Distribution Agent and
written to the Subscriber. Timestamps for each step are recorded in tracking
tables in the Distribution Database and can be displayed in Replication Monitor or
via TSQL statements.
When Log Reader picks up Token it records time in MStracer_tokens table in the
Distribution database. The Distribution Agent then picks up the Token and records
Subscriber(s) write time in the MStracer_history tables also in the Distribution
database.
Below is the T-SQL code to use Tracer tokens to troubleshoot the latency issues.
A SQL Agent JOB to insert a new Tracer Token in the publication database.
USE [AdventureWorks]
Go
Go
Go
publisher_commit
subscriber_commit
A typical tail log backup is having two options, 1. WITH NORECOVERY 2.Continue
After Error.
1. WITH NORECOVERY: To make sure no transactions happens after the tal log
backup
2. CONTINUE AFTER ERROR: Just to make sure log backup happens even though
some meta data pages corrupted.
Q. Consider a situation where publisher database log file has been increasing
and there there is just few MB available on disk. As an experienced professional
how do you react to this situation? Remember no disk space available and also
we cant create a new log file on other drive
Ans:
Essentially we have to identify the bottleneck which is filling the log file.
Resolve if there are any errors in log reader agent / distribution agent
Fix if there are any connectivity issues either between publisher distributor or
distributor
Fix if there are any issues with I/O at any level
Check if there is any huge number of transactions pending from publisher
Check if there are any large number of VLFs (USE DBCC Loginfo)which slows
the logreader agent work.
Check all database statistics are up-to-date at distributer. Usually we do siwtch
off this Auto Update Stats by default.
To find and resolve these issues we can use Replication Monitor, DBCC
Commands, SQL Profiler, System Tables / SP / Function.
If incase we cant resolve just by providing a simple solution we have to shrink the
transaction log file. Below are two methods.
To shrink the transaction log file:
1. Backup the log So transactions in vlfs are marked as inactive
2. Shrink the logfile using DBCC SHRINKFILE Inactive VLFs would be removed
There are two parameters that we need to change to False. 1. Immediate Sync
and 2. Allow_Ananymous.
Both the fields were set to ON by default. If the Immediate_sync is enabled every
time you add a new article it will cause the entire snapshot to be applied and not
the one for the particular article alone.
Steps:
1. Change the values to True for publication properties Immediate_Sync and
Allow_Anonymous using SP_CHANGEPUBLICATION
2. Add a new article to the publication using SP_AddArticle. While executing this
procedure along with the required parameters also specify the parameter
@force_invalidate_snapshot=1.
3. Add the subscriptions to the publication for the single table/article uisng
SP_ADDSUBSCRIPTION. While executing this proc specify the parameter
@Reserved = Internal. Generate a new snapshot which only includes newly added
article.
For servers that use more than eight processors, use the following configuration:
MAXDOP=8
For servers that use eight or fewer processors, use the following configuration:
MAXDOP=0 to N
Q. How distributed transactions works in SQL Server?
Ans:
Distributed transactions are the transactions that worked across the databases,
instances in the given session. Snapshot isolation level does not support distributed
transactions.
3. Turn on random options at linked server properties like RPC, RPC Out,
Data Access etc.
Q. Can you give some examples for One to One, One to Many and Many to Many
relationships?
Ans:
One to One: Citizen UID
A citizen can have only one UID A UID can represent only one citizen
Q. I wanted to know what are the maximum worker threads setting and active
worker thread count on sql server. Can you tell me how to capture this info?
Whats the default value for max thread count?
Ans:
We can check the current settings and thread allocation using the below queries.
Thread setting
Increasing the number of worker threads may actually decrease the performance
because too many threads causes context switching which could take so much of
the resources that the OS starts to degrade in overall performance.
Alternatively, a log shipped copy of the database could save your bacon (you have
a warm standby, and you know the log backups are definitely good).
Q. Full backup size is 300 GB, usually my diff backup size varies between 300
MB and 5 GB, one day unfortunately diff backup size was increased to 250 GB?
What might be the reason any idea?
Ans:
Are you the kind of DBA who rebuilds all indexes nightly? Your differential backups
can easily be nearly as large as your full backup. That means youre taking up
nearly twice the space just to store the backups, and even worse, youre talking
about twice the time to restore the database.
To avoid these issues with diff backups , ideally schedule the index maintenance to
happen right before the full backup.
Q. What is .TUF file? What is the significance of the same? Any implications if
the file is deleted?
Ans:
.TUF file is the Transaction Undo File, which is created when performing log
shipping to a server in Standby mode.
When the database is in Standby mode the database recovery is done when the log
is restored; and this mode also creates a file on destination server with .TUF
extension which is the transaction undo file.
This file contains information on all the modifications performed at the time
backup is taken.
The file plays a important role in Standby mode the reason being very obvious
while restoring the log backup all uncommited transactions are recorded to the
undo file with only commited transactions written to disk which enables the users
to read the database. So when we restore next transaction log backup; SQL server
will fetch all the uncommited transactions from undo file and check with the new
transaction log backup whether commited or not.
If .tuf file is got deleted there is no way to repair logshipping except reconfiguring
it from scratch.