Quantcast
Channel: You Had Me At EHLO…
Viewing all 301 articles
Browse latest View live

Life in a Post TMG World – Is It As Scary As You Think?

$
0
0

Let’s start this post about Exchange with a common question: Now that Microsoft has stopped selling TMG, should I rip it out and find something else to publish Exchange with?

I have occasionally tried to answer this question with an analogy. Let’s try it.

My car (let’s call it Threat Management Gateway, or TMG for short), isn’t actively developed or sold any more (like TMG). However, it (TMG) works fine right now, it does what I need (publishes Exchange securely) and I can get parts for it and have it serviced as needed (extended support for TMG ends 2020) and so I ‘m keeping it. When it eventually either doesn’t meet my requirements (I want to publish something it can’t do) or runs out of life (2020, but it could be later if I am ok to accept the risk of no support) then I’ll replace it.

Now, it might seem odd to offer up a car analogy to explain why Microsoft no longer selling TMG is not a reason for Exchange customers to panic, but I hope you’ll agree, it works, and leads you to conclude that when something stops being sold, like your car, it doesn’t immediately mean you replace it, but instead think about the situation and decide what to do next. You might well decide to go ahead and replace TMG simply based on our decision to stop selling or updating it, that’s fine, but just make sure you are thinking the decision through.

Of course, you might also decide not to buy another car. Your needs have changed. Think about that.

Here are some interesting Exchange-related facts to help further cement the idea I’m eventually going to get to.

  1. We do not require traffic to be authenticated prior to hitting services in front of Exchange Online.
  2. We do not do any form of pre-authentication of services in front of our corporate, on-premises messaging deployments either.
  3. We have spent an awfully large amount of time as a company working on securing our code, writing secure code, testing our code for security, and understanding the threats that exist to our code. This is why we feel confident enough to do #1 and #2.
  4. We have come to learn that adding layers of security often adds little additional security, but certainly lots of complexity.
  5. We have invested in getting our policies right and monitoring our systems.

This basically says we didn’t buy another car when ours didn’t meet our needs any more. We don’t use TMG to protect ourselves any more. Why did we decide that?

To explain that, you have to cast your mind back to the days of Exchange and Windows 2000. The first thing to admit is that our code was less ‘optimal’ (that’s a polite way of putting it), and there were security issues caused by anonymous access. So, how did we (Exchange) tell you to guard against them? By using something called ISA (Internet Security and Acceleration – which is an odd name for what it was, a firewall). ISA, amongst other things, did pre-authentication of connections. It forced users to authenticate to it, so it could then allow only authenticated users access to Exchange. It essentially stopped anonymous users getting to Windows and Exchange. Which was good for Windows and Exchange, because there were all kinds of things that they could do if they got there anonymously.

However once authenticated users got access, they too could still do those bad things if they chose to. And so of course could anyone not coming through ISA, such as internal users. So why would you use ISA? It was so that you would know who these external users were wouldn’t you?

But do you really think that’s true? Do you think most customers a) noticed something bad was going on and b) trawled logs to find out who it was who did it? No, they didn’t. So it was a bit like an insurance policy. You bought it, you knew you had it, you didn’t really check to see if it covers what you were doing until you needed it, and by then, it was too late, you found out your policy didn’t cover that scenario and you were in the deep doo doo.

Insurance alone is not enough. If you put any security device in front of anything, it doesn’t mean you can or should just walk away and call it secure.

So at around the same time as we were telling customers to use ISA, back in the 2000 days, the whole millennium bug thing was over, and the proliferation of the PC, and the Internet was continuing to expand. This is a very nice write up on the Microsoft view of the world.

Those industry changes ultimately resulted in something we called Trustworthy Computing. Which was all about changing the way we develop software – “The data our software and services store on behalf of our customers should be protected from harm and used or modified only in appropriate ways. Security models should be easy for developers to understand and build into their applications.” There was also the Secure Windows Initiative. And the Security Development Lifecycle. And many other three letter acronyms I’m sure, because whatever it was you did, it needed a good TLA.

We made a lot of progress over those ten years since then. We delivered on the goal that the security of the application can be better managed inside the OS and the application rather than at the network layer.

But of course most people still seem to think of security as being mainly at the network layer, so think for a moment about what your hardware/software/appliance based firewall does today. It allows connections from a destination, on some configurable protocol/port, to a configured destination protocol/port.

If you have a load balancer, and you configure it to allow inbound connections to an IP on its external interface, to TCP 443 specifically, telling it to ignore everything else, and it takes those packets and forward them to your Exchange servers, is that not the same thing as a firewall?

Your load balancer is a packet filtering firewall. Don’t tell your load balancing vendor that, they might want to charge you extra for it, but it is. And when you couple that packet level filtering firewall/load balancer with software behind it that has been hardened for 10 years against attacks, you have a pretty darn secure setup.

And that is the point. If you hang one leg of your load balancer on the Internet, and one leg on your LAN, and you operate a secure and well managed Windows/Exchange Server – you have a more secure environment than you think. Adding pre-authentication and layers of networking complexity in front of that buys you very little extra, if anything.

So let’s apply this directly to Exchange, and try and offer you some advice from all of this. What should YOU do?

The first thing to realize is that you now have a CHOICE. And the real goal of this post is to help you make an INFORMED choice. If you understand the risks, and know what you can and cannot do to mitigate them, you can make better decisions.

Do I think everyone should throw out that TMG box they have today and go firewall commando? No. not at all. I think they should evaluate what it does for them, and, if they need it going forward. If they do that, and decide they still want pre-auth, then find something that can do it, when the time to replace TMG comes.

You could consider it a sliding scale, of choice. Something like this perhaps;

TMGScale

So this illustrated that there are some options and choices;

  1. Just use a load balancer– as discussed previously, a load balancer only allowing in specified traffic, is a packet filtering firewall. You can’t just put it there and leave it though, you need to make sure you keep it up to date, your servers up to date and possibly employ some form of IDS solution to tell you if there’s a problem. This is what Office 365 does.
  2. TMG/UAG– at the other end of the scale are the old school ‘application level’ firewall products. Microsoft has stopped selling TMG, but as I said earlier, that doesn’t mean you can’t use it if you already have it, and it doesn’t stop you using it if you buy an appliance with it embedded.

In the middle of these two extremes (though ARR is further to the left of the spectrum as shown in the diagram) are some other options.

Some load balancing vendors offer pre-authentication modules, if you absolutely must have pre-auth (but again, really… you should question the reason), some use LDAP, some require domain joining the appliance and using Kerberos Constrained Delegation, and Microsoft has two options here too.

The first, (and favored by pirates the world over) is Application Request Routing, or ARR! for short. ARR! (the ! is my own addition, marketing didn’t add that to the acronym but if marketing were run by pirates, they would have) “is a proxy based routing module that forwards HTTP requests to application servers based on HTTP headers and server variables, and load balance algorithms” – read about it here, and in the series of blog posts we’ll be posting here in the not too distant future. It is a reverse proxy. It does not do pre-authentication, but it does let you put a non-domain joined machine in front of Exchange to terminate the SSL, if your 1990’s style security policy absolutely requires it, ARR is an option.

The second is WAP. Another TLA. Recently announced at TechEd 2013 in New Orleans is the upcoming Windows Server 2012 R2 feature – Web Application Proxy. A Windows 2012 feature that is focused on browser and device based access and with strong ADFS support and WAP is the direction the Windows team are investing in these days. It can currently offer pre-authentication for OWA access, but not for Outlook Anywhere or ActiveSync. See a video of the TechEd session here (the US session) and here (the Europe session).

Of course all this does raise some tough questions. So let’s try and answer a few of those;

Q: I hear what you are saying, but Windows is totally insecure, my security guy told me so.

A: Yes, he’s right. Well he was right, in the yesteryear world in which he formed that opinion. But times have changed, and when was the last time he verified that belief? Is it still true? Do things change in this industry?

Q: My security guy says Microsoft keeps releasing security patches and surely that’s a sign that their software is full of holes?

A: Or is the opposite true? All software has the potential for bugs and exploits, and not telling customers about risks, or releasing patches for issues discovered is negligent. Microsoft takes the view that informed customers are safer customers, and making vulnerabilities and mitigations known is the best way of protecting against them.

Q: My security guy says he can’t keep up with the patches and so he wants to make the server ‘secure’ and then leave it alone. Is that a good idea?

A: No. It’s not (I hope) what he does with his routers and hardware based firewalls is it? Software is a point in time piece of code. Security software guards against exploits and attacks it knows of today. What about tomorrow? None of us are saying Windows, or any other vendor’s solution is secure forever, which is why a well-managed and secure network keeps machines monitored and patched. If he does not patch other devices in the chain, overall security is compromised. Patches are the reality of life today, and they are the way we keep up with the bad guys.

Q: My security guy says his hardware based firewall appliance is much more secure than any Windows box.

A: Sure. Right up to the point at which that device has a vulnerability exposed. Any security device is only as secure as the code that was written to counter the threats known at that time. After that, then it’s all the same, they can all be exploited.

Q: My security guy says I can’t have traffic going all the way through his 2 layers of DMZ and multitude of devices, because it is policy. It is more secure if it gets terminated and inspected at every level.

A: Policy. I love it when I hear that. Who made the policy? And when? Was it a few years back? Have the business requirements changed since then? Have the risks they saw back then changed any? Sure, they have, but rarely does the policy get updated. It’s very hard to change the entire architecture for Exchange, but I think it’s fair to question the policy. If they must have multiple layers, for whatever perceived benefit that gives (ask them what it really does, and how they know when a layer has been breached), there are ways to do that, but one could argue that more layers doesn’t necessarily make it better, it just makes it harder. Harder to monitor, and to manage.

Q: My security guy says if I don’t allow access from outside except through a VPN, we are more secure.

A: But every client who connects via a VPN adds one more gateway/endpoint to the network don’t they? And they have access to everything on the network rather than just to a single port/protocol. How is that necessarily more secure? Plus, how many users like VPN’s? Does making it harder to connect and get email, so people can do their job, make them more productive? No, it usually means they might do less work as they cannot bothered to input a little code, just so they can check email.

Q: My security guy says if we allow users to authenticate from the Internet to Exchange then we will be exposed to an account lockout Denial of Service (DoS).

A: Yes, he’s right. Well, he’s right only because account lockout policies are being used, something we’ve been advising against for years, as they invite account lockout DoS’s. These days, users typically have their SMTP address set to equal their User Principal Name (UPN) so they can log on with (what they think is) their email address. If you know someone’s email address, you know their account logon name. Is that a problem? Well, only if you use account lockout policies rather than using strong password/phrases and monitoring. That’s what we have been telling people for years. But many security people feel that account lockouts are their first line of defense against dictionary attacks trying to steal passwords. In fact, you could also argue that a bad guy trying out passwords and getting locked out now knows the account he’s trying is valid…

Note the common theme in these questions is obviously – “the security guy said…..”. And it’s not that I have it in for security guys generally speaking, but given they are the people who ask these questions, and in my experience some of them think their job is to secure access by preventing access. If you can’t get to it, it must be safe right? Wrong. Their job is to secure the business requirements. Or put another way, to allow their business to do their work, securely. After all, most businesses are not in the business of security. They make pencils. Or cupcakes. Or do something else. And is the job of the security folks working at those companies to help them make pencils, or cupcakes, securely, and not to stop them from doing those things?

So there you go, you have choices. What should you choose? I’m fine with you choosing any of them, but only if you choose the one that meets your needs, based on your comfort with risk, based on your operational level of skill, and based on your budget.

Greg Taylor
Principal Program Manager Lead
Exchange Customer Adoption Team


A reminder on real life performance impact of Windows SNP features

$
0
0

I think we are due for a reminder on best practices related to Windows features collectively known as “Microsoft Scalable Networking Pack (SNP)”, as it seems difficult to counter some of the “tribal knowledge” on the subject. Please also see our previous post on the subject.

Recently we had a customer that called in on the subject and this particular case stressed the importance of the Scalable Networking Pack features. The background of this case was that the customer was Running Exchange 2010 SP2 RU6 on Windows 2008 R2 SP1, and they had multiple physical sites with a stretched DAG. This customer had followed our guidance from Windows 2003 times and disabled all of the relevant options on all of their servers, similar to below:

Receive-Side Scaling State : disabled
Chimney Offload State : disabled
NetDMA State : disabled
Direct Cache Acess (DCA) : disabled
Receive Window Auto-Tuning Level : disabled
Add-On Congestion Control Provider : ctcp
ECN Capability : disabled
RFC 1323 Timestamps : disabled

The current problem was that the customer was trying to add copies of all their databases to a new physical site, so that they could retire an old disaster recovery site. The majority of these databases were around 1 to 1.5TB in size, with some ranging up to 3TB. The customer stated that the databases took five days to reseed, which was unacceptable in his mind, especially since he had to decommission this site in two weeks. After digging into this case a little bit more and referencing this article, we first started by looking at the network drivers. With all latency issues or transport issues overs a WAN or LAN, we should always make sure that the network drivers are updated. Since the majority of the servers in this customer’s environment were virtual servers running the latest version of the virtual software, we switched our focus over to physical machines. When we looked at the physical machines we saw they had a network driver with a publishing date of December 17, 2009.

At this point I recommended to update the network driver to a newer version with at least a driver date of 2012 or newer. We then tested again, and still saw the transfer speeds roughly similar to those before updating the drivers. At that point I asked the customer to change the scalable network pack items from above to:

Receive-Side Scaling State : enabled
Chimney Offload State : automatic
NetDMA State : enabled

(Here is how you change these items.)

The customer changed the SNP features and then rebooted the machines in question. At that time he started to reseed a 2.2TB database across their WAN at around 12pm. The customer sent me an email later that night that stated the database would now take around 12 hours to reseed. The next morning he sent me another email and the logs were copied over before he showed up for work at 7am. This time the reseed took 19 hours to complete compared to 100+ hours with SNP features disabled. Customer stated that he was very happy, and started planning how to upgrade network drivers on all other physical machines in his environment. Once that was done he was going to change RSS, TCP Chimney, and NetDMA to the recommended values on all of his other Windows 2008 R2 SP1 machines.

The following two articles show the current recommendations for the Scalable Networking Pack features:

  1. Here is the document reference above that shows the correct settings for each version
  2. Even thoughthis article specifies SQL, this is still relevant to the operating system that Exchange sits on.

So, what exactly is our point?

Friends don’t let friends run their modern OS-servers with old network drivers and SNP features turned off! As mentioned in our previous blog post on the subject, please make sure that you update network-level drivers first, as many vendors made various fixes in their driver stacks to make sure that SNP features function correctly. The above is just one illustration of issues that incorrect settings in this area can bring to your environment.

David Dockter

Released: Update Rollups for Exchange 2007 & Exchange 2010 and Security Updates for Exchange 2013

$
0
0

Update 8/14/13: Due to an issue with the Exchange 2013 Security Update installation process, the Exchange 2013 updates have been removed from the Download Center. For more information, please see Exchange 2013 Security Update MS13-061 Status Update.

Today, Exchange Servicing released several updates for the Exchange product line to the Download Center:

  • Update Rollup 11 for Exchange Server 2007 SP3
  • Update Rollup 7 for Exchange Server 2010 SP2
  • Update Rollup 2 for Exchange Server 2010 SP3
  • Exchange Server 2013 RTM CU1 MSRC Security bulletin MS13-061
  • Exchange Server 2013 RTM CU2 MSRC Security bulletin MS13-061

Note: Some of the following KB articles may not be available at the time of this article’s publishing.

Exchange 2007 Rollups

The Exchange 2007 SP3 RU11 update contains two fixes in addition to the changes for MS13-061. For more details, including a list of fixes included in this update, see KB 2873746 and the MS13-061 security bulletin. We would like to specifically call out the following fixes which are included in this release:

  • 2688667 W3wp.exe consumes excessive CPU resources on Exchange Client Access servers when users open recurring calendar items in mailboxes by using OWA or EWS
  • 2852663 The last public folder database on Exchange 2007 cannot be removed after migrating to Exchange 2013 

Exchange 2010 Rollups

The Exchange 2010 SP2 RU7 update contains the changes for MS13-061.  For more details, see the MS13-061 security bulletin.

The Exchange 2010 SP3 RU2 update contains fixes for a number of customer-reported and internally found issues, as well as, the changes for MS13-061. For more details, including a list of fixes included in this update, see KB 2866475 and the MS13-061 security bulletin. We would like to specifically call out the following fixes which are included in this release:

  • 2861118 W3wp.exe process for the MSExchangeSyncAppPool application pool crashes in an Exchange Server 2010 SP2 or SP3 environment
  • 2851419 Slow performance in some databases after Exchange Server 2010 is running continuously for at least 23 days
  • 2859596 Event ID 4999 when you use a disclaimer transport rule in an environment that has Update Rollup 1 for Exchange Server 2010 SP3 installed
  • 2873477 All messages are stamped by MRM if a deletion tag in a retention policy is configured in an Exchange Server 2010 environment
  • 2860037 iOS devices cannot synchronize mailboxes in an Exchange Server 2010 environment
  • 2854564 Messaging Records Management 2.0 policy can't be applied in an Exchange Server 2010 environment

Exchange Server 2013

MS13-061 is the first security update released for Exchange Server 2013 utilizing the new servicing model.  MS13-061 is available as a security update for:

Important: If you have previously deployed CU2, you must ensure you are running build 712.24 in order to apply the security update. For more information about build 712.24, please see Now Available: Updated Release of Exchange 2013 RTM CU2.

Ross Smith IV
Principal Program Manager
Exchange Customer Experience

Hybrid Mailbox moves and EMC changes

$
0
0

As customers are upgraded or sign up for the latest version of Office 365 for business, they may notice some changes in their recipient and organization management experience. For instance, we now allow the Organization and Hybrid Migration management experience to be initiated from the Exchange Administration Center (EAC) in Exchange Online. This allows you as the administrator to take advantage of many of the enhancements that were added to the migration and organization management experience. The good news is you can do this without having to upgrade your on-premises Exchange 2010 Hybrid server to Exchange 2013.

Hybrid Mailbox Moves

In hybrid deployments with Exchange Online, you should consider using the EAC to perform your hybrid mailbox moves. Some of the main features that have been added or enhanced are listed in Mailbox Moves in Exchange 2013:

  • Ability to move multiple mailboxes in large batches
  • Email notification during move with reporting
  • Automatic retry and automatic prioritization of moves
  • Option for manual move request finalization, which allows you to review your move before you complete it
  • Periodic incremental syncs to update migration changes

The other benefit to using the EAC in Exchange Online - it's a web-based tool, and you have access to the latest enhancements as they are made available in Exchange Online. The Migration team is continually working to improve the migration experience and resolve issues that we may face along the way.

The Exchange Management Console (EMC) in Exchange 2010 SP3 is still a supported tool for performing hybrid mailbox moves. However, you'd be missing out on the enhancements that were built into the EAC and you could potentially run into some of the limitations that exist in the EMC when connecting to the new service.

Previous EMC move mailbox experience

Most customers with a hybrid deployment are currently running Exchange 2010 in their on-premises environment. Many customers are accustomed to performing their mailbox moves from the EMC on-premises. If there was an issue initiating the move, the customer would be notified with an error message such as the following:

Screenshot: Generic error when performing a remote mailbox move using the EMC in Exchange 2010
Figure 1: A generic error when performing a remote mailbox move using the Exchange Management Console in Exchange 2010 SP3

This would let the administrator know that they needed to start investigating the moves. In many cases, when you get a generic error (such as the error above) or if you wanted to get a more verbose error message you would use PowerShell to connect to Exchange Online and initiate the move, as shown in the following steps:

  1. Connect to Exchange Online Using Remote PowerShell
  2. Initiate the move request:

    $OnPremAdmin=Get-Credential

    When prompted, enter the on-premises administrator credentials.

    New-MoveRequest -Remote -RemoteHostName mail.contoso.com -RemoteCredential $OnPremAdmin -TargetDeliveryDomain “Contoso.mail.onmicrosoft.com”

Current EMC move mailbox experience

With the latest version of Office 365, if you were to use the EMC on-premises to initiate a mailbox move and there was an issue, you would not get ANY errors. Meaning you may think the move was submitted successfully even if it was never submitted to the Mailbox Replication Service (MRS). This could mean no move logs, no moves in the queue, and no record that a move was even attempted! Although using the EMC to move mailboxes to Exchange Online is (still) supported, all of the move features are not available in the EMC. You should really use the EAC in the cloud to get the best and most reliable administration experience for your mailbox moves.

Moving mailboxes to Exchange Online using the EAC

Here's a quick walkthrough of how to move a mailbox from Exchange 2013/2010/2007/2003 in a hybrid environment with the new service.

This is assuming you've successfully completed hybrid configuration and your on-premises Exchange organization happily coexists with your cloud-based Exchange organization. You can use the Microsoft Exchange Server Deployment Assistant (EDA), our web-based tool, to get step-by-step instrucions for configuring a hybrid environment.

  1. Log into https://portal.MicrosoftOnline.com with your tenant administrator credentials

  2. In the top ribbon, click Admin and then select Exchange

    image

  3. Click Migration> click + and then select the appropriate move mailbox option. In this example, we're moving a mailbox to the cloud so we select Migrate to Exchange Online.

    image

  4. On the Select a migration type page, select Remote move migration as the migration type for a hybrid mailbox move

    image

  5. On the Select the users page, select the mailboxes you want to move to the cloud.

    image

  6. On the Enter on-premises account credentials page, provide your on-premises administrator credentials in the domain\user format.

    image

  7. On the Confirm the migration endpoint page, ensure that the on-premises endpoint shown is the CAS with MRS Proxy enabled.

    Note This will be the same endpoint regardless of wehther you're moving a mailbox to the cloud (onboard request) or moving a mailbox from the cloud to your on-premises Exchange organization (offboard request).

    image

  8. Enter a name for the migration batch and initiate the move.

EMC Changes

In Exchange 2010, the EMC has three sections: 1) Organization Configuration 2) Server Configuration and 3) Recipient Configuration. These sections will be visible to administrators as long as they have permissions, via RBAC, to the objects that are in that container.

Before the Service upgrade, if a customer was in a hybrid deployment and connected their Exchange 2010 SP2 EMC to Office 365, they'd see two sections (Organization Configuration and Recipient Configuration in their cloud organization. The Tenant Administrator would not see server configuration settings because they have no control or view into the Server Configuration settings.

Screenshot: Exchange Management Console (EMC) showing the on-premises and cloud organizations
Figure 2: Exchange 2010 SP2 EMC connected to Exchange Online before the service uprade

After the service upgrade, the customers will actually see only one container in the cloud organization in Exchange 2010 SP3 EMC - Recipient Configuration. We removed the Organization Configuration container because we don't support (or test) allowing an Exchange 2010 EMC to control or change organizational settings in a newer version of Exchange. To prevent issues, we've completely removed that container from the view in EMC. Configuration changes to the tenant’s Exchange organizational settings in Exchange Online should be accomplished by connecting to the EAC in Exchange Online.

Screenshot: Exchange 2010 SP3 EMC connected to Exchange Online after the service upgrade
Figure 3: Exchange 2010 SP3 EMC connected to Exchange Online after the service uprade

Timothy Heeney

Released: The new Exchange Server Deployment Assistant

$
0
0

We’ve listened to your feedback for improving the on-premises and hybrid Exchange Server deployment experience, and we’re happy to announce the release of the new, consolidated Deployment Assistant!

The Exchange Server Deployment Assistant now combines all the on-premises and hybrid deployment scenarios from both the Exchange 2013 Deployment Assistant and the Exchange 2010 Deployment Assistant into a single tool. We’ve eliminated the need for the installation of Silverlight and provide guidance for all Exchange Server deployments in a true one-stop shop experience. We’ve also kept the same, convenient question-and-answer format to create a customized, step-by-step checklist with instructions to deploy Exchange 2013 or Exchange 2010.

Starting your Exchange deployment is familiar and convenient, no matter whether you’re deploying Exchange 2013 or Exchange 2010. Just select the one of the three basic deployment tracks: On-premises, Hybrid, or Cloud Only to get started, as shown in Figure 1.

image
Figure 1:
The Exchange Server Deployment Assistant home page

After selecting your basic deployment track, you’ll be able to choose from either an Exchange 2013-based or Exchange 2010-based path for either on-premises or hybrid deployment scenarios. If you selected the Cloud Only scenario, the deployment path is the same for both Exchange 2013 and Exchange 2010 organizations and you’re on your way to getting started with an Exchange Online-only deployment.

image
Figure 2:
Choose either the Exchange 2013 or Exchange 2010 deployment path

After you’ve chosen either the Exchange 2013 or Exchange 2010 deployment path (see Figure 2), you’ll answer a few questions about your deployment needs and you’re off to the races with your customized deployment checklist! For example, see Figure 3, which shows a checklist for an on-premises Exchange 2013 deployment.

image
Figure 3:
Exchange 2013 on-premises deployment path

And, here’s some more good news! If you’ve bookmarked links to the Exchange 2013 and Exchange 2010 Deployment Assistants, there’s no action required to go to the new tool when using your bookmarks. You’ll be automatically redirected to the new tool when using the URL for the previous version of the Deployment Assistant.

We hope you enjoy the convenience of having all the Exchange 2013 and Exchange 2010 deployment scenario guidance in a single tool. We’d love your feedback and comments! Please feel free to leave a comment here, or send an email to edafdbk@microsoft.com directly or via the 'Feedback' link located in the header of every page of the Deployment Assistant.

Happy deploying!

The Deployment Assistant Team

Under The Hood: Exchange ActiveSync Mailbox Log Analysis

$
0
0

One of best troubleshooting tools for Exchange ActiveSync (EAS) is mailbox logging. This logging allows us to see the incoming request sent by the device and the outgoing response from the Exchange server. Exchange ActiveSync Mailbox Logging provides the steps for enabling ActiveSync mailbox logging and breaks down the components of the log. Here we're going to use one of these logs to analyze how a mobile device running an EAS client (for e.g. a Windows Phone) initializes a profile with Exchange and a few standard commands.

Provision

A device must be provisioned before it can synchronize with Exchange. The device sends the Provision command with the device settings contained within the request. The server response includes the security settings based on the ActiveSync mailbox policy associated with the mailbox. It is important to note now that there are two status codes for most ActiveSync requests. The HttpStatus code only provides the IIS response to the request, and a 200 response does not mean the request was successful. The second status code is for the ActiveSync command and varies depending on the command sent by the device. A status code of 1 is most commonly a success.

image

The following example shows the request and response from a Provision command:

image

image

The device sends another Provision command to complete the provisioning process.This request includes the policy key from the previous response. The following example shows the second Provision command sending the PolicyKey.

image

For more detailed analysis of EAS provisioning process & Policies, see Provisioning, Policies, Remote Wipe, and ABQ in Exchange ActiveSync on The Exchange Dev Blog.

FolderSync

Once the device is provisioned, it will send a FolderSync command to obtain the folder hierarchy of the mailbox. If you capture this FolderSync request in the ActiveSync mailbox log, you will have the folder name that correlates to the CollectionId values in future ActiveSync requests. Alternatively, see ActiveSync - Mapping a Collection ID to a Mailbox Folder to determine which folder the CollectionId represents. The following example shows the response from a FolderSync request:

image

Sync

After the EAS client has obtained the folder hierarchy of the mailbox from Exchange, it can begin to populate folders on the device . Windows Phone leverages a hangingSync request to retrieve data from these folder. We should however expect the first Sync request by this device to have an immediate response with new items for one or more folders. It is also important to notice the SyncKey sent by the device in this first request is 0. This is because the folders (have just been created on the device and) currently have no synchronization state. The response for each Sync request will include a new SyncKey value that the subsequent Sync request should send. The following example shows the request and response for a Sync command:

image

image

We will typically either see no status code or a status code of 1 in the response for a Sync request. Any other status code would require further investigation by reviewing the protocol document. A status code of 1 represents a successful Sync request and no status code simply means there were no changes within the heartbeat interval for the request.

Typically you will see items being added to the device in the Sync response. You can see detailed information including the sender and subject if verbose logging has been enabled on the CAS servers. The following example shows a response sending a new item to the Inbox:

image

ItemOperations

There are several uses for the ItemOperations command and one of the most common requests is for downloading an attachment onto the device. The request will contain the FileReference value for the attachment which can be seen in the Sync response if available. The following examples show two responses for the ItemOperations command. We can see the first response is a success and the server sending the attachment. However the second response throws an exception and has a different status code. Lookup this status code in the protocol documentfor more information on the exception.

image

image

Were you able to determine the issue? The exception within the ActiveSync mailbox log for this example does provide enough detail to know the attachment was too large. It is very important to know how to use the protocol document to look up status codes for the various ActiveSync commands.

Calendaring

Now that we have covered the most common command that will be found in the ActiveSync mailbox log (that is the Sync command), it is now time to dig a little bit deeper. One of the most common issues with ActiveSync devices is calendaring. Most ActiveSync users rely on their mobile device to have accurate calendar information so they do not miss an appointment. Calendar items can be added to a mailbox as either an appointment created by the mailbox or a meeting request sent by either the mailbox owner or another organizer.

We are going to review the life of an appointment as we find it within the ActiveSync mailbox log. This appointment will start as a meeting request sent by an organizer within the organization. The following example shows a Sync response where this appointment is first added to the device:

image

image

Here we can see the appointment has a unique ServerId value for this item on the device. We also know that the appointment is currently showing a status of tentative in the BusyStatus. This is the standard placeholder that Exchange creates in the calendar when a new meeting request is received. The following example shows the corresponding meeting request:

image

The complex part of this process begins when the user responds to the appointment on the device. This response results in several requests which include MeetingResponse, SendMail, MoveItems, and Sync commands. We are going to cover each of these steps to see how the commands impact the items on the device and within the mailbox.

image

The MeetingResponse command is in the first ActiveSync command sent by the device to accept, decline, or tentatively accept the meeting. This request does not sent the response to the organizer. The request includes the meeting request item within the Inbox the response is for while the response from the Exchange server also includes the appointment item. The following examples show the request and response for a MeetingResponse command:

image

The SendMail command is the response message sent back to the organizer. The following is an example of the request for a SendMail command:

image

The MoveItems command is sent by the device to move the meeting request item from the Inbox to the Deleted Items folder. The following example shows the request for a MoveItem command:

image

The Sync command is sent by the device to update the calendar item on the mailbox. This Sync request is sending a Change for the appointment to update this status from Tentative to Busy. The following example shows the request sending a change for the BusyStatus:

image

You may also notice another Sync command sent by the device and that the response includes an Add for the Sent Items folder. Here we are getting the meeting acceptance message from the Sent Items and adding in onto the device.

image

Meeting Updates

All that we just covered was the original meeting request being received by the device. That is the origin of the appointment for our example. Next we need to look at how this appointment changes as time moves forward. The next evolution in this appointment’s life is a when the organizer sends an update for a single instance of the series.

A change to a recurring appointment is called an exception and that is exactly what we will see in the ActiveSync mailbox log. The first part of the response shows us that we have a Change for an item and further down within that response we will see the exceptions. The following example shows our appointment receiving a change and the exception includes a new start time:

image

image

Wait. Our appointment has not stopped experiencing life changes. The organizer has decided to cancel an instance of this recurring appointment. The following example once again shows a change for our appointment but this time the exceptions have grown. This example was done intentionally so we can see how difficult it becomes to read these logs when an appointment has a large number of exceptions.

image

image

The good news is these exceptions are sent in the order in which they were made, so the last exception is the most recent. In our example above, the last exception shows this instance of the meeting has been canceled.

The focus has intentionally been on the Calendar item and its changes. However we cannot forget that with each change to the appointment the user also gets an updated meeting request. This means we will also see a Sync request that includes a response adding the meeting request to the Inbox. The following example shows the response adding the updated meeting request:

image

Just like the original meeting request for the series, the user has the ability to accept and decline the changes from the device. If you do not remember the process, don’t hesitate to jump back and take a second look. That is exactly what this article is intended to show.

SendMail

The last topic we are going to cover is sending a message from a Windows Phone device. There are two commands that we may see from an ActiveSync device when a user is sending a message. The Search command will be sent when a user types text into the To field and perform a search against the Global Address List. The following examples show a request and response for a Search command:

image

image

Then the device will send a SendMail command when the user hits the Send icon. Unless an error is encountered during this request there should be an empty response from the Exchange server. The following example shows a request for the SendMail command:

image

Conclusion

At this point you should have some understanding of how Exchange ActiveSync functions and what to look for in the ActiveSync mailbox log. Here are a few reminders:

  • Whenever the device initiates a new item or change, the request from the device will contain this data. Whenever the change is made on the mailbox, the response from the Exchange server will contain the data.
  • Windows Phone uses a hanging Sync command to wait for changes on the mailbox. This request contains a heartbeat interval which determines how long the server should wait before sending a response. A success will return a status code of 1 indicating there are changes. If there are no changes, then no status code is returned.
  • An updated meeting contains all of the exceptions for that appointment and the last exception is the most recent.
  • Accepting a meeting request on an EAS device is a complex process with multiple steps. It is recommended that you review this process if many users use their devices to accept meetings.
  • Current versions of Exchange require a minimum search length of four characters before Exchange will perform the query.
  • The SendMail command does not return a status code unless an error is encountered.

Jim Martin (EXCHANGE)

Analyzing Exchange Transaction Log Generation Statistics

$
0
0

Update 11/5/2013: added a section on firewall rules to try.

Overview

When designing a site resilient Exchange Server solution, one of the required planning tasks is to determine how many transaction logs are generated on an hourly basis. This helps figure out how much bandwidth will be required when replicating database copies between sites, and what the effects will be of adding additional database copies to the solution. If designing an Exchange solution using the Exchange Server Role Requirements Calculator, the percent of logs generated per hour is an optional input field.

Previously, the most common method of collecting this data involved taking captures of the files in each log directory on a scheduled basis (using dir, Get-ChildItem, or CollectLogs.vbs). Although the log number could be extracted by looking at the names of the log files, there was a lot of manual work involved in figuring out the highest the log generation from each capture, and getting rid of duplicate entries. Once cleaned up, the data still had to be analyzed manually using a spreadsheet or a calculator. Trying to gather data across multiple servers and databases further complicated matters.

To improve upon this situation, I decided to write an all-in-one script that could collect transaction log statistics, and analyze them after collection. The script is called GetTransactionLogStats.ps1. It has two modes: Gather and Analyze. Gather mode is designed to be run on an hourly basis, on the top of the hour. When run, it will take a single set of snapshots of the current log generation number for all configured databases. These snapshots will be sent, along with the time the snapshots were taken, to an output file, LogStats.csv. Each subsequent time the script is run in Gather mode, another set of snapshots will be appended to the file. Analyze mode is used to process the snapshots that were taken in Gather mode, and should be run after a sufficient amount of snapshots have been collected (at least 2 weeks of data is recommended). When run, it compares the log generation number in each snapshot to the previous snapshot to determine how many logs were created during that period.

Script Features

Less Data to Collect

Instead of looking at the files within log directories, the script uses Perfmon to get the current log file generation number for a specific database or storage group. This number, along with the time it was obtained, is the only information kept in the output log file, LogStats.csv. The performance counters that are used are as follows:

Exchange 2013:

MSExchangeIS HA Active Database\Current Log Generation Number

Exchange 2007/2010:

MSExchange Database ==> Instances\Log File Current Generation

Note: The counter used for Exchange 2013 only contains the active databases on that server. The counter used for Exchange 2007/2010 contains all databases on that server, including passive copies. To only get data from active databases on an Exchange 2007/2010 server, make sure to manually specify the databases for that server in the TargetServers.txt file.

Multi Server/Database Support

The script takes a simple input file, TargetServers.txt, where each line in the file specifies the server, or server and databases to process. If you want to get statistics for all databases on a server, only the server name is necessary. If you want to only get a subset of databases on a server (for instance if you wanted to omit secondary copies on an Exchange 2007 and 2010 server), then you can specify the server name, followed by each database you want to process.

Built In Analysis Capability

The script has the ability to analyze the output log file, LogStats.csv, which was created when run in Gather mode. It does a number of common calculations for you, but also leaves the original data in case any other calculations need to be done. Output from running in Analyze mode is sent to multiple .CSV files, where one file is created for each database, and one more file is created containing the average statistics for all analyzed databases. The following columns are added to the CSV files:

  • Hour: The hour that log stats are being gathered for. Can be between 0 – 23.
  • TotalLogsCreated: The total number of logs created during that hour for all days present in LogStats.csv.
  • TotalSampleIntervalSeconds: The total number of seconds between each valid pair of samples for that hour. Because the script gathers Perfmon data over the network, the sample interval may not always be exactly one hour.
  • NumberOfSamples: The number of times that the log generation was sampled for the given hour.
  • AverageSample: The average number of logs generated for that hour, regardless of sample interval size. Formula: TotalLogsCreated / NumberOfSamples.
  • PercentDailyUsage: The percent of a full days’ worth of logs that the AverageSample value for that hour accounts for. Formula: (AverageSample / AverageNumberOfLogsPer24Hours) * 100.
  • AverageSamplePer60Minutes: Similar to AverageSample, but adjusts the value like each sample was taken exactly 60 minutes apart. Formula: (TotalLogsCreated / TotalSampleIntervalSeconds) * 3600 * 24.
  • PercentDailyUsagePer60Minutes: Similar to PercentDailyUsage, but adjusts the value like each sample was taken exactly 60 minutes apart. (AverageSamplePer60Minutes / AverageNumberOfLogsPer24Hours) * 100.

Parameters

The script has the following parameters:

  • -Gather: Switch specifying we want to capture current log generations. If this switch is omitted, the -Analyze switch must be used.
  • -Analyze: Switch specifying we want to analyze already captured data. If this switch is omitted, the -Gather switch must be used.
  • -ResetStats: Switch indicating that the output file, LogStats.csv, should be cleared and reset. Only works if combined with –Gather.
  • -WorkingDirectory: The directory containing TargetServers.txt and LogStats.csv. If omitted, the working directory will be the current working directory of PowerShell (not necessarily the directory the script is in).
  • -LogDirectoryOut: The directory to send the output log files from running in Analyze mode to. If omitted, logs will be sent to WorkingDirectory.
  • -MaxSampleIntervalVariance: The maximum number of minutes that the duration between two samples can vary from 60. If we are past this amount, the sample will be discarded. Defaults to a value of 10.
  • -MaxMinutesPastTheHour: How many minutes past the top of the hour a sample can be taken. Samples past this amount will be discarded. Defaults to a value of 15.
  • -MonitoringExchange2013: Whether there are Exchange 2013 servers configured in TargetServers.txt. Defaults to $true. If there are no 2013 servers being monitored, set this to $false to increase performance.

Usage

Run the script in Gather mode, taking a single snapshot of the current log generation of all configured databases:

PS C:\> .\GetTransactionLogStats.ps1 -Gather

Run the script in Gather mode, and indicates that no Exchange 2013 servers are configured in TargetServers.txt:

PS C:\> .\GetTransactionLogStats.ps1 -Gather -MonitoringExchange2013 $false

Run the script in Gather mode, and changes the directory where TargetServers.txt is located, and where LogStats.csv will be written to:

PS C:\> .\GetTransactionLogStats.ps1 -Gather -WorkingDirectory "C:\GetTransactionLogStats" -ResetStats

Run the script in Analyze mode:

PS C:\> .\GetTransactionLogStats.ps1 -Analyze

Run the script in Analyze mode, sending the output files for the analysis to a different directory. Specifies that only sample durations between 55-65 minutes are valid, and that each sample can be taken a maximum of 10 minutes past the hour before being discarded:

PS C:\> .\GetTransactionLogStats.ps1 -Analyze -LogDirectoryOut "C:\GetTransactionLogStats\LogsOut" -MaxSampleIntervalVariance 5 -MaxMinutesPastTheHour 10

Example TargetServers.txt

The following example shows what the TargetServers.txt input file should look like. For the server1 and server3 lines, no databases are specified, which means that all databases on the server will be sampled. For the server2 and server4 lines, we will only sample the specified databases on those servers. Note that no quotes are necessary for databases with spaces in their names.

image

Output File After Running in Gather Mode

When run in Gather mode, the log generation snapshots that are taken are sent to LogStats.csv. The following shows what this file looks like:

image

Output File After Running in Analyze Mode

The following shows the analysis for a single database after running the script in Analyze mode:

image

Notes

By default, the Windows Firewall on an Exchange 2013 server running on Windows Server 2012 does not allow remote Perfmon access. I suspect this is also the case with Exchange 2013 running on Windows Server 2008 R2, but haven’t tested. If either of the below errors are logged, you may need to open the Windows Firewall on these servers to allow access from the computer running the script.

ERROR: Failed to read perfmon counter from server SERVERNAME

ERROR: Failed to get perfmon counters from server SERVERNAME

Update:

After noticing that multiple people were having issues getting this to work through the Windows Firewall, I tried enabling different combinations of built in firewall rules until I could figure out which ones were required. I only tested on an Exchange 2013 server running on Windows Server 2012, but this should apply to other Windows versions as well. The rules I had to enable were:

File and Printer Sharing (NB-Datagram-In)
File and Printer Sharing (NB-Name-In)
File and Printer Sharing (NB-Session-In)

Mike Hendrickson

Do you have a sleepy NIC?

$
0
0

I continue to run into this issue over and over in the field so I wanted people to be aware of this possible problem. In a Database Availability Group (DAG), if your databases are randomly mounting or flipping from one server to another, for no apparent reason (including across datacenters) you may be suffering from your network interface card (NIC) going to sleep. And that’s not a good thing.

Power Management on the NIC

In the power management settings for the NIC on Windows Server, make sure you are not allowing the NIC to go into power save mode. Why is this important? It seems like at least once a month I’ve run into customers who have this power management setting turned on and more than one of them even had it turned on for their replication network. They were seeing some odd behavior - for example, their databases randomly flipping from one DAG node to another for no apparent reason. And yes, they were all on physical machines.

Here are the steps to look at this configuration: use Device Manager to change the power management settings for a network adapter.

To disable all Power Management settings in Device Manager, expand Network Adapters, right-click the adapter > Properties> Power Management, and then clear the Allow the computer to turn off this device to save power check box.

Screenshot: Network adapter properties | Power Management tab
Figure 1: Disable power management for the network adapter from the Power Management tab

Some of your network adapters may not have the Power Management tab available. This is a good thing, as your NIC is not able to go to sleep. This means there is one less item to worry about in your setup!

CAUTION Be careful when you change this setting. If it's enabled and you decide to disable it, you must plan for this modification as it will likely interrupt network traffic. It may seem odd that by just making a seemingly non-impacting change that the NIC will reset itself, but it definitely can. Trust me; I had a customer ‘test’ this during the day by accident… oops!

PowerShell to the rescue

In addition, now that PowerShell is able to be used for just about everything, there is this page that has a PS script available to make this change. There are additional links and related forum threads to review with supplementary information near the bottom of the script download page.

This modifying script will run against all physical adapters to the machines you deploy it to, and you can also modify the script to disable wireless NIC’s. With PS, don’t forget that you can use this script to blast down these changes to all of your Exchange servers with a single step.

GPO and regedit

For those of you that are more comfortable with regedit and creating GPO’s to help control these settings, that option is also available. This page has information on both ‘one off’ fixes that you can download a .reg file and manually deploy, or using GPO Preferences, you can edit the values in a GPO and apply those changes to an Exchange Server OU (Organizational Unit).

The one step to note with the regedit process is which NIC you are working with and how many NIC’s your server has. The registry only knows of the first, second, third, etc. number of NIC’s. Now if you have identical builds between all of your servers, then this option certainly will ensure that all current and any future servers placed into an OU with the GPO applied will adhere to the proper registry settings.

Also don’t forget, you can record all of your changes on a Windows Server 2008R2 or higher OS, by using the Problem Steps Recorder (PSR) tool.

There you have it: if your DAG Databases are randomly becoming active from one server to another with no apparent reason, you may have a sleepy NIC. Please confirm that you have avoided this setting as you build out not only your DAG environment, but all Exchange related servers. Thank you.

Mike O'Neill


Released: Update Rollup 3 For Exchange 2010 Service Pack 3

$
0
0

The Exchange team is announcing today the availability of Update Rollup 3 for Exchange Server 2010 Service Pack 3. Update Rollup 3 is the latest rollup of customer fixes available for Exchange Server 2010. The release contains fixes for customer reported issues and previously released security bulletins. Update Rollup 3 is not considered a security release as it contains no new previously unreleased security bulletins. A complete list of issues resolved in Exchange Server 2010 Service Pack 3 Update Rollup 3 may be found in KB2891587.

Note: The KB article may not be fully available at the time of publishing this post.

The release is now available on the Microsoft Download Center.

The Exchange Team

Under The Hood: Exchange ActiveSync Mailbox Log Analysis – Part 2

$
0
0

The previous post for Exchange ActiveSync mailbox log analysis gave an overview of the various commands a device may send. Now we want to dig just a little bit deeper and provide a way to link items within an EAS mailbox log to the items inside the mailbox.

Unless verbose logging is enabled you do not see the full details of the item (subject, sender, etc.) This leads us to the question: How do you know what item the ActiveSync request/response was for within the mailbox? The next few sections will show you how to correlate an appointment, message, and attachment between the mailbox log and mailbox contents.

Calendar items

The first step is locating the item within the mailbox and pulling the Global Object ID (GOID) property value for the item. We cannot do this using Outlook, so we will need to download MFCMAPI. Launch MFCMAPI, go to the Session menu and select Logon to select your Outlook profile. Open the mailbox and expand the Root Container and Top of Information Store. Right-click on the Calendar and select Open contents table.

image

Find your appointment inside the Calendar table. Then right-click on the tag 0x80000102 and select Edit property. In this example, we will use the appointment with the subject “Blog demo”.

image

Copy this binary value so you will have it available for a search against the mailbox log.

image

The Mailbox Log Parser utility allows you to search and review mailbox logs easily. Here we can use this tool to search for the GOID of the appointment. Launch Mailbox Log Parser, click Import Mailbox Logs to Grid, locate your mailbox log and click Open. Once the log is open, enter the binary value you copied from MFCMAPI into the Search raw log data for strings text box and click Search. The search results will filter the log entries so you only see log entries containing the GOID value of your appointment. Here you will notice the UID value within the mailbox log matches the GOID value from MFCMAPI (click to see full resolution):

image

Review each log entry to determine what action was taken against the appointment. The above image shows a log entry where a Sync request resulted in a change to the appointment. The details for the update can be found within the log entry on the far right. You may also want to consider performing a search using the ServerId value for the appointment found in the log entry. There may be responses that do not contain the UID such as a Delete.

Now let us look at how we can take the calendar item from the mailbox log and find the appointment within the mailbox. For our example we will use the UID value from the mailbox log we used earlier (in image above). We need to open the Calendar contents table using the steps outlined earlier using MFCMAPI. Inside the Calendar table, go to the Table menu and select Set columns.

image

Click OK on the Set Columns window. In the Column set window, click the Add button. In the Property Tag Editor window, enter the Property Tag value 0x80000102 and click OK twice. This will add the UID column to our table view.

image

Sort your Calendar table by this Property tag column you just added and then scroll down until you find the matching UID from the mailbox log. Here you can see we found our appointment once again with the subject “Blog demo”.

image

E-mail message

Launch MFCMAPI, go to the Session menu, and select Logon to select your Outlook profile. Open the mailbox and expand the Root Container and Top of Information Store. Right-click on the folder where the message resides and select Open contents table. This time we want to locate a message within the table. Next, right-click on the tag 0x00710102 and select Edit property. For this example, we will use the message with the subject “RE: Blog message #1”.

image

Copy the binary value and paste it into a tool like Notepad. This value is not as straightforward as the Global Object ID for an appointment. We need to break down this value into a few parts. The following example is from the third message in a conversation thread:

01CEC617632457F0D646F5744F4990165503AB61C52F00000CF610

The value is broken down as follows:

  1. Remove the first byte – 01
  2. The next five bytes (10 characters) represent the Conversation Index for the message or the current system time- CEC6176324
  3. The next 16 bytes (32 characters) represent the Conversation Id for the message or the globally unique identifier (GUID) - 57F0D646F5744F4990165503AB61C52F
  4. The remaining bytes are added to the Conversation Index (only present for additional messages within the thread)

Note: Additional information on tracking conversations can be found here.

Alright, so what does that mean to us? Once again we will use the Mailbox Log Parser tool to search for our item. This time enter the ConversationId value extracted from the previous step into the Search raw log data for strings window. In the results below, you can see we found two messages with this ConversationId value. Remember, this search will return all messages related to the conversation including messages in Sent Items.

image

Analysis of the log entry shows the item being added to the folder on the device.

image

Keep in mind we have two results for this conversation. You need to use the Conversation Index value to locate the exact message in the log.

What about the reverse? Just make note of the ConversationId value from the mailbox log for your message. Then open MFCMAPI to open the content table for the folder where the message resides. Sort the table using the Conversation ID column and search for the ConversationId value from the mailbox log. You should find your message(s) for this conversation.

image

We can see in this example there are two messages within this conversation using the Conversation ID. We would need to examine the property further for each item to obtain the Conversation Index value to locate the exact message.

Attachments

What about those attachment errors you see in the mailbox log? The mailbox log does give us the information we need to locate the attachment inside the mailbox. The following example shows the FileReference value of the attachment is 5%3a10%3a1. This equates to 5:10:1 or attachment 1 for ServerId 5:10.

image

First we have to search the mailbox log for this ServerId to determine the message if we do not already know it. Using the example attachment above, we can see the message being added to the folder:

image

Now we can use the steps from earlier section to locate the message within MFCMAPI using the ConversationId. Once we locate the message, right-click on the message and select Attachments> Display attachment table.

image

We can determine what attachment the ActiveSync mailbox log reference by matching the Num column from the log value. In our example, the attachment referenced was _Read~1.pdf.

image

Conclusion

Each item that is synchronized to and from Exchange contains a unique identifier that we can use to locate the item in either the mailbox or ActiveSync client. Calendar items have a unique Global Object ID and mail items have a ConversationIndex and ConversationId value. Now you can review an Exchange ActiveSync mailbox log with more confidence, knowing that you can associate items within the log with items inside the mailbox.

Jim Martin

Released: Microsoft Security Bulletin MS13-105 for Exchange

$
0
0

Today the Exchange team released security bulletin MS13-105. Updates are being made available for the following versions of Exchange Server:

  • Exchange Server 2007 SP3
  • Exchange Server 2010 SP2
  • Exchange Server 2010 SP3
  • Exchange Server 2013 CU2
  • Exchange Server 2013 CU3

Customers who are not running one of these versions will need to upgrade to an appropriate version in order to receive the update.

Security bulletin MS13-105 contains details about the issues resolved, including download links.

For Exchange Server 2007/2010 customers, the update is being delivered via an Update Rollup per standard practice. Due to the timing of the release of our most recent Update Rollups, the only difference between the previously released Update Rollup and the Security Update Rollup released today is the inclusion of the security updates identified in MS13-105. We did not include updates for any other customer reported issues in these packages to ease their adoption.

For Exchange Server 2013 customers, security updates are always delivered as discrete updates and contain no other updates. Security updates for Exchange 2013 are cumulative in nature based upon a given Cumulative Update. This means customers who are running CU2 who have not deployed MS13-061 can move straight to the MS13-105 update because it will contain both security updates. Customers who are already running MS13-061 on CU2 may install MS13-105 on top of MS13-061 without removing the previous security update. If MS13-061 was previously deployed, Add/Remove Programs will indicate that both updates are installed. If MS13-061 was not previously deployed, only MS13-105 will appear in Add/Remove Programs.

These updates are being made available via Microsoft Update and on the Microsoft Download Center.

Exchange Team

Mysterious mail loop on Edge Transport server: Check your size limits!

$
0
0

I'm a support enginer in CSS. I was working with a customer who reported a mail loop error for a specific domain like contoso.com. This error was only observed in large emails.  Yeah that’s really mysterious until you figure out that the mail loop is due to size restriction on the Send connector.  I thought this was curious enough to share.

Understanding the configuration and root cause of the issue:

I initially thought that this might have been the outcome of the Edge server being configured to use an external DNS server (a DNS server that resolves external hosts). Usually, when the Edge Tranport server is configured to use an external DNS, it resolves the domain name to the public IP addresses (generally pointing to itself, the exernal firewall, or the service provider) instead of a Hub Transport server in the Active Diectory site, causing a mail loop.

On reproducing the issue, I found out that the Edge Transport server was not configured to use an external DNS server. The environment I set up to reproduce the issue looked like the diagra below:

clip_image002

 

Here's what happens in this scenario: When the Edge Transport server receives a 20 MB email from an Internet sender, it accepts it. The Edge Transport server has two connectors that match the address space - one for the address space contoso.com to the Active Directory site and one for the address space *. When making the routing decision based on all available connectors, the one from the Edge to Hub is not considered because of the size restriction (it has 10 MB size limit). The best match is the * connector from Edge to the Internet (Please go over the connector selection algorithm documented in Understanding Message Routing) which has a message size limit of 30 MB.

End result: The message is routed back to the Internet causing the message loop between the Internet and the Edge Server.

Based on whether the Send connector to the Internet is configured to use DNS or a Smart Host to deliver oubound mail, we will get one of the following NDRs:

If using DNS:

#554 5.4.4 SMTPSEND.DNS.MxLoopback; DNS records for this domain are configured in a loop ##

If using a Smart Host:

5.4.6 smtp;554 5.4.6 Hop count exceeded - possible mail loop> #SMTP#

The Solution

This behavior is by design and can be easily rectified by modifying the message size limit on the connector. Based on your requirement, you can choose either of the following options:

  • Set the MaxMessageSize parameter on the Receive Connector (which receives inbound mail from the Internet) to 10MB, so messages from the Interent are restricted to 10 MB.
  • Set the MaxMessageSize on the Send connector from Edge to HUB to 30MB, which will allow you to receive 30 MB messages from external senders.

Mystery solved! Thanks to Arindam Thokder and Scott Landry, who helped me with getting this ready for the blog!

Suresh Kumar (XCON)

Sending common or canned responses from a shared or repository mailbox

$
0
0

As a Microsoft Premier Field Engineer (PFE), I assist companies with Lotus Notes and GroupWise migrations to Exchange/Outlook environments and have found that different applications act differently. One of the common questions is: Can Outlook/Exchange send canned responses from a common mailbox? The answer is yes, but it is done a little differently than other products.

While everyone here is familiar with the different tools and process in this article, I usually find that the ‘leap of faith’ to put them together in this combination is not always made. So here we go, easy steps put together to create an end result worthy of solving a situation that may apply to your environment.

The first question is, in the Exchange/Outlook realm, is it possible to have a common response from a single mailbox (sometimes referred to as a ‘repository’) from multiple people? Yes it is possible in Outlook, but several steps have to be setup for this to occur correctly and consistently.

Let’s say you have an ‘IT Helpdesk’ mailbox, that acts like a repository and you have several people accessing this resource throughout the day. The first step is to remind ourselves where ‘signatures’ are stored in Outlook. Most peoples’ first thought is within the Outlook profile. That’s not correct. The signatures are actually stored as .htm files on a user’s local machine:

By default on Vista\Windows 7\8:

C:\Users\%UserName%\AppData\Roaming\Microsoft\Signatures

By default on Windows XP:

C:\Documents and settings\%UserName%\Application Data\Microsoft\Signatures

Why do I mention %UserName%? That is the wild card entry for the Alias or AD User Account name of whoever is logged onto the computer. You’ll see later that we reference this again.

Another question is, ‘What can you do with an HTML file?’ And of course the better question is, ‘What CAN’T you do with an HTML file?’ The beauty of having these files on the computer is that we can act upon them at any time and Outlook will use the current file. In fact, even with Outlook open, you could add a new file in this location and it will show up in the available signature list, which is one of the more dynamic options available in Outlook. Within this HTML file, you could put all kinds of interesting mechanisms: hyperlinks, formatted text, images, marching ant text, blinking text, all of the cool and possibly annoying functions that a web page allows us to accomplish.

The next step we have to do is to be able to get common files to end users’ desktops. This is accomplished by a simple GPO (Group Policy Object) via AD (Active Directory). Hence, if a file needs to be copied to a desktop, a simple call to a UNC path that acts upon a logon process can get the file to the proper location. Something like:

xCopy \\ServerName\Share\*.* C:\Users\%UserName%\AppData\Roaming\Microsoft\Signatures

This will copy all files in this share location to whoever has the GPO applied to and get it to their desktop every time they log onto any machine they have access to. This also ensures consistency of implementing updated files and ease of deploying new machines. For those of you that thought the signature files were in the Outlook profile, you can now see that you can have signatures setup for an end user before they ever launch Outlook on a newly provisioned client machine. Pretty cool stuff.

You can also set permissions on the network folder share to only allow say a manager to edit the files inside the folder. This would allow only the proper people to have access that impacts many other users. Example, around December the files could mention holiday specific information like: “Thank you for the contact, our staff is cycling through holiday time off and your request may take a little longer than usual to get to.” Or at the beginning of the year: “Welcome to the New Year, we are excited about deploying our new products this year.” Whatever the message is, it can easily be updated and controlled from a central location and be deployed to the proper end users that send out the common responses.

sharedmbx1

Figure 1: Setting share permissions on a folder share.

The next problem we run into is when someone ‘reads’ a message in Outlook, it is marked as ‘read’. Now other mail products allow you to read a message and have it not be marked as read. We don’t do that in Outlook/Exchange. Once a message is read by anyone, you have to go back and mark it as ‘unread’. A training of end users is the only answer here. There are no Outlook settings available to set to change this behavior.

sharedmbx2

Figure 2: Right click a message and select ‘Mark as Unread’ or on the Ribbon, select Unread/Read option.

We’re almost done. The last few steps are back on the client machines. After you’ve trained your end users to simply apply the appropriate signature file and select the correct mailbox account to send from, you have one more training step. You have to inform them of which ‘sent items’ folder to send from. Yep, there’s another issue here, but one that can also be easily resolved.

sharedmbx3

Figure 3: Selecting specific signature and which mailbox account to send from.

By default, when sending a message from within Outlook, the message is recorded from the ‘sent item’ folder of the user sending the item. This is not the ideal behavior when we have a common resource mailbox that others have to view sent items from people who are also acting as a single voice from the identical repository location. This is accomplished by a registry edit on the client side. So once again we go back to our friend AD and edit a GPO for consistency. Remember that all registry settings can be edited via GPO’s. Depending on the version of AD you’re on, you may have to use a logon file that calls to a .reg file, or for 2008 and above DC’s, use GPO Preferences to achieve the same process.

There is no universal fix; each user who can ‘Send As’ another mailbox must perform one of the following on their computer, or deploy using a patch management solution, like System Center Configuration Manager (SCCM) and/or GPO’s.

Outlook 2003 (KB317865 and KB953804)

Install the office2003-KB953803-GLB.exe hotfix (http://support.microsoft.com/kb/953803). There is no reboot required; but, Outlook has to be closed.

1. Click Start, click Run, type regedit, and then click OK.

2. Locate and then click the following registry subkey:

HKEY_CURRENT_USER\Software\Microsoft\Office\11.0\Outlook\Preferences

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type DelegateSentItemsStyle, and then press ENTER.

5. Right-click DelegateSentItemsStyle, and then click Modify.

6. In the Value data box, type 1, and then click OK.

7. Exit Registry Editor.

Outlook 2007 (KB972148 and KB970944)

1. Click Start, click Run, type regedit, and then click OK.

2. Locate and then click the following registry subkey:

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Outlook\Preferences

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type DelegateSentItemsStyle, and then press ENTER.

5. Right-click DelegateSentItemsStyle, and then click Modify.

6. In the Value data box, type 1, and then click OK.

7. Exit Registry Editor.

Outlook 2010 (KB2181579)

1. Click Start, click Run, type regedit, and then click OK.

2. Locate and then click the following registry subkey:

HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Outlook\Preferences

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type DelegateSentItemsStyle, and then press ENTER.

5. Right-click DelegateSentItemsStyle, and then click Modify.

6. In the Value data box, type 1, and then click OK.

7. Exit Registry Editor.

Outlook 2013

1. Click Start, click Run, type regedit, and then click OK.

2. Locate and then click the following registry subkey:

HKEY_CURRENT_USER\Software\Microsoft\Office\15.0\Outlook\Preferences

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type DelegateSentItemsStyle, and then press ENTER.

5. Right-click DelegateSentItemsStyle, and then click Modify.

6. In the Value data box, type 1, and then click OK.

7. Exit Registry Editor.

The other alternative is to have the sender manually move the sent message to the Sent Items folder of the user named as the sender in the email.

Important: After you set the DelegateSentItemsStyle registry value to 1, the functionality is only available when the Microsoft Exchange account is set to Use Cached Exchange Mode. The DelegateSentItemsStyle registry value will not work consistently on an Exchange account that is configured in Online mode.

sharedmbx4

Figure 4: Creating a GPO, using ‘preferences’, to set which ‘sent items’ folder is allowed to be selected.

So there you have it. Sending a consistent response, from a commonly shared mailbox, using signature files, Outlook client regedit, GPO’s, and a UNC share. Now go out and improve your commonly used shared resource mailboxes and present a stronger corporate image at the same time. Thank you and happy improvements.

Mike O'Neill

Preserving Activation Blocks After Performing DAG Member Maintenance

$
0
0

In Exchange 2010, when a database availability group (DAG) member needs service, it can be placed into maintenance mode. Exchange 2010 includes scripts the StartDagServerMaintenance.ps1 and StopDagServerMaintenace.ps1 scripts to place/remove a DAG member from maintenance mode. For a summary of what these scripts do, see this post.

Within a DAG, it is not uncommon to have one or more databases or servers blocked from automatic activation by the system. Some customers configure entire servers to be blocked from activation, some block individual copies, and some do a combination of both, based on their business requirements. Administrators using the maintenance mode scripts will find their configured activation blocks reset to the unblocked. This behavior is not a problem with the scripts; in fact, the scripts are working as designed.

Here is an example of a database copy that has had activation suspended:

[PS] C:\>Get-MailboxDatabaseCopyStatus DAG-DB0\MBX-2 | fl name,activationSuspended

Name                             : DAG-DB0\MBX-2
ActivationSuspended              : True

Here is an example of a server that has activation blocked:

[PS] C:\>Get-MailboxServer MBX-2 | fl name,databasecopyautoactivationpolicy

Name                             : MBX-2
DatabaseCopyAutoActivationPolicy : Blocked

When the administrator executes the stopDagServerMaintenance.ps1 script, these states are reset back to their defaults. Here is an example of the states post StopDagServerMaintenance.ps1:

[PS] C:\Program Files\Microsoft\Exchange Server\V14\Scripts>Get-MailboxDatabaseCopyStatus DAG-DB0\MBX-2 | fl name,activationSuspended

Name                             : DAG-DB0\MBX-2
ActivationSuspended              : False

[PS] C:\Program Files\Microsoft\Exchange Server\V14\Scripts>Get-MailboxServer MBX-2 | fl name,databasecopyautoactivationpolicy

Name                             : MBX-2
DatabaseCopyAutoActivationPolicy : Unrestricted

Although the maintenance scripts behavior is by design, it can lead to undesirable scenarios, such as lagged database copies being activated. Of course, to eliminate these issues, an administrator can record database and server settings before and after maintenance and reconfigure any reset settings as needed.

To help with this, here is a sample script I created that records database and server activation settings into a CSV file which can then be used with the maintenance scripts to adjust the states automatically.

What the script does

When you run the script, it creates two CSV files (on the server you run it on) along with a transcript that contains the results of the command executed. The first CSV file contains all database copies assigned to the server and their activation suspension status. Here's an example:

#TYPE Selected.Microsoft.Exchange.Management.SystemConfigurationTasks.DatabaseCopyStatusEntry
"Name","ActivationSuspended"
"DAG-DB0\DAG-3","True"
"DAG-DB1\DAG-3","True"

The second CSV file contains the database copy auto activation policy of the server. For example:

#TYPE Selected.Microsoft.Exchange.Data.Directory.Management.MailboxServer
"Name","DatabaseCopyAutoActivationPolicy"
"DAG-3","Blocked"

Using maintenanceWrapper.ps1 to start and stop maintenance

Because this scipt is unsigned, you'll need to relax the execution policy on the server to allow for unsigned scripts.

IMPORTANT: Allowing unsigned PowerShell scripts to execute is a security risk. For details, see Running Windows PowerShell Scripts. If this option does not meet your organization's security policy, you can sign the script (requires a code-signing certificate).

This command sets the execution policy on a server to allow unsigned PowerShell scripts to execute:

Set-ExecutionPolicy UNSIGNED

You can use the maintenanceWrapper.ps1 script to start and stop maintenance procedure on a DAG member.

  1. Use this command to start the maintenance procedure:

    maintenanceWrapper.ps1 –server <SERVERNAME> –action START

    The command creates the CSV files containing the original database states and then invokes the StartDagServerMaintenance.ps1 script to place the DAG member in maintenance mode.

  2. After maintenance is completed, you can stop the maintenance procedure using this command:

    maintenanceWrapper.ps1 –server <SERVERNAME> –action STOP

    The command calls the StopDagServerMaintenance.ps1 script to remove the DAG member from maintenance mode and then resets the database and server activation states from the states recorded in the CSV file.

Give the script a try and see if it makes maintenance mode for activation-blocked servers and databases easier for you. I hope you find this useful, and I welcome any and all feedback.

*Special thanks to Scott Schnoll and Abram Jackson for reviewing this content and David Spooner for validating the script.

Tim McMichael

Troubleshooting Rapid Growth in Databases and Transaction Log Files in Exchange Server 2007 and 2010

$
0
0

A few years back, a very detailed blog post was released on Troubleshooting Exchange 2007 Store Log/Database growth issues.

We wanted to revisit this topic with Exchange 2010 in mind. While the troubleshooting steps needed are virtually the same, we thought it would be useful to condense the steps a bit, make a few updates and provide links to a few newer KB articles.

The below list of steps is a walkthrough of an approach that would likely be used when calling Microsoft Support for assistance with this issue. It also provides some insight as to what we are looking for and why. It is not a complete list of every possible troubleshooting step, as some causes are simply not seen quite as much as others.

Another thing to note is that the steps are commonly used when we are seeing “rapid” growth, or unexpected growth in the database file on disk, or the amount of transaction logs getting generated. An example of this is when an Administrator notes a transaction log file drive is close to running out of space, but had several GB free the day before. When looking through historical records kept, the Administrator notes that approx. 2 to 3 GBs of logs have been backed up daily for several months, but we are currently generating 2 to 3 GBs of logs per hour. This is obviously a red flag for the log creation rate. Same principle applies with the database in scenarios where the rapid log growth is associated to new content creation.

In other cases, the database size or transaction log file quantity may increase, but signal other indicators of things going on with the server. For example, if backups have been failing for a few days and the log files are not getting purged, the log file disk will start to fill up and appear to have more logs than usual. In this example, the cause wouldn’t necessarily be rapid log growth, but an indicator that the backups which are responsible for purging the logs are failing and must be resolved. Another example is with the database, where retention settings have been modified or online maintenance has not been completing, therefore, the database will begin to grow on disk and eat up free space. These scenarios and a few others are also discussed in the “Proactive monitoring and mitigation efforts” section of the previously published blog.

It should be noted that in some cases, you may run into a scenario where the database size is expanding rapidly, but you do not experience log growth at a rapid rate. (As with new content creation in rapid log growth, we would expect the database to grow at a rapid rate with the transaction logs.) This is often referred to as database “bloat” or database “space leak”. The steps to troubleshoot this specific issue can be a little more invasive as you can see in some analysis steps listed here (taking databases offline, various kinds of dumps, etc.), and it may be better to utilize support for assistance if a reason for the growth cannot be found.

Once you have established that the rate of growth for the database and transaction log files is abnormal, we would begin troubleshooting the issue by doing the following steps. Note that in some cases the steps can be done out of order, but the below provides general suggested guidance based on our experiences in support.

Step 1

Use Exchange User Monitor (Exmon) server side to determine if a specific user is causing the log growth problems.

  1. Sort on CPU (%) and look at the top 5 users that are consuming the most amount of CPU inside the Store process. Check the Log Bytes column to verify for this log growth for a potential user.
  2. If that does not show a possible user, sort on the Log Bytes column to look for any possible users that could be attributing to the log growth

If it appears that the user in Exmon is a ?, then this is representative of a HUB/Transport related problem generating the logs. Query the message tracking logs using the Message Tracking Log tool in the Exchange Management Consoles Toolbox to check for any large messages that might be running through the system. See #15for a PowerShell script to accomplish the same task.

Step 2

With Exchange 2007 Service Pack 2 Rollup Update 2 and higher, you can use KB972705 to troubleshoot abnormal database or log growth by adding the described registry values. The registry values will monitor RPC activity and log an event if the thresholds are exceeded, with details about the event and the user that caused it. (These registry values are not currently available in Exchange Server 2010)

Check for any excessive ExCDO warning events related to appointments in the application log on the server. (Examples are 8230 or 8264 events). If recurrence meeting events are found, then try to regenerate calendar data server side via a process called POOF.  See http://blogs.msdn.com/stephen_griffin/archive/2007/02/21/poof-your-calender-really.aspx for more information on what this is.

Event Type: Warning
Event Source: EXCDO
Event Category: General
Event ID: 8230
Description: An inconsistency was detected in username@domain.com: /Calendar/<calendar item> .EML. The calendar is being repaired. If other errors occur with this calendar, please view the calendar using Microsoft Outlook Web Access. If a problem persists, please recreate the calendar or the containing mailbox.

Event Type: Warning
Event ID : 8264
Category : General
Source : EXCDO
Type : Warning
Message : The recurring appointment expansion in mailbox <someone's address> has taken too long. The free/busy information for this calendar may be inaccurate. This may be the result of many very old recurring appointments. To correct this, please remove them or change their start date to a more recent date.

Important: If 8230 events are consistently seen on an Exchange server, have the user delete/recreate that appointment to remove any corruption

Step 3

Collect and parse the IIS log files from the CAS servers used by the affected Mailbox Server. You can use Log Parser Studio to easily parse IIS log files. In here, you can look for repeated user account sync attempts and suspicious activity. For example, a user with an abnormally high number of sync attempts and errors would be a red flag. If a user is found and suspected to be a cause for the growth, you can follow the suggestions given in steps 5 and 6.

Once Log Parser Studio is launched, you will see convenient tabs to search per protocol:

image

Some example queries for this issue would be:

image

Step 4

If a suspected user is found via Exmon, the event logs, KB972705, or parsing the IIS log files, then do one of the following:

  • Disable MAPI access to the users mailbox using the following steps (Recommended):
    • Run

      Set-Casmailbox –Identity <Username> –MapiEnabled $False

    • Move the mailbox to another Mailbox Store. Note: This is necessary to disconnect the user from the store due to the Store Mailbox and DSAccess caches. Otherwise you could potentially be waiting for over 2 hours and 15 minutes for this setting to take effect. Moving the mailbox effectively kills the users MAPI session to the server and after the move, the users access to the store via a MAPI enabled client will be disabled.
  • Disable the users AD account temporarily
  • Kill their TCP connection with TCPView
  • Call the client to have them close Outlook or turn of their mobile device in the condition state for immediate relief.

Step 5

If closing the client/devices, or killing their sessions seems to stop the log growth issue, then we need to do the following to see if this is OST or Outlook profile related:

Have the user launch Outlook whileholding down the control key which will prompt if you would like to run Outlook in safe mode. If launching Outlook in safe mode resolves the log growth issue, then concentrate on what add-ins could be attributing to this problem.

For a mobile device, consider a full resync or a new sync profile. Also check for any messages in the drafts folder or outbox on the device. A corrupted meeting or calendar entry is commonly found to be causing the issue with the device as well.

If you can gain access to the users machine, then do one of the following:

1. Launch Outlook to confirm the log file growth issue on the server.

2. If log growth is confirmed, do one of the following:

  • Check users Outbox for any messages.
    • If user is running in Cached mode, set the Outlook client to Work Offline. Doing this will help stop the message being sent in the outbox and sometimes causes the message to NDR.
    • If user is running in Online Mode, then try moving the message to another folder to prevent Outlook or the HUB server from processing the message.
    • After each one of the steps above, check the Exchange server to see if log growth has ceased
  • Call Microsoft Product Support to enable debug logging of the Outlook client to determine possible root cause.

3. Follow the Running Process Explorer instructions in the below article to dump out dlls that are running within the Outlook Process. Name the file username.txt. This helps check for any 3rd party Outlook Add-ins that may be causing the excessive log growth.
970920  Using Process Explorer to List dlls Running Under the Outlook.exe Process
http://support.microsoft.com/kb/970920

4. Check the Sync Issues folder for any errors that might be occurring

Let’s attempt to narrow this down further to see if the problem is truly in the OST or something possibly Outlook Profile related:

  • Run ScanPST against the users OST file to check for possible corruption.
  • With the Outlook client shut down, rename the users OST file to something else and then launch Outlook to recreate a new OST file. If the problem does not occur, we know the problem is within the OST itself.

If renaming the OST causes the problem to recur again, then recreate the users profile to see if this might be profile related.

Step 6

Ask Questions:

  • Is the user using any type of devices besides a mobile device?
  • Question the end user if at all possible to understand what they might have been doing at the time the problem started occurring. It’s possible that a user imported a lot of data from a PST file which could cause log growth server side or there was some other erratic behavior that they were seeing based on a user action.

Step 7

Check to ensure File Level Antivirus exclusions are set correctly for both files and processes per http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx

Step 8

If Exmon and the above methods do not provide the data that is necessary to get root cause, then collect a portion of Store transaction log files (100 would be a good start) during the problem period and parse them following the directions in http://blogs.msdn.com/scottos/archive/2007/11/07/remix-using-powershell-to-parse-ese-transaction-logs.aspx to look for possible patterns such as high pattern counts for IPM.Appointment. This will give you a high level overview if something is looping or a high rate of messages being sent. Note: This tool may or may not provide any benefit depending on the data that is stored in the log files, but sometimes will show data that is MIME encoded that will help with your investigation

Step 9

If nothing is found by parsing the transaction log files, we can check for a rogue, corrupted, and large message in transit:

1. Check current queues against all HUB Transport Servers for stuck or queued messages:

get-exchangeserver | where {$_.IsHubTransportServer -eq "true"} | Get-Queue | where {$_.Deliverytype –eq “MapiDelivery”} | Select-Object Identity, NextHopDomain, Status, MessageCount | export-csv  HubQueues.csv

Review queues for any that are in retry or have a lot of messages queued:

Export out message sizes in MB in all Hub Transport queues to see if any large messages are being sent through the queues:

get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,@{Name="Message Size MB";expression={$_.size.toMB()}} | sort-object -property size –descending | export-csv HubMessages.csv

Export out message sizes in Bytes in all Hub Transport queues:

get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,size | sort-object -property size –descending | export-csv HubMessages.csv

2. Check Users Outbox for any large, looping, or stranded messages that might be affecting overall Log Growth.

get-mailbox -ResultSize Unlimited| Get-MailboxFolderStatistics -folderscope Outbox | Sort-Object Foldersize -Descending | select-object identity,name,foldertype,itemsinfolder,@{Name="FolderSize MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv

Note: This does not get information for users that are running in cached mode.

Step 10

Utilize the MSExchangeIS Client\Jet Log Record Bytes/sec and MSExchangeIS Client\RPC Operations/sec Perfmon counters to see if there is a particular client protocol that may be generating excessive logs. If a particular protocol mechanism if found to be higher than other protocols for a sustained period of time, then possibly shut down the service hosting the protocol. For example, if Exchange Outlook Web Access is the protocol generating potential log growth, then stopping the World Wide Web Service (W3SVC) to confirm that log growth stops. If log growth stops, then collecting IIS logs from the CAS/MBX Exchange servers involved will help provide insight in to what action the user was performing that was causing this occur.

Step 11

Run the following command from the Management shell to export out current user operation rates:

To export to CSV File:

get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| export-csv LogonStats.csv

To view realtime data:

get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| ft

Key things to look for:

In the below example, the Administrator account was storming the testuser account with email.
You will notice that there are 2 users that are active here, one is the Administrator submitting all of the messages and then you will notice that the Windows2000Account references a HUB server referencing an Identity of testuser. The HUB server also has *no* UserName either, so that is a giveaway right there. This can give you a better understanding of what parties are involved in these high rates of operations

UserName : Administrator
Windows2000Account : DOMAIN\Administrator
Identity : /o=First Organization/ou=First Administrative Group/cn=Recipients/cn=Administrator
MessagingOperationCount : 1724
OtherOperationCount : 384
ProgressOperationCount : 0
StreamOperationCount : 0
TableOperationCount : 576
TotalOperationCount : 2684

UserName :
Windows2000Account : DOMAIN\E12-HUB$
Identity : /o= First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=testuser
MessagingOperationCount : 630
OtherOperationCount : 361
ProgressOperationCount : 0
StreamOperationCount : 0
TableOperationCount : 0
TotalOperationCount : 1091

Step 12

Enable Perfmon/Perfwiz logging on the server. Collect data through the problem times and then review for any irregular activities. You can reference Perfwiz for Exchange 2007/2010 data collection here http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx

Step 13

Run ExTRA (Exchange Troubleshooting Assistant) via the Toolbox in the Exchange Management Console to look for any possible Functions (via FCL Logging) that may be consuming Excessive times within the store process. This needs to be launched during the problem period. http://blogs.technet.com/mikelag/archive/2008/08/21/using-extra-to-find-long-running-transactions-inside-store.aspx shows how to use FCL logging only, but it would be best to include Perfmon, Exmon, and FCL logging via this tool to capture the most amount of data. The steps shown are valid for Exchange 2007 & Exchange 2010.

Step 14

Export out Message tracking log data from affected MBX server.

Method 1

Download the ExLogGrowthCollector script and place it on the MBX server that experienced the issue. Run ExLogGrowthCollector.ps1 from the Exchange Management Shell. Enter in the MBX server name that you would like to trace, the Start and End times and click on the Collect Logs button.

image

Note: What this script does is to export out all mail traffic to/from the specified mailbox server across all HUB servers between the times specified. This helps provide insight in to any large or looping messages that might have been sent that could have caused the log growth issue.

Method 2

Copy/Paste the following data in to notepad, save as msgtrackexport.ps1 and then run this on the affected Mailbox Server. Open in Excel for review. This is similar to the GUI version, but requires manual editing to get it to work.

#Export Tracking Log data from affected server specifying Start/End Times
Write-host "Script to export out Mailbox Tracking Log Information"
Write-Host "#####################################################"
Write-Host
$server = Read-Host "Enter Mailbox server Name"
$start = Read-host "Enter start date and time in the format of MM/DD/YYYY hh:mmAM"
$end = Read-host "Enter send date and time in the format of MM/DD/YYYY hh:mmPM"
$fqdn = $(get-exchangeserver $server).fqdn
Write-Host "Writing data out to csv file..... "
Get-ExchangeServer | where {$_.IsHubTransportServer -eq "True" -or $_.name -eq "$server"} | Get-MessageTrackingLog -ResultSize Unlimited -Start $start -End $end  | where {$_.ServerHostname -eq $server -or $_.clienthostname -eq $server -or $_.clienthostname -eq $fqdn} | sort-object totalbytes -Descending | export-csv MsgTrack.csv -NoType
Write-Host "Completed!! You can now open the MsgTrack.csv file in Excel for review"

Method 3

You can also use the Process Tracking Log Tool at http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.aspx to provide some very useful reports.

Step 15

Save off a copy of the application/system logs from the affected server and review them for any events that could attribute to this problem.

Step 16

Enable IIS extended logging for CAS and MB server roles to add the sc-bytes and cs-bytes fields to track large messages being sent via IIS protocols and to also track usage patterns (Additional Details).

Step 17

Get a process dump the store process during the time of the log growth. (Use this as a last measure once all prior activities have been exhausted and prior to calling Microsoft for assistance. These issues are sometimes intermittent, and the quicker you can obtain any data from the server, the better as this will help provide Microsoft with information on what the underlying cause might be.)

  • Download the latest version of Procdump from http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx and extract it to a directory on the Exchange server
  • Open the command prompt and change in to the directory which procdump was extracted in the previous step.
  • Type

    procdump -mp -s 120 -n 2 store.exe d:\DebugData

    This will dump the data to D:\DebugData. Change this to whatever directory has enough space to dump the entire store.exe process twice. Check Task Manager for the store.exe process and how much memory it is currently consuming for a rough estimate of the amount of space that is needed to dump the entire store dump process.
    Important: If procdump is being run against a store that is on a clustered server, then you need to make sure that you set the Exchange Information Store resource to not affect the group. If the entire store dump cannot be written out in 300 seconds, the cluster service will kill the store service ruining any chances of collecting the appropriate data on the server.

Open a case with Microsoft Product Support Services to get this data looked at.

Most current related KB articles

2814847 - Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device

2621266 - An Exchange Server 2010 database store grows unexpectedly large

996191 - Troubleshooting Fast Growing Transaction Logs on Microsoft Exchange 2000 Server and Exchange Server 2003

Kevin Carker
(based on a blog post written by Mike Lagase)


Exchange 2010 Database Availability Groups and Disk Sector Sizes

$
0
0

These days, some customers are deploying Exchange databases and log files on advanced format (4K) drives.  Although these drives support a physical sector size of 4096, many vendors are emulating 512 byte sectors in order to maintain backwards compatibility with application and operating systems.  This is known as 512 byte emulation (512e).  Windows 2008 and Windows 2008 R2 support native 512 byte and 512 byte emulated advanced format drives.  Windows 2012 supports drives of all sector sizes.  The sector size presented to applications and the operating system, and how applications respond, directly affects data integrity and performance.

For more information on sector sizes see the following links:

When deploying an Exchange 2010 Database Availability Group (DAG), the sector sizes of the volumes hosting the databases and log files must be the same across all nodes within the DAG.  This requirement is outlined in Understanding Storage Configuration.

Support requires that all copies of a database reside on the same physical disk type. For example, it is not a supported configuration to host one copy of a given database on a 512-byte sector disk and another copy of that same database on a 512e disk. Also be aware that 4-kilobyte (KB) sector disks are not supported for any version of Microsoft Exchange and 512e disks are not supported for any version of Exchange prior to Exchange Server 2010 SP1.

Recently, we have noted that some customers have experienced issues with log file replication and replay as the result of sector size mismatch.  These issues occur when:

  • Storage drivers are upgraded resulting in the recognized sector size changing.
  • Storage firmware is upgraded resulting in the recognized sector size changing.
  • New storage is presented or existing storage is replaced with drives of a different sector size.

This mismatch can cause one or more database copies in a DAG to fail, as illustrated below. In my example environment, I have a three-member DAG with a single database that resides on a volume labeled Z that is replicated between each member.

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
---- ------ --------- ----------- -------------------- ------------
SectorTest\MBX-1 Mounted 0 0 Healthy
SectorTest\MBX-2 Healthy 0 1 3/19/2013 10:27:50 AM Healthy
SectorTest\MBX-3 Healthy 0 1 3/19/2013 10:27:50 AM Healthy

If I use FSUTIL to query the Z volume on each DAG member, we can see that the volume currently has 512 logical bytes per sector and a 512 physical bytes per sector. Thus, the the volume is currently seen by the operating system as having a native 512 byte sector size.

On MBX-1:

C:\>fsutil fsinfo ntfsinfo z:

NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fb842c
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

On MBX-3:

C:\>fsutil fsinfo ntfsinfo z:

NTFS Volume Serial Number :       0x0ad44aafd44a9d37
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fad281
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0000
Mft Zone End   :                  0x00000000000cc820
RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

Effects of storage changes

But what happens if there is a change in the way storage is seen on MBX-3, so that the volume now reflects a 512e sector size.  This can happen when upgrading storage drivers, upgrading firmware, or presenting new storage that implements advanced format storage.

C:\>fsutil fsinfo ntfsinfo z:

NTFS Volume Serial Number :       0x0ad44aafd44a9d37
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fad2e7
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

When reviewing the database copy status, notice that the copy assigned to MBX-3 has failed.

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
---- ------ --------- ----------- -------------------- ------------
SectorTest\MBX-1 Mounted 0 0 Healthy
SectorTest\MBX-2 Healthy 0 0 3/19/2013 11:13:05 AM Healthy
SectorTest\MBX-3 Failed 0 8 3/19/2013 11:13:05 AM Healthy

The full details of the copy status of MBX-3 can be reviewed to display the detailed error:

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\MBX-3 | fl

RunspaceId                       : 5f4bb58b-39fb-4e3e-b001-f8445890f80a
Identity                         : SectorTest\MBX-3
Name                             : SectorTest\MBX-3
DatabaseName                     : SectorTest
Status                           : Failed
MailboxServer                    : MBX-3
ActiveDatabaseCopy               : mbx-1
ActivationSuspended              : False
ActionInitiator                  : Service
ErrorMessage                     : The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occurred on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.
ErrorEventId                     : 2152
ExtendedErrorInfo                :
SuspendComment                   :
SinglePageRestore                : 0
ContentIndexState                : Healthy
ContentIndexErrorMessage         :
CopyQueueLength                  : 0
ReplayQueueLength                : 7
LatestAvailableLogTime           : 3/19/2013 11:13:05 AM
LastCopyNotificationedLogTime    : 3/19/2013 11:13:05 AM
LastCopiedLogTime                : 3/19/2013 11:13:05 AM
LastInspectedLogTime             : 3/19/2013 11:13:05 AM
LastReplayedLogTime              : 3/19/2013 10:24:24 AM
LastLogGenerated                 : 53
LastLogCopyNotified              : 53
LastLogCopied                    : 53
LastLogInspected                 : 53
LastLogReplayed                  : 46
LogsReplayedSinceInstanceStart   : 0
LogsCopiedSinceInstanceStart     : 0
LatestFullBackupTime             :
LatestIncrementalBackupTime      :
LatestDifferentialBackupTime     :
LatestCopyBackupTime             :
SnapshotBackup                   :
SnapshotLatestFullBackup         :
SnapshotLatestIncrementalBackup  :
SnapshotLatestDifferentialBackup :
SnapshotLatestCopyBackup         :
LogReplayQueueIncreasing         : False
LogCopyQueueIncreasing           : False
OutstandingDumpsterRequests      : {}
OutgoingConnections              :
IncomingLogCopyingNetwork        :
SeedingNetwork                   :
ActiveCopy                       : False

Using the Exchange Server Error Code Look-up tool (ERR.EXE), we can verify the definition of the error code –546.

D:\Utilities\ERR>err -546

# for decimal -546 / hex 0xfffffdde
  JET_errLogSectorSizeMismatch                                   esent98.h
# /* the log file sector size does not match the current
# volume's sector size */
# 1 matches found for "-546"

In addition, the Application event log may contain the following entries:

Log Name:      Application
Source:        MSExchangeRepl
Date:          3/19/2013 11:14:58 AM
Event ID:      2152
Task Category: Service
Level:         Error
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occured on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.

The cause

Why does this issue occur?
Each log file records in the header the sector size of the disk where a log file was created.  For example, this is the header of a log file on MBX-1 with a native 512 byte sector size:

Z:\SectorTest>eseutil /ml E0100000001.log

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode... 
      Base name: E01
      Log file: E0100000001.log
      lGeneration: 1 (0x1)
      Checkpoint: (0x38,FFFF,FFFF)
      creation time: 03/19/2013 09:40:14
      prev gen time: 00/00/1900 00:00:00
      Format LGVersion: (7.3704.16.2)
      Engine LGVersion: (7.3704.16.2)
      Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
      Env SystemPath: z:\SectorTest\
      Env LogFilePath: z:\SectorTest\
     Env Log Sec size: 512 (matches)
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
          (    off,   1227,  61350,  16384,  61350,   2048,   2048,  44204)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off
      Checkpoint at log creation time: (0x1,8,0) 
      Last Lgpos: (0x1,A,0)
Number of database page references:  0
Integrity check passed for log file: E0100000001.log
Operation completed successfully in 0.62 seconds.

The sector size that is chosen is determined through one of two methods:

  • If the log stream is brand new, read the sector size from disk and utilize this sector size.
  • If the log stream already exists, use the sector size of the given log stream.

In theory, since the sector size of disks should not be changing across nodes and the sector size of all disks must match, this should not cause a problem.  In our example, and in some customer environments, these sector sizes are actually changing.  Since most of these databases already exist, the existing sector size of the log stream is utilized, which in turn causes a mismatch between DAG members.

When a mismatch occurs, the issue only prevents the successful use of block mode replication.  It does not affect file mode replication.  Block mode replication was introduced in Exchange 2010 Service Pack 1.  For more information on block mode replication, see New High Availability and Site Resilience Functionality in Exchange 2010 SP1.

Why does this only affect block mode replication?
When a log file is addressed we reference locations within a log file based off a log file position.  The log file position is a combination of the log generation, the sector, and offset within that sector.  For example, in the previous header dump you can see the “Last LGPOS” is (0x1,A,0) – this just happens to be the last log file position within the log.  Let us say we were creating a block for block mode replication within a log file generation 0x1A, sector 8, offset 1 – this would be reflected as an LGPOS of (0x1a,8,1).  When this block is transmitted to a host with an advanced sector size disk, the log position would actually have to be translated.  On an advanced format disk this same log position would be (0x1a,1,1).  As you can see, it could create significant problems if incorrect positions within a log file were written to or read from.

The resolution

How do I go about correcting this condition?
To fix this condition, first ensure that the same sector sizes exist on all disks across all nodes that host Exchange data, and then reset the log stream.

The following steps can show you how to do this with minimal downtime.

  1. Ensure that Exchange 2010 Service Pack 2 or later is installed on all DAG members.

    Note: Exchange 2010 Service Pack 1 and earlier do not support 512e volumes).

  2. Disable block mode replication on all hosts.  This step requires restarting the replication service on each node.  This will temporarily cause all copies to fail on passive nodes when the service is restarted on the active node.  When the service is restarted on the passive node only passive copies on that node will enter a failed state.  Databases that are mounted and client connections are not impacted by this activity.  Block mode replication should remain disabled until all steps have been completed on all DAG members.
    1. Launch registry editor.
    2. Navigate to HKLM\Software\Microsoft\ExchangeServer\V14\Replay\Parameters
    3. Right click in the parameters key and select New–> DWORD
    4. The name for the DWORD is DisableGranularReplication
    5. The value for the DWORD is 1
  3. Restart the Microsoft Exchange Replication service on each member using the Shell: Restart-Service MSExchangeRepl

  4. Validate that all copies of databases across DAG members are healthy at this time:

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/19/2013 12:28:34 PM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/19/2013 12:28:34 PM Healthy

  5. Apply the appropriate hotfix for Windows Server 2008 or Windows Server 2008 R2 and Advanced Format Disks.  Windows Server 2012 does not require a hotfix.

    • Windows 2008 R2:KB 982018 An update that improves the compatibility of Windows 7 and Windows Server 2008 R2 with Advanced Format Disks is available
    • Windows 2008:KB 2553708 A hotfix rollup that improves Windows Vista and Windows Server 2008 compatibility with Advanced Format disks
  6. Repeat the procedure that caused the disk sector size to change.  For example, if the issue arose as a result of upgrading drivers and firmware on a host utilize your maintenance mode procedures to complete the driver and firmware upgrade on all hosts.

    Note: If your installation does not allow for you to use the same sector sizes across all DAG members, then the implementation is not supported.

  7. Utilize FSUTIL to ensure that the sector sizes match across all hosts for the log and database volumes. 

    On MBX-1:

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fac6e6
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       4096

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0040
    Mft Zone End   :                  0x00000000000cc840
    RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

    On MBX-2

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0xfa6a794c6a790723
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fac86f
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       4096

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0040
    Mft Zone End   :                  0x00000000000cc840
    RM Identifier:        5F18A2FC-909E-11E2-8599-00155D006BA2

    On MBX-3

    C:\>fsutil fsinfo ntfsinfo z:

    NTFS Volume Serial Number :       0x0ad44aafd44a9d37
    Version :                         3.1
    Number Sectors :                  0x000000000fdfe7ff
    Total Clusters :                  0x0000000001fbfcff
    Free Clusters  :                  0x0000000001fabfd6
    Total Reserved :                  0x0000000000000000
    Bytes Per Sector  :               512
    Bytes Per Physical Sector :       4096

    Bytes Per Cluster :               4096
    Bytes Per FileRecord Segment    : 1024
    Clusters Per FileRecord Segment : 0
    Mft Valid Data Length :           0x0000000000040000
    Mft Start Lcn  :                  0x00000000000c0000
    Mft2 Start Lcn :                  0x0000000000000002
    Mft Zone Start :                  0x00000000000c0040
    Mft Zone End   :                  0x00000000000cc840
    RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

At this point, the DAG should be stable, and replication should be occurring as expected between databases using file mode. In order to restore block mode replication and fully recognize the new disk sector sizes, the log stream must be reset.

IMPORTANT: Please note the following about resetting the log stream:

  • The log stream must be fully reset on all database copies.
  • All lagged database copies must be replayed to current log.
  • If backups are utilized as a recovery method this will introduce a gap in the log file sequence preventing  a full roll forward recovery from the last backup point.

You can use the following steps to reset the log stream:

  1. Validate the existence of a replay queue:

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy
    SectorTest\MBX-3 Healthy 0 138 3/19/2013 1:34:37 PM Healthy

  2. Set the replay and truncation lag times values to 0 on all database copies. This will ensure that logs replay to current while allowing the databases to remain online. In this example, MBX-3 is a lagged copy database. When the configuration change is detected, log replay will occur allowing the lagged copy to eventually catch up. Note that depending on the replay lag time, this could take several hours before proceeding to next steps.

    [PS] C:\>Set-MailboxDatabaseCopy SectorTest\MBX-3 -ReplayLagTime 0.0:0:0 -TruncationLagTime 0.0:0:0

    Validate that the replay queue has caught up and is near zero.

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:34:37 PM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:34:37 PM Healthy

  3. Dismount the database.

    CAUTION: Dismounting the database will cause a client interruption, which will continue until the database is mounted.

    [PS] C:\>Dismount-Database SectorTest

    Confirm
    Are you sure you want to perform this action?
    Dismounting database "SectorTest". This may result in reduced availability for mailboxes in the database.
    [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y
    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*
    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Dismounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

  4. On each DAG member hosting a database copy, open a command prompt and navigate to the log file directory. Execute eseutil /r ENN to perform a soft recovery. This step is necessary to ensure that all log files are played into all copies.

    Z:\SectorTest>eseutil /r e01

    Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
    Version 14.02
    Copyright (C) Microsoft Corporation. All Rights Reserved.
    Initiating RECOVERY mode...
        Logfile base name: e01
                Log files: <current directory>
             System files: <current directory>
    Performing soft recovery...
                          Restore Status (% complete) 
              0    10   20   30   40   50   60   70   80   90  100
              |----|----|----|----|----|----|----|----|----|----|
              ...................................................
    Operation completed successfully in 0.203 seconds.

  5. On each DAG member hosting a database copy open a command prompt and navigate to the database directory. Execute eseutil /mh <EDB> against the database to dump the header. You must validate that the following information is correct on all database copies:

    • All copies of the database show in clean shutdown.
    • All copies of the database show the same last detach information.
    • All copies of the database show the same last consistent information.

    Here is example output of a full /mh dump followed by a comparison of the data across our three sample copies.

    Z:\SectorTest>eseutil /mh SectorTest.edb

    Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
    Version 14.02
    Copyright (C) Microsoft Corporation. All Rights Reserved.
    Initiating FILE DUMP mode...
             Database: SectorTest.edb
    DATABASE HEADER:
    Checksum Information:
    Expected Checksum: 0x010f4400
      Actual Checksum: 0x010f4400
    Fields:
            File Type: Database
             Checksum: 0x10f4400
       Format ulMagic: 0x89abcdef
       Engine ulMagic: 0x89abcdef
    Format ulVersion: 0x620,17
    Engine ulVersion: 0x620,17
    Created ulVersion: 0x620,17
         DB Signature: Create time:03/19/2013 09:40:15 Rand:11009066 Computer:
             cbDbPage: 32768
               dbtime: 601018 (0x92bba)
    State: Clean Shutdown
         Log Required: 0-0 (0x0-0x0)
        Log Committed: 0-0 (0x0-0x0)
       Log Recovering: 0 (0x0)
      GenMax Creation: 00/00/1900 00:00:00
             Shadowed: Yes
           Last Objid: 3350
         Scrub Dbtime: 0 (0x0)
           Scrub Date: 00/00/1900 00:00:00
         Repair Count: 0
          Repair Date: 00/00/1900 00:00:00
    Old Repair Count: 0
    Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11
          Last Attach: (0x111,9,86)  03/19/2013 13:42:29
    Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11
                 Dbid: 1
        Log Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
           OS Version: (6.1.7601 SP 1 NLS ffffffff.ffffffff)

    Previous Full Backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00

    Previous Incremental Backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00

    Previous Copy Backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00

    Previous Differential Backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00

    Current Full Backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00

    Current Shadow copy backup:
            Log Gen: 0-0 (0x0-0x0)
               Mark: (0x0,0,0)
               Mark: 00/00/1900 00:00:00 

         cpgUpgrade55Format: 0
        cpgUpgradeFreePages: 0
    cpgUpgradeSpaceMapPages: 0 

           ECC Fix Success Count: none
       Old ECC Fix Success Count: none
             ECC Fix Error Count: none
         Old ECC Fix Error Count: none
        Bad Checksum Error Count: none
    Old bad Checksum Error Count: none 

      Last checksum finish Date: 03/19/2013 13:11:36
    Current checksum start Date: 00/00/1900 00:00:00
          Current checksum page: 0

    Operation completed successfully in 0.47 seconds.

    MBX-1:

    State: Clean Shutdown
    Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11
    Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11

    MBX-2:

    State: Clean Shutdown
    Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:12
    Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:12

    MBX-3:

    State: Clean Shutdown
    Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:13
    Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:13

    In this case, the values match across all copies so further steps can be performed.

    If the values do not match across copies for any reason, do not continue and please contact Microsoft support.

  6. Reset the log file generation for the database.

    Note: Use Get-MailboxDatabaseCopyStatus to record database locations and status prior to performing this activity.

    Locate the log file directory for each ACTIVE (DISMOUNTED) database. Remove all log files from this directory first. Failure to remove log files from the ACTIVE (DISMOUNTED) database may result in the Replication service recopying log files, a failure of this procedure, and subsequent need to reseed all database copies.

    IMPORTANT: If log files are located in the same location as the database and catalog data folder, take precautions to not remove the database or the catalog data folder.

    In our example MBX-1 hosts the ACTIVE (DISMOUNTED) copy.

    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Dismounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

    Locate the log file directory for each PASSIVE database. Remove all log files from this directory. Failure to remove all log files could result in this procedure failing, and the need to reseed this or all database copies. If log files are located in the same location as the database and catalog data folder take precautions to not remove the database or the catalog data folder.

    In our example MBX-2 and MBX-3 host the passive database copies.

    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Dismounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/25/2013 5:41:54 AM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/25/2013 5:41:54 AM Healthy

  7. Mount the database using Mount-Database <DBNAME>, and verify it has mounted.

    [PS] C:\>Mount-Database SectorTest
    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 1 3/25/2013 5:57:28 AM Healthy
    SectorTest\MBX-3 Healthy 0 1 3/25/2013 5:57:28 AM Healthy

  8. Suspend and resume all passive database copies.

    Note: The error on suspending the active database copy is expected.

    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Suspend-MailboxDatabaseCopy

    The suspend operation can't proceed because database 'SectorTest' on Exchange Mailbox server 'MBX-1' is the active mailbox database copy.
        + CategoryInfo          : InvalidOperation: (SectorTest\MBX-1:DatabaseCopyIdParameter) [Suspend-MailboxDatabaseCopy], InvalidOperationException
        + FullyQualifiedErrorId : 5083D28B,Microsoft.Exchange.Management.SystemConfigurationTasks.SuspendDatabaseCopy
        + PSComputerName        : mbx-1.exchange.msft

    Note: The error on resuming the active database copy is expected.

    [PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Resume-MailboxDatabaseCopy

    WARNING: The Resume operation won't have an effect on database replication because database 'SectorTest' hosted on server 'MBX-1' is the active mailbox database.

  9. Validate replication health.

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Mounted 0 0 Healthy
    SectorTest\MBX-2 Healthy 0 0 3/19/2013 1:56:12 PM Healthy
    SectorTest\MBX-3 Healthy 0 0 3/19/2013 1:56:12 PM Healthy

  10. Using Set-MailboxDatabaseCopy, reconfigure any replay lag or truncation lag time on the database copy. This example implements a 7 day replay lag time.

    set-mailboxdatabasecopy –identity SectorTest\MBX-3 –replayLagTime 7.0:0:0

  11. Repeat the previous steps for all databases in the DAG including those databases that have a single copy.

    IMPORTANT: DO NOT proceed to the next step until all databases have been reset.

  12. Enable block mode replication. Using registry editor navigate to HKLM \Software\Microsoft\ExchangeServer \V14 \Replay, and then remove the DisableGranularReplication DWORD value.

  13. Restart the replication service on each DAG member.

    Restart-Service MSExchangeREPL

  14. Validate database health using Get-MailboxDatabaseCopyStatus.

    [PS] C:\>Get-MailboxDatabaseCopyStatus *

    Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State
    ---- ------ --------- ----------- -------------------- ------------
    SectorTest\MBX-1 Healthy 0 0 3/19/2013 2:25:56 PM Healthy
    SectorTest\MBX-2 Mounted 0 0 Healthy
    SectorTest\MBX-3 Healthy 0 230 3/19/2013 2:25:56 PM Healthy

  15. Dump the header of a log file and verify that the new sector size is reflected in the log file stream. To do this, open a command prompt and navigate to the log file directory for the database on the active node. Run eseutil /ml against any log within the directory, and verify that the sector size reflects 4096 and (matches).

    Z:\SectorTest>eseutil /ml E0100000001.log

    Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
    Version 14.02
    Copyright (C) Microsoft Corporation. All Rights Reserved.

    Initiating FILE DUMP mode... 
          Base name: E01
          Log file: E0100000001.log
          lGeneration: 1 (0x1)
          Checkpoint: (0x17B,FFFF,FFFF)
          creation time: 03/19/2013 13:56:11
          prev gen time: 00/00/1900 00:00:00
          Format LGVersion: (7.3704.16.2)
          Engine LGVersion: (7.3704.16.2)
          Signature: Create time:03/19/2013 13:56:11 Rand:2996669 Computer:
          Env SystemPath: z:\SectorTest\
          Env LogFilePath: z:\SectorTest\
         Env Log Sec size: 4096 (matches)
          Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
              (    off,   1227,  61350,  16384,  61350,   2048,    256,  44204)
          Using Reserved Log File: false
          Circular Logging Flag (current file): off
          Circular Logging Flag (past files): off
          Checkpoint at log creation time: (0x1,1,0) 
          Last Lgpos: (0x1,2,0)
    Number of database page references:  0
    Integrity check passed for log file: E0100000001.log
    Operation completed successfully in 0.250 seconds.

If the above steps have been completed successfully, and the log file sequence recognizes a 4096 sector size, then this issue has been resolved.

This guidance was validated in the following configurations:

  • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 2
  • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 3
  • Windows 2008 SP2 Enterprise with Exchange 2010 Service Pack 3
  • Windows 2012 Datacenter with Exchange 2010 Service Pack 3

Tim McMichael

Troubleshoot your Exchange 2010 database backup functionality with VSSTester script

$
0
0

Frequently in support, we encounter several backup related calls for Exchange 2010 databases. A sample of common issues we hear from our customers are:

  • “My backup software is not able to take a successful snapshot of the databases”
  • “My backups have been failing for quite a while. I have several thousand log files consuming disk space and I will eventually run out of disk space”
  • “My backup software indicates that the backup is successful but at the end of my backup, logs do not truncate”
  • “The Exchange Writer /VSS writer is not in a stable state (state is listed as ‘Retryable‘, ’Waiting for completion‘ or ’Failed’)”
  • “We suspect that the Volume Shadow Copy Service (VSS) is failing on the server and hence there are no successful backups”

It is critical to understand how backups and log truncation work in Exchange 2010. If you haven't already done so, check out our three-part blog series by Jesse Tedoff on backups and log truncation in Exchange 2010, Everything You Need to Know About Exchange Backups*.

When troubleshooting backups in Exchange 2010 we are interested in two writers– the Exchange Information Store Writer (utilized for active copy backups) and the Exchange Replica Writer (utilized for passive copy backups). The writers are responsible for providing the metadata information for databases to the VSS Requestor (aka the backup software). The VSS Provider is the component that creates and maintains shadow copies. At the end of successful backups, when the Volume Shadow Copy Service signals backup is complete, the writers initiate post-backup steps which include updating the database header and performing log truncation. (For more details, see Exchange VSS Writers on MSDN.)

As explained above, it is the responsibility of the VSS Requestor to get metadata information from Exchange writers and at the end of successful backup, VSS service signals backup complete to the Exchange writers so the writers can perform post-backup operations.

The purpose of this blog is to discuss the VSSTester script, its functionality and how it can help diagnose backup problems.

What does the script to?

The script has two major functions:

  1. Perform Diskshadow backup of a selected Exchange database so we can exercise the VSS framework in the system, so at the end of a successful snapshot, database header is updated and log files are truncated. We will discuss in detail what Diskshadow is and what it does.
  2. The second function of this script is to collect diagnostic data. For backup cases, there is a lot of data that needs to be collected. To get the diagnostic data you may have to manually go to different places in the Exchange server and turn on logging. If that is not done correctly, we will miss getting crucial logs during the time of the issue. The script makes the data collection process much easier.

 

Script requirements

  1. The current version of the script works only on Exchange 2010 servers.
  2. The script needs to be run on the Exchange server that is experiencing backup issues. If you are having issues with passive copy backups, please go to the appropriate node in the DAG and run the script. For example: You may have Database A having copies on Server1, Server2 and Server3. Server1 hosts the active copy of the database. If backups of the active copy have previously failed run the script on Server1. Otherwise run script on whichever of the remaining servers has failed previously when backing up the passive copy.
  3. Please ensure that you have enough space on the drive you save the configuration and output files. Exchange and VSS traces, Diagnostic logs can occupy up to several GBs of drive space depending on the time taken for taking backup. For example: Running the script in a lab environment consumed close to 25MB of drive space a minute.
  4. The script is unsigned. On the server where you run the script you will have to set the execution policy to allow unsigned PowerShell scripts. Please see this for how to do this.

The script can be run on any DAG configuration. You can use this to troubleshoot Mailbox and Public folder database backup issues. Databases and log files can be on regular drives or mount points. Mix and match of the two will also work!

Let us discuss in detail the two main functionalities of the script.

Diskshadow functionality and how the script uses it

What is Diskshadow and why do we utilize it in VSSTester script?

Diskshadow.exe is a command line tool built in to Windows Server 2008 operating system family as well as Windows Server 2012. Diskshadow is an in-box VSS requestor. It is utilized to test the functionality provided by the Volume Shadow Copy Service (VSS). For more details on Diskshadow please visit:

http://technet.microsoft.com/en-us/library/ee221016(v=ws.10).aspx

http://blogs.technet.com/b/josebda/archive/2007/11/30/diskshadow-the-new-in-box-vss-requester-in-windows-server-2008.aspx

The best part about Diskshadow is that it includes a script mode for automating tasks. This feature of Diskshadow is utilized in the VSSTester. The shadow copy done by Diskshadow is a snapshot of the entire volume at a given point in time. This copy is read-only.

More details on how a shadow copy is created, please visit the following link: http://technet.microsoft.com/en-us/library/ee923636(v=ws.10).aspx

During the course of the blog post, I will be mentioning the term “Diskshadow backup”. It is very important to understand that the term “backup” is relative here. Diskshadow uses the VSS service and gets the appropriate writer to be utilized for the snapshot. The writer will provide the metadata information of database /log files to the Diskshadow. After which Diskshadow utilizes the VSS Provider to create a shadow copy.

After a successful shadow copy /snapshot of databases and log files, the VSS Provider signals an end-backup to Exchange writers. To Exchange this looks like a full backup has been performed on the database. The key to understand here is NO data is actually transferred to a device, tape etc. This is only a test! You will see events in the application logs that usually show up when you take a regular backup, but NOdata is actually backed up. Diskshadow has simply run all the backup APIs through the backup process without transferring any data.

The VSS Provider will take a snapshot of all the databases and logs (if present) on the volume. We will be doing a mirrored snapshot of the entire volume at the point in time when Diskshadow was run. Anything that is on the volume will be part of the snapshot. During the Diskshadow backup, we will be utilizing either the Information store writer (for active copy backup) or the Replica Writer (Passive copy backup) to provide the metadata information for the database.

When you use the VSSTester script, it prompts you for a database to be selected to perform the Diskshadow backup. When we take a snapshot of the volume all other databases (if present on the same drive) will be part of the snapshot, but post-backup operations will happen only on the selected database. This is because we will be utilizing either the Information store Writer (Active Copy Backup) or the Replica Writer (Passive copy backup) that is associated with the selected database. DB headers get updated based on VSS Requestor interaction with the Exchange writer that was utilized, which in turn leads to log truncation. Hence the header of the selected database will be only updated and logs will be purged (only for that the selected database) without being backed up.

When would you be interested in utilizing this Diskshadow functionality of the script?

You would be interested to utilize this functionality in almost all scenarios that I discussed at the start of this blog post. In addition to those scenarios another one that is not related to backups sometimes arises:

  • “I had an unexpected high transactional log growth issue in my exchange 2010 environment and now I am on the verge of losing all disk space in the logs directory. I do not have the time to perform a backup to truncate logs and my goal is to safely remove all the log files”

In the scenario mentioned above (and, by the way, if you have that problem, please go here), Exchange administrators would like to avoid causing a service outage by dismounting the database, removing log files and remounting the database. Another downside to manually removing the log files is breaking replication if the database has replicas across Database Availability Group members.

If you are willing to forgo a backup of the log files you can use the Diskshadow functionality of the script to trigger the backup APIs and tell Exchange to truncate the log files. The truncation commands will replicate to the other database copies and purge log files there as well. If successful, the net result is that the database will not go offline for lack of disk space on the log drive, but you will not have the security of retaining those log files for a future restore.

A sample run of the VSSTester script (with Diskshadow functionality)

Let me demonstrate the Diskshadow functionality of the script.

The Script can be downloaded from TechNet gallery here.

The script initializes and gives us the following options.

image

We select the option 1 to test backup using the built-in Diskshadow function.

image

If the path does not exist, the script will create the folder for you.

We gather the server name and verify it is an Exchange 2010 server. The script will check for the VSS writer status on the local machine. If we detect, any of the writers are not in a “Stable” state, the script will exit. You will need to restart the service associated with the writer to get the writers to a stable state (The Replication service for the Replica Writer or the Information Store service for the Exchange Writer).

The script then gets a list of databases present on the local server and displays the database name, if database is mounted or not and what is the server that holds the active copy of the database. You will have to select the number of the database.

Note: If the user does not provide an input, the script will automatically select the last database in the list.

In my case, I selected database mdb5. The number to enter would be 8.

image

The next important check is ensuring that the database’s replicas (if present) are healthy. If we detect that one of the copies is not healthy, the script will exit mentioning that the database copies need to be in healthy status before running the script.

image

The script next detects the location of the database file and log files. We create the Diskshadow configuration file on the fly every time a database is selected. This configuration file is also saved to the location you had specified earlier (in the example screenshots of this blog c:\vsstesterlogs) to save the configuration and output files. In this case the log files are in a mount point and the database file is on a regular volume. The script will add the appropriate volumes to the disk shadow file.

image

The script will then prompt you to provide the drive letters to expose the snapshots. A common question that arises is, do I need to initialize the drive before I specify a drive letter? The answer is no!

You will be specifying a drive letter that is currently not in use, so Diskshadow will create a virtual drive and expose the snapshot. Remember, the virtual drive that exposes the shadow copy is a read-only volume. The shadow copy is a read only copy .If the database and logs are in the same mount point / drive only, one drive letter is required to expose the snapshot, otherwise you will need to provide two different drive letters. One for exposing database snapshot and another for log files.

image

When you select the option to perform the Diskshadow backup, the script will automatically collect Diagnostic logs, ExTRA traces and VSS traces. Also verbose logging is turned on for Diskshadow. Whatever activity the script does is also logged in to transcript log and saved in the output files directory (c:\vsstesterlogs in this example).

image

Note: If you are performing a passive copy backup, ExTRA tracing will also be turned on in the active node. At the end of the script, we turn off ExTRA tracing in the active node and it will be automatically moved to the passive node. The active node ETL will be placed in the logs folder you had specified at the start of the script. .

Now, the main Diskshadow function will execute.

In the screenshots below we have excluded all other writers on the system that are associated with all other databases on the node (that are mounted or be replicas) and we are ONLY utilizing the writer associated with the selected database. This node hosts the passive copy of the database MDB5. Hence, the writer utilized will be associated with the Replication service aka the Microsoft Exchange Replica Writer.

image

image
(please click on above two screenshots to see them)

From the screen shot below, you can see that VSS Provider has taken a successful snapshot of the database and signaled end backup to the replica writer.

image

Now that we performed a successful snapshot of the database and log files, all the logging that was turned on will be turned off. The log files will be consolidated in the logs folder that you specified earlier at the start of the script. The script checks the VSS writer status after the backup is complete.

image

When the snapshot operation is complete, you will be prompted for an option to either remove the snapshot or leave the snapshots exposed in Windows Explorer.

image
(click to view)

I selected the option to remove the snapshot; hence we will be invoking Diskshadow again to delete the snapshot created earlier.

Let us discuss in detail exposing and removing snapshot functionality:

  1. Remove snapshots - The snapshots that were taken earlier (database or log files) will be exposed in the Windows explorer, if the snapshot operation was successful. In this script we expose the snapshots as a drive letter (that you had specified earlier). If you do not want to have a copy of the log files, you may chose this option and the snapshot will be deleted. All the logs that got purged after post-backup will be present in this read only volume and when this volume is removed they will be deleted forever.
  2. Expose Snapshots– You may choose to have the snapshots exposed. Later, if you want to delete the snapshot, please do the following
    • Open Command prompt
    • Type in Diskshadow
    • Delete shadows exposed <volume>

Note: It is highly recommended to take a full backup of the database using your regular backup software after utilizing Diskshadow.

After this, the script collects the application and system logs. The script filters them to cover only the period you started the script to the present. The transcript log is also stopped. The logs will be saved as a text file and saved in the output folder you had specified earlier (c:\vsstesterlogs in this example).

image

The most reliable method to verify log truncation takes place is to get the log sequence before and after the backup. Hence, before running the script I ran eseutil/ml ENN (the log generation prefix associated with database).

image

Post-backup, when I ran the same command, and can see:

image

We can clearly see a difference in the start of the sequence, meaning log truncation has occurred for the database. One more verification that can be done is to check the database header. We can see that the database header got updated to the most recent time, where Diskshadow was run.

image

I ran the script; what have I accomplished?

If the script finished successfully:

  • We were able to successfully test and exercise the underlying VSS framework in the server. Volume Shadow Copy service was able to successfully identify and utilize the Exchange writers in the box
  • The Exchange writers are able to provide the metadata information to the VSS Requestor (Diskshadow)
  • VSS Provider was able to successfully create a snapshot /shadow copy
  • VSS successfully signaled the Exchange writers on backup complete
  • The Exchange writers were able to perform the post snapshot operations which included log truncation.

Let us now look in to the other major functionality of the script.

Enable logging to troubleshoot backup issues

Use this if you do not want to test backup using Diskshadow and you just want to collect diagnostic logs for troubleshooting backup issues.

You may collect the diagnostic logs and have them handy before calling Microsoft Support saving a lot of time in the support incident because you can provide the files at the beginning of the case.

This time we will be selecting option 2 to enable logging.

image

Selecting this option does the majority of the things that the script did earlier, EXCEPT Diskshadow of course!

After checking the writer status, you can select the database to backup. We will be enabling all the logging like before (Diagnostic Logging, ExTRA, VSS tracing). Remember that, even though you would still be selecting one database - diagnostic logging, ExTRA tracing, VSS tracing are not database specific and are turned on at the server level. When you are utilizing the script to troubleshoot backup issues you can select any one database on the server and it will turn on appropriate logging on the server.

After the logging is turned on and traces enabled, you will see:

image
(click to view)

Now you will need to start your regular backup. After the backup completes/fails, you will need to come back to the PowerShell window where you are running the script and use the “ENTER” key to terminate the data collection. The script then disables diagnostic logging and tracing that was turned up earlier. If needed it will copy diagnostic logs from the active node for that database copy as well.

The script will again check for writer status after the backup then collect the application and system logs. It will stop the transcript log as well.

At this point, in order to troubleshoot the issue, you can open a case with Microsoft Support and upload the logs.

I hope this script helps you in better understanding the core concepts in Exchange 2010 backups, thus helping you troubleshoot backup issues! You can utilize Diskshadow to test Volume Shadow Copy Service and also check if the Exchange writers are performing as intended. If Diskshadow completes successfully without any error and you are still experiencing issues with backup software, you may need to contact the backup vendor to further troubleshoot the issue.

Your feedback and comments are most welcome.

Special thanks to Michael Barta for his contribution to the script, Theo Browning and Jesse Tedoff for reviewing the content.

Muralidharan Natarajan

Use Exchange Web Services and PowerShell to Discover and Remove Direct Booking Settings

$
0
0

Update 7/15/2013: we have made a few updates to the below blog post to adjust the instructions for the release of the version 2 of the script.

Prior to Exchange 2007, there were two primary methods of implementing automated resource scheduling – Direct Booking and the AutoAccept Agent(a store event sink released as a web download for Exchange 2003). In Exchange 2007, we changed how automated resource scheduling is implemented. The AutoAccept Agent is no longer supported, and the Direct Booking method, technically an Outlook function, has been replaced with server-side calendar booking function called the Resource Booking Attendant.

Note: There are various terms associated with this new Resource Booking function, such as: Calendar Processing, Automatic Resource Booking, Calendar Attendant Processing, Automated Processing and Resource Booking Assistant. We will be using the “Resource Booking Attendant” nomenclature for this article.

While the Direct Booking method for resource scheduling can indeed work on Exchange Server 2007/2010/2013, we strongly recommend that you disable Direct Booking for resource mailboxes and use the Resource Booking Attendant instead. Specifically, we are referring to the “AutoAccept” Automated Processing feature of the Resource Booking Attendant, which can be enabled for a mailbox after it has been migrated to Exchange 2007 or later and upgraded to a Resource Mailbox.

Note: The published resource mailbox upgrade guidance on TechNet specifies to disable Direct Booking in the resource mailbox while still on Exchange 2003, move the mailbox, and then enable the AutoAccept functionality via the Resource Booking Attendant. This order of steps can introduce an unnecessary amount of time where the resource mailbox may be without automated scheduling capabilities.

We are currently working to update that guidance to reflect moving the mailbox first, and only then proceed with disabling the Direct Booking functionality, after which the AutoAccept functionality via the Resource Booking Attendant can be immediately enabled. This will shorten the duration where the mailbox is without automated resource scheduling capabilities.

This conversion process to resource mailboxes utilizing the Resource Booking Attendant is sometimes an honest oversight or even deliberately ignored when migrating away from Exchange 2003 due to Direct Booking’s ability to continue to work with newer versions of Exchange, even Exchange Online. This will often result in resource mailboxes (or even user mailboxes!) with Direct Booking functionality remaining in place long after Exchange 2003 is ancient history in the environment.

Why not just leave Direct Booking enabled?

There are issues that can arise from leaving Direct Booking enabled, from simple administrative burden scenarios all the way to major calendaring issues. Additionally, Resource Booking Attendant offers advantages over Direct Booking functionality:

  1. Direct Booking capabilities, technically an Outlook function, has been deprecated from the product as of Outlook 2013. It was already on the deprecation list in Outlook 2010 and required a registry modification to reintroduce the functionality.
  2. Direct Booking and Resource Booking Attendant are conflicting technologies, and if simultaneously enabled, unexpected behavior in calendar processing and item consistency can occur.
  3. Outlook Web App (as well as any non-MAPI clients, like Exchange ActiveSync (EAS) devices) cannot use Direct Booking for automated resource scheduling. This is especially relevant for Outlook Web App-only environments where the users do not have Microsoft Outlook as a mail client.
  4. The Resource Booking Attendant AutoAccept functionality is a server-side solution, eliminating the need for client-side logic in order to automatically process meeting requests.

How do I check which mailboxes have Direct Booking Enabled?

How does one validate if Direct Booking settings are enabled on mailboxes in the organization, especially if mailboxes had previously been hosted on Exchange 2003?

Screenshot: Resource Scheduling properties
Figure 1: Checking Direct Booking settings in Microsoft Outlook 2010

Unfortunately, the manual steps involve assigning permissions to all mailboxes, creating MAPI profiles for each mailbox, logging into each mailbox, checking Tools> Options> Calendar> Resource Scheduling, note which of the three Direct Booking checkboxes are checked, click OK/Cancel a few times, log out of mailbox. Whew! That can be a major undertaking even for a small to midsize company that has more than a handful of mailboxes! Having staff perform this type of activity manually can be a costly and tedious endeavor. Once you have discovered which mailboxes have the Direct Booking settings enabled, you would then have to repeat this entire process to disable these settings unless you removed them at the time of discovery.

Having an automated method to discover, track, and even disable Direct Booking settings would be nice right?

Look no further, we have the solution for you!

Using Exchange Web Services (EWS) and PowerShell, we can automate the discovery of Direct Booking settings that are enabled, track the results, and even disable them! We wrote Remove-DirectBooking.ps1, a sample script, to do exactly that and even more to aid in automating this manual effort.

After you've downloaded it, rename the file and remove the .txt extension.

IMPORTANT:  The previously uploaded script had the last line truncated to Stop-Tran (instead of Stop-Transcript). We've uploaded an updated version to TechNet Gallery. If you downloaded the previous version of the script, please download the updated version. Alternatively, you can open the previously downloaded version in Notepad or other text editor and correct the last line to Stop-Transcript.

Let’s break down the major tasks the PowerShell script does:

  1. Uses EWS Application Impersonation to tap into a mailbox (or set of mailboxes) and read the three MAPI properties where the Direct Booking settings are stored. It does this by accessing the localfreebusy item sitting in the NON_IPM_SUBTREE\FreeBusy Data folder, which resides in the root of the Information Store in the mailbox. The three MAPI properties and their equivalent Outlook settings the script looks at are:

    • 0x686d Automatically accept meeting requests and remove canceled meetings
    • 0x686f Automatically decline meeting requests that conflict with an existing appointment or meeting
    • 0x686e Automatically decline recurring meeting requests

    These three properties contain Boolean values mirroring the Resource Scheduling checkboxes found in Outlook (see Figure 1 above).

  2. For each mailbox processed, the script attempts to locate the corresponding free/busy message stored in the ‘Schedule+ Free/Busy’ system Public Folder representing the user.  This item must be updated just like the user’s local mailbox item – the two items must be consistent in their settings. We need to do this because Outlook only considers the settings in the Public Folder free/busy item when a user attempts to Direct Book a resource.  Therefore, it is critical that the script checks for the Public Folder item’s existence and its settings are in sync with the localfreebusy item stored in the mailbox itself.
  3. For mailboxes where Direct Booking settings were detected, it checks for conflicts by determining if the mailbox also has Resource Booking Attendant enabled with AutomateProcessing set to AutoAccept.
  4. Optionally, disables any enabled Direct Booking settings encountered.

    Note: It is important to understand that by default the script runs in a read-only mode. Additional command line switches are available to run the script to disable Direct Booking settings.

  5. Writes a detailed runtime processing log to console and log file.
  6. Creates a simple output text file containing a list of mailboxes that can be later leveraged as an input file to feed the script for disabling the Direct Booking functionality.
  7. Creates a CSV file containing statistics of the list of mailboxes processed with detailed information, such as what was discovered, any errors encountered, and optionally what was disabled. This is useful for performing analysis in the discovery phase and can also be used as another source to create an input file to feed into the script for disabling the Direct Booking functionality.

Example Scenarios

Here are a couple of example scenarios that illustrate how to use the script to discover and remove enabled Direct Booking settings.

Scenario 1

You've recently migrated from Exchange 2003 to Exchange 2010 and would like to disable Direct Booking for your company’s conference room mailboxes as well as any user mailboxes that may have Direct Booking settings enabled. The administrator’s logged in account has Application Impersonation rights and the View-Only RecipientsRBAC role assigned.

  1. On a machine that has the Exchange management tools & the Exchange Web Services API 1.2 or greater installed, open the Exchange Management Shell, navigate to the folder containing the script, and run the script using the following syntax:

    .\Remove-DirectBooking.ps1 –identity * -UseDefaultCredentials

  2. The script will process all mailboxes in the organization with detailed logging sent to the shell on the console. Note, depending the number of mailboxes in the org, this may take some time to complete
  3. When the script completes, open the Remove-DirectBooking_<timestamp>.txt file in Notepad, which will contain list of mailboxes that have Direct Booking enabled:

    Screnshot: The Remove-Directbooking log generated by the script
    Figure 2: Output file containing list of mailboxes with Direct Booking enabled

  4. After reviewing the list, rerun the script with the InputFile parameter and the RemoveDirectBookingswitch:

    .\Remove-DirectBooking.ps1 –InputFile ‘.\Remove-DirectBooking_<timestamp>.txt’ –UseDefaultCredentials -RemoveDirectBooking

  5. The script will process all the mailboxes listed in the input file with detailed logging sent to the shell on the console. Because you specified the RemoveDirectBooking switch, it does not run in read-only mode and disables all currently enabled Direct Booking settings encountered.
  6. When the script completes, you can check the status of the removal operation by checking the Remove-DirectBooking_<timestamp>.csv file. A column called Direct Booking Removed? will record if the removal was successful. You can also check the runtime processing log file RemoveDirectBooking_<timestamp>.logas well.

    image
    Figure 3: Reviewing runtime log file in Excel

Note The Direct Booking Removed? column now shows Yes where applicable, but the three Direct Booking settings columns still show their various values as “Yes”; this is because we record those three values pre-removal. If you were to run the script again in read-only mode against the same input file, those columns would reflect a value of N/A since there would no longer be any Direct Booking settings enabled. The Resource Room?, AutoAccept Enabled?, and Conflict Detected all have a value of N/A regardless because they are not relevant when disabling the Direct Booking settings.

Scenario 2

You're an administrator who's new to an organization. You know that they migrated from Exchange 2003 to Exchange 2007 in the distant past and are currently in the process of implementing Exchange 2010, having already migrated some users to Exchange 2010. You have no idea what resources mailboxes or even user mailboxes may be using Direct Booking and would like to discover who has what Direct Booking settings enabled. You would then like to selectively choose which mailboxes to pilot for Direct Booking removal before taking action on the majority of found mailboxes.

Here's how you would accomplish this using the Remove-DirectBooking.ps1 script:

  1. Obtain a service account that has Application Impersonation rights for all mailboxes in the org.
  2. Ensure service account has at least Exchange View-Only Administrator role (2007) and at least have an RBAC Role Assignment of View Only Recipients (2010/2013).
  3. On a machine that has the Exchange management tools & the Exchange Web Services API 1.2 or greater installed, preferably an Exchange 2010 server, open the Exchange Management Shell, navigate to the folder containing the script, and run the script using the following syntax:

    .\Remove-DirectBooking.ps1 –Identity *

  4. The script will prompt you for the domain credentials of the account you wish to use because no credentials were specified. Enter the service account’s credentials.
  5. The script will process all mailboxes in the organization with detailed logging sent to the shell on the console. Note, depending the number of mailboxes in the org, this may take some time to complete.
  6. When the script completes, open the Remove-DirectBooking_<timestamp>.csv in Excel, which will looks something like:

    image

    Figure 4: Reviewing the Remove-DirectBooking_<timestamp>.csv in Excel

  7. Filter or sort the table by the Direct Booking Enabled? column. This will provide a list that can be scrutinized to determine which mailboxes are to be piloted with Direct Booking removal, such as those that have conflicts with already having the Resource Booking Attendant’s Automated Processing set to AutoAccept (which you can also filter on using the AutoAccept Enabled? column).
  8. Once the list has been reviewed and the targeted mailboxes isolated, simply copy their email addresses into a text file (one address per line), save the text file, and use it as the input source for the running the script to disable the Direct Booking settings:

    .\Remove-DirectBooking.ps1 –InputFile ‘.\’ -RemoveDirectBooking

  9. As before, the script will prompt you for the domain credentials of the account you wish to use. Enter the service account’s credentials.
  10. The script will process all the mailboxes listed in the input file with detailed logging sent to the shell on the console. It will disable all enabled Direct Booking settings encountered.
  11. Use the same validation steps at the end of the previous example to verify the removal was successful.

Script Options and Caveats

Please see the script’s help section (via “get-help .\remove-DirectBooking.ps1 -full”) for full information on all the available parameters. Here are some additional options that may be useful in certain scenarios:

  1. EWSURL switch parameter. By default, the script will attempt to retrieve the EWS URL for each mailbox via AutoDiscover. This is preferred, especially in complex multi-datacenter or hybrid Exchange Online/On-premises environments where different EWS URLs may be in play for any given mailbox depending on where it resides in the org. However, there may be times where one would want to supply an EWS URL manually, such as when AutoDiscover is having “issues”, or the response time for AutoDiscover requests is introducing delays in overall script execution (think very large quantity of number of mailbox identities to churn through) and the EWS URL is the same across the org, etc. In these situations, one can use the EWSURL parameter to feed the script a static EWS URL.
  2. UseDefaultCredentials If the current user is the service account or perhaps simply has both the Impersonation and the necessary Exchange Admin rights per the script’s requirements and they don’t wish to be prompted to type in a credential (another great example is scheduling the script to run as a job for instance), you can use the UseDefaultCredentials to run the script under that security context.
  3. RemoveDirectBooking By default, the script runs in read-only mode. In order to make changes and disable Direct Booking settings on the mailbox, you mus specify the RemoveDirectBooking switch.

The script does have several prerequisites and caveats to ensure proper operation and meaningful results:

  1. Application Impersonation rights and minimum Exchange Admin rights must be used
  2. Exchange Web Services Managed API 1.2 or later must be installed on the machine running the script
  3. Exchange management tools must be installed on the machine running the script
  4. Script must be executed from within the Exchange Management Shell
  5. The Shell session must have the appropriate execution policy to allow the script to be executed (by default, you can't execute unsigned scripts).
  6. AutoDiscover must be configured correctly (unless the EWS URL is entered manually)
  7. Exchange 2003-based mailboxes cannot be targeted due to lack of EWS capabilities
  8. In an Exchange 2010/2013 environment that also has Exchange 2007 mailboxes present, the script should be executed from a machine running Exchange 2010/2013 management tools due to changes in the cmdlets in those versions
  9. Due to limitations in the EWS architecture, the Schedule+ Free/Busy System Folder subfolders must contain a replica on each version of Exchange that users are being processed for; additionally, the user's Exchange mailbox database 'Default public folder database' property must be set to a Public Folder database that is on the same version of Exchange as the mailbox database.

Summary

The discovery and removal of Direct Booking settings can be a tedious and costly process to perform manually, but you can avoid and automate it using current functions and features via PowerShell and EWS in Microsoft Exchange Server 2007, 2010, & 2013. With careful use, the Remove-DirectBooking.ps1 script can be a valuable tool to aid Exchange administrators in maintaining automated resource scheduling capabilities in their Microsoft Exchange environments.

Your feedback and comments are welcome.

Thank you to Brian Day and Nino Bilic for their guidance in content review, and to our customers (you know who you are) for piloting the script.

Seth Brandes& Dan Smith

Using Exchange Web Services to Apply a Personal Tag to a Custom Folder

$
0
0

In Exchange 2010, we introduced Retention Tags, a Messaging Records Management (MRM) feature that allows you to manage email lifecycle. You can use retention policies to retain mailbox data for as long as it’s required to meet business or regulatory requirements, and delete items older than the specified period.

One of the design goals for MRM 2.0 was to simplify administration compared to Managed Folders, the MRM feature introduced in Exchange 2007, and allow users more flexibility. By applying a Personal Tag to a folder, users can have different retention settings apply to items in that folder than the default tag applied to the entire mailbox(known as a Default Policy Tag). Similarly, users can apply a different tag to a subfolder than the one applied to the parent folder. Users can also apply a Personal Tag to individual items, allowing them the freedom to organize messages based on their work habits and preference, rather than forcing them to move messages, based on the retention requirement, to an admin-controlled Managed Folder.

You can still use Managed Folders in Exchange 2010, but they’re not available in Exchange 2013.

For a comparison of Retention Tags with Managed Folders and migration details, see Migrate Managed Folders.

If you like the Managed Folders approach of being able to create a folder in the user’s mailbox and configure a retention setting for that folder, you can use Exchange Web Services (EWS) to accomplish something similar, with some caveats mentioned later in this post. You can write your own code or even a PowerShell script to create a folder in the user’s mailbox and apply a Personal Tag to it. There are scripts available on the interwebs, including some code samples on MSDN to accomplish this. For example:

Note: The above scripts are examples for your reference. They’re not written or tested by the Exchange product group.

But is it supported?

We frequently get questions about whether this is supported by Microsoft. Short answer: Yes. Exchange Web Services (EWS) is a supported and documented API, which allows ISVs and customers to create custom solutions for Exchange.

When using EWS in your code or PowerShell script to apply a Personal Tag to a folder, it’s important to consider the following:

For Developers

  • EWS is meant for developers who can write custom code or scripts to extend Exchange’s functionality. As a developer, you must have a good understanding of the functionality available via the API and what you can do with it using your code/script.
  • Support for EWS API is offered through our Exchange Developer Support channels.

For IT Pros

  • If you’re an IT Pro writing your own code or scripts, you’re a developer too! Above applies to you.
  • If you’re an IT Pro using 3rd-party code or scripts, including the code samples & scripts available on MSDN, TechNet or elsewhere on the interwebs, we recommend that you follow the general best practices for using such code or scripts, including (but not limited to)the following:
    • Do not use code/scripts from untrusted sources in a production environment.
    • Understand what the script or code does. (This is easy for scripts – you can look at the source in a text editor.)
    • Test the script or code thoroughly in a non-production environment, including all command-line options/parameters available in it, before installing or executing it in your production environment.
    • Although it’s easy to change the PowerShell execution policy on your servers to allow unsigned scripts to execute, it’s recommended to allow only signed scripts in production environments. You can easily sign a script if it's unsigned, before running it in a production environment.

So should I do it?

If using EWS to apply a Personal Tag to custom folders helps you meet your business requirements, absolutely! However, do note and consider the following:

  • You’re replicating some of the functionality available via Managed Folders, but it doesn’t turn the folder into a Managed Folder.
  • Remember - it’s a Personal Tag! Users can remove the tag from the folder using Outlook or Outlook Web App.
  • If you have additional Personal Tags available in your environment, users can change the tag on the custom folder.
  • Users can tag individual items with a different Personal Tag. There is no way to enforce inheritance of retention tag if Personal Tags have been provisioned and available to the user.
  • Users can rename or delete custom folders. Unlike Managed Folders, which are protected from changes or deletion by users, custom folders created by users or by admin are just like any other (non-default) folder in the mailbox.

Provisioning custom folders with different retention settings (by applying Personal Tags) may help you meet your organization’s retention requirements. As an IT Pro, make sure you understand the above and follow the best practices.

Bharat Suneja

Ambiguous URLs and their effect on Exchange 2010 to Exchange 2013 Migrations

$
0
0

With the recent releases of Exchange Server 2013 RTM CU1, Exchange 2013 sizing guidance, Exchange 2013 Server Role Requirements Calculator, and the updated Exchange 2013 Deployment Asistant, on-premises customers now have the tools you need to begin designing and performing migrations to Exchange Server 2013. Many of you have introduced Exchange 2013 RTM CU1 into your test environments alongside Exchange 2010 SP3 and/or Exchange 2007 SP3 RU10, and are readying yourselves for the production migrations.

There's one particular Exchange 2010 design choice some customers made that could throw a monkey wrench into your upgrade plans to Exchange 2013, and we want to walk you through how to mitigate it so you can move forward. If you're still in the design or deployment phase of Exchange Server 2010, we recommend you continue reading this article so you can make some intelligent design choices which will benefit you when you migrate to Exchange 2013 or later.

What is the situation we need to look for?

In Exchange 2010, all Outlook clients in the most typical configurations will utilize MAPI/RPC or Outlook Anywhere (RPC over HTTPS) connections to a Client Access Server. The MAPI/RPC clients connect to the CAS Array Object FQDN (also known as the RPC endpoint) for Mailbox access and the HTTPS based clients connect to the Outlook Anywhere hostname (also known as the RPC proxy endpoint) for all Mailbox and Public Folder access. In addition to these primary connections, other HTTPS based workloads such as EAS, ECP, OAB, and EWS may be sharing the same FQDN as Outlook Anywhere. In some environments you may also be sharing the same FQDN with POP/IMAP based clients and using it as an SMTP endpoint for internal mail submissions.

In Exchange 2010, the recommendation was to utilize split DNS and ensure that the CAS Array Object FQDN was only resolvable via DNS by internal clients. External clients should never be able to resolve the CAS Array Object FQDN. This was covered previously in item #4 of Demystifying the CAS Array Object - Part 2. If you put those two design rules together you come to the conclusion your ClientAccessArray FQDN used by the mailbox database RpcClientAccessServer property should have been an internal-only unique FQDN not utilized by any workload besides MAPI/RPC clients.

Take the following chart as an example of what a suggested configuration in a split DNS configuration would have looked like.

FQDNUsed ByInternal DNS resolves toExternal DNS resolves to
mail.contoso.comAll HTTPS WorkloadsInternal Load Balancer IPPerimeter Network Device
outlook.contoso.comMAPI/RPC WorkloadsInternal Load Balancer IPN/A

If your do not utilize split DNS, then a suggested configuration may have been.

FQDNUsed ByDNS resolves to
mail.contoso.comExternal HTTPS WorkloadsPerimeter Network Device
mail-int.contoso.comInternal HTTPS WorkloadsInternal Load Balancer IP
outlook.contoso.comInternal MAPI/RPC WorkloadsInternal Load Balancer IP

In speaking with our Premier Field Engineers and MCS consultants, we learned that some of our customers did not choose to use a unique ClientAccessArrayFQDN. This design choice may manifest itself in one of two ways. The MAPI/RPC and HTTPS workloads may both utilize the mail.contoso.com FQDN internally and externally, or a unique external FQDN of mail.contoso.com is used while internal MAPI/RPC and HTTPS workloads share mail-int.contoso.com. The shared FQDN in either situation is ambiguous because we can't look at it and immediately understand the workload type that's using it. Perhaps we were not clear enough in our original guidance, or customers felt fewer names would help reduce overall design complexity since everything appeared to work with this configuration.

Take a look at the figure below and the FQDNs in use for some of the different workloads. Shown are EWS, ECP, OWA, CAS Array Object, and Outlook Anywhere External Hostname. The yellow arrow specifically points out the CAS Array Object, the value used as the RpcClientAccessServer for Exchange 2010 mailbox databases, and seen in the Server field of an Outlook profile for an Exchange 2010 mailbox.

image
An Exchange 2010 deployment with a single ambiguous URL for all workloads.

Let us pause for a moment to visualize what we have talked about so far. If we were to compare an Exchange 2010 environment using ambiguous URLs to one not using ambiguous URLs, it would look like the following diagrams. Notice the first diagram below uses the same FQDN for Outlook MAPI/RPC based traffic and HTTPS based traffic.

image

If we were to then look at an environment not utilizing ambiguous URLs, we see the clients utilize unique FQDNs for MAPI/RPC based traffic and HTTPS based traffic. In addition, the FQDN utilized for MAPI/RPC based traffic is only resolvable via internal DNS.

image

If your environment does not look like the one above using ambiguous URLs, then you can go hit the coffee shop for a while or play some XBOX 360. Tell your boss we gave the okay. If your environment does look similar to the first example using ambiguous URLs or you are in the planning stages for Exchange 2010, then please read on as we need you to perform some extra steps when migrating to Exchange 2013.

So what’s the big deal? It is functional this way isn’t it?

While this may be working for you today, it certainly will not work tomorrow if you migrate to Exchange 2013. In this scenario where both the MAPI/RPC and HTTP workloads are using the same FQDN you cannot successfully move the FQDN to CAS 2013 without breaking your MAPI/RPC client connectivity entirely. I repeat, your MAPI/RPC clients will start failing to connect via MAPI/RPC once their DNS cache expires after the shared FQDN is moved to CAS 2013. The MAPI/RPC clients will fail to connect because CAS 2013 does not know how to handle direct MAPI/RPC connections as all Windows based Outlook clients utilize MAPI over a RPC over HTTPS connection in Exchange 2013. There is a chance your Outlook clients may successfully fall back to HTTPS only if Outlook Anywhere is currently enabled for Exchange 2010 when the failure to connect via MAPI/RPC takes place, but this article will help with the following.

  1. Ensure you are in full control of what will take place
  2. Ensure you are in full control of when #1 takes place
  3. Ensure you are in a supported server + client configuration
  4. Ensure environments with Outlook Anywhere disabled for Exchange 2010 know their path forward
  5. Help remove the possibility of any clients not automatically falling back to HTTPS
  6. Remove the potentially long delay when Outlook does fail to via MAPI/RPC even though it can resolve the MAPI/RPC URL and then falls back to HTTPS

Shoot… this looks like us. What should we do immediately?

First off, if you are still in the planning stages of Exchange 2010 you need to take our warning to heart and immediately change your design to use a specific internal-only FQDN for MAPI/RPC clients. If you are in the middle of a 2010 deployment using an Ambiguous URL I recommend you change your ClientAccessArray FQDN to a unique name and update the mailbox database RpcClientAccessServer values on all Exchange 2010 mailbox databases accordingly. Fixing this item mid-migration to Exchange 2010 or even in your fully migrated environment will ensure any newly created or manually repaired Outlook profiles are protected, but it will not automatically fix existing Outlook clients with the old value in the server field.

While not necessary as long as you go through our mitigation steps below, any existing Outlook profiles could be manually repaired to reflect the new value. If you are curious why a manual repair is necessary you can refer to items #5 and #6 in Demystifying the CAS Array Object - Part 2. Again, forcing this update is not necessary if you follow our mitigation steps later in this article. However, if you were to choose to update some specific Outlook profiles we suggest you perform those steps in your test environment first to make sure you have the process down correctly.

Additionally as we previously discussed in item #3 of Demystifying the CAS Array Object – Part 1, the ClientAccessArray FQDN is not needed in your SSL certificate as it is not being used for HTTPS based traffic. Because of this, the only thing you would need to do is create a new internal DNS record, update your ClientAccessArray FQDN, and finally update your Exchange 2010 Mailbox Database RpcClientAccessServer values. It bears repeating that you do not have to get a new SSL certificate only to fix an Ambiguous URL situation.

Ok, fixed that… now what about the clients we don’t want to repair manually?

Our suggestion is to implement Outlook Anywhere internally for all users prior to introducing Exchange Server 2013 to the environment.

Many of our customers have already moved to Outlook Anywhere internally for all Windows Outlook clients. In fact, those of you reading this with OA in use internally are good to proceed to the coffee shop or go play XBOX 360 with the other folks if you’d like to.

Now for the rest of you… sit a little closer. Go ahead and fill in, there are plenty of seats in the front row like usual.

In Exchange Server 2013 all Windows Outlook clients operate in Outlook Anywhere mode internally. By following these mitigation steps you will be one step ahead of where you will end up after your migration to Exchange Server 2013 anyways.

If you do not have Outlook Anywhere enabled at all in your environment, please see Enable Outlook Anywhere on TechNet for steps on how to enable it in Exchange 2010. If your company does not wish to provide external access for Outlook Anywhere that is ok. By simply enabling Outlook Anywhere you will not be providing remote access unless you also publish the /rpc virtual directory to the Internet.

It is suggested customers, especially very large ones, consider enabling Kerberos authentication to avoid any potential performance issues you may run into utilizing the default NTLM authentication. Information on how to configure Kerberos Authentication can be found here on TechNet for Exchange Server 2010 and the steps for Exchange Server 2013 are similar which we will have documentation for in the near future. However, please keep in mind Kerberos authentication with Outlook Anywhere is only supported with Windows Vista or later.

By default with Outlook Anywhere enabled in the environment your clients prefer RPC/TCP connections when on Fast Networks as seen below.

image

The trick we use to force Outlook Anywhere to also be used internally is via Autodiscover. Using Autodiscover we can make Windows Outlook clients prefer RPC/HTTPS on both Fast and Slow networks as seen here.

image

The method used to make clients always prefer HTTPS is configuring the OutlookProviderFlags option via the Set-OutlookProvider cmdlet. The following commands are executed from the Exchange 2010 Management Shell.

Set-OutlookProvider EXPR -OutlookProviderFlags:ServerExclusiveConnect

Set-OutlookProvider EXCH -OutlookProviderFlags:ServerExclusiveConnect

If for any reason you need to put the configuration back to its default settings, issue the following commands and clients will no longer prefer HTTP on Fast Networks.

Set-OutlookProvider EXPR -OutlookProviderFlags:None

Set-OutlookProvider EXCH -OutlookProviderFlags:None

You can prepare to introduce Exchange Server 2013 to your environment once all of your Windows Outlook clients are preferring HTTP on both fast and slow networks and are connecting through mail.contoso.com for RPC over HTTPS connections.

There are a small number of things we would like to call out as you plan this migration to enable Outlook Anywhere for all internal clients.

First, your front end infrastructure (CAS 2013, Load Balancer, etc…) must ready to immediately handle the full production load of Windows Outlook clients when you re-point the mail.contoso.com FQDN in DNS.

Second, if your Exchange 2010 Client Access Servers were not scaled for 100% Outlook Anywhere connections then performance should be monitored when OA is enabled and all clients are moved from MAPI/RPC based to HTTPS based workloads. You should be ready to scale out your CAS 2010 infrastructure if necessary to mitigate any possible performance issues.

Lastly, Windows Outlook clients older than Outlook 2007 are not supported going through CAS 2013 even if their mailbox is on an older Exchange version. All Windows Outlook clients going through CAS 2013 have to be at least the minimum versions supported by Exchange 2013. Any unsupported clients, such as Outlook 2003, do not support Autodiscover and would have to be manually with a new MAPI/RPC specific endpoint to assure they continue communicating with Exchange 2010 until the client can be updated and the mailbox migrated to Exchange 2013.

Note: The easiest way to confirm what major/minor version of Outlook you have is to look at the version of OUTLOOK.EXE and EMSMDB32.DLL via Windows Explorer or to run an inventory report through Microsoft System Center Configuration Manager or similar software. The minimum version numbers Exchange Server 2013 supports for on-premises deployments are provided below.

  • Outlook 2007: 12.0.6665.5000 (SP3 + the November 2012 Public Update or any later PU)
  • Outlook 2010: 14.0.6126.5000 (SP1 + the November 2012 Public Update or any later PU)
  • Outlook 2013: 15.0.4420.1017 (RTM or later)

If we were to visualize the mitigation steps from start to end we need to compare it between phases.

First, the upper area of the below diagram depicts the start state of the environment with internal Windows Outlook clients utilizing MAPI/RPC and ambiguous URLs for their HTTPS based workloads. The lower area of the diagram depicts the same environment, but we have now forced Outlook Anywhere to be used by internal Windows Outlook clients. This change has forced all mailbox and public folder access traffic over HTTPS through the mail.contoso.com Outlook Anywhere FQDN.

image

We now have all Windows Outlook clients utilizing Outlook Anywhere internally by levering Autodiscover to force the preference of HTTPS. Now that all Windows Outlook traffic is routed through mail.contoso.com via HTTPS, the ambiguous URL problem has been mitigated. However, you may have other applications integrating with Exchange whom are unable to utilize Outlook Anywhere and/or Autodiscover. These applications will also be affected if you were to update the mail.contoso.com DNS entry to point at Exchange 2013. Before moving onto the second step it may be most efficient to add a HOSTS file entry on the servers hosting these external applications to force resolution of mail.contoso.com to the Layer-7 Load Balancer used by Exchange 2010. This should allow you to temporarily continue routing external application traffic that needs to talk to only Exchange 2010 via MAPI/RPC while you work on updating the applications to be Outlook Anywhere compatible, which they will need to be before they can ever connect to Exchange 2013.

Having dealt with both the Windows Outlook clients and third-party applications whom cannot utilize Outlook Anywhere, we can now move onto the second step. The second step is executed when you are ready to introduce Exchange 2013 to the environment.

The below diagram starts by showing where we finished after executing step one. The lower area of the below diagram shows that we have updated DNS to point the mail.contoso.com entry to the new IP of the new Exchange 2013 load balancer configuration. Because of the HOSTS entry we made our application server continues talking to the old Layer-7 load balancer for its MAPI over RPC/TCP connections. Exchange 2013 CAS will now receive all client traffic and then we proxy traffic for users still on Exchange 2010 back to the Exchange 2010 CAS infrastructure. The redundant CAS was removed from the diagram to simplify the view and simply show traffic flow.

image

In summary, we hope those of you in this unique configuration will be able to smoothly migrate from Exchange 2010 to Exchange 2013 now that you have these mitigation steps. Some of you may identify other potential methods to use and wonder why we are offering only a single mitigation approach. There were many methods investigated, but this mitigation approach came back every time as the most straightforward method to implement, maintain, and support. Given the potential complexity of this change we invite you to ask follow-up questions at the follow Exchange Server Forum where we can often better interact with you than the comments format allows.

Exchange Server Forum: Exchange Server 2013 – Setup, Deployment, Updates, and Migration

Brian Day
Senior Program Manager
Exchange Customer Experience

Viewing all 301 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>