Skip to main content

How to import a large xml file into SQL Server


(Or how to import the StackOverflow database into SQL Server)

Introduction


NB  This process can be generalised to import any large (>2G) xml file into SQL Server.
Some SQL Server training you can find online including that by Brent Ozar uses the StackOverflow database for practice. The tables from it are available online for download in xml format. In the past it was possible to use the scripts found here, https://www.toadworld.com/platforms/sql-server/w/wiki/9466.how-to-import-the-stackoverflow-xml-into-sql-server, to import them but as each xml file is now over 2GB you will get an error like this when you try to execute them:


Brent Ozar, has a link to SODDI.exe, https://github.com/BrentOzarULTD/soddi, which can import the files (I haven’t tried it) but it means downloading and importing eight tables: Badges, Comments, PostHistory, PostLinks, Posts, Tags, Users, and Votes tables which amounts to >30GB of compressed xml increasing to ~200GB when decompressed. What if you only want to import one table?


By using SQLXML 4 and some custom xsd files, it is possible to import one table at a time. If you just want to import the Users table then a quick guide follows. If you want to know how I created the xsd files for all the tables then check out the next section. But if you want just want the links to the downloads, the PowerShell script, DML and xsd files then skip to the bottom.

Import the Users table

1.       Download and install SQLXML4 SP1 from https://www.microsoft.com/en-gb/download/details.aspx?id=30403
2.     Create a database called StackOverflow and a table called dbo.Users on your SQL Server:

CREATE DATABASE [StackOverflow]
GO

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

1.       Download and unzip (7zip) to a convenient location the users table from here, https://archive.org/download/stackexchange/stackoverflow.com-Users.7z to C:\Temp, The 7zip file at the time of writing is 230MB and expands to well over 2GB. After it has extracted, which will take a while, rename it to users.xml.

In the location where you unzipped the above file (C:\Temp) create a text file called users.xsd and paste this in,
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

2.       Now in C:\Temp create a file called import.ps1 and paste these PowerShell commands, editing the ConnectionString value to match your server:

$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Error.log'
$objBL.Execute('C:\Temp\users.xsd','C:\Temp\Users.xml')
$objBL = $null  

Execute the PowerShell script and after 10 minutes or so (depending on the usual things) it should have imported. If not check the error log for help. That's it for the Users table but how did I create the xsd file? Read on for details...

Creating the .xsd file

Probably the trickiest part of this is creating the .xsd file but there are some tricks we can use to help.
This is how we import the badges table. In your new StackOverflow database create the Badges table:
CREATE TABLE [dbo].[Users](
       [Id] [int] NOT NULL,
       [Reputation] [int] NULL,
       [CreationDate] [char](23) NULL,
       [DisplayName] [nvarchar](40) NULL,
       [LastAccessDate] [char](23) NULL,
       [WebsiteUrl] [nvarchar](500) NULL,
       [Location] [nvarchar](100) NULL,
       [Age] [int] NULL,
       [AboutMe] [nvarchar](max) NULL,
       [Views] [int] NULL,
       [UpVotes] [int] NULL,
       [AccountID] [int] NULL,
       [ProfileImageUrl] [nvarchar](500) NULL,
       [DownVotes] [int] NULL,
 CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

1.       Download and extract the badges archive to C:\Temp, https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z

2.       As this is a very large file and notepad/notepad++ won’t open it, use HJSplit.exe, http://www.hjsplit.org/, to break it up into manageable pieces. This may take a while if we split the entire file but we only really need a very small subset of the data in the file. Say 500KB. Start the split process and as soon as the first file is created take a copy of it then stop the split. If you don’t take a copy then when you stop the split all the files it creates will be deleted.


1.       Now download and install Visual Studio 2017 Community Edition from here, https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15,  if you don’t have a copy of Visual Studio available. What we are after is the xsd creating capabilities, I’m sure other applications can do this

2.       Once you have your 500KB file open it in Notepad++ or some such and copy out the first 10 lines or so, but remember to create an ending </badges> on the last line so the xml is properly delimited.
3.       Create an empty Visual Studio solution:


Right-click on Solution Explorer and select New Item…
And choose XML file:

Paste the 11 line xml file you created earlier and paste it into the empty XML file.


Then from the XML menu choose Create Schema. This will magically create a .xsd file from your xml file. Now all we have to do is some extra editing and we’re nearly there.


This is the .xsd file before we edited it:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="badges">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="row">
          <xs:complexType>
            <xs:attribute name="Id" type="xs:unsignedInt" use="required" />
            <xs:attribute name="UserId" type="xs:unsignedShort" use="required" />
            <xs:attribute name="Name" type="xs:string" use="required" />
            <xs:attribute name="Date" type="xs:dateTime" use="required" />
            <xs:attribute name="Class" type="xs:unsignedByte" use="required" />
            <xs:attribute name="TagBased" type="xs:string" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

This is it after:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
 <xsd:element name = "badges" sql:is-constant ="1">
   <xsd:complexType>
     <xsd:sequence>   
       <xsd:element name="badges">     
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name = "row" sql:relation = "badges" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
              <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
              <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
              <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
              <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
              <xsd:attribute name="TagBased" type="xsd:string" sql:field="TagBased" />         
            </xsd:complexType>
          </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
     </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>

So what we did is replace every occurrence of xs with xsd, added mappings to match the column names in the database; and deleted this line:
<?xml version="1.0" encoding="utf-8"?>
Replaced this line
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

With
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">

Changed
<xsd:element maxOccurs="unbounded" name="row">
To
<xsd:element name = "row" sql:relation = "badge" maxOccurs = "unbounded">
Added the line:
<xsd:element name = "badges" sql:is-constant ="1"> after the first.


Etc….

You can work out the rest by comparing the two different listings.

Now save your xsd file, along with our 2GB xml file plus the edited PowerShell script and run it! Voila!

I worked out the DML for the tables by looking at the xsd/xml elements, attributes and datatypes. 

For your convenience, here are the .xsd files and DML for all the tables:


Badges

https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Badges.log'
$objBL.Execute('C:\Temp\Badges.xsd','C:\Temp\Badges.xml')
$objBL = $null  

CREATE TABLE [dbo].[Badges] (
       [Id] [int] IDENTITY(1, 1) NOT NULL
       ,[UserId] [int] NULL
       ,[Name] [nvarchar](50) NULL
       ,[Date] CHAR(23) NULL
       ,Class INT
       ,TagBased bit
       ,CONSTRAINT [PK_Badges] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "badges" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "Badges" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
            <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
            <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
            <xsd:attribute name="TagBased" type="xsd:boolean" sql:field="TagBased" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Comments


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Comments.log'
$objBL.Execute('C:\Temp\Comments.xsd','C:\Temp\Comments.xml')
$objBL = $null 

CREATE TABLE [dbo].[Comments] (
       [Id] [int] NOT NULL
       ,[PostId] [int] NULL
       ,[Score] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "comments" sql:is-constant ="1">   
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Comments" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostHistory


 $objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostHistory.log'
$objBL.Execute('C:\Temp\PostHistory.xsd','C:\Temp\PostHistory.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posthistory] (
       [Id] [int] NOT NULL
       ,[PostHistoryTypeId] [int] NULL
       ,[PostId] [int] NULL
       ,[RevisionGUID] [char](36) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[UserDisplayName] [varchar](11) NULL
       ,PRIMARY KEY CLUSTERED ()
       )
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posthistory" sql:is-constant ="1">
      <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostHistory" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostHistoryTypeId" type="xsd:integer" sql:field="PostHistoryTypeId" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="RevisionGUID" type="xsd:string" sql:field="RevisionGUID" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="UserDisplayName" type="xsd:string" sql:field="UserDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostLinks


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostLinks.log'
$objBL.Execute('C:\Temp\PostLinks.xsd','C:\Temp\PostLinks.xml')
$objBL = $null 


CREATE TABLE [dbo].[PostLinks] (
       [Id] [int] NOT NULL
       ,[CreationDate] [char](23) NULL
       ,[PostId] [int] NULL
       ,[RelatedPostId] [int] NULL
       ,[LinkTypeId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "postlinks" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostLinks" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field ="Id" />
              <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field = "CreationDate" />
              <xsd:attribute name="PostId" type="xsd:integer" sql:field = "PostId" />
              <xsd:attribute name="RelatedPostId" type="xsd:integer" sql:field ="RelatedPostId" />
              <xsd:attribute name="LinkTypeId" type="xsd:integer" sql:field ="LinkTypeId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Posts


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Posts.log'
$objBL.Execute('C:\Temp\Posts.xsd','C:\Temp\Posts.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posts] (
       [Id] [int] NOT NULL
       ,[PostTypeId] [int] NULL
       ,[AcceptedAnswerId] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[Score] [int] NULL
       ,[ViewCount] [int] NULL
       ,[Body] [varchar](max) NULL
       ,[OwnerUserID] [int] NULL
       ,[LastEditorUserId] [int] NULL
       ,[LastEditorDisplayName] [varchar](100) NULL
       ,[LastEditDate] [char](23) NULL
       ,[LastActivityDate] [char](23) NULL
       ,[Title] [varchar](500) NULL
       ,[Tags] [varchar](500) NULL
       ,[AnswerCount] [int] NULL
       ,[CommentCount] [int] NULL
       ,[FavoriteCount] [int] NULL
       ,[CommunityOwnedDate] [char](23) NULL
       ,[ParentID] [int] NULL
       ,[OwnerDisplayName] [varchar](100) NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posts" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Posts" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostTypeId" type="xsd:integer" sql:field="PostTypeId" />
            <xsd:attribute name="AcceptedAnswerId" type="xsd:integer" sql:field="AcceptedAnswerId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="ViewCount" type="xsd:integer" sql:field="ViewCount" />
            <xsd:attribute name="Body" type="xsd:string" sql:field="Body" />
            <xsd:attribute name="OwnerUserId" type="xsd:integer" sql:field="OwnerUserId" />
            <xsd:attribute name="LastEditorUserId" type="xsd:integer" sql:field="LastEditorUserId" />
            <xsd:attribute name="LastEditorDisplayName" type="xsd:string" sql:field="LastEditorDisplayName" />
            <xsd:attribute name="LastEditDate" type="xsd:dateTime" sql:field="LastEditDate" />
            <xsd:attribute name="LastActivityDate" type="xsd:dateTime" sql:field="LastActivityDate" />
            <xsd:attribute name="Title" type="xsd:string" sql:field="Title" />
            <xsd:attribute name="Tags" type="xsd:string" sql:field="Tags" />
            <xsd:attribute name="AnswerCount" type="xsd:integer" sql:field="AnswerCount" />
            <xsd:attribute name="CommentCount" type="xsd:integer" sql:field="CommentCount" />
            <xsd:attribute name="FavoriteCount" type="xsd:integer" sql:field="FavoriteCount" />
            <xsd:attribute name="CommunityOwnedDate" type="xsd:dateTime" sql:field="CommunityOwnedDate" />
            <xsd:attribute name="ParentId" type="xsd:integer" sql:field="ParentId" />
            <xsd:attribute name="OwnerDisplayName" type="xsd:string" sql:field="OwnerDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Tags

https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z



$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Tags.log'
$objBL.Execute('C:\Temp\Tags.xsd','C:\Temp\Tags.xml')
$objBL = $null 

CREATE TABLE [dbo].[Tags] (
       [Id] [int] NOT NULL
       ,[TagName] [nvarchar](100) NULL
       ,[Count] [int] NULL
       ,[ExcerptPostId] [int] NULL
       ,[WikiPostId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "tags" sql:is-constant ="1">
     <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "tags" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:unsignedByte" sql:field="Id" />
            <xsd:attribute name="TagName" type="xsd:string" sql:field="TagName" />
            <xsd:attribute name="Count" type="xsd:unsignedInt" sql:field="Count" />
            <xsd:attribute name="ExcerptPostId" type="xsd:unsignedInt" sql:field="ExcerptPostId" />
            <xsd:attribute name="WikiPostId" type="xsd:unsignedInt" sql:field="WikiPostId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Users


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Users.log'
$objBL.Execute('C:\Temp\Users.xsd','C:\Temp\Users.xml')
$objBL = $null 

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Votes


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'$objBL.ErrorLogFile ='C:\Temp\Votes.log'
$objBL.Execute('C:\Temp\Votes.xsd','C:\Temp\Votes.xml')
objBL = $null 

CREATE TABLE [dbo].[Votes] (
       [Id] [int] NULL
       ,[PostId] [int] NULL
       ,[VoteTypeID] [int] NULL
       ,[CreationDate] [char](23) NULL
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="votes" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Votes" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="VoteTypeId" type="xsd:integer" sql:field="VoteTypeId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Resources used in putting this guide together

Comments

Popular posts from this blog

How to move the Microsoft Assessment and Planning Toolkit (MAP) database to a different drive

The Microsoft Assessment and Planning Toolkit (MAP) is a very useful tool for scanning your network to find instances of SQL Server plus all manner of detailed information about the installed product, OS and hardware it sits on.


<Click image to enbiggen>
There is an issue with it the database it uses to store the data it collects, however. Assuming you don't have an instance called MAPS on your server, the product will install using LocalDB (a cut down version of SQL Server Express) and puts the databases on your C: drive. If you then scan a large network you could easily expand the database to 10GB which may cause issues on a server when that drive is often one of the smallest. However, there is a simple solution: connect to LocalDB using Management Studio, detach the databases, move to a different drive, set permissions on the new location if required and reattach the database. How do you connect to LocalDB? Here you go:

Connect to (localdb)\MAPTOOLKIT


The databases I move…

The Purge SQL Agent Job for MDW takes a long time to complete

I use the dbWarden alerts to inform me if a SQL job is taking longer to complete than normal and I got one this morning:
I noticed by looking at the history this purge job was gradually taking longer and longer to complete each day since I installed it again (see mylast post on this):
RunDate RunTime Duration ExecutionStatus JobName 04/10/2013 02:00:01 00:00:45 Succeded mdw_purge_data_[MDW] 05/10/2013 02:00:00 00:13:27 Succeded mdw_purge_data_[MDW] 06/10/2013 02:00:00 00:17:03 Succeded mdw_purge_data_[MDW] 07/10/2013 02:00:01

Generate scripts to attach multiple databases

There is a handy little "by product", if you like, when running queries which means you can quickly generate scripts to do different things. Below is an example of generating multiple "attach" commands that you can copy from the results pane into the main SSMS window for execution. I have found this very handy in the past:


SELECT 'CREATE DATABASE ['+name+'] ON ( FILENAME = N''F:\MSSQL\Data\'+name+'.mdf'' ), ( FILENAME = N''E:\MSSQL\Log\'+name+'_log.ldf'' )  FOR ATTACH GO ' FROMmaster.dbo.sysdatabases WHEREnamenotin('master','msdb','model','tempdb')
ORDERBYname