Skip to main content

How to import a large xml file into SQL Server


(Or how to import the StackOverflow database into SQL Server)

Introduction


NB  This process can be generalised to import any large (>2G) xml file into SQL Server.
Some SQL Server training you can find online including that by Brent Ozar uses the StackOverflow database for practice. The tables from it are available online for download in xml format. In the past it was possible to use the scripts found here, https://www.toadworld.com/platforms/sql-server/w/wiki/9466.how-to-import-the-stackoverflow-xml-into-sql-server, to import them but as each xml file is now over 2GB you will get an error like this when you try to execute them:


Brent Ozar, has a link to SODDI.exe, https://github.com/BrentOzarULTD/soddi, which can import the files (I haven’t tried it) but it means downloading and importing eight tables: Badges, Comments, PostHistory, PostLinks, Posts, Tags, Users, and Votes tables which amounts to >30GB of compressed xml increasing to ~200GB when decompressed. What if you only want to import one table?


By using SQLXML 4 and some custom xsd files, it is possible to import one table at a time. If you just want to import the Users table then a quick guide follows. If you want to know how I created the xsd files for all the tables then check out the next section. But if you want just want the links to the downloads, the PowerShell script, DML and xsd files then skip to the bottom.

Import the Users table

1.       Download and install SQLXML4 SP1 from https://www.microsoft.com/en-gb/download/details.aspx?id=30403
2.     Create a database called StackOverflow and a table called dbo.Users on your SQL Server:

CREATE DATABASE [StackOverflow]
GO

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

1.       Download and unzip (7zip) to a convenient location the users table from here, https://archive.org/download/stackexchange/stackoverflow.com-Users.7z to C:\Temp, The 7zip file at the time of writing is 230MB and expands to well over 2GB. After it has extracted, which will take a while, rename it to users.xml.

In the location where you unzipped the above file (C:\Temp) create a text file called users.xsd and paste this in,
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

2.       Now in C:\Temp create a file called import.ps1 and paste these PowerShell commands, editing the ConnectionString value to match your server:

$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Error.log'
$objBL.Execute('C:\Temp\users.xsd','C:\Temp\Users.xml')
$objBL = $null  

Execute the PowerShell script and after 10 minutes or so (depending on the usual things) it should have imported. If not check the error log for help. That's it for the Users table but how did I create the xsd file? Read on for details...

Creating the .xsd file

Probably the trickiest part of this is creating the .xsd file but there are some tricks we can use to help.
This is how we import the badges table. In your new StackOverflow database create the Badges table:
CREATE TABLE [dbo].[Users](
       [Id] [int] NOT NULL,
       [Reputation] [int] NULL,
       [CreationDate] [char](23) NULL,
       [DisplayName] [nvarchar](40) NULL,
       [LastAccessDate] [char](23) NULL,
       [WebsiteUrl] [nvarchar](500) NULL,
       [Location] [nvarchar](100) NULL,
       [Age] [int] NULL,
       [AboutMe] [nvarchar](max) NULL,
       [Views] [int] NULL,
       [UpVotes] [int] NULL,
       [AccountID] [int] NULL,
       [ProfileImageUrl] [nvarchar](500) NULL,
       [DownVotes] [int] NULL,
 CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

1.       Download and extract the badges archive to C:\Temp, https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z

2.       As this is a very large file and notepad/notepad++ won’t open it, use HJSplit.exe, http://www.hjsplit.org/, to break it up into manageable pieces. This may take a while if we split the entire file but we only really need a very small subset of the data in the file. Say 500KB. Start the split process and as soon as the first file is created take a copy of it then stop the split. If you don’t take a copy then when you stop the split all the files it creates will be deleted.


1.       Now download and install Visual Studio 2017 Community Edition from here, https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15,  if you don’t have a copy of Visual Studio available. What we are after is the xsd creating capabilities, I’m sure other applications can do this

2.       Once you have your 500KB file open it in Notepad++ or some such and copy out the first 10 lines or so, but remember to create an ending </badges> on the last line so the xml is properly delimited.
3.       Create an empty Visual Studio solution:


Right-click on Solution Explorer and select New Item…
And choose XML file:

Paste the 11 line xml file you created earlier and paste it into the empty XML file.


Then from the XML menu choose Create Schema. This will magically create a .xsd file from your xml file. Now all we have to do is some extra editing and we’re nearly there.


This is the .xsd file before we edited it:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="badges">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="row">
          <xs:complexType>
            <xs:attribute name="Id" type="xs:unsignedInt" use="required" />
            <xs:attribute name="UserId" type="xs:unsignedShort" use="required" />
            <xs:attribute name="Name" type="xs:string" use="required" />
            <xs:attribute name="Date" type="xs:dateTime" use="required" />
            <xs:attribute name="Class" type="xs:unsignedByte" use="required" />
            <xs:attribute name="TagBased" type="xs:string" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

This is it after:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
 <xsd:element name = "badges" sql:is-constant ="1">
   <xsd:complexType>
     <xsd:sequence>   
       <xsd:element name="badges">     
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name = "row" sql:relation = "badges" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
              <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
              <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
              <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
              <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
              <xsd:attribute name="TagBased" type="xsd:string" sql:field="TagBased" />         
            </xsd:complexType>
          </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
     </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>

So what we did is replace every occurrence of xs with xsd, added mappings to match the column names in the database; and deleted this line:
<?xml version="1.0" encoding="utf-8"?>
Replaced this line
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

With
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">

Changed
<xsd:element maxOccurs="unbounded" name="row">
To
<xsd:element name = "row" sql:relation = "badge" maxOccurs = "unbounded">
Added the line:
<xsd:element name = "badges" sql:is-constant ="1"> after the first.


Etc….

You can work out the rest by comparing the two different listings.

Now save your xsd file, along with our 2GB xml file plus the edited PowerShell script and run it! Voila!

I worked out the DML for the tables by looking at the xsd/xml elements, attributes and datatypes. 

For your convenience, here are the .xsd files and DML for all the tables:


Badges

https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Badges.log'
$objBL.Execute('C:\Temp\Badges.xsd','C:\Temp\Badges.xml')
$objBL = $null  

CREATE TABLE [dbo].[Badges] (
       [Id] [int] IDENTITY(1, 1) NOT NULL
       ,[UserId] [int] NULL
       ,[Name] [nvarchar](50) NULL
       ,[Date] CHAR(23) NULL
       ,Class INT
       ,TagBased bit
       ,CONSTRAINT [PK_Badges] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "badges" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "Badges" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
            <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
            <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
            <xsd:attribute name="TagBased" type="xsd:boolean" sql:field="TagBased" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Comments


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Comments.log'
$objBL.Execute('C:\Temp\Comments.xsd','C:\Temp\Comments.xml')
$objBL = $null 

CREATE TABLE [dbo].[Comments] (
       [Id] [int] NOT NULL
       ,[PostId] [int] NULL
       ,[Score] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "comments" sql:is-constant ="1">   
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Comments" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostHistory


 $objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostHistory.log'
$objBL.Execute('C:\Temp\PostHistory.xsd','C:\Temp\PostHistory.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posthistory] (
       [Id] [int] NOT NULL
       ,[PostHistoryTypeId] [int] NULL
       ,[PostId] [int] NULL
       ,[RevisionGUID] [char](36) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[UserDisplayName] [varchar](11) NULL
       ,PRIMARY KEY CLUSTERED ()
       )
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posthistory" sql:is-constant ="1">
      <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostHistory" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostHistoryTypeId" type="xsd:integer" sql:field="PostHistoryTypeId" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="RevisionGUID" type="xsd:string" sql:field="RevisionGUID" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="UserDisplayName" type="xsd:string" sql:field="UserDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostLinks


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostLinks.log'
$objBL.Execute('C:\Temp\PostLinks.xsd','C:\Temp\PostLinks.xml')
$objBL = $null 


CREATE TABLE [dbo].[PostLinks] (
       [Id] [int] NOT NULL
       ,[CreationDate] [char](23) NULL
       ,[PostId] [int] NULL
       ,[RelatedPostId] [int] NULL
       ,[LinkTypeId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "postlinks" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostLinks" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field ="Id" />
              <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field = "CreationDate" />
              <xsd:attribute name="PostId" type="xsd:integer" sql:field = "PostId" />
              <xsd:attribute name="RelatedPostId" type="xsd:integer" sql:field ="RelatedPostId" />
              <xsd:attribute name="LinkTypeId" type="xsd:integer" sql:field ="LinkTypeId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Posts


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Posts.log'
$objBL.Execute('C:\Temp\Posts.xsd','C:\Temp\Posts.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posts] (
       [Id] [int] NOT NULL
       ,[PostTypeId] [int] NULL
       ,[AcceptedAnswerId] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[Score] [int] NULL
       ,[ViewCount] [int] NULL
       ,[Body] [varchar](max) NULL
       ,[OwnerUserID] [int] NULL
       ,[LastEditorUserId] [int] NULL
       ,[LastEditorDisplayName] [varchar](100) NULL
       ,[LastEditDate] [char](23) NULL
       ,[LastActivityDate] [char](23) NULL
       ,[Title] [varchar](500) NULL
       ,[Tags] [varchar](500) NULL
       ,[AnswerCount] [int] NULL
       ,[CommentCount] [int] NULL
       ,[FavoriteCount] [int] NULL
       ,[CommunityOwnedDate] [char](23) NULL
       ,[ParentID] [int] NULL
       ,[OwnerDisplayName] [varchar](100) NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posts" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Posts" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostTypeId" type="xsd:integer" sql:field="PostTypeId" />
            <xsd:attribute name="AcceptedAnswerId" type="xsd:integer" sql:field="AcceptedAnswerId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="ViewCount" type="xsd:integer" sql:field="ViewCount" />
            <xsd:attribute name="Body" type="xsd:string" sql:field="Body" />
            <xsd:attribute name="OwnerUserId" type="xsd:integer" sql:field="OwnerUserId" />
            <xsd:attribute name="LastEditorUserId" type="xsd:integer" sql:field="LastEditorUserId" />
            <xsd:attribute name="LastEditorDisplayName" type="xsd:string" sql:field="LastEditorDisplayName" />
            <xsd:attribute name="LastEditDate" type="xsd:dateTime" sql:field="LastEditDate" />
            <xsd:attribute name="LastActivityDate" type="xsd:dateTime" sql:field="LastActivityDate" />
            <xsd:attribute name="Title" type="xsd:string" sql:field="Title" />
            <xsd:attribute name="Tags" type="xsd:string" sql:field="Tags" />
            <xsd:attribute name="AnswerCount" type="xsd:integer" sql:field="AnswerCount" />
            <xsd:attribute name="CommentCount" type="xsd:integer" sql:field="CommentCount" />
            <xsd:attribute name="FavoriteCount" type="xsd:integer" sql:field="FavoriteCount" />
            <xsd:attribute name="CommunityOwnedDate" type="xsd:dateTime" sql:field="CommunityOwnedDate" />
            <xsd:attribute name="ParentId" type="xsd:integer" sql:field="ParentId" />
            <xsd:attribute name="OwnerDisplayName" type="xsd:string" sql:field="OwnerDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Tags

https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z



$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Tags.log'
$objBL.Execute('C:\Temp\Tags.xsd','C:\Temp\Tags.xml')
$objBL = $null 

CREATE TABLE [dbo].[Tags] (
       [Id] [int] NOT NULL
       ,[TagName] [nvarchar](100) NULL
       ,[Count] [int] NULL
       ,[ExcerptPostId] [int] NULL
       ,[WikiPostId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "tags" sql:is-constant ="1">
     <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "tags" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:unsignedByte" sql:field="Id" />
            <xsd:attribute name="TagName" type="xsd:string" sql:field="TagName" />
            <xsd:attribute name="Count" type="xsd:unsignedInt" sql:field="Count" />
            <xsd:attribute name="ExcerptPostId" type="xsd:unsignedInt" sql:field="ExcerptPostId" />
            <xsd:attribute name="WikiPostId" type="xsd:unsignedInt" sql:field="WikiPostId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Users


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Users.log'
$objBL.Execute('C:\Temp\Users.xsd','C:\Temp\Users.xml')
$objBL = $null 

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Votes


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'$objBL.ErrorLogFile ='C:\Temp\Votes.log'
$objBL.Execute('C:\Temp\Votes.xsd','C:\Temp\Votes.xml')
objBL = $null 

CREATE TABLE [dbo].[Votes] (
       [Id] [int] NULL
       ,[PostId] [int] NULL
       ,[VoteTypeID] [int] NULL
       ,[CreationDate] [char](23) NULL
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="votes" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Votes" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="VoteTypeId" type="xsd:integer" sql:field="VoteTypeId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Resources used in putting this guide together

Comments

Popular posts from this blog

How to move the Microsoft Assessment and Planning Toolkit (MAP) database to a different drive

The Microsoft Assessment and Planning Toolkit (MAP) is a very useful tool for scanning your network to find instances of SQL Server plus all manner of detailed information about the installed product, OS and hardware it sits on.


<Click image to enbiggen>
There is an issue with it the database it uses to store the data it collects, however. Assuming you don't have an instance called MAPS on your server, the product will install using LocalDB (a cut down version of SQL Server Express) and puts the databases on your C: drive. If you then scan a large network you could easily expand the database to 10GB which may cause issues on a server when that drive is often one of the smallest. However, there is a simple solution: connect to LocalDB using Management Studio, detach the databases, move to a different drive, set permissions on the new location if required and reattach the database. How do you connect to LocalDB? Here you go:

Connect to (localdb)\MAPTOOLKIT


The databases I move…

SAN performance testing using SQLIO

Introduction
This document describes how to use Microsoft’s SQLIO to test disk/SAN performance. It is biased towards SQL Server – which uses primarily 64KB and 8KB data pages so I am running the tests using those cluster sizes, however, other sizes can be specified.  Download SQLIO from https://www.microsoft.com/en-gb/download/details.aspx?id=20163 SQLIO is a command line tool with no GUI so you need to open a command prompt at C:\Program Files (x86)\SQLIO after you have installed it. Configuration First of all edit param.txt so that you create the test file we will be using. The file needs to be bigger than the combined RAID and on-board disk caches. In this case we are using a 50GB file.
The “2” refers to the number of threads to use when testing, you don’t need to change this now. The “0x0” value indicates that all CPUs should be used, which you probably don’t want to change either, “#” is a comment. The only part you may want to change is 51200 (50GB) and the drive letter. After …

SSIS Job fails when it calls Excel via the SQL Agent but succeeds from SSDT

If you have an SSIS package which fails when run on a schedule but succeeds when executed interactively in Visual Studio/BIDS/SSDT, it may produce an error like this:
Executed as user: DOMAIN\user. Microsoft (R) SQL Server Execute Package Utility  Version 12.0.4100.1 for 64-bit  Copyright (C) Microsoft Corporation. All rights reserved.    Started:  10:52:25  Error: 2017-02-06 10:52:26.26     Code: 0x00000001     Source: Open Spreadsheet and Run Macro      Description: Exception has been thrown by the target of an invocation.  End Error  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  10:52:25  Finished: 10:52:26  Elapsed:  0.485 seconds.  The package execution failed.  The step failed.
The issue is a type of permissions error. Excel needs to have its permissions changed via the DCOM applet in Control Panel. By default it is on “The launching user.” This needs to be changed to a user with more permissions, in this case we have used the service account used by SQL A…