Skip to main content

How to import a large xml file into SQL Server


(Or how to import the StackOverflow database into SQL Server)

Introduction


NB  This process can be generalised to import any large (>2G) xml file into SQL Server.
Some SQL Server training you can find online including that by Brent Ozar uses the StackOverflow database for practice. The tables from it are available online for download in xml format. In the past it was possible to use the scripts found here, https://www.toadworld.com/platforms/sql-server/w/wiki/9466.how-to-import-the-stackoverflow-xml-into-sql-server, to import them but as each xml file is now over 2GB you will get an error like this when you try to execute them:


Brent Ozar, has a link to SODDI.exe, https://github.com/BrentOzarULTD/soddi, which can import the files (I haven’t tried it) but it means downloading and importing eight tables: Badges, Comments, PostHistory, PostLinks, Posts, Tags, Users, and Votes tables which amounts to >30GB of compressed xml increasing to ~200GB when decompressed. What if you only want to import one table?


By using SQLXML 4 and some custom xsd files, it is possible to import one table at a time. If you just want to import the Users table then a quick guide follows. If you want to know how I created the xsd files for all the tables then check out the next section. But if you want just want the links to the downloads, the PowerShell script, DML and xsd files then skip to the bottom.

Import the Users table

1.       Download and install SQLXML4 SP1 from https://www.microsoft.com/en-gb/download/details.aspx?id=30403
2.     Create a database called StackOverflow and a table called dbo.Users on your SQL Server:

CREATE DATABASE [StackOverflow]
GO

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

1.       Download and unzip (7zip) to a convenient location the users table from here, https://archive.org/download/stackexchange/stackoverflow.com-Users.7z to C:\Temp, The 7zip file at the time of writing is 230MB and expands to well over 2GB. After it has extracted, which will take a while, rename it to users.xml.

In the location where you unzipped the above file (C:\Temp) create a text file called users.xsd and paste this in,
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

2.       Now in C:\Temp create a file called import.ps1 and paste these PowerShell commands, editing the ConnectionString value to match your server:

$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Error.log'
$objBL.Execute('C:\Temp\users.xsd','C:\Temp\Users.xml')
$objBL = $null  

Execute the PowerShell script and after 10 minutes or so (depending on the usual things) it should have imported. If not check the error log for help. That's it for the Users table but how did I create the xsd file? Read on for details...

Creating the .xsd file

Probably the trickiest part of this is creating the .xsd file but there are some tricks we can use to help.
This is how we import the badges table. In your new StackOverflow database create the Badges table:
CREATE TABLE [dbo].[Users](
       [Id] [int] NOT NULL,
       [Reputation] [int] NULL,
       [CreationDate] [char](23) NULL,
       [DisplayName] [nvarchar](40) NULL,
       [LastAccessDate] [char](23) NULL,
       [WebsiteUrl] [nvarchar](500) NULL,
       [Location] [nvarchar](100) NULL,
       [Age] [int] NULL,
       [AboutMe] [nvarchar](max) NULL,
       [Views] [int] NULL,
       [UpVotes] [int] NULL,
       [AccountID] [int] NULL,
       [ProfileImageUrl] [nvarchar](500) NULL,
       [DownVotes] [int] NULL,
 CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

1.       Download and extract the badges archive to C:\Temp, https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z

2.       As this is a very large file and notepad/notepad++ won’t open it, use HJSplit.exe, http://www.hjsplit.org/, to break it up into manageable pieces. This may take a while if we split the entire file but we only really need a very small subset of the data in the file. Say 500KB. Start the split process and as soon as the first file is created take a copy of it then stop the split. If you don’t take a copy then when you stop the split all the files it creates will be deleted.


1.       Now download and install Visual Studio 2017 Community Edition from here, https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15,  if you don’t have a copy of Visual Studio available. What we are after is the xsd creating capabilities, I’m sure other applications can do this

2.       Once you have your 500KB file open it in Notepad++ or some such and copy out the first 10 lines or so, but remember to create an ending </badges> on the last line so the xml is properly delimited.
3.       Create an empty Visual Studio solution:


Right-click on Solution Explorer and select New Item…
And choose XML file:

Paste the 11 line xml file you created earlier and paste it into the empty XML file.


Then from the XML menu choose Create Schema. This will magically create a .xsd file from your xml file. Now all we have to do is some extra editing and we’re nearly there.


This is the .xsd file before we edited it:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="badges">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="row">
          <xs:complexType>
            <xs:attribute name="Id" type="xs:unsignedInt" use="required" />
            <xs:attribute name="UserId" type="xs:unsignedShort" use="required" />
            <xs:attribute name="Name" type="xs:string" use="required" />
            <xs:attribute name="Date" type="xs:dateTime" use="required" />
            <xs:attribute name="Class" type="xs:unsignedByte" use="required" />
            <xs:attribute name="TagBased" type="xs:string" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

This is it after:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
 <xsd:element name = "badges" sql:is-constant ="1">
   <xsd:complexType>
     <xsd:sequence>   
       <xsd:element name="badges">     
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name = "row" sql:relation = "badges" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
              <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
              <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
              <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
              <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
              <xsd:attribute name="TagBased" type="xsd:string" sql:field="TagBased" />         
            </xsd:complexType>
          </xsd:element>
        </xsd:sequence>
      </xsd:complexType>
     </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>

So what we did is replace every occurrence of xs with xsd, added mappings to match the column names in the database; and deleted this line:
<?xml version="1.0" encoding="utf-8"?>
Replaced this line
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

With
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">

Changed
<xsd:element maxOccurs="unbounded" name="row">
To
<xsd:element name = "row" sql:relation = "badge" maxOccurs = "unbounded">
Added the line:
<xsd:element name = "badges" sql:is-constant ="1"> after the first.


Etc….

You can work out the rest by comparing the two different listings.

Now save your xsd file, along with our 2GB xml file plus the edited PowerShell script and run it! Voila!

I worked out the DML for the tables by looking at the xsd/xml elements, attributes and datatypes. 

For your convenience, here are the .xsd files and DML for all the tables:


Badges

https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Badges.log'
$objBL.Execute('C:\Temp\Badges.xsd','C:\Temp\Badges.xml')
$objBL = $null  

CREATE TABLE [dbo].[Badges] (
       [Id] [int] IDENTITY(1, 1) NOT NULL
       ,[UserId] [int] NULL
       ,[Name] [nvarchar](50) NULL
       ,[Date] CHAR(23) NULL
       ,Class INT
       ,TagBased bit
       ,CONSTRAINT [PK_Badges] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "badges" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "Badges" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Name" type="xsd:string" sql:field="Name" />
            <xsd:attribute name="Date" type="xsd:dateTime" sql:field="Date" />
            <xsd:attribute name="Class" type="xsd:integer" sql:field="Class" />
            <xsd:attribute name="TagBased" type="xsd:boolean" sql:field="TagBased" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Comments


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Comments.log'
$objBL.Execute('C:\Temp\Comments.xsd','C:\Temp\Comments.xml')
$objBL = $null 

CREATE TABLE [dbo].[Comments] (
       [Id] [int] NOT NULL
       ,[PostId] [int] NULL
       ,[Score] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "comments" sql:is-constant ="1">   
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Comments" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostHistory


 $objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostHistory.log'
$objBL.Execute('C:\Temp\PostHistory.xsd','C:\Temp\PostHistory.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posthistory] (
       [Id] [int] NOT NULL
       ,[PostHistoryTypeId] [int] NULL
       ,[PostId] [int] NULL
       ,[RevisionGUID] [char](36) NULL
       ,[CreationDate] [char](23) NULL
       ,[UserId] [int] NULL
       ,[Text] [varchar](max) NULL
       ,[UserDisplayName] [varchar](11) NULL
       ,PRIMARY KEY CLUSTERED ()
       )
       [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posthistory" sql:is-constant ="1">
      <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostHistory" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostHistoryTypeId" type="xsd:integer" sql:field="PostHistoryTypeId" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="RevisionGUID" type="xsd:string" sql:field="RevisionGUID" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="UserId" type="xsd:integer" sql:field="UserId" />
            <xsd:attribute name="Text" type="xsd:string" sql:field="Text" />
            <xsd:attribute name="UserDisplayName" type="xsd:string" sql:field="UserDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

PostLinks


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\PostLinks.log'
$objBL.Execute('C:\Temp\PostLinks.xsd','C:\Temp\PostLinks.xml')
$objBL = $null 


CREATE TABLE [dbo].[PostLinks] (
       [Id] [int] NOT NULL
       ,[CreationDate] [char](23) NULL
       ,[PostId] [int] NULL
       ,[RelatedPostId] [int] NULL
       ,[LinkTypeId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "postlinks" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "PostLinks" maxOccurs = "unbounded">
            <xsd:complexType>
              <xsd:attribute name="Id" type="xsd:integer" sql:field ="Id" />
              <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field = "CreationDate" />
              <xsd:attribute name="PostId" type="xsd:integer" sql:field = "PostId" />
              <xsd:attribute name="RelatedPostId" type="xsd:integer" sql:field ="RelatedPostId" />
              <xsd:attribute name="LinkTypeId" type="xsd:integer" sql:field ="LinkTypeId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Posts


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Posts.log'
$objBL.Execute('C:\Temp\Posts.xsd','C:\Temp\Posts.xml')
$objBL = $null 

CREATE TABLE [dbo].[Posts] (
       [Id] [int] NOT NULL
       ,[PostTypeId] [int] NULL
       ,[AcceptedAnswerId] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[Score] [int] NULL
       ,[ViewCount] [int] NULL
       ,[Body] [varchar](max) NULL
       ,[OwnerUserID] [int] NULL
       ,[LastEditorUserId] [int] NULL
       ,[LastEditorDisplayName] [varchar](100) NULL
       ,[LastEditDate] [char](23) NULL
       ,[LastActivityDate] [char](23) NULL
       ,[Title] [varchar](500) NULL
       ,[Tags] [varchar](500) NULL
       ,[AnswerCount] [int] NULL
       ,[CommentCount] [int] NULL
       ,[FavoriteCount] [int] NULL
       ,[CommunityOwnedDate] [char](23) NULL
       ,[ParentID] [int] NULL
       ,[OwnerDisplayName] [varchar](100) NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="posts" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Posts" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostTypeId" type="xsd:integer" sql:field="PostTypeId" />
            <xsd:attribute name="AcceptedAnswerId" type="xsd:integer" sql:field="AcceptedAnswerId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="Score" type="xsd:integer" sql:field="Score" />
            <xsd:attribute name="ViewCount" type="xsd:integer" sql:field="ViewCount" />
            <xsd:attribute name="Body" type="xsd:string" sql:field="Body" />
            <xsd:attribute name="OwnerUserId" type="xsd:integer" sql:field="OwnerUserId" />
            <xsd:attribute name="LastEditorUserId" type="xsd:integer" sql:field="LastEditorUserId" />
            <xsd:attribute name="LastEditorDisplayName" type="xsd:string" sql:field="LastEditorDisplayName" />
            <xsd:attribute name="LastEditDate" type="xsd:dateTime" sql:field="LastEditDate" />
            <xsd:attribute name="LastActivityDate" type="xsd:dateTime" sql:field="LastActivityDate" />
            <xsd:attribute name="Title" type="xsd:string" sql:field="Title" />
            <xsd:attribute name="Tags" type="xsd:string" sql:field="Tags" />
            <xsd:attribute name="AnswerCount" type="xsd:integer" sql:field="AnswerCount" />
            <xsd:attribute name="CommentCount" type="xsd:integer" sql:field="CommentCount" />
            <xsd:attribute name="FavoriteCount" type="xsd:integer" sql:field="FavoriteCount" />
            <xsd:attribute name="CommunityOwnedDate" type="xsd:dateTime" sql:field="CommunityOwnedDate" />
            <xsd:attribute name="ParentId" type="xsd:integer" sql:field="ParentId" />
            <xsd:attribute name="OwnerDisplayName" type="xsd:string" sql:field="OwnerDisplayName" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Tags

https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z



$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Tags.log'
$objBL.Execute('C:\Temp\Tags.xsd','C:\Temp\Tags.xml')
$objBL = $null 

CREATE TABLE [dbo].[Tags] (
       [Id] [int] NOT NULL
       ,[TagName] [nvarchar](100) NULL
       ,[Count] [int] NULL
       ,[ExcerptPostId] [int] NULL
       ,[WikiPostId] [int] NULL
       ,PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "tags" sql:is-constant ="1">
     <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "tags" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:unsignedByte" sql:field="Id" />
            <xsd:attribute name="TagName" type="xsd:string" sql:field="TagName" />
            <xsd:attribute name="Count" type="xsd:unsignedInt" sql:field="Count" />
            <xsd:attribute name="ExcerptPostId" type="xsd:unsignedInt" sql:field="ExcerptPostId" />
            <xsd:attribute name="WikiPostId" type="xsd:unsignedInt" sql:field="WikiPostId" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Users


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'
$objBL.ErrorLogFile ='C:\Temp\Users.log'
$objBL.Execute('C:\Temp\Users.xsd','C:\Temp\Users.xml')
$objBL = $null 

CREATE TABLE [dbo].[Users] (
       [Id] [int] NOT NULL
       ,[Reputation] [int] NULL
       ,[CreationDate] [char](23) NULL
       ,[DisplayName] [nvarchar](40) NULL
       ,[LastAccessDate] [char](23) NULL
       ,[WebsiteUrl] [nvarchar](500) NULL
       ,[Location] [nvarchar](100) NULL
       ,[Age] [int] NULL
       ,[AboutMe] [nvarchar](max) NULL
       ,[Views] [int] NULL
       ,[UpVotes] [int] NULL
       ,[AccountID] [int] NULL
       ,[ProfileImageUrl] [nvarchar](500) NULL
       ,[DownVotes] [int] NULL
       ,CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (
              PAD_INDEX = OFF
              ,STATISTICS_NORECOMPUTE = OFF
              ,IGNORE_DUP_KEY = OFF
              ,ALLOW_ROW_LOCKS = ON
              ,ALLOW_PAGE_LOCKS = ON
              ) ON [PRIMARY]
       ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name = "users" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name = "row" sql:relation = "users" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field = "ID" />
            <xsd:attribute name="Reputation" type="xsd:integer" sql:field="Reputation" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
            <xsd:attribute name="DisplayName" type="xsd:string" sql:field="DisplayName" />
            <xsd:attribute name="LastAccessDate" type="xsd:dateTime" sql:field="LastAccessDate" />
            <xsd:attribute name="WebsiteUrl" type="xsd:string" sql:field="WebsiteUrl" />
            <xsd:attribute name="Location" type="xsd:string" sql:field="Location" />
            <xsd:attribute name="AboutMe" type="xsd:string" sql:field="AboutMe" />
            <xsd:attribute name="Views" type="xsd:integer" sql:field="Views" />
            <xsd:attribute name="UpVotes" type="xsd:integer" sql:field="UpVotes" />
            <xsd:attribute name="DownVotes" type="xsd:integer" sql:field="DownVotes" />
            <xsd:attribute name="AccountId" type="xsd:integer" sql:field="AccountId" />
            <xsd:attribute name="ProfileImageUrl" type="xsd:string" sql:field="ProfileImageUrl" />
            <xsd:attribute name="Age" type="xsd:integer" sql:field="Age" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Votes


$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad'
$objBL.ConnectionString = 'provider = SQLOLEDB;data source=DB-SERVER;database=StackOverflow; integrated security = SSPI'$objBL.ErrorLogFile ='C:\Temp\Votes.log'
$objBL.Execute('C:\Temp\Votes.xsd','C:\Temp\Votes.xml')
objBL = $null 

CREATE TABLE [dbo].[Votes] (
       [Id] [int] NULL
       ,[PostId] [int] NULL
       ,[VoteTypeID] [int] NULL
       ,[CreationDate] [char](23) NULL
       ) ON [PRIMARY]

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
  <xsd:element name="votes" sql:is-constant ="1">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="row" sql:relation = "Votes" maxOccurs = "unbounded">
          <xsd:complexType>
            <xsd:attribute name="Id" type="xsd:integer" sql:field="Id" />
            <xsd:attribute name="PostId" type="xsd:integer" sql:field="PostId" />
            <xsd:attribute name="VoteTypeId" type="xsd:integer" sql:field="VoteTypeId" />
            <xsd:attribute name="CreationDate" type="xsd:dateTime" sql:field="CreationDate" />
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Resources used in putting this guide together

Comments

Popular posts from this blog

How to create a custom Windows Event Log view and email trigger

The filtering on Windows event logs can be slow, clunky and although you can do it on fields like event ID, it seems that many event IDs are shared amongst many different errors – the event ID may match but the body of the error (therefore the actual error) may be completely unrelated. Fortunately, it is possible to filter on the contents of the body of the error message but it requires creating a custom XML query. Also, it would be handy to send out a notification email when this event gets logged. Read on to find out how to work this magic…. This example is looking for a  Warning  event  1309  for  ASP.NET 4.0.30319.0  on a web server. If you were to just filter the log on the criteria above today it would return 435 results because it is a fairly general error ID. If I filter it using XML for SqlException (what I’m really interested in) only 5 results are returned. So the first step is go to the Application Log and choose  Create Custom View…  Select the  XML  tab, check  Edit

How to configure the SSAS service to use a Domain Account

NB Updating SPNs in AD is not for the faint hearted plus I got inconsistent results from different servers. Do so at your own risk! If you need the SSAS account on a SQL Server to use a domain account rather than the local “virtual” account “NT Service\MSSQLServerOLAPService”. You may think you just give the account login permissions to the server, perhaps give it sysadmin SQL permissions too. However, if you try and connect to SSAS  remotely  you may get this error: Authentication failed. (Microsoft.AnalysisService.AdomdClient) The target principal name is incorrect (Microsoft.AnalysisService.AdomdClient) From Microsoft: “A Service Principle Name (SPN) uniquely identifies a service instance in an Active Directory domain when Kerberos is used to mutually authenticate client and service identities. An SPN is associated with the logon account under which the service instance runs. For client applications connecting to Analysis Services via Kerberos authentication, th

Fun and games with the Management Data Warehouse (MDW and Data Collectors)

The SQL Server Management Data Warehouse (when you first come across it) seems to promise so much if the verbiage from Microsoft and some other websites is to to believed. But when you install it you may find that it is not as useful as it could be. This is a shame but we are currently only on v2 of the product with SQL 2012 so one hopes it will improve in subsequent versions. However, it probably is worth playing with if you have never used it before - at least you can show your boss some reports on general server health when he asks for it and you have nothing else in place. There is one big problem with it though if you decide that you don't want to use it any more, uninstalling it is not supported! Mad, I know. But as usual some very helpful people in the community have worked out, what seems to me, a pretty safe way of doing it. I had a problem with my MDW. The data collector jobs were causing a lot of deadlocking on some production servers and impacting performance. I