Tuesday, September 8, 2009

Downloading Content From SharePoint (Let me count the ways)

Technorati Tags: ,,,

This is a follow up post to the “Uploading Content To SharePoint”. In that post I evaluated the performance and complexity of uploading content to SharePoint with three different methods. In this post I will be doing comparisons for downloading content from SharePoint. Once again there are many ways to get content from SharePoint. When I say content I am focusing on files and metadata. There is an additional way to get a list item using the Lists web service and the GetListItems method; however, this only returns metadata.

Method Complexity Scalability Metadata Versions
Copy.asmx 2 4 yes no
WebDav 5 5 yes* yes**
Rpc 10 10 yes yes

*must be used in conjunction with Lists web service

**must be used in conjunction with Versions web service

Copy Web Service

The copy web service is the easiest to use to get content from SharePoint because it is familiar web service programming and it allows the content and the metadata to be retrieved in one method call. The scalability is not as good as the other methods because it uses the more verbose soap protocol. Finally, one of the biggest disadvantages using the copy web service is the fact that it does not support returning versions of a document. The copy web service was made for migrating content between document libraries and sites within the same site collection not necessarily versions. I did try using the copy web service to retrieve an older version of a document by setting the url argument to the GetItem method using the older version’s url.

The version url looks like this:

http://servername/site/_vti_history/512/documentlibrary/filename.tiff

Unfortunately, when sending in a version url the GetItem method executes successfully but both the metadata and binary are empty.

public static void DownloadDocument()
{

copyservice.Copy c = new copyservice.Copy();
c.Url = "http://basesmcdev2/sites/tester1/_vti_bin/copy.asmx%22;
c.UseDefaultCredentials = true;

byte[] myBinary = new byte[] { };
copyservice.FieldInformation information = new copyservice.FieldInformation();
copyservice.FieldInformation[] info = { information };

string docUrl = "http://basesmcdev2/sites/tester1/Shared%20Documents/+aaa/mama.bmp%22;

uint result = c.GetItem(docUrl, out info, out myBinary);
if(File.Exists("c:\\newfile.bmp")) File.Delete("c:\\newfile.bmp");
using (Stream s = File.Create("c:\\newfile.bmp"))
{
s.Write(myBinary, 0, myBinary.Length);
s.Flush();
}
}

WebDav

Using webdav to download a document is fairly simple if you have the full url of the document. However, if you want to include metadata you must call the GetListItem of the list web service. This becomes tricky when the only piece of information you have is the url. You can use the list web service but you will need to know the name of the list and the id of the item in order to retrieve the metadata. In the end, if you are going to call a SharePoint web service then just call the copy web service and get both the metadata and file in one call.

public static void GetFileWithMetaData(string url)
{

WebClient wc = new WebClient();
wc.UseDefaultCredentials = true;

byte[] response = wc.DownloadData(url);

string returnStr = Encoding.UTF8.GetString(response);

if (File.Exists("c:\\default.txt")) File.Delete("c:\\default.txt");

using (Stream s = File.Create("c:\\detault.txt"))
{
s.Write(response, 0, response.Length);
s.Flush();
}

}

So how does WebDav support versions? You can give the DownLoadData method the url of the version mentioned previously and it will return the binary for that version. So, now the question is how to you obtain the url’s of previous versions? Below is a code snippet that uses the Versions web service to retrieve a particular version’s url for a document for a given file url and version number.

public static string GetVersionUrl(string url, double version)
{
double versionNumeric = 0;
string versionNumber = string.Empty;
string versionUrl = string.Empty;

versionservice.Versions vs = new versionservice.Versions();
vs.Url = "http://basesmcdev2/sites/tester1/_vti_bin/versions.asmx%22;
vs.UseDefaultCredentials = true;

XmlNode versionsNode = vs.GetVersions(url);

if (versionsNode != null)
{
using (StringReader sr = new StringReader(versionsNode.OuterXml))
{
using (XmlTextReader xtr = new XmlTextReader(sr))
{
XElement versionInfo = XElement.Load(xtr);

var versionResults = from r in versionInfo.Elements() where r.Name.LocalName == "result" select r;

foreach (XElement versionElement in versionResults)
{

var versionAttr = (from a in versionElement.Attributes() where a.Name == "version" select a);
versionNumber = versionAttr.First().Value;
versionUrl = versionElement.Attributes("url").First().Value.ToString();

//current version is in the form of "@1.0" and will not pass TryParse
if (versionAttr.Count() > 0 && double.TryParse(versionNumber, out versionNumeric))
if (version == versionNumeric) return versionUrl;
else
if (version == Convert.ToDouble(versionNumber.Substring(1))) return versionUrl;

}

}

}
}

return string.Empty;
}

FrontPage RPC (Remote Procedure Calls)

Most developers are not familiar with frontpage remote procedure calls. However, they are the most efficient and yet the most complex to code against. You could create your own wrapper classes to simplify coding. You must understand the command structure and be able to parse the return html correctly. In the case of downloading a document it is especially complex. You must come up with a way to parse out the returned metadata within the html. Below is an example of a successful return of a “get document” rpc call. Anything below the closing </html> tag is the file.

<html><head><title>vermeer RPC packet</title></head>
<body>
<p>method=get document:12.0.0.4518
<p>message=successfully retrieved document 'tester4/Code128.tif' from 'tester4/Code128.tif'
<p>document=
<ul>
<li>document_name=tester4/Code128.tif
<li>meta_info=
<ul>
<li>vti_rtag
<li>SWrt:FED298A3-6030-40DA-A984-D0A04A673741@00000000013
<li>vti_etag
<li>SW&#34;&#123;FED298A3-6030-40DA-A984-D0A04A673741&#125;,13&#34;
<li>vti_parserversion
<li>SR12.0.0.6318
<li>vti_modifiedby
<li>SRBASESMCDEV2&#92;steve.curran
<li>vti_filesize
<li>IR3387
<li>vti_timecreated
<li>TR22 Oct 2008 19:45:25 -0000
<li>ContentType
<li>SWDocument
<li>ContentTypeId
<li>SW0x01010087C42E0D80709D4CB61D6558C94571E4
<li>vti_title
<li>SW
<li>statechoice
<li>SWstateone
<li>vti_lastheight
<li>IX2200
<li>vti_timelastmodified
<li>TR13 Apr 2009 22:04:31 -0000
<li>vti_nexttolasttimemodified
<li>TR13 Apr 2009 22:02:20 -0000
<li>vti_candeleteversion
<li>BRtrue
<li>vti_canmaybeedit
<li>BXtrue
<li>vti_backlinkinfo
<li>VXLists/threeStateTasks/3_.000 Lists/threeStateTasks/4_.000
<li>myreq
<li>SWnbnbvnbv
<li>vti_author
<li>SRBASESMCDEV2&#92;steve.curran
<li>vti_lastwidth
<li>IX1696
<li>vti_sourcecontrolversion
<li>SRV1.0
<li>vti_sourcecontrolcookie
<li>SRfp_internal
<li>vti_level
<li>IR1
</ul>
</ul>
</body>
</html>








Finally, below is an example on how to call the “get document” rpc method. You must work with the returned byte array and copy the section which represents the file to another byte array. This method supports retrieving older versions. Just pass in the version number, zero would represent the current version. The fileUrl argument represents document library name plus the file name. For example, “Shared Documents/FileName.tiff”.










public static void DownloadDocumentRPC(string fileUrl, int version)
{

string method = "get document: 12.0.0.4518";
string serviceName = "http://basesmcdev2/sites/tester1/_vti_bin/_vti_aut/author.dll";
string verstr = version > 0 ? "V" + version.ToString() : string.Empty;
string document = fileUrl;
byte[] data;
string returnStr = string.Empty;
byte[] fileBytes = null;

string fpRPCCallStr = "method={0}&service_name={1}&document_name={2}&doc_version={3}&get_option={4}&timeout=0";

method = HttpUtility.UrlEncode(method);
fpRPCCallStr = String.Format(fpRPCCallStr, method, serviceName, document, verstr, "none");

try
{
//add line feed character to delimit end of command
byte[] fpRPCCall = System.Text.Encoding.UTF8.GetBytes(fpRPCCallStr + "\n");

data = new byte[fpRPCCall.Length];
fpRPCCall.CopyTo(data, 0);

HttpWebRequest wReq = WebRequest.Create(serviceName) as HttpWebRequest;

wReq.Credentials = System.Net.CredentialCache.DefaultCredentials;
wReq.Method = "POST";
wReq.ContentType = "application/x-vermeer-urlencoded";
wReq.Headers.Add("X-Vermeer-Content-Type", "application/x-vermeer-urlencoded");
wReq.ContentLength = fpRPCCall.Length;

using (Stream requestStream = wReq.GetRequestStream())
{

requestStream.Write(fpRPCCall, 0, fpRPCCall.Length);

int chunkSize = 2097152;


//Now get the response from the server
WebResponse response = wReq.GetResponse();
int lastBytesRead, totalBytesRead;
long contentLength = response.ContentLength;
bool noLength = false;

if (contentLength == -1)
{
noLength = true;
contentLength = chunkSize;
}

byte[] returnBuffer = new byte[(int)contentLength];
using (Stream responseStream = response.GetResponseStream())
{
totalBytesRead = 0;

do
{
lastBytesRead =
responseStream.Read(returnBuffer, totalBytesRead, ((int)contentLength) - totalBytesRead);
totalBytesRead += lastBytesRead;
if (noLength && (totalBytesRead == contentLength))
{
contentLength += chunkSize;
byte[] buffer2 = new byte[(int)contentLength];
Buffer.BlockCopy(returnBuffer, 0, buffer2, 0, totalBytesRead);
returnBuffer = buffer2;
}
}
while (lastBytesRead != 0);

}

if (noLength)
{
byte[] buffer3 = new byte[totalBytesRead];
Buffer.BlockCopy(returnBuffer, 0, buffer3, 0, totalBytesRead);
returnBuffer = buffer3;
}

returnStr = Encoding.UTF8.GetString(returnBuffer);

//get begining of file bytes
int startpos = returnStr.IndexOf("</html>") + 8;
using (MemoryStream ms =
new MemoryStream(returnBuffer, startpos, returnBuffer.Length - startpos))
fileBytes = ms.ToArray();


if (File.Exists("c:\\newfile.bmp")) File.Delete("c:\\newfile.bmp");
using (Stream s = File.Create("c:\\newfile.bmp"))
{
s.Write(fileBytes, 0, fileBytes.Length);
s.Flush();

}



}

}
catch (Exception ex)
{
//error handling

}

}




Once again I hope this comparison of the different methods of downloading content from SharePoint will help you plan your next application’s SharePoint integration. Knowing the different methods will prevent you from having write your own web service. Ultimately, taking advantage of SharePoint “out of the box” tools will make your applications easier to install, configure and maintain.

2 comments:

bharath said...

Actually, i am using RPC to connect to SharePoint to retrive all the documents along with its metadata. It worked fine with SharePoint2007, but when i try to use FrontPage RPC method 'list versions' which is used to get all versions of a document in SharePoint 2010, it results in a message saying 'list versions not supprted'.

Is ther any specific reason why 'list versions' method of FromPageRPC not supported in SharePoint 2010 while other methods like 'Get document' still working.

Unknown said...

Recently, I got stuck with the download of large files(~10GB) from SharePoint Online. Through this post, I learned a great deal about the RPC calls to SharePoint, which has eventually helped me to solve my issue. So, thanks a lot for sharing :)

I have blogged about my issue, on the following link:
https://realmpksharepoint.wordpress.com/2016/08/15/download-large-files-from-sharepoint-online/

Post a Comment