Gokul's Blog


Leave a comment

XSD to table generator

http://stackoverflow.com/questions/2628327/how-to-build-a-database-from-an-xsd-schema-and-import-xml-data

http://stackoverflow.com/questions/138575/how-can-i-create-database-tables-from-xsd-files

http://alearningdeveloper.wordpress.com/2011/03/31/creating-database-from-xsd-file/

http://stackoverflow.com/questions/403420/convert-xsd-into-sql-relational-tables

Advertisements


Leave a comment

Generate XSD for a XML file

Original Link: http://www.gibmonks.com/c_sharp/csharpckbk2-CHP-15-SECT-13.html

[sourceode language=”csharp”]
public static void GenerateSchemaForDirectory(string dir)
{
// Make sure the directory exists.
if (Directory.Exists(dir))
{
// Get the files in the directory.
string[] files = Directory.GetFiles(dir, “*.xml”);
foreach (string file in files)
{
// Set up a reader for the file.
using (XmlReader reader = XmlReader.Create(file))
{
XmlSchemaSet schemaSet = new XmlSchemaSet();
XmlSchemaInference schemaInference =
new XmlSchemaInference();

// Get the schema.
schemaSet = schemaInference.InferSchema(reader);

string schemaPath = “”;
foreach (XmlSchema schema in schemaSet.Schemas())
{
// Make schema file path.
schemaPath = Path.GetDirectoryName(file) + @”\” +
Path.GetFileNameWithoutExtension(file) + “.xsd”;
using (FileStream fs =
new FileStream(schemaPath, FileMode.OpenOrCreate))
{
schema.Write(fs);
}
}
}
}
}
}
[/sourcecode]


Leave a comment

Working with Large Xml files – part 2

http://blogs.msdn.com/b/xmlteam/archive/2007/03/24/streaming-with-linq-to-xml-part-2.aspx

for writing large xml files:   http://www.codeproject.com/KB/XML/XStreamingElement_Merge.aspx?display=Mobile

private static IEnumerable<XElement> StreamElements(
  string fileName,
  string elementName)
{
  using (var rdr = XmlReader.Create(fileName))
  {
	rdr.MoveToContent();
	while (rdr.Read())
	{
	  if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
	  {
		var e = XElement.ReadFrom(rdr) as XElement;
		yield return e;
	  }
	}
	rdr.Close();
  }

XMLReader Implementation: http://msdn.microsoft.com/en-us/library/ff647804.aspx

while (reader.Read())
{
    switch (reader.NodeType)
    {
      case System.Xml.XmlNodeType.Element :
      {
        if( reader.Name.Equals("patient")
        && reader.GetAttribute("number").Equals("25") )
      {
        doc = new System.Xml.XmlDocument();
        XmlNode node = doc.ReadNode( reader );
        doc.AppendChild( node );
      }
      break;
    }
  }
}


Leave a comment

UTF-8 Encoding

It’s always good to know how UTF-8 Works:
http://research.swtch.com/2010/03/utf-8-bits-bytes-and-benefits.html

UTF-8 is a way to encode Unicode code points—integer values from 0 through 10FFFF—into a byte stream, and it is far simpler than many people realize. The easiest way to make it confusing or complicated is to treat it as a black box, never looking inside. So let’s start by looking inside. Here it is:

 

Unicode code points UTF-8 encoding (binary)
00-7F (7 bits) 0tuvwxyz
0080-07FF (11 bits) 110pqrst 10uvwxyz
0800-FFFF (16 bits) 1110jklm 10npqrst 10uvwxyz
010000-10FFFF (21 bits) 11110efg 10hijklm 10npqrst 10uvwxyz

 

The convenient properties of UTF-8 are all consequences of the choice of encoding.

 

  1. All ASCII files are already UTF-8 files.
    The first 128 Unicode code points are the 7-bit ASCII character set, and UTF-8 preserves their one-byte encoding.
  2. ASCII bytes always represent themselves in UTF-8 files. They never appear as part of other UTF-8 sequences.
    All the non-ASCII UTF-8 sequences consist of bytes with the high bit set, so if you see the byte 0x7A in a UTF-8 file, you can be sure it represents the character z.
  3. ASCII bytes are always represented as themselves in UTF-8 files. They cannot be hidden inside multibyte UTF-8 sequences.
    The ASCII z 01111010 cannot be encoded as a two-byte UTF-8 sequence 11000001 10111010. Code points must be encoded using the shortest possible sequence. A corollary is that decoders must detect long-winded sequences as invalid. In practice, it is useful for a decoder to use the Unicode replacement character, code point FFFD, as the decoding of an invalid UTF-8 sequence rather than stop processing the text.
  4. UTF-8 is self-synchronizing.
    Let’s call a byte of the form 10xxxxxx a continuation byte. Every UTF-8 sequence is a byte that is not a continuation byte followed by zero or more continuation bytes. If you start processing a UTF-8 file at an arbitrary point, you might not be at the beginning of a UTF-8 encoding, but you can easily find one: skip over continuation bytes until you find a non-continuation byte. (The same applies to scanning backward.)
  5. Substring search is just byte string search.
    Properties 2, 3, and 4 imply that given a string of correctly encoded UTF-8, the only way those bytes can appear in a larger UTF-8 text is when they represent the same code points. So you can use any 8-bit safe byte at a time search function, like strchr orstrstr, to run the search.
  6. Most programs that handle 8-bit files safely can handle UTF-8 safely.
    This also follows from Properties 2, 3, and 4. I say “most” programs, because programs that take apart a byte sequence expecting one character per byte will not behave correctly, but very few programs do that. It is far more common to split input at newline characters, or split whitespace-separated fields, or do other similar parsing around specific ASCII characters. For example, Unix tools like cat, cmp, cp, diff, echo, head, tail, and tee can process UTF-8 files as if they were plain ASCII files. Most operating system kernels should also be able to handle UTF-8 file names without any special arrangement, since the only operations done on file names are comparisons and splitting at /. In contrast, tools like grep, sed, and wc, which inspect arbitrary individual characters, do need modification.
  7. UTF-8 sequences sort in code point order.
    You can verify this by inspecting the encodings in the table above. This means that Unix tools like join, ls, and sort (without options) don’t need to handle UTF-8 specially.
  8. UTF-8 has no “byte order.”
    UTF-8 is a byte encoding. It is not little endian or big endian. Unicode defines a byte order mark (BOM) code point FFFE, which are used to determine the byte order of a stream of raw 16-bit values, like UCS-2 or UTF-16. It has no place in a UTF-8 file. Some programs like to write a UTF-8-encoded BOM at the beginning of UTF-8 files, but this is unnecessary (and annoying to programs that don’t expect it).

 

UTF-8 does give up the ability to do random access using code point indices. Programs that need to jump to the nth Unicode code point in a file or on a line—text editors are the canonical example—will typically convert incoming UTF-8 to an internal representation like an array of code points and then convert back to UTF-8 for output, but most programs are simpler when written to manipulate UTF-8 directly.

 

Programs that make UTF-8 more complicated than it needs to be are typically trying to be too general, not wanting to make assumptions that might not be true of other encodings. But there are good tools to convert other encodings to UTF-8, and it is slowly becoming the standard encoding: even the fraction of web pages written in UTF-8 is nearing 50%. UTF-8 was explicitly designed to have these nice properties. Take advantage of them.

 

For more on UTF-8, see “Hello World or Καλημέρα κόσμε or こんにちは 世界,” by Rob Pike and Ken Thompson, and also this history.


Notes: Property 6 assumes the tools do not strip the high bit from each byte. Such mangling was common years ago but is very uncommon now. Property 7 assumes the comparison is done treating the bytes as unsigned, but such behavior is mandated by the ANSI C standard for memcmpstrcmp, and strncmp.

 

 

 

 

Posted by rsc on Friday, March 05, 2010
Labels: 

 

 

 

 

 


Leave a comment

Working with XMLTextReader

Some good articles from MSDN on how to work with XMLTextReader. The below block of code can be used to iterate through the list of book in a XML list of books.

XmlTextReader reader = new XmlTextReader(@"C:\books.xml");

            XmlDocument doc;
            reader.MoveToContent();
            while (reader.Read())
            {
                Application.DoEvents();
                //if (reader.MoveToContent() == XmlNodeType.Element
                 // && reader.Name == "ns0:book")
                if (reader.Name == "ns0:book")
                {
                    doc = new System.Xml.XmlDocument();
                        XmlNode node = doc.ReadNode(reader);
                        txtStatus.Text = txtStatus.Text + node.OuterXml.ToString()
                                        + Environment.NewLine
                                         + "---Processed Node No:"
                                         + i++ + "---" + Environment.NewLine;
                   doc = null;
                }

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }


Leave a comment

Get Xpath Value from XMLTextReader

The below code is used to get Xpathvalue from a xmltextreader object. 
Thanks to Naresh(my-colleague) for this quick snippet.
      void TestOne(XmlTextReader reader,string xpathQuery)
        {
            System.Console.WriteLine("TestOne");
            XPathDocument xdoc = new XPathDocument(reader);
            XPathNavigator nav = xdoc.CreateNavigator();
            XPathNodeIterator nodeItor = nav.Select(xpathQuery);
             //"STUFF/TYPE1/CENSUS[@COUNTRY='USA' and @YEAR='1930']/PAGE");
            nodeItor.MoveNext();
            TraverseSiblings(nodeItor);
            System.Console.WriteLine();
        }

         void TraverseSiblings(XPathNodeIterator nodeItor)
        {
            XPathNodeIterator igor = nodeItor.Clone();
            PrintNode(igor.Current);
            igor.Current.MoveToNext();
            bool more = false;
            do
            {
                PrintNode(igor.Current);
                more = igor.Current.MoveToNext();
            } while (more);
        }

         void PrintNode(XPathNavigator nav)
        {
            System.Console.WriteLine(nav.Name + ":" + nav.Value +
                " Type : " + nav.NodeType.ToString());
        }
 
 
One more code sample which I have implemented 
            XPathNavigator nav;
            XPathDocument doc;
            XPathNodeIterator NodeIter;
            String rv=string.Empty;

            doc = new XPathDocument(reader);
            nav = doc.CreateNavigator();
            //sXpathQuery = @"//EmployeeName/attribute::SourceID";
            System.Diagnostics.Debug.WriteLine("Source Id:  {0}", nav.Evaluate(sXpathQuery).ToString());
                NodeIter = nav.Select(sXpathQuery);
                while (NodeIter.MoveNext())
                {
                    rv=NodeIter.Current.InnerXml;

                    System.Diagnostics.Debug.WriteLine("------------item start------------------");
                    System.Diagnostics.Debug.WriteLine(rv);
                    System.Diagnostics.Debug.WriteLine("-------------item end-------------------");
                    //Console.WriteLine(NodeIter.Current.Value);
                    //Console.WriteLine("////////////////////////////////");

                    if (!string.IsNullOrEmpty(rv))
                        break;
                }
Other interesting links

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }