Guide for C#
Core Document Formats
Additional Features

C# Word - Extract Text from Word in C#

Visual C# Sample Codes for Extracting Text from Word Document in .NET

Visual C#
Home > .NET Imaging SDK > C# > Extract Text from Word
If this is your first time to use our DocImageSDK, we strongly suggest you reading How to Start first!

RasterEdge .NET Image SDK includes a special Word document processing & editing library component which enables developers to load, create a Word doc, add pages / annotation or barcode into it, and even duplicate, split, and merge docx documents.
Related .net document control helps:
asp.net pdf editor control: EdgePDF: ASP.NET PDF Editor Web Control: Online view, annotate, redact, edit, process, convert PDF documents
asp.net pdf viewer control: ASP.NET PDF Viewer Control: view, navigate, zoom Adobe PDF document in C# ASP.NET
asp.net edit pdf image: ASP.NET PDF Image Edit Control: online insert, edit PDF images in C#
asp.net pdf editor control: EdgePDF: ASP.NET PDF Editor Web Control: Online view, annotate, redact, edit, process, convert PDF documents
asp.net pdf viewer control: ASP.NET PDF Viewer Control: view, navigate, zoom Adobe PDF document in C# ASP.NET
asp.net edit pdf image: ASP.NET PDF Image Edit Control: online insert, edit PDF images in C#
asp.net pdf editor control: EdgePDF: ASP.NET PDF Editor Web Control: Online view, annotate, redact, edit, process, convert PDF documents
This page will mainly focus on the text extraction function of this Word processor. This text extraction functionality enables you to extract text from MS Word, so users can easily create a text data stream which can be processed later for storing, publishing, archiving or searching. You can either output the text as data stream or you can save it as a text file. All can be done with simple Visual C# programming.
  • Compatible with Microsoft Visual Studio 2005 / 2008 / 2010 versions
  • High speed to extract text from a Microsoft Office Word document with C#
  • Simple to convert a Visual C# MS Word doc into a text file
  • Document formatting fully reserved such as spacing, paragraphs and file appearance
  • Visual C# to retain Word content in text for more efficient data storage
This page is a detailed guide on how to extract text from a MS Word document. Here is the page layout:
  • How to create a C# project in VS
  • How to extract text from certain page in C# Word
  • How to extract text from certain area in C# Word
  • More MS Word C# processing functions
How to Create a C# Project for Word Text Extraction
If you want to extract plain text from an MS Office Word document using Visual C# programming language, you should get the preparation done by creating a C# Windows application in your Microsoft Visual Studio. Here is how:
  1. Create a Visual C# project in Visual Studio 2005. VS 2008 and 2010 are also supported;
  2. Now please download the RasterEdge .NET Image SDK evaluation package and unzip;
  3. Add RasterEdge.Imaging.Basic.dll and RasterEdge.Imaging.MSWordDocx.dll from the unzipped file to your project reference;
  4. Then you can activate .NET Imaging SDK license and save "RasterEdgeLicense.txt" to your created project folder, together with the former two DLLs;
  5. Call RasterEdge .NET image processing library and Word add-on namespaces as demonstrated below.
using RasterEdge.Imaging.Basic;
using RasterEdge.Imaging.Basic.Core;
using RasterEdge.Imaging.Basic.Codec;
using RasterEdge.Imaging.MSWordDocx;
Extract Text from a Page in Word - C# Method
Demonstrated is the Visual C# method which you can use to easily extract text from a certain page or pages from a MS Word document. If you are worried about the document quality after conversion, you can rest assured because this Word text extractor preserves both the plain text as well as the formatting data to ensure no degradation for the appearance or legibility of the document during extraction.
public void WordProcessorExtractTextPage(string WordInputFile, int WordPageNumberStart, 
int
WordPageNumberStop, string WordOutputFile);
Extract Text from a Page in Word - C# Sample Codes
If you have followed the guide above to create a C# Windows project, now you can feel free to copy the Visual C# sample codes below to your project, so you can extract text from a MS Word document in your local disk.
using RasterEdge.Imaging.Basic;
using RasterEdge.Imaging.Basic.Core;
using RasterEdge.Imaging.Basic.Codec;
using RasterEdge.Imaging.MSWordDocx;


namespace RE__Test
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

public static string FolderName = "c:/";

private void button1_Click(object sender, EventArgs e)
{
RasterEdgeImaging Word = new RasterEdgeImaging();

public void WordProcessorExtractTextPage();
{
WordInputFile = (@"C:/1.docx");
WordPageNumberStart = "0";
WordPageNumberStop = "4";
WordOutputFile = OutputFormat.txt;
WordOutputFile = (@"C:/extract.txt");
};
word.WordProcessorExtractText (@"C:/1.docx", "0","4", @"C:/extract.txt");
}

}
}
Extract Text from Area in Word - C# Method
RasterEdge.Imaging.MSWordDocx.dll also empowers developers to build text extraction capability into a project such as a Windows application, or C# Class Library in your MS Visual Studio. Here is another C# method which allows you to extract text a user-defined area in your MS Word document.
public void WordProcessorExtractTextArea(string WordInputFile, Rectangle ExtractedArea, string WordOutputFiles);
Extract Text from Area in Word - C# Sample Codes
Demonstrated below is the complete Visual C# sample code for extracting text from a certain area in a MS Word document. Developers only need to load your local docx file to this program, and define an area to be extracted by specifying the location with coordinates, then you can use the method above to start exaction and then save file to your system.
using RasterEdge.Imaging.Basic;
using RasterEdge.Imaging.Basic.Core;
using RasterEdge.Imaging.Basic.Codec;
using RasterEdge.Imaging.MSWordDocx;


namespace RE__Test
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

public static string FolderName = "c:/";

private void button1_Click(object sender, EventArgs e)
{
RasterEdgeImaging Word = new RasterEdgeImaging();

public void WordProcessorExtractTextArea();
{
WordInputFile = (@"C:/1.docx");
ExtractedArea = new rectangle(0, 0, 300, 300);
WordOutputFile = OutputFormat.txt;
WordOutputFile = (@"C:/extract.txt");
};
word.WordProcessorExtractArea(@"C:/1.docx", new rectangle(0, 0, 300, 300), @"C:/extract.txt");
}

}
}
More Tutorials!
Find more user guides with RasteEdge .NET Image SDK using Visual C# sample codings!


Recommend this to Google+


RasterEdge.com is professional provider of ASP.NET MVC Document Viewer, ASP.NET PDF Viewer, MVC PDF Viewer document, content and imaging solutions, available for ASP.NET AJAX, Silverlight, Windows Forms as well as WPF. We are dedicated to provide powerful & profession imaging controls, PDF document, image to pdf files and components for capturing, viewing, processing, converting, compressing and stroing images, documents and more.

©2000-2017 Raster Edge.com