48
For example, the following POST request uploads a batch of documents formatted in JSON to the domain
endpoint doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com.
curl -X POST --upload-file data1.json doc-movies-123456789012.us-east-1.cloud
search.amazonaws.com/2013-01-01/documents/batch --header "Content-Type: applic
ation/json"
Bulk Uploads in Amazon CloudSearch
Document batches are limited to 5 MB per batch. However, you can upload batches in parallel to reduce
the amount of time it takes to upload all of your data.
To perform a bulk upload:
• Make sure your batches are as close to the 5 MB limit as possible. Uploading a larger amount of smaller
batches slows down the upload and indexing process.
• Set your desired instance type to a larger instance type than the default search.m1.small.The
number of upload threads you can use depends on the type of search instance your domain is using
and the nature of your data and indexing options. Larger instance types have a higher upload capacity.
Attempting to upload batches in parallel to a search.m1.small instance usually results in a high rate
of 504 or 507 errors. For more information about setting the desired instance type, see Configuring
Scaling Options (p.39).
• Start uploading data once your configuration changes are active. If you encounter a high rate of 5xx
errors, you either need to reduce your upload rate or switch to a larger instance type. If you are already
using the largest instance type, you can increase the desired partition count to further increase upload
capacity.
Important
If you submit a large volume of updates while your domain is in the PROCESSING state, it
can increase the amount of time it takes for the updates to be applied to your search index.
To avoid this update lag, wait until your domain is in the ACTIVE state before starting your
bulk upload.
• When you are finished with your bulk upload, you can change the desired instance type back to a
smaller instance type. If your index fits on a smaller type, Amazon CloudSearch will automatically scale
your domain back down. Amazon CloudSearch will not scale to an instance type that's smaller than
the desired instance type configured for your domain.
For datasets of less than 1 GB of data or fewer than one million 1 KB documents, a small search instance
should be sufficient.To upload data sets between 1 GB and 8 GB, we recommend setting the desired
instance type to search.m3.large before you begin uploading. For datasets between 8 GB and 16 GB,
start with a search.m3.xlarge. For datasets between 16 GB and 32 GB, start with a
search.m3.2xlarge. If you have more than 32 GB to upload, select the search.m3.2xlarge instance
type and increase the desired partition count to accommodate your data set. Each partition can contain
up to 32 GB of data. Submit a Service Increase Limit Request if you need more upload capacity or have
more than 500 GB to index.
Uploading Data Using the Amazon CloudSearch
Console
In the Amazon CloudSearch console, you can upload data from your local file system or Amazon S3 to
your domain from the domain dashboard.The console can automatically convert the following types of
files to document batches during the upload process:
• Comma Separated Value (.csv)
API Version 2013-01-01
89
Amazon CloudSearch Developer Guide
Bulk Uploads
How to C#: Basic SDK Concept of XDoc.PowerPoint Conversely, conversion from PDF to PowerPoint (.PPTX) is also split PowerPoint file(s), and add, create, insert This class describes bookmarks in a PowerPoint
add bookmark to pdf reader; creating bookmarks in a pdf document How to C#: Basic SDK Concept of XDoc.Word Conversely, conversion from PDF to Word (.docx) is also and split Word file(s), and add, create, insert This class describes bookmarks in a Word document.
delete bookmarks pdf; create bookmarks pdf file
49
• Adobe Portable Document Format (.pdf)
• HTML (.htm, .html)
• Microsoft Excel (.xls, .xlsx)
• Microsoft PowerPoint (.ppt, .pptx)
• Microsoft Word (.doc, .docx)
• Text Documents (.txt)
You can also convert and upload items from an DynamoDB table. For more information, see Uploading
DynamoDB Data (p.110).
Note
To upload data from Amazon S3 or DynamoDB, you must have permission to access both the
service and the resources you want to upload. For more information, see Using Bucket Policies
and User Policies and Using IAM to Control Access to DynamoDB Resources.
CSV files are parsed row-by-row and a separate document is generated for each row. All other types of
files are treated as a single document. For more information about automatically generating document
batches, see Preparing Your Data (p.58).
Note
Uploading data to Amazon CloudSearch from an Amazon S3 bucket or DynamoDB table requires
access to those services and resources.
To send data to a domain for indexing
1. Sign in to the AWS Management Console and open the Amazon CloudSearch console at https://
console.aws.amazon.com/cloudsearch/home.
2. In the Navigation pane, click the name of the domain.
3. At the top of the domain dashboard, click Upload Documents.
4. Select the location of the data you want to upload to your domain:
• File(s) on my local disk
• Object(s) from Amazon S3
• Item(s) from DynamoDB
• Predefined data
If you upload data that isn't formatted as document batches, it will automatically be converted during
the upload process.
Note
If a batch is invalid, Amazon CloudSearch converts the content to a valid batch that contains
a single content field and generic metadata fields. Since these are not normally the fields
configured for the domain, you will get errors stating that the fields don't exist.
5. If you are uploading local files, click Browse to choose the file(s) to upload:
6. If you are uploading objects from Amazon S3, select the bucket you want to upload from.To upload
the entire contents of the bucket, leave the Prefix field empty and click Add.To upload selected
objects, enter a filter in the Prefix field and click Add. (You can add multiple prefixes.)
7. If you are uploading items from DynamoDB, select the table you want to upload from.To start reading
from a particular item, specify a start key.To limit the read capacity units that can be consumed while
reading from the table, enter the maximum percentage of read capacity units.
8. If are uploading predefined sample data, choose the data set that you want to use:
9. Once you've selected the data you want to upload, click Continue.
10. In the Review Documents step, review the documents to be uploaded and click Upload Documents
to continue.
API Version 2013-01-01
90
Amazon CloudSearch Developer Guide
Uploading Data Using the Console
C# Excel - Convert Excel to HTML in C#.NET document file, converted by C#.NET Excel to HTML converter toolkit SDK, preserves all the original anchors, links, bookmarks and font Add necessary references:
create pdf bookmarks; creating bookmarks in pdf from word How to C#: Basic SDK Concept of XDoc.Excel Conversely, conversion from PDF to Excel (.XLSX) is also and split Excel file(s), and add, create, insert This class describes bookmarks in a Excel document.
editing bookmarks in pdf; export pdf bookmarks
42
11. In the Document Summary step, if a document batch has been automatically generated from your
data, you can click Download the generated document batch to get it. Click Finish to return to
the domain dashboard.
Uploading Data Using the AWS CLI
You use the aws cloudsearch upload-documents command to send document batches to your
search domain. For information about installing and setting up the AWS CLI, see the AWS Command
Line Interface User Guide.
Alternatively, you can use the standalone Amazon CloudSearch command line tools to generate document
batches and upload them to your domain in a single step with the cs-import-documents command.
The cs-import-documents command enables you to process and upload local data as well as data
stored in Amazon S3 and DynamoDB. For more information, see Processing Source Data Using the
CLTs (p.63).
To send document batches to a domain for indexing
•
Run the aws cloudsearch upload-documents command to upload your batches to your domain.
You must specify at least one --source option to specify the location of the batch you want to
upload.
aws cloudsearchdomain --endpoint-url http://doc-movies-y6gelr4lv3jeu4rvoe
lunxsl2e.us-east-1.cloudsearch.amazonaws.com upload-documents --content-type
application/json --documents movie-data-2013.json
{
"status": "success",
"adds": 5000,
"deletes": 0
}
Posting Documents to an Amazon CloudSearch
Domain's Document Service Endpoint via HTTP
You use the documents/batch (p. 232) resource to post document batches to your domain to add,
update, or remove documents. For example:
curl -X POST --upload-file movie-data-2013.json doc-movies-123456789012.us-east-
1.cloudsearch.amazonaws.com/2013-01-01/documents/batch --header "Content-
Type:application/json"
Indexing Document Data with Amazon
CloudSearch
When you send document updates to your domain, Amazon CloudSearch automatically updates the
domain's search index with the new data.You don't have to do anything for the updates to be indexed.
However, if you change the configuration of your domain's index fields or text options, you must explicitly
rebuild your search index for those changes to be visible in search results. Because rebuilding the index
API Version 2013-01-01
91
Amazon CloudSearch Developer Guide
Uploading Data Using the AWS CLI
44
can take a significant amount of time if you have a lot of data, you should finish making all of your
configuration changes before re-indexing your documents.
Important
If you change the type of a field and have documents in your index that contain data that is
incompatible with the new field type, all fields being processed are put in the FailedToValidate
state when you run indexing and the indexing operation fails. Rolling back the incompatible
configuration change will enable you to successfully rebuild your index. If the change is necessary,
you must update or remove the incompatible documents from your index to use the new
configuration.
When you make changes that require re-indexing, the domain status changes to NEEDS INDEXING.
While the index is being rebuilt, the domain's status is PROCESSING.You can continue to submit search
requests while indexing is in process, but the configuration changes won't be visible in search results
until indexing completes and the domain's status changes to ACTIVE.You can also continue to upload
document batches to your domain. However, if you submit a large volume of updates while your domain
is in the PROCESSING state, it can increase the amount of time it takes for the updates to be applied to
your search index. If this becomes an issue, slow your update rate until the domain returns to the ACTIVE
state.
Note
Depending on the volume of data, building a full index can take a considerable amount of compute
power. Amazon CloudSearch automatically manages the resources needed to build the index
in a timely fashion. Most data updates and simple domain configuration changes are built and
deployed in minutes. Indexing large volumes of data and applying configuration changes that
require rebuilding the full index will take longer to complete.
You can initiate indexing from the Amazon CloudSearch console (p.92), using the aws cloudsearch
index-documents command, or through the AWS SDKs.
Topics
• Indexing Documents Using the Amazon CloudSearch Console (p.92)
• Indexing Documents Using the Amazon CloudSearch AWS CLI (p.93)
• Indexing Documents with the AWS SDK (p.93)
Indexing Documents Using the Amazon
CloudSearch Console
When you make changes that require your domain's index to be rebuilt, the status shown on the domain
dashboard changes to NEEDS INDEXING.The console also displays a message at the top of the
configuration pages prompting you to run indexing when you are done making changes.
To run indexing
1. Sign in to the AWS Management Console and open the Amazon CloudSearch console at https://
console.aws.amazon.com/cloudsearch/home.
2. In the Navigation pane, click the name of the domain that needs indexing.
3. On the domain dashboard, click the Run Indexing button.
4. Click OK in the Starting Indexing dialog box to return to the domain dashboard.
API Version 2013-01-01
92
Amazon CloudSearch Developer Guide
Indexing Documents Using the Console
23
Indexing Documents Using the Amazon
CloudSearch AWS CLI
You use the aws cloudsearch index-documents command to rebuild your domain's search index.
For information about installing and setting up the AWS CLI, see the AWS Command Line Interface User
Guide.
Note
If you are using the 2.0.0.1 version of the Amazon CloudSearch command line tools, you can
use the cs-index-documents command to rebuild your index. However, we recommend that
you migrate to the AWS CLI, which provides a cross-service CLI with a simplified installation,
unified configuration, and consistent command line syntax.
To explicitly index your domain
•
Run the aws cloudsearch index-documents command.The following example rebuilds the
index for a domain called movies.
aws cloudsearch index-documents --domain-name movies
Indexing Documents with the AWS SDK
The AWS SDKs (except the Android and iOS SDKs) support all of the Amazon CloudSearch actions
defined in the Amazon CloudSearch Configuration API, including IndexDocuments (p. 187). For more
information about installing and using the AWS SDKs, see AWS Software Development Kits.
API Version 2013-01-01
93
Amazon CloudSearch Developer Guide
Indexing Documents Using the AWS CLI
34
Searching Your Data with Amazon
CloudSearch
You specify the terms or values you want to search for with the q parameter. How you specify the search
criteria depends on which query parser you use. Amazon CloudSearch supports four query parsers:
• simple—search all text and text-array fields for the specified string.The simple query parser
enables you to search for phrases, individual terms, and prefixes.You can designate terms as required
or optional, or exclude matches that contain particular terms.To search particular fields, you can specify
the fields you want to search with the q.options parameter.The simple query parser is used by
default if the q.parser parameter is not specified.
• structured—search specific fields, construct compound queries using Boolean operators, and use
advanced features such as term boosting and proximity searching.
• lucene—specify search criteria using the Apache Lucene query parser syntax. If you currently use
the Lucene syntax, using the lucene query parser enables you to migrate your search services to an
Amazon CloudSearch domain without having to completely rewrite your search queries in the Amazon
CloudSearch structured search syntax.
• dismax—specify search criteria using the simplified subset of the Apache Lucene query parser syntax
defined by the DisMax query parser. If you are currently using the DisMax syntax, using the dismax
query parser enables you to migrate your search services to an Amazon CloudSearch domain without
having to completely rewrite your search queries in the Amazon CloudSearch structured search syntax.
You can use additional search parameters to control how search results are returned (p.129) and include
additional information (p.114) such as facets, highlights, and suggestions with your search results.
For information about all of the Amazon CloudSearch search parameters, see the Search API
Reference (p.240).
Topics
• Submitting Search Requests to an Amazon CloudSearch Domain (p.95)
• Constructing Compound Queries in Amazon CloudSearch (p.97)
• Searching for Text in Amazon CloudSearch (p.99)
• Searching for Numbers in Amazon CloudSearch (p.103)
• Searching for Dates and Times in Amazon CloudSearch (p.104)
• Searching for a Range of Values in Amazon CloudSearch (p.104)
• Searching and Ranking Results by Geographic Location in Amazon CloudSearch (p.105)
API Version 2013-01-01
94
Amazon CloudSearch Developer Guide
Documents you may be interested
Documents you may be interested