42
33
percentage of their page copy composed of a few keyword phrases. Thus, there is
no magical page copy length that is best for all search engines.
The uniqueness of page content is far more important than the length. Page copy
has three purposes above all others:
•
To be unique enough to get indexed and ranked in the search result
•
To create content that people find interesting enough to want to link
to
•
To convert site visitors into subscribers, buyers, or people who click
on ads
Not every page is going to make sales or be compelling enough to link to, but if, in
aggregate, many of your pages are of high-quality over time, it will help boost the
rankings of nearly every page on your site.
Keyword Density, Term Frequency & Term Weight
Term Frequency (TF) is a weighted measure of how often a term appears in a
document. Terms that occur frequently within a document are thought to be some
of the more important terms of that document.
If a word appears in every (or almost every) document, then it tells you little about
how to discern value between documents. Words that appear frequently will have
little to no discrimination value, which is why many search engines ignore common
stop words (like the, and, and or).
Rare terms, which only appear in a few or limited number of documents, have a
much higher signal-to-noise ratio. They are much more likely to tell you what a
document is about.
Inverse Document Frequency (IDF) can be used to further discriminate the value
of term frequency to account for how common terms are across a corpus of
documents. Terms that are in a limited number of documents will likely tell you
more about those documents than terms that are scattered throughout many
documents.
When people measure keyword density, they are generally missing some other
important factors in information retrieval such as IDF, index normalization, word
proximity, and how search engines account for the various element types. (Is the
term bolded, in a header, or in a link?)
Search engines may also use technologies like latent semantic indexing to
mathematically model the concepts of related pages. Google is scanning millions
of books from university libraries. As much as that process is about helping people
find information, it is also used to help Google understand linguistic patterns.
If you artificially write a page stuffed with one keyword or keyword phrase without
adding many of the phrases that occur in similar natural documents you may not
C# Imaging - C# Code 93 Generator Tutorial pictures on PDF documents, multi-page TIFF, Microsoft Office Word, Excel and PowerPoint. Please create a Windows application or ASP.NET web form and copy the
copy images from pdf to powerpoint; how to cut a picture out of a pdf C#: Use OCR SDK Library to Get Image and Document Text color image recognition for scanned documents and pictures in C#. text content from whole PDF file, single PDF page and You can directly copy demos to your .NET
how to paste a picture into a pdf; cut and paste image from pdf
35
34
show up for many of the related searches, and some algorithms may see your
document as being less relevant. The key is to write naturally, using various related
terms, and to structure the page well.
Multiple Reverse Indexes
Search engines may use multiple reverse indexes for different content. Most
current search algorithms tend to give more weight to page title and link text than
page copy.
For common broad queries, search engines may be able to find enough quality
matching documents using link text and page title without needing to spend the
additional time searching through the larger index of page content. Anything that
saves computer cycles without sacrificing much relevancy is something you can
count on search engines doing.
After the most relevant documents are collected, they may be re-sorted based on
interconnectivity or other factors.
Around 50% of search queries are unique, and with longer unique queries, there is
greater need for search engines to also use page copy to find enough relevant
matching documents (since there may be inadequate anchor text to display enough
matching documents).
Search Interface
The search algorithm and search interface are used to find the most relevant
document in the index based on the search query. First the search engine tries to
determine user intent by looking at the words the searcher typed in.
These terms can be stripped down to their root level (e.g., dropping ing and other
suffixes) and checked against a lexical database to see what concepts they represent.
Terms that are a near match will help you rank for other similarly related terms.
For example, using the word swims could help you rank well for swim or
swimming.
Search engines can try to match keyword vectors with each of the specific terms in
a query. If the search terms occur near each other frequently, the search engine
may understand the phrase as a single unit and return documents related to that
phrase.
WordNet is the most popular lexical database. At the end of this chapter there is a
link to a Porter Stemmer tool if you need help conceptualizing how stemming
works.
40
35
Searcher Feedback
Some search engines, such as Google and Yahoo!, have toolbars and systems like
Google Search History and My Yahoo!, which collect information about a user.
Search engines can also look at recent searches, or what the search process was for
similar users, to help determine what concepts a searcher is looking for and what
documents are most relevant for the user’s needs.
As people use such a system it takes time to build up a search query history and a
click-through profile. That profile could eventually be trusted and used to
•
aid in search personalization
•
collect user feedback to determine how well an algorithm is working
•
help search engines determine if a document is of decent quality (e.g.,
if many users visit a document and then immediately hit the back
button, the search engines may not continue to score that document
well for that query).
I have spoken with some MSN search engineers and examined a video about MSN
search. Both experiences strongly indicated a belief in the importance of user
acceptance. If a high-ranked page never gets clicked on, or if people typically
quickly press the back button, that page may get demoted in the search results for
that query (and possibly related search queries). In some cases, that may also flag a
page or website for manual review.
As people give search engines more feedback and as search engines collect a larger
corpus of data, it will become much harder to rank well using only links. The more
satisfied users are with your site, the better your site will do as search algorithms
continue to advance.
Real-Time versus Prior-to-Query Calculations
In most major search engines, a portion of the relevancy calculations are stored
ahead of time. Some of them are calculated in real time.
Some things that are computationally expensive and slow processes, such as
calculating overall inter-connectivity (Google calls this PageRank), are done ahead
of time.
Many search engines have different data centers, and when updates occur, they roll
from one data center to the next. Data centers are placed throughout the world to
minimize network lag time. Assuming it is not overloaded or down for
maintenance, you will usually get search results from the data centers nearest you.
If those data centers are down or if they are experiencing heavy load, your search
query might be routed to a different data center.
C# Imaging - C# MSI Plessey Barcode Tutorial Create high-quality MSI Plessey bar code pictures for almost Copy C#.NET code below to print an MSI a document file, like Word, Excel, PowerPoint, PDF and TIFF
copy picture from pdf to powerpoint; copy a picture from pdf to word C# Imaging - Scan RM4SCC Barcode in C#.NET detect & decode RM4SCC barcode from scanned documents and pictures in your Decode RM4SCC from documents (PDF, Word, Excel and PPT) and extract barcode value as
how to paste a picture into pdf; how to copy image from pdf file
32
36
Search Algorithm Shifts
Search engines such as Google and Yahoo! may update their algorithm dozens of
times per month. When you see rapid changes in your rankings, it is usually due to
an algorithmic shift, a search index update, or something else outside of your
control. SEO is a marathon, not a sprint, and some of the effects take a while to
kick in.
Usually, if you change something on a page, it is not reflected in the search results
that same day. Linkage data also may take a while to have an effect on search
relevancy as search engines need to find the new links before they can evaluate
them, and some search algorithms may trust links more as the links age.
The key to SEO is to remember that rankings are always changing, but the more
you build legitimate signals of trust and quality, the more often you will come out
on top.
Relevancy Wins Distribution!
The more times a search leads to desired content, the more likely a person is to use
that search engine again. If a search engine works well, a person does not just
come back, they also tell their friends about it, and they may even download the
associated toolbar. The goal of all major search engines is to be relevant. If they
are not, they will fade (as many already have).
Search Engine Business Model
Search engines make money when people click on the sponsored advertisements.
In the search result below you will notice that both Viagra and Levitra are bidding
on the term Viagra. The area off to the right displays sponsored advertisements for
the term Viagra. Google gets paid whenever a searcher clicks on any of the
sponsored listings.
The white area off to the left displays the organic (free) search results. Google does
not get paid when people click on these. Google hopes to make it hard for search
engine optimizers (like you and I) to manipulate these results to keep relevancy as
high as possible and to encourage people to buy ads.
Later in this e-book we will discuss both organic optimization and pay-per-click
marketing.
C# Imaging - Scan ISBN Barcode in C#.NET which can be used to track images, pictures and documents BarcodeType.ISBN); // read barcode from PDF page Barcode from PowerPoint slide, you can copy demo code
how to cut an image out of a pdf file; copy picture to pdf
25
37
Image of Search Results
Origins of the Web
The Web started off behind the idea of the free flow of information as envisioned
by Tim Berners-Lee. He was working at CERN in Europe. CERN had a
somewhat web-like environment in that many people were coming and going and
worked on many different projects.
Tim created a site that described how the Web worked and placed it live on the
first server at info.cern.ch. Europe had very little backing or interest in the Web
back then, so U.S. colleges were the first groups to set up servers. Tim added links
to their server locations from his directory known as the Virtual Library.
Current link popularity measurements usually show college web pages typically
have higher value than most other pages do. This is simply a function of the
following:
•
The roots of the WWW started in lab rooms at colleges. It was not until
the mid to late 1990s that the Web became commercialized.
•
The web contains self-reinforcing social networks.
•
Universities are pushed as sources of authority.
•
Universities are heavily funded.
•
Universities have quality controls on much of their content.
36
38
Early Search Engines
The Web did not have sophisticated search engines when it began. The most
advanced information gatherers of the day primitively matched file names. You
had to know the name of the file you were looking for to find anything. The first
file that matched was returned. There was no such thing as search relevancy. It
was this lack of relevancy that lead to the early popularity of directories such as
Yahoo!.
Many search engines such as AltaVista, and later Inktomi, were industry leaders for
a period of time, but the rush to market and lack of sophistication associated with
search or online marketing prevented these primitive machines from having
functional business models.
Overture was launched as a pay-per-click search engine in 1998. While the
Overture system (now known as Yahoo! Search Marketing) was profitable, most
portals were still losing money. The targeted ads they delivered grew in popularity
and finally created a functional profit generating business model for large-scale
general search engines.
Commercialized Cat & Mouse
Web = Cheap Targeted Marketing
As the Internet grew in popularity, people realized it was an incredibly cheap
marketing platform. Compare the price of spam (virtually free) to direct mail (~ $1
each). Spam fills your inbox and wastes your time.
Information retrieval systems (search engines) must also fight off aggressive
marketing techniques to keep their search results relevant. Search engines market
their problems as spam, but the problem is that they need to improve their
algorithms.
It is the job of search engines to filter through the junk to find and return relevant
results.
There will always be someone out there trying to make a quick buck. Who can fault
some marketers for trying to find holes in parasitic search systems that leverage
others’ content without giving any kickback?
Becoming a Resource
Though I hate to quote a source I do not remember, I once read that one in three
people believe the top search result is the most relevant document relating to their
search. Imagine the power associated with people finding your view of the world
first. Whatever you are selling, someone is buying!
37
39
I have been quoted as a source of information on Islam simply because I wrote
about a conversation I had with a person from Kuwait who called me for help on
the web. I know nothing about Islam, but someone found my post in a search
engine…so I was quoted in their term paper. College professors sourced some
sites I am embarrassed to admit I own.
Sometimes good things happen to you and sometimes the competition gets lucky.
Generally the harder you work, and the more original and useful your site is, the
more often you will get lucky.
Business Links
As easy as it is to get syndicated with useful interesting and unique information, it is
much harder to get syndicated with commercial ideas, especially if the site does not
add significant value to a transaction. Often times links associated with commercial
sites are business partnerships.
Many people do well to give information away and then attach a product to their
business model. You probably would have never read this e-book if I did not have
a blog associated with it. On the same note, it would also be significantly easier for
me to build links to SEOBook.com if I did not sell this e-book on it.
Depending on your skills, faults, and business model, sometimes it is best to make
your official voice one site and then sell stuff on another, or add the commercial
elements to the site after it has gained notoriety and trust. Without knowing
you, it is hard to advise you which road to take, but if you build value before trying
to extract profits, you will do better than if you do it the other way around.
Ease of Reference
If my site was sold as being focused on search and I wrote an e-book or book
about power searching, it would be far easier for me to get links than running a site
about SEO. For many reasons, the concept of SEO is hated in many circles. The
concept of search is much easier to link at.
Sometimes by broadening, narrowing, or shifting your topic it becomes far easier
for people to reference you.
Primitive Search Technology
As the Web grew, content grew faster than technology did. The primitive nature of
search engines promoted the creation of content, but not the creation of quality
content. Search engines had to rely on the documents themselves to state their
purpose. Most early search engines did not even use the full page content either,
relying instead on page title and document name to match results. Then came
along meta tags.
38
40
Meta Tags
Meta tags were used to help search engines organize the Web. Documents listed
keywords and descriptions that were used to match user queries. Initially these tags
were somewhat effective, but over time, marketers exploited them and they lost
their relevancy.
People began to stuff incredibly large amounts of data (which was frequently off
topic) into these tags to achieve high search engine rankings. Porn and other high-
margin websites published meta tags like “free, free, free, free, Disney, free.”
Getting a better ranking simply meant you repeated your keywords a few more
times in the meta tags.
Banners, Banners, Banners
It did not help anything that during the first Web bubble stocks were based on
eyeballs, not profits. That meant that people were busy trying to buy any type of
exposure they could, which ended up making it exceptionally profitable to spam
search engines to show off topic random banners on websites.
The Bubble Burst
The Internet bubble burst. What caused such a fast economic recovery was the
shift from selling untargeted ad impressions to selling targeted leads. This meant
that webmasters lost much of their incentive for trying to get any kind of traffic
they could. Suddenly it made far greater sense to try to get niche-targeted traffic.
In 1998, Overture pioneered the pay-per-click business model that most all major
search engines rely on. Google AdWords enhanced the model by adding a few
more variables to the equation—the most important one is factoring ad click-
through rate (CTR) into the ad ranking algorithm.
Google extended the targeted advertisement marketing by delivering relevant
contextual advertisements on publisher websites via the Google AdSense program.
More and more ad spending is coming online because it is easy to track the return
on investment. As search algorithms continue to improve, the value of having
well-cited, original, useful content increases daily.
Advancing Search Technology
Instead of relying exclusively on page titles and meta tags, search engine now index
the entire page contents. Since search engines have been able to view entire pages,
the hidden inputs (such as meta tags) have lost much of their importance in
relevancy algorithms.
The best way for search engines to provide relevant results is to emulate a user and
rank the page based on the same things the user see and do (Do users like this
website? Do they quickly hit the back button?), and what other people are saying
Documents you may be interested
Documents you may be interested