Understanding URLs

URL stands for Uniform Resource Locator.

A URL is just the internet address for any given webpage:

A screenshot of a web browser. The url is circled.

Understanding the component parts of a URL can be helpful in a variety of situations. Here are just a few reasons why understanding URLs is useful:

  • The URL often reveals key information about a site
  • An understanding of URLs provides the needed foundation for many advanced search strategies
  • A heightened attention to URLs helps searchers recognize fraudulent sites

Each section below focuses on a different part of the URL. At the end of the webtext is a quiz that you can take to test your understanding of URLs.

Locate the protocol

The “protocol” is the first part of URL. Some browsers simplify how addresses are displayed by hiding the protocol:  for example, in Chrome and Firefox, http://writingcommons.org displays as writingcommons.org

The protocol https indicates that information sent through the page will be encrypted, and therefore harder to read if some third party intercepts the information. (The next time you are entering a username and password on a page, check for the “https” protocol.)

Locate the domain name

The “domain name” identifies the site that contains the page you are viewing. It appears just before the first single slash (/). If there is no single slash, then the domain name is whatever appears at the end of the URL.

For example, the following URLs all refer to pages on the Writing Commons site:

  • http://writingcommons.org/open-text/information-literacy
  • writingcommons.org/open-text/research-methods-methodologies/integrate-evidence/incorporate-evidence/1030-synthesizing-your-research-findings
  • www.writingcommons.org

If you look carefully, you will see that most browsers try to help users out by boldfacing the domain name in the address bar.

Example

the URL from a web browser the domain name "writingcommons.org" is in a darker color than the rest of the URL, which has been greyed out.

Being able to locate the domain name in a URL allows you to identify the entity that hosts the page you are viewing—a piece of information that is often crucial to understanding the nature of your source.

Recognize sub-directories

Elements of the URL that appear after the domain indicate different sub-directories. For example:

A dissected URL. The domain name (writingcommons.org) is indicated as are several sub-directories: open-text, information-literacy, and rhetorical-analysis.

In the example above, “open-text,” “information-literacy,” and “rhetorical-analysis” are sub-directories of the domain writingcommons.org. Think of these as folders within folders.

Recognize subdomains

Subdomains are similar to sub-directories in that they provide a way for website developers to separate content, but subdomains appear before the domain name in the URL.  Don’t let this trip you up.  The domain name is still the content that appears pressed up against the first single slash (/) or—if there is no single slash—at the very end of the URL.

For example, the domain name in all of the following URLs is google.com

  • www.google.com
  • books.google.com
  • https://accounts.google.com/Login

Pay attention to the placement of the dots.  The following is not a Google page:

www.mgoogle.com

Here the domain is mgoogle.com, not google.com

Recognize top-level domains

In the domain name writingcommons.org, the “top-level domain” is .org.  The top-level domain .org was originally intended for use by non-profit organizations—and many non-profits continue to use it—but it is now open to anyone.

In the domain name amazon.com, the top-level domain is .com.  Short for “commercial,” .com is the most common top-level domain in the world and is now used for a wide variety of sites—not just the sites of commercial enterprises.

Some top-level domains have retained their original meanings and are especially helpful to know:

domain description example
.edu university site http://www.nu.edu
gov government site http://www.senate.gov
.mil military site http://www.army.mil

Newer top-level domains such as .museum, .bike, and .clothing are not yet widely used.

Some domains include a country domain extension—or “country code top level domain.”

Here are some examples:

code country example
.in India indianrail.gov.in
.de Germany www.spiegel.de
.ca Canada www.cbc.ca
.jp Japan www.nicovideo.jp
.uk United Kingdom www.ima.org.uk

Pay attention to country domain extensions. When present in a URL, they represent a core component of the domain. Note, for example, that hydra.com and hydra.com.gr are different domains. The two are unrelated sites run by unrelated entities.

For a comprehensive list of top-level domains, consult  one of the following:

Use your understanding of URLs to enhance your web searching

Once you understand URLs, certain kinds of advanced search strategies become easier to conceptualize, remember, and implement—for example, filtering by domain and top-level domain.

Filter by top-level domain

If you know that the kind of information you are seeking is most likely to appear on a site with a particular type of top-level domain, you can restrict your search to this type of site using the site: search operator.

For example, if you are seeking government documents on the topic of student loans, then a search for student loans site:gov will return only results with the top-level domain gov, filtering out a large number of sites that are not relevant to your research needs.

Filter by domain

If you know the domain of the site on which your information will appear, you can use site: to search only that site.

For example, a search for sample tests site:dmv.ca.gov will return only pages located on the California Department of Motor Vehicles (DMV) website (the domain of which is dmv.ca.gov).

The site: operator works in all major search engines (Google, Bing, Baidu, DuckDuckGo, etc.).

Practice identifying deceptive URLs

The immediate benefit of the drill below will be to improve your ability to distinguish between real and fraudulent sites, but the exercise will also help you sharpen your overall URL-analysis skills by heightening your attention to the component parts of URLs.

A) Which of the following are eBay.com web pages? Do not go to the sites. (Some sites masquerading as legitimate sites may contain harmful underlying code). Just examine the URLs.

  1. http://pages.ebay.com
  2. http://movies.half.ebay.com
  3. http://pages.ebey.com
  4. http://68.112.112.34:8866/ebay.htm
  5. http://signin.ebay.com@10.19.29.2
  6. http://pages.@ebay.com
  7. http://signin-ebay.com
  8. http://www.ebay.com/electronics/ipad
  9. http://www.ebay.deals.com
  10. http://www.ebay.pro
  11. http://www.ebay.com.bb/motors/motorcycles
  12. http://www.ebay.com/itm/A-Planet-of-Viruses-by-Carl-Zimmer-2011-Hardcover-/191063912359

B) Find the domain name in this URL:

http://www.bankofamerica.com.sas.signon.do.detect.2.signin.sessionid.rmrlfbqjlokcjpczgs.oxcvsvcpdsoeeseytje.yucfnjtidbvnujxrwjmsea.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.nuyovbuskl.bernadinec.com/index.php?pageType=708XeMWZamp;cust=redacted@redacted.redactedamp;l=lWXS3AlBXVShqAhQRfhgTDrf=nttps://sitekey.bnkofamerica.com/sas/signon.do?SignIn&SMSESSIONID=ASERTFGUY2I94O0389GYBH23JNMKUYH83JMN12I90U82HJNASDKOASD9AS8D&iv=90832yhIopOWjos

Answers

A) eBay page?

1. http://pages.ebay.com YES This is an eBay page. The domain name is ebay.com
2. http://movies.half.ebay.com YES This is an eBay page. The domain name is ebay.com (“movies” and “half” indicate subdomains).
3. http://pages.ebey.com NO This is not an eBay page. Note that “ebay” is misspelled as ebey.
4. http://68.112.112.34:8866/ebay.htm NO This is not an eBay page. The first single slash (/) is not preceded by the domain name ebay.com.
5. http://signin.ebay.com@10.19.29.2 NO This is not an eBay page. Notice that there is no slash (/) after “ebay.com.”
6. http://page.@ebay.com NO This is not an eBay page. The actual domain is @ebay.com, not ebay.com.  (@ebay.com is as different from ebay.com as zebay.com, bebay.com, mebay.com, etc.  One character can make all the difference.)
7. http://signin-ebay.com NO This is not an eBay page. If the hyphen were a period, we’d be fine.  But it isn’t.  As in the example above with @, the hyphen could be any character and be just as wrong.
8. http://www.ebay.com/electronics/ipad YES This is an eBay page. The domain name is ebay.com.  The first single slash (/) is directly preceded by .ebay.com
9. http://www.ebay.deals.com NO This is not an eBay page. The domain name is deals.com (not ebay.com).
10. http://www.ebay.pro NO This is not an eBay page. The domain name is ebay.pro (not ebay.com).
 11. http://www.ebay.com.bb/motors/motorcycles NO This is not an eBay page. The domain name is ebay.com.bb(not ebay.com).
 12. http://www.ebay.com/itm/A-Planet-of-Viruses-by-Carl-Zimmer-2011-Hardcover-/191063912359 YES This is an eBay page. The domain name is ebay.com.  The first slash is directly preceded by.ebay.com

B) The domain name in the following URL is bernadinec.com (not bankofamerica.com). Notice that bernadinec.com is what appears just before the first single slash (/):

http://www.bankofamerica.com.sas.signon.do.detect.2.signin.sessionid.rmrlfbqjlokcjpczgs.oxcvsvcpdsoeeseytje.yucfnjtidbvnujxrwjmsea.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.zydyilpnchtjrriiszti.nuyovbuskl.bernadinec.com/index.php?pageType=708XeMWZamp;cust=redacted@redacted.redactedamp;l=lWXS3AlBXVShqAhQRfhgTDrf=nttps://sitekey.bnkofamerica.com/sas/signon.do?SignIn&SMSESSIONID=ASERTFGUY2I94O0389GYBH23JNMKUYH83JMN12I90U82HJNASDKOASD9AS8D&iv=90832yhIopOWjos