If you use spreadsheets to store data from the web or other external sources, both Excel and Google Sheets offer built-in functionality that lets you import data without having to copy and paste it from the web.
For example theIMPORTHTML-Funktion in Google Sheetsyou can import data from actionable spreadsheets and lists into websites. additionallyGoogle Sheets IMPORT DATA Featureyou can import .csv and .tsv files into your spreadsheet from a URL.
In this article, you will learn through examples what the IMPORTXML function is in Google Sheets, understand the syntax of the IMPORTXML formula, and how to use the IMPORTXML formula. In case this feature does not work, at the end you will also find recommendations on how to solve this problem.
Start layering today!
Share parts of your Google spreadsheets, track, review and approve changes, and sync data from different sources - all in seconds.
START FOR FREE
What is IMPORTXML in Google Sheets?
The IMPORTXML function imports data from multiple structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds. You can use the IMPORTXML function to import data from websites for data mining or web scraping.
While this feature works with most websites, there may be website owners who block these types of features, as well as other bots, from accessing their data. The last section tells you how to work around these problems.
IMPORTXML-Syntax
The syntax for the IMPORTXML function is:
=IMPORTXML(url, xpath_query)
- URLrefers to the URL of the website. You can represent this value in the formula by enclosing the protocol "http://" or "https://" and the URL in double quotes, or referencing a cell with the appropriate text.
- xpath_queryis the parameter that tells the function what data to import. For example, if you set this parameter to "//h1/@title", you get all the attributes of the <h1> tags on the website.
As you can see, XPath itself is a type of query language used to retrieve data from the web. If you want to learn more about the XPath language, read this tutorial provided by W3Schools.
How to use IMPORTDATA in Google Sheets
Google offers several ways to import data from external sources. Learn how to import .csv and .tsv files from a URL into Google Sheets using IMPORTDATA.
SEE MORE INFORMATION
How to use IMPORTXML in Google Sheets?
The IMPORTXML function allows you to import data from websites directly into your Google spreadsheets. Using practical examples, you will now learn about the different possible uses of this function.
There are many ways to use the IMPORTXML function in Google Sheets. Here you will learn the most commonly used ones by XPath query type.
Use //@href to scrape all links
Web scraping is a common practice for companies looking to grow with web data. For example, let's say you're looking at your competitor's website and you want a list of all the links used on their website. This is how you can use IMPORTXML to scrape all links on a website.
- 1.Go to Google Sheets and type "=IMPORTXML" to trigger the built-in function.
How to use IMPORTXML in Google Sheets (examples) - trigger function
- 2.Enter the URL of the website from which you want to remove all links. Enclose the URL in quotation marks.
Using IMPORTXML in Google Sheets (Examples) - Include URL
- 3.Enclose the XPath query containing all links "scraping", "//@href" in quotes and enclose them with parentheses.
Using IMPORTXML in Google Sheets (examples) - Include XPath query
- 4.Press "Enter" to get all website links as shown below.
How to use IMPORTXML in Google Sheets (examples) - copied links
Use "//a[not(contains(@href, 'example.com'))]/@href" to scrape external links
An external link is a hyperlink to a domain outside of the original website; they tend to have more impact on search engine rankings as they are more of an objective seal of approval for a website. Here's how you can scrape an external link using the IMPORTXML function.
- 1.Repeat steps 1 through 3 from the previous section.
- 2.Paste the XPath query scraping all links, "//a[not(contains(@href, 'example.com'))]/@href", enclosed in quotes and terminated with parentheses.
Using IMPORTXML in Google Sheets (examples) - XPath query for external links
- 3.Press “Enter” to get all the external links of the website as shown in the image below.
Using IMPORTXML in Google Sheets (examples) - Copied external links
How to use IMPORTHTML in Google Sheets
Google offers several ways to import data from external sources. Learn how to import spreadsheets and lists from a website into Google Sheets using IMPORTHTML.
SEE MORE INFORMATION
SEE MORE INFORMATION
Use "//link[@rel='canonical']/@href" to scrape the canonical link
A canonical link is the "preferred" version of a webpage that webmasters specify to avoid duplicate content. Here's how you can scrape a canonical link using the IMPORTXML function.
- 1.Repeat steps 1 through 3 from the previous section.
- 2.Paste the XPath query scraping all links, "//link[@rel='canonical']/@href" enclosed in quotes and terminated with parentheses.
Using IMPORTXML in Google Sheets (examples) - Include XPath query for canonical link
- 3.Press "Enter" to get the canonical website link as shown below.
Using IMPORTXML in Google Sheets (examples) - canonical link copied
If you are interested in importing financial data into your Google Sheets, you can combine the IMPORTXML function with theGOOGLE FINANCEorYahooFinancefunctions. However, this requires the use of a slightly more complex division function. For more support information, seethis link.
Google Sheets IMPORTXML not working
If errors occurred when using the IMPORTXML function, e.g. B. "Could not get URL" or "Imported content is empty", check the following:
- Is the URL correct?The data you're trying to copy may have moved to a different URL, so it's worth checking again.
- Is the protocol https or http?The protocol could also have changed without you noticing. Sometimes websites are not optimized to redirect from one to another, so you need to use the right one.
- Does the website block scraping?Some web owners insert blocks to prevent others from reading or scratching their web content.
How to import data into Google Sheets using layers?
layeris an add-on that provides tools to increase efficiency and data quality in your Google Sheets processes. Share parts of your Google spreadsheets, track, review and approve changes, and sync data from different sources - all in seconds.see how it works.
With Layer you can:
- Share and collaborate:Automate your data collection and validation through user controls.
- Automate and schedule:Schedule recurring data collection and distribution tasks.
- Integrate and sync:Connect to your tech stack and sync all your data in one place.
- View and report:Create and share reports with real-time data and actionable decisions.
Limited time offer:Install the Layer Google Sheets add-on todayeget free accessto all paid features so you can manage, automate and scale your processes in Google Sheets!
Conclusion
As you can see, the IMPORTXML function in Google Sheets can be a powerful feature when it comes to importing data from external sources, especially websites. The example provided here can be applied to any domain. However, if you own a website or work with data from the web, this feature is the fastest and easiest way to perform web scraping without leaving your spreadsheet.
You learned what the IMPORTXML function is and the syntax behind the IMPORTXML formula. You also saw through a set of helpful examples how the IMPORTXML function can be used and applied to your data. They can also fix any problems that you may encounter when IMPORTXML doesn't seem to be working properly.
If you want to learn more about importing data, check out the following articles:
- How to use IMPORTDATA in Google Sheets
- How to use IMPORHTML in Google Sheets
- How to use the IMPORTRANGE function in Google Sheets
- How to import CSV to Google Sheets automatically
Start layering today!
Share parts of your Google spreadsheets, track, review and approve changes, and sync data from different sources - all in seconds.
START FOR FREE
Hady is Content Lead at Layer. Hady is passionate about technology, marketing and spreadsheets. In addition to his computer science studies, he has extensive experience in the development, introduction and scaling of content marketing processes at SaaS startups.