Google Sheets has a good range of functions to import data from external sources. Some of these include IMPORTRANGE, IMPORTFEED, IMPORTDATA, IMPORTXML, and IMPORTHTML.
In this tutorial, we will talk about the IMPORTHTML function of Google Sheets in particular. We will discuss what the function does, what its syntax is, as well as look at some simple examples of how it can be used.
This Article Covers:
What Does the Google Sheets IMPORTHTML Function Do?
The IMPORTHTML function is a useful data scraping function that helps us extract tables and lists from web pages. The function scans an HTML page (at a given URL) for tables or lists and copies the specific list or table that we want from the page into a Google Sheets workbook.
All tables in an HTML file are enclosed in <table></table> tags, while all lists are enclosed in <ul></ul> tags (for unordered lists) or <ol></ol> tags for ordered lists. This makes it very easy for an IMPORTHTML function to detect tables and lists in the webpage.
When we specify the index of the table or list we want to extract, the function simply finds the table or list at that index and extracts it onto our worksheet.
Syntax for the IMPORTHTML Function in Google Sheets
The syntax for the Google Sheets IMPORTHTML Function is as follows:
=IMPORTHTML(url, query_type, index)
Here,
- The url parameter specifies the URL of the webpage from which we want to extract the list or table. This should include the protocol (“http://” or “https://”).
- The query parameter specifies whether we want to extract a table or a list from the webpage.
- The index parameter specifies the index of the table or list that we need from the webpage. Indexes always start from 1, so if we want to extract the first table from the webpage, we specify this parameter as 1.
How to Use the IMPORTHTML Function to Import a List from a Webpage
Let us take an example to understand how to use the IMPORTHTML function to extract a specific list from a webpage.
Let’s say we want to extract the list of “Top 10 Python Machine Learning Libraries” from the page at the URL https://www.edureka.co/blog/python-libraries/. This is how the list looks in the original webpage:
To get this list, we first need to know the index of this particular list in the whole webpage. If you visit the website at this URL, you will notice that the page has a number of different lists. So how do we know what the index of this particular list is?
How to Get the List Index from a Webpage
To get the index of a list from a webpage, we need to write a small piece of javascript code into the browser’s console. Don’t worry, you don’t need to do much, simply copy the code given below into the console:
var index = 1;
[].forEach.call(document.getElementsByTagName(“table”), function(elements) { console.log(“Index: ” + index++, elements); });
Here are the steps to get the index of our required list from the given webpage:
- Open the Developer tools of your browser. If you’re on a Windows PC, press the F12 key on your keyboard. This works on Firefox, Google Chrome or Microsoft Edge. If you’re on a Mac, press Cmd+Opt+J to open Google Chrome and Cmd+Opt+C to open Safari (Make sure to enable the ‘Develop Menu’ first).
- Once your developer tool opens, click on the ‘Console’ tab.
- Paste or type the above code into the console script area
- Click Run.
- In the output area, you should see a series of indexes that look like this:
- If you hover your mouse over each index, you will see the list in your webpage that corresponds to that index highlighted. This will help you understand which index corresponds to the list that you want.
As you can see from the above screenshot, the list that we want to extract is at index 18 of the webpage.
Using IMPORTHTML Function to Import a List from a Webpage
Once you know the index of the list that you want to extract, you can go ahead and use it in the IMPORTHTML function, as follows:
=IMPORTHTML("https://www.edureka.co/blog/python-libraries/","list",18)
The above formula will import the list number 18 from the webpage at URL https://www.edureka.co/blog/python-libraries/.
To insert the list into a Google Sheets worksheet, simply type the above formula into a blank worksheet, press the return key and wait a few seconds for the list to appear on the spreadsheet.
How to use the IMPORTHTML Function to Import a Table from a Webpage
Besides lists, you can also import tables from web pages using the IMPORTHTML function.
Let’s say we want to import the current market prices of all cryptocurrencies from the CoinMarketCap website. The URL of the page is: https://coinmarketcap.com/all/views/all/
This is how the table looks in the original webpage:
To get this table, we again need to know its index in the webpage.
How to Get the Table Index from a Webpage
The same javascript code can be used to find the index of the table we want, but with a small change. Instead of using the querySelectorAll to extract all “ul” and “ol” elements, we use the getElementsByTagName function to extract all “table” elements from the webpage. Here’s the code that you can copy and paste into the console:
var index=1;
[].forEach.call(document.getElementsByTagName ("table"),function(elements) {console.log("Index: "+index++,elements);});
Here’s how the output of the above code looks in the console:
As you can see from the above screenshot, the table that we want to extract is at index 3 of the webpage.
Using IMPORTHTML Function to Import a Table from a Webpage
Once you know the index of the table that you want to extract, you can go ahead and use it in the IMPORTHTML function, as follows:
=IMPORTHTML("https://coinmarketcap.com/all/views/all/","table",3)
The above formula will import the table number 3 from the webpage at URL “https://coinmarketcap.com/all/views/all/”.
Here’s how the table looks in your Google Sheets worksheet:
Combining the IMPORTHTML Function with QUERY Functions to Extract Specific Parts of a Table
Now let us look at how you can fine-tune your extraction, so you get only specific columns or rows from a table.
We can combine the IMPORTHTML function with the Google Sheets QUERY function so that we can query the output and extract specifically what we need.
Extracting Specific Table Columns from a Webpage
From the cryptocurrency table, let’s say you only want the Name and Price of each cryptocurrency. In such cases, you can specify this in your QUERY function. The Name and Price columns are the 2nd and 5th columns in the table. So we specify this in the QUERY function as follows:
=QUERY(IMPORTHTML("https://coinmarketcap.com/all/views/all/","table",3), "SELECT Col2, Col5")
Here, the IMPORTHTML function will first get your table number 3 from the coinmarketcap webpage and then the QUERY function will take this table and extract just the second and fifth columns from it.
Extracting Specific Table Rows from a Webpage
Let us now further refine our search and narrow down our window. Let us say we only want to see the rows where the Price value is more than $1,000.
We can specify this condition within the QUERY function by adding a WHERE clause as follows:
=QUERY(IMPORTHTML("https://coinmarketcap.com/all/views/all/","table",3), "SELECT Col2, Col5 WHERE Col5>1000")
Since the Price is in column 5, we specified that we want to select only those rows of Col2 and Col5 where Col5>1000.
Here’s the result we get:
Automatically Refreshing the IMPORTHTML Function at a Given Interval
In most cases, we would like to see the freshest version of the data that we import from a webpage. The IMPORTHTML function automatically refreshes the imported data every hour.
However, if you want to speed up or slow down the refresh interval, you can do so by adding an additional refresh string as follows:
=IMPORTHTML("https://coinmarketcap.com/all/views/all/?refresh=1","table",3)
Notice that we added a “?refresh-1” condition at the end of the url string.
The above formula makes the IMPORTHTML function refresh itself every 1 minute.
In this tutorial we had a detailed look into the IMPORTHTML function in Google Sheets. We went over how the function works, its syntax and how to use it in different cases. We hope the tutorial was helpful and easy to follow.