Scraping Instagram Follower Information with Python
Disclaimer: This tutorial is only for demonstration purposes and does not encourage TOS violations through web scraping.
Recently I tried scraping Instagram follower information from public profiles for a personal project of mine. The project didn’t really amount for much but I did get a nice Python script out of it. Here are the implementation details.
Instagram Private API
The implementation relies on a pretty comprehensive Instagram library written in Python called Instagram Private API . In the words of the library’s author,
The library was written to access Instagram’s API when they clamped down on developer access. The goal of the project is to achieve parity with the official public API.
Feel free to check out the documentation, which in my opinion is quite well done and covers a lot of very useful functionality. Right now our focus will be on a small subset of functions that let us to interact with the Instagram API.
Initial Setup
Create a directory for your project and create ascrape.py
file. Next, clone this fork of the Instagram Private API into the root of your project folder. The fork has a fix for an authentication bug in the original repository.
Your project structure should now look like this:
insta-followers-scraper
----scrape.py
----instagram_private_api
----instagram_private_api
----instagram_web_api
// other files
Note: The Instagram Private API functionality is grouped under two separate modules: instagram_private_api
and instagram_web_api
. We only need functionality from the instagram_web_api
module for this script.
Authentication
To successfully retrieve the complete list of followers from a public profile, your request should be authenticated. This is pretty annoying but fortunately, the instagram_web_api
module allows us to create an authenticated client through a very simple interface.
from instagram_private_api.instagram_web_api import Client
AUTH_USERNAME = ''
AUTH_PASSWORD = ''# authenticate client
try:
client = Client(username=AUTH_USERNAME, password=AUTH_PASSWORD, authenticate=True)
except Exception as e:
print('auth error', e)
raise e
if client.is_authenticated:
print('Client authenticated')
else:
print('Client not authenticated')
Update AUTH_USERNAME
and AUTH_PASSWORD
with valid credentials and test the script using python3 scrape.py
. If all goes well, you should see Client authenticated
printed on your console.
Retrieving the User ID
Now lets get down to business! If you look at the client
object we just created, it has a user_followers
method that we can use to retrieve follower information. The method has the following signature:
@login_required
def user_followers(self, user_id, **kwargs):
If you look carefully, this method also expects a user_id
instead of the username. Fortunately, if we know the username
the user_id
can be retrieved fairly easily like so:
result = api.user_info2('<your target account>')
user_id = result['id']
print('user id retrieved: {}'.format(user_id))
If all goes well, you should now have the user_id
of your target account!
Scraping Follower Information
Now we can revisit the user_followers
method with the following code snippet, which will return an object containing follower information from the specified user_id
# extract=False skips a data sanitization skip that would otherwise strip out useful meta information
results = api.user_followers(user_id, count=50, extract=False)
user = results['data']['user']
If you inspect the user
object, it contains a whole bunch of useful information related to the target profile as well as the follower accounts that we requested. A few of these are shown below:
edge_followed_by = user['edge_followed_by']# total follower count
print('follower count: {}'.format(edge_followed_by['count']))# followers array
followers = []
followers.extend(edge_followed_by.get('edges', []))# print follower information
for follower in followers:
print('username: {}'.format(follower['node']['username']))
print('full name: {}'.format(follower['node']['full_name']))
print('user id: {}'.format(follower['node']['id']))
print('is private: {}'.format(follower['node']['is_private']))# page info. This is required for pagination. more info on this below!
print(edge_followed_by['page_info'])
Now it should be noted that user_followers
will only retrieve a maximum of 50 followers per request. However, the endpoint does support pagination which we can utilize to our advantage.
To perform a paginated scrape we can implement a simple loop that uses a few attributes within the page_info
object like so:
has_next_page = True
end_cursor = None
while has_next_page:
# note the additional end_cursor parameter
results = api.user_followers(user_id, count=50, extract=False, end_cursor=end_cursor)
user = results['data']['user']
edge_followed_by = user['edge_followed_by']
# do what you want with the data!
# pagination attributes
end_cursor = edge_followed_by['page_info']['end_cursor']
has_next_page = edge_followed_by['page_info']['has_next_page']
There you have it! A python script that can help you analyze follower information for public instagram profiles. The complete implementation of the script can be found at the project’s repository.
Closing Thoughts
- Instagram has very strict API throttling and bot detection mechanisms in place. It is always wise to respect these limits when scraping information.
- An unavoidable downside of this approach is the need for authentication. Performing repeated authentications is a sure fire way to get a profile banned. However, the Instagram Private API does provide the capability to cache and reuse authenticated sessions. If this is something you are interested in, have a look at this example.
- High volume scraping in theory can be performed by integrating proxies to your implementation. However, approaches like these are often unreliable, extremely fragile and is almost never worth your time and effort.