Scraping Instagram Follower Information with Python

Priyath Gregory
4 min readMay 16, 2021

Disclaimer: This tutorial is only for demonstration purposes and does not encourage TOS violations through web scraping.

Recently I tried scraping Instagram follower information from public profiles for a personal project of mine. The project didn’t really amount for much but I did get a nice Python script out of it. Here are the implementation details.

Instagram Private API

The implementation relies on a pretty comprehensive Instagram library written in Python called Instagram Private API . In the words of the library’s author,

The library was written to access Instagram’s API when they clamped down on developer access. The goal of the project is to achieve parity with the official public API.

Feel free to check out the documentation, which in my opinion is quite well done and covers a lot of very useful functionality. Right now our focus will be on a small subset of functions that let us to interact with the Instagram API.

Initial Setup

Create a directory for your project and create ascrape.py file. Next, clone this fork of the Instagram Private API into the root of your project folder. The fork has a fix for an authentication bug in the original repository.

Your project structure should now look like this:

insta-followers-scraper
----scrape.py
----instagram_private_api
----instagram_private_api
----instagram_web_api
// other files

Note: The Instagram Private API functionality is grouped under two separate modules: instagram_private_api and instagram_web_api . We only need functionality from the instagram_web_api module for this script.

Authentication

To successfully retrieve the complete list of followers from a public profile, your request should be authenticated. This is pretty annoying but fortunately, the instagram_web_api module allows us to create an authenticated client through a very simple interface.

from instagram_private_api.instagram_web_api import Client

AUTH_USERNAME = ''
AUTH_PASSWORD = ''
# authenticate client
try:
client = Client(username=AUTH_USERNAME, password=AUTH_PASSWORD, authenticate=True)
except Exception as e:
print('auth error', e)
raise e

if client.is_authenticated:
print('Client authenticated')
else:
print('Client not authenticated')

Update AUTH_USERNAME and AUTH_PASSWORD with valid credentials and test the script using python3 scrape.py . If all goes well, you should see Client authenticated printed on your console.

Retrieving the User ID

Now lets get down to business! If you look at the client object we just created, it has a user_followers method that we can use to retrieve follower information. The method has the following signature:

@login_required
def user_followers(self, user_id, **kwargs):

If you look carefully, this method also expects a user_id instead of the username. Fortunately, if we know the username the user_id can be retrieved fairly easily like so:

result = api.user_info2('<your target account>')
user_id = result['id']
print('user id retrieved: {}'.format(user_id))

If all goes well, you should now have the user_id of your target account!

Scraping Follower Information

Now we can revisit the user_followers method with the following code snippet, which will return an object containing follower information from the specified user_id

# extract=False skips a data sanitization skip that would otherwise strip out useful meta information
results = api.user_followers(user_id, count=50, extract=False)
user = results['data']['user']

If you inspect the user object, it contains a whole bunch of useful information related to the target profile as well as the follower accounts that we requested. A few of these are shown below:

edge_followed_by = user['edge_followed_by']# total follower count
print('follower count: {}'.format(edge_followed_by['count']))
# followers array
followers = []
followers.extend(edge_followed_by.get('edges', []))
# print follower information
for follower in followers:
print('username: {}'.format(follower['node']['username']))
print('full name: {}'.format(follower['node']['full_name']))
print('user id: {}'.format(follower['node']['id']))
print('is private: {}'.format(follower['node']['is_private']))
# page info. This is required for pagination. more info on this below!
print(edge_followed_by['page_info'])

Now it should be noted that user_followers will only retrieve a maximum of 50 followers per request. However, the endpoint does support pagination which we can utilize to our advantage.

To perform a paginated scrape we can implement a simple loop that uses a few attributes within the page_info object like so:

has_next_page = True
end_cursor = None
while has_next_page:
# note the additional end_cursor parameter
results = api.user_followers(user_id, count=50, extract=False, end_cursor=end_cursor)
user = results['data']['user']
edge_followed_by = user['edge_followed_by']

# do what you want with the data!

# pagination attributes
end_cursor = edge_followed_by['page_info']['end_cursor']
has_next_page = edge_followed_by['page_info']['has_next_page']

There you have it! A python script that can help you analyze follower information for public instagram profiles. The complete implementation of the script can be found at the project’s repository.

Closing Thoughts

  • Instagram has very strict API throttling and bot detection mechanisms in place. It is always wise to respect these limits when scraping information.
  • An unavoidable downside of this approach is the need for authentication. Performing repeated authentications is a sure fire way to get a profile banned. However, the Instagram Private API does provide the capability to cache and reuse authenticated sessions. If this is something you are interested in, have a look at this example.
  • High volume scraping in theory can be performed by integrating proxies to your implementation. However, approaches like these are often unreliable, extremely fragile and is almost never worth your time and effort.

--

--

Priyath Gregory

Exploring software development, scalable software design & architecture.