StackOverflow – technology map

The number of technologies we can find these days can be quite overwhelming. In this article, I’ll show you how to use StackOverflow to better understand which technologies are often connected to one another. You might find this topic interesting if you are a Sourcer/Recruiter (to find better search keywords or even potential candidates), a Manager (to gain high-level understanding) or even if you are tech-savvy (to be up-to-date). I’ve divided the article into two parts. Today, I’m going to focus on less technical side, and we’ll go through 3 example tags. In the next post, I’ll describe technical aspects and show how you can run my script.

StackOverflow

As you probably know, StackOverflow is Q&A platform for developers. It’s a part of a wider StackExchange network. There is a lot of discussion on how SO should be used. Sometimes people don’t think much about the posted answer and they just copy-paste the code, which is not always the right one. There are also many trivial questions. Nevertheless, it’s a very active community used by most developers all over the world. That’s why it’s a great source of data about technologies.

Putting StackOverflow tags into a graph

I’ve written a Python script which does the following:

  1. Gathers Q&A threads with the tag provided by you. Since now, we treat the term tag as equivalent to the term technology.
  2. Collects all the users who have provided answers in those threads.
  3. Builds a list of tags for each user. StackOverflow provides information about a particular user about tags he/she was active in.
  4. Transforms the relation: user -> tag into a graph.
  5. Plots the graph in Graphistry.

Tags don’t always represent technologies, but it happens very often. Let’s assume we are a Sourcer/Recruiter and we are looking for a Big Data Engineer. Probably, we have received quite a typical list of requirements from the customer. We want to get to know our profile better and build more suited search keywords.

Finding a Big Data Engineer!

I’ve run my script with bigdata tag and built the graph. You can click on the image below and go to the interactive mode.

At the beginning, there’s a bit of chaos, but you can filter the data (in the centre of the top toolbar) to get a more readable picture. Here are some starting hints:

  • Points, called nodes, represent tags/technologies (red) and users (blue).
  • Lines, called edges, represent the already mentioned relationships between the user and the tag. Let’s come up with an example: If a user: John has written an answer to the question described with a tag: spark, there would be a node John, spark and line (edge) from John to spark.
  • Nodes (points) and edges (lines) have attributes describing them. One of the useful attributes is a degree – which is the sum of incoming and outgoing edges. If you want to filter out less important nodes (users and technologies), you can add a filter: point:degree >= 10. It will hide the tags which are related to less than 10 users and accounts which are bound to fewer than 10 tags.
  • If you want to show only tag nodes that are related to Y (let’s say – 50) or more users, set the filter to: point:degree_in >= 50.
  • You can change the visual settings of the graph. To do this, click on the brush button: . I recommend that you decrease “Edge size” and “Edge opacity” in bigger graphs. You can also increase “Max Points of interest” in the Label settings ()

I’ve also gathered data for mlops and kubeflow tags.

Graph for tag “mlops”. Click to open interactive view on Graphistry Hub.
Graph for tag “kubeflow”. Click to open interactive view on Graphistry Hub.

Searching by a tag or user

Graphistry provides a useful feature – Data Table. It’s a list of all nodes and edges combined into two tables. To display it, click on the Data Table icon (top toolbar). You’ll see a table like this:

Graphistry – Data Table with search

It’s especially helpful when you have a number of nodes and edges, and the first view is not very clear. You can have a difficulty in finding the kubeflow tag on the last graph. In such a case, open the Data Table, type kubeflow in the search field and click on the row. This point will be highlighted on the graph.

Selecting the right tag

MLOps and Kubeflow datasets have many common parts, but are not the same. This is a good example of two approaches. When you want to find tags/technologies related to a specific one, like Kubeflow in this case, you can run my script for this specific technology. You can also start with a more general term like: MLOps to find a set of similar technologies, and then look for more specific ones.

How to run the script? If you are familiar with Python, it’s really easy, but if you’re not a technical person, you won’t find it difficult, either 🙂 We’ll focus on this in the next article. If you can’t wait and want to test the script right away, go to the repo: https://github.com/data-hunters/tech-skills-visualizer. Stay tuned!

Related Posts
Welcome to our blog!

If you are searching for information related to Big Data, mostly focused on Open Source Intelligence, you are in the Read more

Extracting metadata (Exif) with Metadata Digger

In last post we mentioned about Metadata Digger. It’s a tool for extracting and analyzing metadata from huge amounts of Read more

Solr – Full Text Search for Big Data and OSINT

Today we want to make a basic introduction to Apache Solr and then (in the following post) use it as Read more

Indexing metadata from images to Solr with Metadata Digger

In one of our last video posts we presented how to extract metadata from images and save to CSV (you Read more

Leave a Reply

Your email address will not be published. Required fields are marked *