
This is translation from article written in Polish as it is mainly intended for people from my country. First part of election was conducted on June 28.
In a few days we are going to choose a president. Each candidate, to a lesser or greater extent, is engaging in social media. This engagement may have highly various forms. From trying to reach people in a natural way, through paid publicity campaigns, ending with quite different campaigns in which a whole staff of committed people creating fake accounts to promote a particular candidate in a less or more finespun way. In this post I am not going to chase trolls, although we will mention this issue a bit. First of all, I want to present data and make a basic analysis of different accounts engagement. In the collection there were the ones, which proved their activity by creating tweets including at least one of below hashtags (obviously containing any combinations of small and capital letters):
- #biedron2020
- #bosak2020
- #duda2020
- #holownia2020
- #kidawa2020
- #kosiniak2020
- #trzaskowski2020
Selection criteria
The analysis does not involve all tweets connected with the election. It can be enriched by other hashtags, little less official, but I came to a conclusion it will be fair enough to start with hashtags which can be acknowledged as official. There are no candidates on the list who did not get to the forefront of the last survey. Maybe with an exception of Mrs Kidawa, but her activity is worth to be analysed in terms of Rafał Trzaskowski.
Highly significant is the time slot as well. I took into consideration tweets from the beginning of April till June 19 included.
Engagement with hashtags distribution
For openers, very basic numbers showing how many tweets with a relevant hashtag appeared in a given period (01.04-19.06.2020).
Figures are not surprising. The most of entries from the whole collection are the ones including hashtags #duda2020 and #trzaskowski2020. By the way, taking into account that Rafał Trzaskowski has been a candidate not for a long time, you can observe quite a big activity at him. It’s worth mentioning that the fact of attaching a hashtag in a post, does not mean a positive message of course. That is why above numbers should be treated as general interests (negative and positive) in the candidate. Below there is a chart with the same numbers but showing a percentage contribution of tweets with particular hashtags in a whole collection.

A sum of tweets for each of hashtags is not an actual sum of all tweets because some posts include a few of analysed hashtags.
Activity across months
Let’s divide above numbers into months. We are analysing April, May and June. Let’s check which hashtag was the most popular.

A huge activity of #duda2020 and #trzaskowski2020 is visible here. The rest is far behind. Despite the final decision about contesting a seat by president Trzaskowski was made about the middle of May, you can see that #trzaskowski2020 was significantly more active in May than #kidawa2020. If taking into consideration the amount of posts with #kidawa2020 in April, we can see it is comparable to other candidates far depart from #duda2020. She falls much poorer than #trzaskowski2020. As you can see, Andrzej Duda was the most active this month. There may be a lot of reasons but surely the COVID-19 situation influenced the state of things.
Dates of the accounts creation
Popularity is popularity but let’s get to more interesting aspects. Let’s analyse when the accounts responsible for making tweets with particular hashtags were created. There are a lot of accounts all together so we are not going to discuss all of them, but we care about picking up the months in which the biggest amount of accounts were created. To do this, we are going to group the accounts together in according to the months of their creation. Of course we are considering only the accounts which created at least one tweet with a particular hashtag. The below chart presents 20 largest groups. Let’s check who got it.

As you can see, to the first twenty got #duda2020, #trzaskowski2020 and #bosak2020. The interesting fact is that in case of #duda2020 we deal with a very systematic accounts creation in January 2016, 2017 and 2019. In case of January 2020 there is a significant growth. Then in February there is a slight decrease in the number of new accounts and then, an increase again. In March, April and May about 220-230 are created. Quite a similar trend may be observed in case of #trzaskowski2020. Here we have consistency and since January to May there is a constant increase of new accounts. Where can these numbers come from? It is hard to write expressly. On the one hand they may point at people’s mobilization in connection with upcoming election or just pandemic. In recent months life of many people was moved to the Web because of COVID-19. Twitter is a potential source of latest world news. On the other hand, as we will see in a moment, a group of people who created an account in 2020 forms quite a big contribution among those who tweeted using hashtags #duda2020 and #trzaskowski2020. This may testify more organised actions.
The same chart below, only extended by next 10 groups (top 30).

As you can see, there are the same hashtags all the time. New groups represented by new accounts with #bosak2020 and #duda2020 appeared. Let’s check if anyone new enters the game, if we extend our circle by another 10 groups.

We have a new player #holownia2020. Slightly, because only in April 2020 with an amount of 70 accounts, but still 🙂 Except for it you can observe appearance of primarily new groups connected with #trzaskowski2020. These are mainly previous years.
Activity of accounts created in 2020
If we start wondering if some particular strategy is by any chance a reason of systematic creation of accounts, it is worth checking what the activity of new accounts in an election year is. Below chart presents this information.

As you can see, participation in newly created accounts in 2020 is especially crucial in case of some accounts:
- #holownia – ~29%
- #bosak2020 – ~28%
- #biedron2020 – ~27%
- #duda2020 – ~21%
- #kosiniak2020 – ~20%
- #trzaskowski2020 – ~17%
- #kidawa2020 – ~14%
Percentage itself cannot be an oracle of course, but only one of many factors. It is especially apparent in case of #kosiniak2020 where there was just not a big activity. Each new account highly boosted the involvement. As you know, on Twitter there are various bots. Among them there are such, which will be not connected with candidates at all, not even connected with the election theme. To function and not be blocked by Twitter, they will retweet from time to time. That’s why in case of analysis of such unpopular hashtag as #kosiniak2020 it is worth taking into consideration.
The most active accounts
Let’s go into some details and check if we have significant leaders using particular hashtags. The chart below shows top 10 accounts writing tweets with at least one of considering hashtags.

And here we have a leader 🙂 It is definitely @aantypofrontb account using a hashtag #duda2020. What’s interesting, the account was more active than an official electoral account of Andrzej Duda. Even if we total a few other most popular accounts up, it is still visible that @aantypofrintb created more posts 😉 In regard to other hashtags we can see that to top 10 also qualified #bosak2020, #kosiniak2020 and #trzaskowski2020. Official accounts of the candidates are leading here. Among the users dominate also @konfederacja_ and @polskiidea.
Summary of the most active accounts
The last chart showed that #duda2020 dominated top 10. To take into consideration activity within less popular hashtags, let’s focus on everybody separately taking 5 most active users. This time we will help ourselves with charts presenting additional information:
- A number of tweets with a particular hashtag
- All tweets of a given person
- Date of joining
- Number of people following the account
- Amount of people, who follow the account
- Number of likes which the account has given to different tweets
- Percentage use of tweets with a chosen hashtag among all posts with this hashtag
I am not going to analyse all users. I will leave it to a reader, but it is worth to pay attention to:
- Relationship between people following an account to being followed. If both worths are similar, it may mean that the account is a part of a bigger net of accounts managed by someone. It results from a fact that such accounts authenticate mutually. Of course a crucial word here is “maybe” as such numbers may result from other things.
- Relationship between likes and other activities, e.g. tweets (it would be good to have more detailed information about how many answers, tweets and retweets we had here. Maybe I will enrich it in the future 😉 ). If there are apparently enough likes (especially to tweets not being retweets or answers) it may testify that we are dealing with a bot. A “Like” is much easier mechanism than a reliable tweet.
- Participation of tweets with an examined hashtag among all tweets of a particular person (it is not the last column, that one shows participation in all tweets with a hashtag).
- Date of an account creation. I encourage to a creative activity in this area 🙂 It may turn out that some of the accounts creation date is not random. Except for such obvious dates like the ones which fall in months just before the election, may be also different e.g. previous election or significant happenings demanding greater activity (e.g. a controversial law, scandal etc.
- It is always right to analyse yourself a chosen account directly on Twitter. I have my own remarks but I will keep them for myself 😉
#biedron2020
Account | Tweets with hashtag | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
robertbiedron | 439 | 19011 | 17.01.2012 | 248334 | 1912 | 32317 | 12.71 |
b_maciejewska | 75 | 4155 | 27.01.2013 | 3528 | 960 | 4177 | 2.17 |
poselttrela | 65 | 4115 | 06.04.2012 | 3769 | 1219 | 13220 | 1.88 |
krutulpawel | 49 | 1469 | 25.03.2019 | 592 | 311 | 4604 | 1.42 |
lukkubiak | 45 | 9162 | 31.08.2014 | 725 | 300 | 84669 | 1.3 |
#bosak2020
Account | Tweety z hashtagiem | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
bosak2020 | 1861 | 5594 | 15.11.2019 | 9962 | 362 | 933 | 18.68 |
konfederacja_ | 912 | 21814 | 07.02.2019 | 46696 | 553 | 14609 | 9.15 |
ruchnarodowy | 284 | 14798 | 28.01.2013 | 32833 | 538 | 6904 | 2.85 |
true_poland | 134 | 10640 | 20.11.2018 | 3832 | 5000 | 21191 | 1.35 |
synrybaka56 | 87 | 6950 | 10.03.2018 | 89 | 97 | 9639 | 0.87 |
#duda2020
Account | Tweets with hashtag | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
aantypofrontb | 4858 | 32537 | 30.09.2019 | 4323 | 3493 | 43775 | 8.18 |
andrzejduda2020 | 1346 | 3182 | 22.02.2018 | 19895 | 152 | 170 | 2.27 |
polskiidea | 1117 | 9325 | 05.03.2020 | 3123 | 3192 | 10981 | 1.88 |
szczery2015 | 730 | 28811 | 12.02.2017 | 6746 | 4986 | 204201 | 1.23 |
krepacz | 672 | 4922 | 30.01.2020 | 1165 | 2636 | 6127 | 1.13 |
#holownia2020
Account | Tweets with hashtag | All tweetsy | Join date | Followers | Following | Likes | % of tweets with hashtag |
frankee_88 | 464 | 5513 | 28.01.2019 | 237 | 130 | 12234 | 5.81 |
piotr_jancz | 426 | 5902 | 04.02.2020 | 257 | 146 | 7056 | 5.34 |
joanna07018177 | 345 | 999 | 02.02.2020 | 106 | 203 | 9642 | 4.32 |
dariopolo9 | 279 | 2707 | 11.02.2020 | 193 | 171 | 1098 | 3.49 |
piotrasik | 238 | 2089 | 21.08.2010 | 271 | 269 | 6627 | 2.98 |
#kidawa2020
Account | Tweets with hashtag | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
astonhedge | 169 | 68855 | 28.05.2015 | 745 | 178 | 4853 | 3.77 |
arturostrowski6 | 86 | 11545 | 15.06.2019 | 107 | 125 | 12160 | 1.92 |
ewblomsilniraze | 71 | 29432 | 12.11.2013 | 781 | 1101 | 25676 | 1.58 |
rbodaszewski | 65 | 46718 | 24.07.2012 | 1339 | 4922 | 63349 | 1.45 |
sebastianprycza | 61 | 6835 | 30.10.2013 | 390 | 152 | 9094 | 1.36 |
#kosiniak2020
Account | Tweets with hashtag | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
kosiniakkamysz | 654 | 8606 | 18.11.2012 | 123052 | 1258 | 11094 | 28.46 |
aneta_marciszek | 336 | 8117 | 17.02.2018 | 224 | 415 | 17375 | 14.62 |
janparadowski_ | 116 | 1090 | 11.05.2017 | 86 | 93 | 1182 | 5.05 |
wiesci24pl | 52 | 13714 | 30.06.2017 | 3043 | 65 | 1 | 2.26 |
sondazep | 51 | 3688 | 08.04.2020 | 5194 | 370 | 838 | 2.22 |
#trzaskowski2020
Account | Tweets with hashtag | All tweets | Join date | Followers | Following | Likes | % of tweets with hashtag |
trzaskowski2020 | 711 | 1577 | 14.05.2020 | 36632 | 189 | 924 | 1.82 |
marioawario | 544 | 122481 | 26.04.2016 | 4262 | 3746 | 461376 | 1.39 |
ciotkaogda | 417 | 1756 | 13.12.2015 | 76 | 505 | 5765 | 1.07 |
po_podkarpackie | 370 | 49410 | 14.09.2016 | 3207 | 3359 | 55971 | 0.95 |
aniqua16 | 343 | 185827 | 29.05.2018 | 922 | 897 | 168864 | 0.88 |
Users’ activity in various hashtags
An interesting complement to above data would be information about how respective users used hashtags of different candidates in the same tweets, at the same time. It is well presented by a graph below, where two accounts from each group can be found. I reduced number of accounts per hashtag to 3, to keep legibility of the graph. Blue nodes are users and the yellow ones are hashtags of particular candidates. The edge thickness (relation) reflects the amount of tweets written by a given account.

The most active accounts were mainly engaged in sharing posts with one or two hashtags. Here prominently dominates @astonhedge, which used all hashtags. Although the graph does not include many points, it still shows the contest was held largely between #kidawa2020, #trzaskowski2020 and #duda2020. If we exclude @ashtonedge it would turn out that #bosak2020, #holownia2020 and #kosiniak2020 has no relevance with the accounts where Andrzej Duda and Rafał Trzaskowski are mentioned. Of course it is only a small, selected group but at the same time, representing the most active users.
A little bonus in conclusion. Visualization of the graph consisting of all relations (user -> hashtag) prepared with a use of Graphistry tool. It allows displaying connections for Big Data.

Confrontation of two main contesters and lagging behind of the other candidates is apparent here. The analysis itself may be given attention in another article, discovering various kinds of groups or principal people. We will probably take care of this topic in the future 😉
Summary
To gather data and make analysis I used following tools, including: Twint, NiFi, Spark, Neo4j, neovis.js, Zeppelin (with highcharts).
Above data and quite superficial analysis show interesting dependents as well as emerge accounts standing out in various terms. In following steps you can extend a data set by different hashtags and check if any significant changes appear. Most likely in case of Szymon Hołownia, we could win more interesting results as it seems that at least equally strongly he uses other hashtags as #ZoltaRewolucja or #EkipaSzymona, but then a data set would have to be strongly extended for other candidates. Examining tweets sentiment or grouping them with used words, would be interesting for sure. Finally, the accounts may be checked by Machine Learning model trained for bots detection. There are lots of possibilities 🙂 Anytime soon I will be posting similar analysys with a use of mentioned mechanisms.
Follow me on Twitter (@jca3s) for latest posts!