This is a follow-up to a short post I wrote on R Access to Twitter’s v2 API. In this post I’ll walk through a few more examples of pulling data from twitter using a mix of Twitter’s v2 API as well as the {rtweet}
package1.
I’ll pull all Twitter users that I (brshallo) have recently been engaged by (e.g. they like my tweet) or engaged with (e.g. I like their tweet). I’ll lean towards using {rtweet}
2 but will use {httr}
in cases where it’s more convenient to use Twitter’s v2 API3.
For this post I’m not really worried about optimizing my queries, minimizing API hits, etc. E.g. when using {rtweet}
I should authenticate through my project app which has higher rate limits (see Authentication options) but instead I just use the default {rtweet}
user authentication. Note also that the default {rtweet}
authentication only works when running scripts interactively4.
See prior post for links on authentication mechanisms. I’m assuming you have “TWITTER_BEARER”5 as well as “TWITTER_PAT”6 in your .Renviron file.
library(rjson)
require(httr)
require(jsonlite)
require(dplyr)
library(purrr)
library(lubridate)
library(rtweet)
library(tidyr)
# bearer_token only used when using httr and twitter v2 API
bearer_token <- Sys.getenv("TWITTER_BEARER")
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
GETting all engagements
In each sub-section I’ll pull a different kind of engagement.
- GET favorited users
- GET all tweets from user – starting point for most of the following sections
- From initial query GET references in those tweets
- Filter to only tweets with likes, GET favoriters
- Filter to only tweets with quotes, search URL’s to GET quoters
- Filter to only tweets with retweets, GET retweeters
- GET repliers and mentions
I’ll finish by Putting them together into a function. Note that not all queries are perfect at pulling all engagements7.
GET favorited users
It’s often easiest to just let {rtweet}
do the work.
# Twitter id for brshallo
user_id <- "307012324"
favorites <- rtweet::get_favorites(user = user_id)
GET all tweets from user
Pulls up to 100 of the most recent tweets from a user8.
url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/tweets?max_results=100", user_id = user_id)
params <- list(tweet.fields = "public_metrics,created_at,in_reply_to_user_id,referenced_tweets")
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
obj <- httr::content(response, as = "text")
json_data <- jsonlite::fromJSON(obj, flatten = TRUE)$data %>%
as_tibble()
GET references
statuses_referenced <- bind_rows(json_data$referenced_tweets) %>%
rename(status_id = id)
users_referenced <- rtweet::lookup_tweets(statuses_referenced$status_id)
GET favoriters
Filter initial query of tweets to only those with more than 0 likes.
liked_tweets <- json_data %>%
filter(public_metrics.like_count > 0)
Functionalize approach described in getting favoriters from prior post R Access to Twitter’s v2 API and map tweet-ids through.
tweet_ids <- liked_tweets$id
get_favoriters <- function(tweet_id){
url_handle <- glue::glue("https://api.twitter.com/2/tweets/{status_id}/liking_users", status_id = tweet_id)
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers))
# query = params)
obj <- httr::content(response, as = "text")
x <- rjson::fromJSON(obj)
x$data %>%
map_dfr(as_tibble)
}
tweet_favoriters <-
map_dfr(tweet_ids, ~ bind_cols(tibble(liked_status_id = .x),
get_favoriters(.x))) %>%
rename(user_id = id)
GET quoters
Filter to only posts with quotes.
tweet_ids_quoters <- json_data %>%
filter(public_metrics.quote_count > 0) %>%
pull(id)
However I am not positive the approach below actually picks up all quotes9. I’d also reviewed some other approaches10.
search_tweets_urls <- function(tweet_id){
rtweet::search_tweets(
glue::glue("url:{tweet_id}",
tweet_id = tweet_id)
)
}
quoters <- map_dfr(tweet_ids_quoters, search_tweets_urls) %>%
filter(is_quote) %>%
as_tibble()
GET retweeters
Filter to only posts that were retweeted.
tweet_ids_rt <- json_data %>%
filter(public_metrics.retweet_count > 0) %>%
select(status_id = id)
I use a slightly different approach in this section than in other similar sections11.
retweeters <- tweet_ids_rt %>%
mutate(retweeters = map(status_id, get_retweeters)) %>%
unnest(retweeters)
GET repliers and mentions
Alternatively you might just use rtweet::get_mentions()
but this only pulls mentions of the currently authenticated user. I also tried other approaches here12.
get_mentions_v2 <- function(user_id){
url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/mentions", user_id = user_id)
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers))
obj <- httr::content(response, as = "text")
x <- rjson::fromJSON(obj)
x$data %>%
map_dfr(as_tibble)
}
tweets_mentions <- get_mentions_v2(gorthon_id)
repliers_mentions <- lookup_tweets(mentions$id)
Putting them together into a function
The function at this gist returns the output from each of the above sections as a list.
# Twitter id for brshallo
user_id <- "307012324"
# load function get_engagements()
source("https://gist.githubusercontent.com/brshallo/119d6a1f858e0e5c20d77212dee8891a/raw/751d022c7bc2e2148292bb78a5178737d9914024/get-engagements.R")
brshallo_engagements <- get_engagements(user_id)
brshallo_engagements
## $favorites
## # A tibble: 10 x 91
## user_id status_id created_at screen_name text source
## * <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 248350998 151302361~ 2022-04-10 05:18:34 BuildABarr "Drop ~ Twitt~
## 2 368551889 151263551~ 2022-04-09 03:36:23 IsabellaGh~ "@elli~ Twitt~
## 3 1469531055736590337 151242047~ 2022-04-08 13:21:54 emkayco "Have ~ Twitt~
## 4 35794978 151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
## 5 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
## 6 29916355 151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
## 7 29916355 151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~
## 8 3089027769 151189179~ 2022-04-07 02:21:09 gyp_casino "@mdne~ Twitt~
## 9 15772978 151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 144592995 151129000~ 2022-04-05 10:29:49 Rbloggers "R Acc~ r-blo~
## # ... with 85 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
##
## $favoriters
## # A tibble: 90 x 4
## liked_status_id user_id name username
## <chr> <chr> <chr> <chr>
## 1 1512295676004093955 117241741 Brett J. Gall brettjgall
## 2 1512295676004093955 2724597409 Peter Ellis ellis2013nz
## 3 1512294950905409543 274123666 Kristen Downs KristenDDowns
## 4 1512293864517750790 3656879234 <U+5F20><U+4EAE> psychelzh
## 5 1512293864517750790 703843771419484160 Ayush Patel ayushbipinpatel
## 6 1512293864517750790 419185498 Kevin Gilds Kevin_Gilds
## 7 1512293864517750790 127357236 Juan LB Juan_FLB
## 8 1512293864517750790 49451947 Luis Remiro LuisMRemiro
## 9 1512293864517750790 253175044 Nicholas Viau nicholasviau
## 10 1512293864517750790 2202983986 Stefania Klayn Ettti_20
## # ... with 80 more rows
##
## $references
## # A tibble: 12 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 307012324 151115943~ 2022-04-05 01:50:59 brshallo "As an~ Twitt~
## 2 307012324 151229344~ 2022-04-08 04:57:09 brshallo "@mdne~ Twitt~
## 3 307012324 150969487~ 2022-04-01 00:51:20 brshallo "It al~ Twitt~
## 4 307012324 151229386~ 2022-04-08 04:58:49 brshallo "@mdne~ Twitt~
## 5 307012324 147233714~ 2021-12-18 22:45:04 brshallo "First~ Twitt~
## 6 29916355 151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~
## 7 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
## 8 144592995 151129000~ 2022-04-05 10:29:49 Rbloggers "R Acc~ r-blo~
## 9 248350998 151302361~ 2022-04-10 05:18:34 BuildABarr "Drop ~ Twitt~
## 10 3146735425 151226195~ 2022-04-08 02:52:00 mdneuzerling "Lovel~ Twitt~
## 11 983470194982088704 151182189~ 2022-04-06 21:43:22 R4DScommuni~ "The n~ Zapie~
## 12 2724597409 151226515~ 2022-04-08 03:04:44 ellis2013nz "@mdne~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
##
## $quoters
## NULL
##
## $retweeters
## # A tibble: 11 x 2
## status_id user_id
## <chr> <chr>
## 1 1512293864517750790 296222670
## 2 1512293864517750790 307012324
## 3 1511869112401596423 4034079677
## 4 1511869112401596423 1306626901432324097
## 5 1511869112401596423 1011817655957893120
## 6 1511469730892156928 1011817655957893120
## 7 1511469730892156928 1306626901432324097
## 8 1511159434717761539 1448348827979747333
## 9 1511159434717761539 15772978
## 10 1511159434717761539 1011817655957893120
## 11 1511159434717761539 1306626901432324097
##
## $referencers
## # A tibble: 10 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 61542689 150992063~ 2022-04-01 15:48:26 twelvespot "@brsh~ Twitt~
## 2 61542689 150994022~ 2022-04-01 17:06:17 twelvespot "@brsh~ Twitt~
## 3 18433005 151007180~ 2022-04-02 01:49:09 rcrdleitao "@brsh~ Twitt~
## 4 35794978 151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
## 5 1346474633520824320 150985661~ 2022-04-01 11:34:03 markjrieke "@brsh~ Twitt~
## 6 29916355 151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
## 7 29916355 151195162~ 2022-04-07 06:18:51 jimjam_slam "@brsh~ Twitt~
## 8 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
## 9 15772978 151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 15772978 151117782~ 2022-04-05 03:04:04 jessicagar~ "@brsh~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
Which as of this writing uses the 1.1 API.↩︎
As it takes less code.↩︎
Or in cases where the field isn’t available in
{rtweet}
. V2 is not yet supported by{rtweet}
but is actively being worked on so this post may have a short shelf-life.↩︎You’ll need to authenticate through a Twitter developer portal app keys if you want to run those sections automatically. You’ll notice that in creating this script I actually don’t evaluate most of the sections and then use some hidden code chunks to return output.↩︎
For the sections where I use
{httr}
in this post.↩︎For the sections where I use
{rtweet}
. This should be set-up through the default{rtweet}
set-up.↩︎This seemed to particularly be the case when it came to seeing all quotes and mentions.↩︎
The reason I’m using {httr} and v2 instead of
{rtweet}
for this is that the 1.1 API (that{rtweet}
currently uses) doesn’t pull quote count unless you have a premium or enterprise account rtweet#640.↩︎Thread here seemed to suggest that just searching the url was the way to go.↩︎
This also seems to be way to see quoters: https://twittercommunity.com/t/how-we-can-get-list-of-replies-on-a-tweet-or-reply-to-a-tweet-in-twitter-api/144958/7
↩︎get_quoters <- function(tweet_id){ url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=url:{status_id}", status_id = tweet_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) # query = params) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } quoters <- map(tweet_ids_quoters, get_quoters)
rtweet::get_retweeters()
has a lot fewer columns returned compared to that fromrtweet::search_tweets()
, which is why I useselect()
above and a different method than the section before and after this where I instead usepull()
and then pass the ideas directly topurrr::map*()
statements rather than wrapping them in amutate()
verb – which would have worked just as well. The structures of the manipulation are nearly the same… maybe should have stayed consistent here and written a function to make clear the pattern here is the same, c’est la vie.↩︎Another simple approach would be to just try:
rtweet::search_tweets("@brshallo")
. I tried the approach below, but really didnt’ seem to work quite as expected…
↩︎tweet_ids_repliers <- json_data %>% filter(public_metrics.reply_count > 0) %>% pull(id) # pulled from here: https://twittercommunity.com/t/how-to-fetch-retweets-and-quote-tweets-from-the-twitter-v2-search-api/156573 but didn't really work as expected... get_replies <- function(tweet_id){ url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=conversation_id:{status_id}", status_id = tweet_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) # query = params) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } repliers <- map(tweet_ids_repliers, get_replies) filter(is_quote) repliers <- bind_rows(repliers)