### Statistical Analysis Help

Hello, I need some help analyzing my data.

I tracked the the top 15 tweets for each day and did a scatter-plot. My graph (attachment) seems to show large differences between the top tweets (Tweets 1-5) each day and the points are spread out, while as the tweet number decreases (Tweets 6-15), the difference between them decreases and the the points are all close together (see the attachment). I'm trying to show there is a consistent significant difference between the top tweets and the lower ranked tweets. Is there a statistical way to show or analyze this? I downloaded a trial version of the IBM SPSS program.

Also, what statistical test can I use to find the value that I would expect most of the data points to fall within (for example, what value would I expect the Tweet 15 datapoints to fall under)?

Thank you!
PhilY

### Re: Statistical Analysis Help

Hi PhilY,

It looks like you have collected a lot of interesting data regarding your Analysis of Twitter networks project. In order to better guide you in your statistical analysis more information would be helpful. What hypothesis are you testing and what kind of tweet data have you collected? Specifically what are the axes on your scatter plot?

Cheers,
Emily
Limeybean
Expert

### Re: Statistical Analysis Help

Thank you. My question is: In a protest environment, how does the number of influencers and the magnitude of influence (amount of influence needed to recruit people) change as the protest develops?

Briefly, my hypothesis is that as the protest develops, there will be more influencers (people whose tweets are retweeted to others at a significant level above others); however, in the later stages of recruitment, the number of influencers will remain constant until the climax of the protest. Also, I think that over time, the amount of influence needed and shown will cycle up and down.

The graph is my raw data and the x-axis is the dates that I recorded data for, and the y-axis is the number of retweets for each of the top 15 tweets (the 15 tweets that had the greatest number of retweets) by any person for that day. I didn't show this but I also recorded the username of the tweeters.

I attached a new spreadsheet, with a line graph connecting the points as well, if it helps. I'm thinking of finding the range/mean/median of the number of retweets for each of the top 15 tweets and comparing them. Is there a way to compare this over time? Also, what test should I use to compare the groups (Tweet 1 to 2 to 3 etc.) to see if there is a significant difference among them (e.g. a lot of the bottom Tweet levels (6-15) seem to cluster together along the same line over time but the top ones separate out more)?

Thank you very much!
PhilY

### Re: Statistical Analysis Help

Hey PhilY,

I am sorry for the delay in responding but I needed to seek some help. I cannot take credit for the following, but I think that it should be helpful.

Let me check if I'm understanding your question correctly. It sounds like you want to know if the number of "highly influential" tweets is changing (and perhaps increasing) over the course of a protest. I'm not certain of an exact way to test this question but a potential strategy is to figure out on each day of the protest the average amount that tweets are retweeted and then identify which/how many tweets for that day are retweeted significantly more than that average (these would be your "highly influential" tweets). If you do that for each day of the protest you can then evaluate if the absolute number of influential tweets shifts over the course of the protest.

To do this for each day I would probably want more to look at more than just the top 15 tweets. Ideally if you could get all the tweets related to the protest that would be a better data set. But with whatever data you have I would calculate the mean and standard deviation for number of retweets for a unique tweet. Then you can make a decision of how far away from the mean an influential tweet should be (95% of the distribution would be within 3 SDs of the mean) and you can then add SDs to the mean to identify how many retweets such an outlier would receive. Then count how many tweets on that day have more than that threshold number of tweets.

You can then take that number of influential tweets and plot it against time to see if there is a temporal pattern.

Cheers,
Emily
Limeybean
Expert

### Re: Statistical Analysis Help

Thank you for your reply! I appreciate all of your feedback; it's helping me a lot. Unfortunately, it's really hard for me to get every retweet, since I'm recording all the data manually into Excel and the process is time-consuming since the time frame is four months.

In addition to finding the number of significant influencers there are each day, I'm also trying to quantify the amount of influence each tweet has. Do you think I could do this by finding how many Standard Deviations away a certain tweet is from that day's mean to see who has more influence (ex. a tweet that is 3 standard deviations away would have more influence than one that is 2.5 standard deviations away)? Do you think this compares over time? For example, if two tweets on different days are both 3 standard deviations away, but one has more retweets than the other I think they would still have the same degree of influence even though the number of retweets is different.Would you agree?

Thank you again so much!
PhilY

