Sunday, February 24, 2013

Which stats REALLY matter for a position?

I have been wondering for a while which stats are the most important to target and what type of players are most likely to produce high quantities of those stats. My main purpose for this is to find individual players who underperformed for us in fantasy last season that might be able to give us better returns this season if they have similar performances. In other words, what players got screwed over last season because their teammates didn't live up to average?

What I've done is look at the correlation between PP90 and every statistic that provides fantasy points and a couple that don't provide points such as shot conversion percentage and key pass conversion percentage. I've done this on a positional basis only, so we know which stats are most important for each position.


Defenders are the most inconsistent group of players in terms of finding a stat that highly correlates with PP90. CBI had the highest correlation at 0.34. Goals and assists follow with correlations of 0.29 and 0.28 respectively. The big shocker, is that a team's clean sheets has a surprisingly low level of correlation - a mere 0.19 - with a defender's PP90.

So, what does this mean? It's tough to draw any firm conclusions about what defenders we should choose. Since CBI and goals are the two stats with the highest correlation with PP90, I am inclined to say that we want to pick center backs who get forward on set-pieces. This is fairly accurately reflected in the players found at the top of my defender projections. We see center backs like Collin, Bernardez, Ianni, Taylor, Berry, Olave, and McDonald all feature as good targets according to my PP90 projections.


There are a couple stats with much higher correlation with PP90 for our midfielders. Assists correlate with midfielder PP90 at a strong 0.77. However, there are several other stats that correlate at better than 0.5: key passes (0.72), goals (0.68), and crosses (0.62).

There's a couple things to take away from this. First, is the reason there is such high correlation for several statistics is that there are some midfielders (pretty much every defensive midfielder in the league) who simply don't rack up any of the above categories. Even with the added defensive points for CBI and recoveries available, defensive midfielders simply don't score enough of them to make them viable alternatives to their attacking counterparts.

We know we want attacking midfielders, but do we want wingers or central midfielders? Both versions of an attacking midfielder has about equal opportunity for goals, assists, and key passes. It isn't until we get to the fourth most highly correlated statistic where we find our answer. A decently high correlation with crosses and PP90 means we want wingers. Zusi, Rosales, Davis, Pontius, Chaves - all are in the top level of my midfielder projections and all are wingers. It should be noted that there are several central playmakers who also rank high in my projections (Morales, Ferreira, DeRosario) who take freekicks, corners, and/or PKs and are also valid targets.


Surprise, surprise, goals is the stat that best correlates with PP90 for forwards at 0.77. Assists (0.69) and key passes (0.64) also had high levels of correlation with PP90.

In our forwards, we are looking for goalscorers. I'm sure everyone who has even heard the term "fantasy soccer" could have told you that. However, I was surprised at how highly correlated both assists and key passes were with PP90 which gives me second thought about choosing guys who are poachers (Bengston, Bruin, Cooper and to a lesser extent, Wondolowski) instead of more well-rounded players (Higuain, Henry, Keane).

Monday, February 18, 2013

Alternate method for finding value

This time instead of looking at how much overvalued or undervalued a player is, I looked to see how much value they give in terms of projected season points per $million spent (PPM). Generally, this returns the same players near the top of the rankings as my previous method. This might be a simpler way for people to understand exactly how much return they can expect from their investment.

Here's the takeaways:
  1. The best use of these charts is to find cheap players who have the potential to give us decent returns when they are called upon to come off the bench.
  2. Another use for these charts is to find the expensive players who offer a decent return on investment.
  3. Defenders tend to have a higher PPM than midfielders who have a higher PPM than forwards.
  4. It looks as though there are several cheap options in defense that have a good PPM. This leads me to believe this is the area we should look to save a bit of cash for some attackers with lower PPM but higher season points. Look to pick up at least two of Woolard, McCarthy*, Tierney*, Korb*, Hedges, Hurtado, and Parke.
*May not be starters




Saturday, February 16, 2013

Finding players who are undervalued: Part 4 - forwards

And finally we look at pricing inefficiencies of our forwards. Again, I have grouped the top couple prices into a single "elite" bracket. For those of you who are contributors to r/mls, this highlights exactly why I was trying to convince people that Henry was a better choice for MVP than Wondo... but that's a different blog post. There has been a lot of movement for these players' supporting players this offseason with the exception of Wondo. Henry lost Cooper to Dallas, Saborio lost Espindola to New York, Johnson lost Montero to the allure of Libertadores soccer, and Keane lost Donovan to an existential crisis. It will be difficult to say exactly how all these moves will affect the elite strikers, but my guess is every one of them has a worst season than last except for Johnson and Saborio. That doesn't mean they're not worth owning, however, and we have to go with one or two of them simply for good options to captain when they have easy gameweeks.

Higuain - the most undervalued player in the entire game who is guaranteed to start (Gordon ranks above him, but is not a locked-in starter). Higuain - the player projected to score the most points out of anyone in the entire game at any position. Higuain - the guy who only has one home game (against San Jose nonetheless) in the first five gameweeks. No player has been in and out of my lineup more. He's only owned by a mere 10% at the time of writing, so has potential for significant price rise if he starts hot and not much danger of significant price fall if he doesn't.

Not quite elite options here, but still pricy enough to make it feel like they should be producing more than these guys did last season. Espindola is an interesting option for me. Assuming he'll be playing a similar role as Cooper did last season, he should get plenty of quality chances, though his historical shot conversion rate is lower. Hassli is another player who I think will greatly benefit from his transfer to Dallas. 

The players we want to have that breakout season that can make our friends wonder, "how the hell did you know he was going to score so many goals this season!?" I actually made a bold prediction that Arrieta is going to win the golden boot this year so long as Higuain remains healthy. I would still pick Higuain and try to find somewhere else to save the money. Ryan Johnson and Juan Agudelo are two players I think will overperform these predictions simply due to changes in playing style and new coaches. At this point, Estrada looks to be Seattle's second striker, but I wouldn't be surprised to see Martinez eat some of those minutes.

Again, the budget options here are not locked-in starters. MacDonald is the only one who seems to be the first choice for their team. Mwanga might be with the injury to Dike and Oduro has seen time with the first squad in the preseason for Columbus. If Gordon does end up being a starter (because of running a 4-3-3 or Wondolowski dropping to a midfield role) he becomes the best value in the entire game. It will be tough to be as efficient as last season, but still has plenty of room to deliver adequate returns for his price.

Finding players who are undervalued: Part 3 - midfielders

Again, the key column we want to examine is the residuals column. The greater the number, the more undervalued that player is. This first group I decided to look at the "elite" options instead of looking at them all as individual price brackets. Here we see that most of these players are undervalued despite being the most expensive players in the midfield. My eye is on Ferreira because Dallas have signed Cooper with a great finishing rate and also Hassli who has a great workrate and combines well holding up the ball so the midfielders can get into the attack.

The next group of not-quite-elite players is a bunch of players that just don't do it for me. Pontius and Morales are the only two who were on my shortlist before looking at the data and are the only two who are there after looking at the data.

The 8m bracket has a bunch of players who are here because of their name or simply had a really good season last year that I'm not sure they'll replicate. Le Toux is a midfielder in this season's game, but will likely be used as a forward and may have spot-kick duty.

Adu is the only player who piques my interest from this group but he seems to be on his way out of Philly so he is toxic to fantasy managers. Completely avoid this price bracket.

A bunch of defensive midfielders come in the 7m price bracket. Again, avoid everyone in this list.

Now we're starting to get into some interesting options. This is the group we want to take some risks in as these are generally attacking players who were on bad attacking teams last season or simply didn't get much playing time. Cascio and Bolanos are both currently in my side as cheap midfielders who are attacking players that have potential to have better seasons than last year by virtue of their teams having better seasons than last year.

There really aren't many players here that are guaranteed starters. Figuring out who has a decent chance of locking down that starting role is key to picking a 6m option. As I mentioned in a previous post, the defensive midfielders (Kitchen, Carroll, Simms, Nagamura, etc.) are overpriced, even at this price level.

And there's the bargain basement options. I'm not sure any of these guys are starters, with Pause and Saragosa most likely. I would probably spend the extra .5m to get a guy in the 6m bracket who is guaranteed to start.

Keeper rotation error

This was important enough to where I felt the need to make a post about it. My post about keeper rotation highlighted the combination of DC and Chicago keepers since they had surprisingly low predicted goals conceded... too low to be right. So, I revisited the data and realized there is a gameweek where neither of them play. So, I now advise NOT to go with a combination of DC and Chicago. I am looking at other potential options, with LA likely being the team I focus on to find a pairing for. LA was a completely different team defensively when they had Omar Gonzalez return from injury and I think Cudicini will be an upgrade over Saunders to boot.

The following keeper combinations have a week where neither play, so their numbers are too low:
Chicago/DC (missing 2 gameweeks in fact)

Finding players who are undervalued: Part 2 - defenders

This time I ran the regression using only defenders. I categorized them based on price and ranked them within their price bracket based on how undervalued they are based on last year's PP90. Note: The "rank" column shows the player's overall rank with only defenders in the regression.

Collin comes in as the number 1 overall undervalued defender despite being in the top price bracket. He also has the highest expected PP90 with the next closest (Bernardez) with a PP90 10 percent lower. I wasn't planning to have any defenders priced at 6m in my squad, but I just can't see how we can pass Collin up given that he is one of the most underpriced players in the game. Bernardez also stands a good chance of making my squad at this point.

 There's no real standout undervalued player in the 5.5m price bracket. For me, examining the Houston defenders is the key point here. If we compare Taylor, Ashe, and Boswell, Taylor is the best value and also has the highest PP90. Ashe was originally in my team, but this has convinced me to find the extra cash to bring in Taylor instead.

The 5m bracket is where things start to get a bit more convoluted. Even though Burch is on top of this bracket, and second overall, he shouldn't make anyone's team because he will not be the starter in Seattle unless Gonzalez gets injured. McCarthy and Tierney come from an interesting New England team that quietly kept nine clean sheets last season. I'm not exactly sure who will be the starters for the Revs, but having one of those two in your side might be useful. There's a slew of useful options in Hedges, Hurtado, Gonzalez, Parke, Harvey, Williams... basically there are a ton of decent choices in the 5m bracket that you should adjust these projections based on how you think defenses will fare relative to last season and also take a look at the difficulty of upcoming fixtures.

And finally, we have our budget options. The trick here is finding players that will be starters. Woolard (should be owned by far more than the mere 4.8% that do at the time of writing) and Jazic are the only ones at this point I have any confidence in starting. Korb could start in DC if the Riley transfer rumors are nothing more than rumors. New York has a couple players that could end up as starters, I think Lade has a better chance than Kimura at this point.

Friday, February 15, 2013

Finding players who are undervalued: part 1

Sorry for the long post, but I found this topic particularly fascinating and insightful for picking my team.

For this first post in a series of several, I am going to explore the idea of finding players who project to score more points per game than the average player for that price range. To do that I used my database to do a regression looking at projected points per 90 minutes (PP90) and price. Again, I just want to remind my readers that these projections are 100% based on last year's performance and adjusted as if all players played every minute during the season.

This initial chart shows when I ran the regression using all players with 1,000+ minutes. The key column to look at here is the "residuals" column. Positive numbers represent players who's performance last season is better than the average for all players of the same price in this year's game. The larger the number, the more undervalued they are. The important takeaways are:

1. Gordon and Higuain are significantly more undervalued than any other player based on their PP90 output from last season and this season's pricing.
2. Forwards tend to be more overvalued than the other two positions.
3. Defenders tend to be undervalued.
4. Midfielders seem to be more accurately valued, on average, than the other two field positions.
5. Defensive midfielders tend to be overvalued. This was the question I was originally trying to answer - general fantasy soccer knowledge says that d-mids aren't worth owning. However, with the additional options to earn points with CBI, recoveries, key passes, etc. That claim needed to be re-examined. It still holds true, in general, as many of the league's defensive midfielders have a negative residual.
6. Some of the most undervalued players also happen to be some of the most expensive in their positions - Henry, Collin, Donovan, Zusi, Bernardez. These are the types of players we want to spend our money on and find suitable cheap options for our other players (Woolard, MacDonald, Tierney, Saragosa).

I have several more posts that are related to this that I will hopefully find time for this weekend. I plan to do this same regression idea of this post except specialized for each position and sort by price to find some potentially undervalued players in each price bracket. I Also plan to explore which attribute(s) (goals, assists, CBI, key passes, etc.) is most indicative of a high PP90 for each position.

Sunday, February 10, 2013

Keeper rotation

Editor's note (02/16/2013): The combination of DC and Chicago keepers was artificially low because there is a gameweek where neither team plays. Avoid this combination, despite previous advice that it was the best combination to select.

Keeper selection is an interesting issue that has been explored dozens of times for other fantasy soccer games. The general consensus is to find two cheap keepers who rotate having good matchups. I'm a big proponent of this strategy as it allows us to upgrade our squads in other areas. The other strategy is to buy one premium keeper and one cheap keeper and play the premium keeper in the vast majority of gameweeks. In the official MLS game, my gut is telling me to choose the two-cheap-keeper rotation strategy because of the bonus points players receive for recoveries, CBI, etc.

I took the expected goals conceded each week by each team for the first 10 gameweeks and contrasted them with every other team. I then chose the lower value for each gameweek for each combination and used that as my choice keeper for the week. From there I simply added up the value for each of the 10 gameweeks for the following values. The values in red text indicate that there is at least one gameweek where neither team plays.

I also picked out the top 10 combinations for each price combination possible for starting keepers. There were a couple teams (Vancouver, New York, Dallas) where I wasn't sure who would be the starting keeper and would result in different pricing, so I used my best guess for each team. Combinations in yellow indicate at least one missing gameweek.

I should also make note that LA was a completely different team last year with Gonzalez in the lineup. However, even when I adjust the numbers based on their goals conceded with Gonzalez, they still do not have more than one combination in the top 10 overall.

The following keeper combinations have a week where neither play, so their numbers are too low:
Chicago/DC (missing 2 gameweeks in fact)

Saturday, February 9, 2013

Player projections - defenders

We should approach defenders a bit differently than we approach our attacking options. Since a large part of how defenders score points for us in fantasy is collecting clean sheets, we essentially can just compare which teams are likely to collect clean sheets. Though we are definitely selecting individual players for our defenders, we are also selecting defensive units. If I want to choose Corey Ashe, I am effectively buying into the Houston defensive unit as a whole. Because of this, I tend to choose the cheapest option in each unit who will be a starter unless there is reason to believe one of the more expensive players will produce more bonus points or goals/assists.

Sunday, February 3, 2013

Player projections - Midfielders

The midfielder situation isn't nearly as cut and dry as the forward situation. Midfielders tend to get more of both the attacking and defending bonus points in the official MLS game so their scores seem to have a narrower variance for players we are actually considering. There aren't really any clear breaks where there is an obvious way to group the top couple players. The closest I can see happens around Shea Salinas ranking 8th where if we ignore Salinas, we see a drop of about 20 points over the season.

I'm going to leave out Donovan who is off being Sad Landon and we don't know when he will return. So, that leaves us with Rosales, Zusi, Davis, Chavez, Bernier, and Pontius. I also think that Ferreira, DeRo, and LeToux all have potential to give us fantastic returns as they are all coming off tumultuous season of injury or transfers.

Player projections - Forwards

As I mentioned in an earlier post, I created a database with almost all the data from last year's official MLS fantasy game so I could actually make some educated decisions in this year's game. I converted all the categories to a per-minute basis so I could then project player scores as if they played every minute of the season to try and find the best value for money players available based on 2012 stats and 2013 prices. I calculated how many points per game a player would have scored last season if they had played every minute for: goals, assists, clean sheets, CBI, recoveries, crosses, key passes, big chances created, big chances fluffed, and goals allowed.

No, this does not include absolutely every scoring category. And no, my calculations aren't going to be perfect projections - partly because if a player gives exactly one cross per 90 minutes, my model gives them 1/3 point per game, but mostly because this is projecting future returns based off previous returns and does not take into account actual playing time, playing a different role in the team, different team quality in general, etc. There's a million different variables that will make my projections wrong, but I'd rather base my picks on something other than pure speculation and gut feeling. So, here it is:

There's really Higuain, Henry, Gordon, then Wondo, then everyone else. If you don't have one of those four players, you are making a huge mistake. It is reasonable to assume that Gordon won't get enough minutes to be a reliable option. So we are really just deciding between Higuain, Henry, and Wondo for our big-money striker or maybe some of you even want to pick two of them - I see no problem with that. I am going with Wondo to start my season because San Jose are predicted to score the most goals over the first 3, 6, and 10 gameweeks (see my earlier posts for more on that) and they don't have any blank gameweeks to start the season.

Friday, February 1, 2013

Predicted goals allowed - first 10 weeks

I used the exact same methodology to predict goals allowed as I did for goals scored. Again, I want to emphasize that these are 100% based on last year's results. Adjust them as you see fit (I'd recommend rating LA higher than last season, especially with a healthy Gonzalez and upgrade at keeper). Generally, teams playing at home concede fewer goals than they do on the road, so it is a good idea to play defenders playing at home and sit defenders playing away, especially against strong opposition.

Since having a low average goals conceded per game does not necessarily translate to having a high number of clean sheets, we need a different way of predicting who will have clean sheets. I have simply tabulated how many times each team is expected to concede fewer than 1 goal and also fewer than 1.2 goals.

Predicted goals scored - first 10 weeks

The following tables are calculated based off of 2012 goals scored and conceded per game home and away by each team. I simply took the average of goals scored per game by the home team (or away team) and goals allowed by the away team (or home team). This is far from a precise prediction, especially in MLS where so much can change from one season to the next in terms of rosters, player roles, and team playing styles.

This first table shows how many goals each team is expected to score each game for the first 10 weeks. Trying to plan any further out that this is absolutely pointless since we can't know how much each team's quality has changed on either side of the ball. A blank space means they don't have a game that week (or two in the second column of week 8 marked "double"). Teams that had a high-powered offense last season like San Jose will generally look better in these charts. Feel free to adjust based on your own expectations about how a team will do relative to how they fared last season.

This second chart simply adds up how many goals each team is expected to score over a given period of time to start the season. I tend to plan about half my team (generally the more expensive players who I expect will see their price rise) for the 6-week time-frame and the other half for the first couple gameweeks, planning out my first two or three weeks of transfers.

2013 MLS fantasy insight is on the way!

Hello readers!

First, I just want to say that I don't intend to run this as a weekly blog this time around. I simply don't have the time or commitment to do so. I will be doing a preseason breakdown based on last season's statistics.This year I will be looking at the official MLS game instead of the ESPN game.

I managed to input into a database all of the data (such as recoveries, crosses, key passes, etc.) for all players with at least 1,000 minutes played last year. I've separated them out by position and will be using that data to project how many points a player could earn over this season.

I'll also do my typical projection of how many goals a team is projected to score and concede each game based on how many they scored and conceded home and away last season.

So, expect a flurry of posts over the next two or three weeks and then back to blackout for several weeks until I find the gumption to give my revised projections and suggestions.