Basketball Analysis Tutorial¶

Thomas Huang and Chase Stockwell

Background/ Current Literature¶

There has been a significant amount of research into the prediction of basketball players based on their collegiate years into their overall significance in the NBA arena. Due to the massive market that the NBA now possesses as a center point of entertainment in the US and around the world, the predictive ability of the best agents and teams are constantly being improved to increase their use of draft positions.

However, these tools are not widely available to the public for personal use. Although online discussion can help contribute and shape perspective of player rankings and success, there has been less noted focus on the longevity of player performance in the NBA. Common methods for predicting ability of players were using the NBA formula for productivity. However, many of these statistics are used to make predictions for early success in their career and their ability to establish a draft position. Our study, rather, will look at the longitudinal data retroactively to determine the longevity of their time in the league and how this is correlated to aspects taken from their time in collegiate basketball.

Objective¶

We intend to create using data science an algorithm that is able to extract the stats that are most significant in indicating and/or predicting longitudinal success in the professional scene among college players. We define success as the following: Longevity of NBA career relative to position Amount of money earned on the player’s contract Overall ‘basketball statistics’ that would be considered successful (points, assists, steals, blocks). Wins College Statistics Physical Qualities

Databases¶

Hoops Hype Salaries of individual players 2000-2020

https://hoopshype.com/salaries/2000-2001/

Bart Torvik On-court stats of individual players

http://barttorvik.com/trankpre.php

NBA Basketball Reference Statistics On-court stats of individual NBA players

https://www.basketball-reference.com/leagues/NBA_2001_totals.html

Hoops Hype Database¶

Will require data scraping Salaries of NBA Players Details the salaries of individual players in the NBA from years 2000 to 2020. This is raw data, so the salary data that will be used as one factor of many to determine success of players. Salaries will also be taken in relevance to salary caps and other monetary statistics for the NBA that year as inflation as well as the compensation of players has changed significantly over the years.

Bart Torvik Database¶

Will not require data scraping - contacted Bart Torvik, will send raw data in csv files. Statistics of individual players in college years. Details include all on-court stats, school, physical characteristics, etc.

NBA Basketball Reference¶

Does not require data scraping - instead the data was available as downloadable csv files that can be found in the data_NBA folder.

Methods¶

Data Scraping - we want to data scrape the databases above from their websites.
We intend to use methods learnt in class to achieve this For more complicated extractions, we will use Stack Overflow and other modules online to learn how to compile the data. We may also interact with software API’s to gain more information on the data, using methods as we have learned in lecture. Python + Pandas to interpret and analyze csv raw data We intend to use the same techniques as performed in class to create DataFrames of the tables. Some SQL may be used to join and merge together tables for a clearer picture whatever inferential statistic we are pursuing. Statistics Modeling to analyze data General Stats Description: Mean, Medians, etc… by quartile groups of “success” groups Assignment of success scores of current NBA players based on various factors.

Impact¶

The inspiration behind this project is that the arguably most influential college basketball player within the last 20 years drafted on the New Orleans team this year, and we want to make sure that the indicators of his success in the NBA look promising. But this project has much more breadth. This could examine: Coaches & Scouts that use general success statistics to access overall talent. Video Game creators to decide what number to quantify the skill of the player in sports video games such as the NBA 2k Series and the NBA Live Series Advertisers to figure out which potential superstars to target for a cheap price before they become superstars. Small Market Teams who can elect to select value players that have an effective skill-to-dollar-paid ratio.

Conflicts of Interest¶

There are no conflicts of interests. This project was not sourced nor paid for by any participating basketball organization nor any subjects of the study This will be unbiased, objective research only using quantifiable statistics to make empirical assessments.

Sources:¶

Coates, Dennis, University of Maryland, ResearchGate, The Length and Success of NBA Careers: Does College Production Predict Professional Outcomes? Oct 11th, 2019

The first step to our analysis of various datasets is to install the proper libraries necessary for data collection, analysis, and to also make the display of this notebook more suitable.

Data Acquisition¶

The first step of any analysis is getting your data. So what data will we need to work with to obtain our data? We will actually be utilizing three different datasets. One is scraped while two others are available in csv format.

import pandas as pd
import numpy as np
%matplotlib inline
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

#The output of this cell below is hidden to allow the notebook to appear more organized. Press "o" while selecting this cell to show the output.

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py37_0         148 KB  conda-forge
    libxslt-1.1.33             |       h31b3aaa_0         556 KB  conda-forge
    lxml-4.4.2                 |   py37h7ec2d77_0         1.5 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  libxslt            conda-forge/linux-64::libxslt-1.1.33-h31b3aaa_0
  lxml               conda-forge/linux-64::lxml-4.4.2-py37h7ec2d77_0

The following packages will be UPDATED:

  ca-certificates                      2019.9.11-hecc5488_0 --> 2019.11.28-hecc5488_0
  certifi                                  2019.9.11-py37_0 --> 2019.11.28-py37_0


Downloading and Extracting Packages
certifi-2019.11.28   | 148 KB    | ##################################### | 100% 
lxml-4.4.2           | 1.5 MB    | ##################################### | 100% 
ca-certificates-2019 | 145 KB    | ##################################### | 100% 
libxslt-1.1.33       | 556 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

The first database we will be looking at is Hoops Hype. Hoops Hype gives us team information rather than information concerning individual players. Thus, this dataset will actually be one of the smallest ones we work with. BeautifulSoup greatly helps us in datascraping by easing the interpretation of the data.

We want to look at the NBA seasons beginning in 2000 to 2018. We will not be using 2019 as part of the data set because the season is unfinished.

#Below are a few more import statement necessary for data scraping from a website.
import requests
from bs4 import BeautifulSoup

'''
The first year from which we will be collecting data from hoopshype datebase is 1990.

For this analysis, the seasons will be labeled by the year in which the season began.
Thus, as an example, the NBA season 2019-2020 will be labeled as 2019.
'''
starting = 2000

#An empty list to be populated by team salaries is created.
team_salaries = []

#The beginning index is 0.
index = 0

#We do not want to obtain 2019 data is the season is still unfinished.
while starting < 2019:
    
    '''
    Request formats the URL in which the data will be scraped from. The {} within the
    URL will be replaced by the current year.
    '''
    request = "https://hoopshype.com/salaries/{}-{}/".format(str(starting), str(starting+1))
    r = requests.get(request)
    # BeautifulSoup allows us to parse the return data from the request.
    root = BeautifulSoup(r.content, 'html.parser')

    # Prettify then allows us to transform the large lock of text into something legible so we can read it.
    root.prettify()

    # Use find() to save the aforementioned table as a variable
    hh_table = root.find('table')


    # Use pandas to read the HTML file
    list_df = pd.read_html(str(hh_table) )
    #print(list_df[0])
    
    # In the case in where this instance of the while loop is the first year scraped,
    if starting == 2000:
        # The dataframe team_salaries is created with the salary taken from list_df.
        team_salaries = list_df[0]
        
        # Set reasonable names for the table columns. Also adds the year of the dataset. 
        team_salaries.columns = ['Rank', 'NBA Team', 'Total Salary', 'Total Salary (adj.)']
        yr = []
        
        # Then iterate through team_salaries and append to yr, the current year.
        for i in range(len(team_salaries.index)):
            yr.append(starting)
        
        # Set the Year column of team_salaries equal to 
        team_salaries['Year'] = yr
    else: 
        team_salaries2 = list_df[0]

        # Set reasonable names for the table columns. Also adds the year of the dataset. 
        team_salaries2.columns = ['Rank', 'NBA Team', 'Total Salary', 'Total Salary (adj.)']
        yr = []
        for i in range(len(team_salaries2.index)):
            yr.append(starting)
        team_salaries2['Year'] = yr
        
        team_salaries = pd.concat([team_salaries, team_salaries2])
    
    #Once we have reached the end of this iteration of the while loop, we move on to the next year.
    starting = starting + 1

#We now have our working dataframe that we will use for the NBA salaries. It is tidied up now, because the year has been merged into a separate variable
#and the data now grows vertically instead of horizontally.
df_team_salaries = team_salaries

df_team_salaries.head()

In the table, you can see the Team, Total Salary, the adjusted salary, and the year. This information will be crucial later when we want to compare team salaries and how the pay is distributed among players.

Below are the unique NBA team names found in the hoops hype data. As we begin to accrue more data, it is important to be able to have a good understanding of how you plan to merge tables. This table is the only table that uses the full name of the team. Therefore, we need to use a function called map to change the names of each NBA team to their three letter abbreviation.

#This displays every NBA team found in the hoops hype data.
df_team_salaries['NBA Team'].unique()

array(['Portland', 'New York', 'Miami', 'Brooklyn', 'Washington',
       'LA Lakers', 'Milwaukee', 'San Antonio', 'Indiana', 'Phoenix',
       'Utah', 'Dallas', 'Denver', 'Oklahoma City', 'Boston',
       'Philadelphia', 'Cleveland', 'Houston', 'Memphis', 'Minnesota',
       'Charlotte', 'Sacramento', 'Golden State', 'Detroit', 'Atlanta',
       'Toronto', 'Orlando', 'Chicago', 'LA Clippers', 'New Orleans'],
      dtype=object)

df_team_salaries['NBA Team'] = df_team_salaries['NBA Team'].map({ 
    'Cleveland' : 'CLE', 
    'New York' : 'NYK',
    'Detroit' : 'DET', 
    'LA Lakers' : 'LAL', 
    'Atlanta' : 'ATL',
    'Dallas' : 'DAL', 
    'Philadelphia' : 'PHI', 
    'Milwaukee' : 'MIL', 
    'Phoenix' : 'PHO', 
    'Brooklyn' : 'NJN',
    'Boston' : 'BOS', 
    'Portland' : 'POR', 
    'Golden State' : 'GSW', 
    'San Antonio' : 'SAS', 
    'Indiana' : 'IND',
    'Utah' : 'UTA', 
    'Oklahoma City' : 'SEA', 
    'Houston' : 'HOU', 
    'Charlotte' : 'CHA', 
    'Denver' : 'DEN',
    'LA Clippers' : 'LAC', 
    'Chicago' : 'CHI', 
    'Washington' : 'WAS', 
    'Sacramento' : 'SAC', 
    'Miami' : 'MIA',
    'Minnesota' : 'MIN', 
    'Orlando' : 'ORL', 
    'Memphis' : 'VAN', 
    'Toronto' : 'TOT', 
    'New Orleans' : 'NOA'
})

df_team_salaries.head()

Good. Now that each NBA team has been properly named to the three letter abreviation rather than the full team name, merging and joining functions across different tables is made much easier. Let's move on to the next database we will be looking at.

Our next dataset comes from basketball references. This is a great resource for data scientists interested in sports as they have data on almost every single well documented professional and amateur league. It contains on-court information concerning every single player. As this dataset was easily downloadable as csv files, we will be using the os library to quickly iterate through the data_NBA directory to read in each file into a central dataframe named df_NBA.

import os

directory_in_str = "./data_NBA"
directory = os.fsencode(directory_in_str)

#Set the first year to be read in as 2000 and create an empty DataFrame.
yNum = 2000
df_NBA = pd.DataFrame()

# Now, we will iterate through each file in the directory.
for file in os.listdir(directory):
     filename = os.fsdecode(file)
     # If the file is a csv file, then it is one we want to read in. The files are ordered by year, so we know exactly which year we are reading in.
     if filename.endswith(".csv"):
        # A temporary dataframe is created for each file.
        df_temp = pd.read_csv((directory_in_str +"/"+ filename), sep=",", encoding='latin-1')
        # The current dataframe has each observation set to its respective year.
        df_temp["Year"] = yNum
        # This current dataframe is then concatenated to the main dataframe and then discarded.
        df_NBA = pd.concat([df_NBA, df_temp], ignore_index = True)
        # Increase the current year by one.
        yNum = yNum+1
     else:
         continue
# Remove the column Rk, which is the old index of the original table.
del df_NBA["Rk"]
df_NBA.head()

We now have the data loaded into a dataframe, we need to tidy it. Most of these column headings are shortened for readability. In the cell below, we take a look at the different columns. Most of this information is very important. They are all on-court stats for each player in each year.

df_NBA.columns

Index(['Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P',
       '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB',
       'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'Year'],
      dtype='object')

Looking at the columns above, we could probably discern some of these column headers. FT more than likely means free throws and Tm probably means Teams. However, lets change the column headings to increase the readability of our table.

df_NBA.columns = ["Player", "Position", "Age", "Team",
                  "Games", "Games Started", "Minutes", "Field Goals",
                  "FG Attempts", "FG Percent","3P FG", "3P Attempts",
                  "3P Percent", "2P FG", "2P Attempts", "2P Percent",
                  "Effective FGP", "FT made", "FT Attempts","FT Percent",
                  "Off Rebounds", "Def Rebounds", "Total Rebounds", "Assists",
                  "Steals", "Blocks", "Turnovers", "P Fouls", "Points",
                  "Year"]
df_NBA["Player"] = df_NBA["Player"].str.split("\\").str[0]
df_NBA.head()

Perfect. Now, we have a table that is easily interpretable. In addition, the name of each player was blocked in a way such that each player had a unique identifier and their full name. Since the unique identifer is a naming schema used only by this dataset, it will not be useful in merging with other data. Thus, we used the string split function to remove the latter, coded identifer and leavve just the player name.

One issue with the data is that players that were traded in the middle of the season were divided into different observations per year, and then a third column was created for their total stats for the entire year. The main issue is that in merging with other data frames, the salaries are based on what each team paid them. Thus, we will remove the summation observations and leave the individual team observations as separate in the table.

# If a record has the Team = TOT, then it is the total summation of that player's stats and is thus a repetitive column.
df_NBA = df_NBA.loc[df_NBA["Team"] != "TOT"]

df_NBA["3P% of Total Attempts"] = df_NBA["3P Attempts"]/(df_NBA["3P Attempts"]+df_NBA["FG Attempts"])
df_NBA.head()

One statistic that isn't shown is that frequency of three point attempts relative to total shot attempts for each player. This will be interesting to analyze due to the changing nature of the NBA. In the cell above, we also calculated this stat.

Our next dataset will be the salary data, for individual players this time. Our hypothesis is that salary is able to accurately reflect the projected success and impact a player can have. However, this is not always the case as players can underperform. In addition, some players may not be paid simply just for their skills on the court but also factor such as their fanbase, presence in the locker room, and brand.

starting = 1999
player_salaries = []
index = 0


while starting < 2019:
    
    for pages in range(1, 16):
    #http://www.espn.com/nba/salaries/_/year/2000
        request = "http://www.espn.com/nba/salaries/_/year/{}/page/{}".format(str(starting+1), str(pages) )
        r = requests.get(request)
   
        #4. Use BeautifulSoup to read and parse the data, as html or lxml
        root = BeautifulSoup(r.content, 'html.parser')

        #5. Use prettify to view the content and find the appropriate table
        root.prettify()

        #6. Use find() to save the aforementioned table as a variable
        salary_table = root.find('table')
    
        #7. Use pandas to read the HTML file
        if salary_table is None:
            continue
        else:
            df_salary = pd.read_html(str(salary_table) )
    
        if starting == 1999:
            player_salaries = df_salary[0]
        
        #8 Set reasonable names for the table columns. Also adds the year of the dataset. 
            player_salaries.columns = ['Rank', 'Player', 'Team', 'Salary']
            player_salaries['Year'] = starting
        
        else: 
            player_salaries2 = df_salary[0]
            #8 Set reasonable names for the table columns. Also adds the year of the dataset. 
            player_salaries2.columns = ['Rank', 'Player', 'Team', 'Salary']
            player_salaries2['Year'] = starting
            player_salaries = pd.concat([player_salaries, player_salaries2])
        
    starting = starting + 1

#We now have our working dataframe that we will use for the NBA salaries. It is tidied up now, because the year has been merged into a separate variable
#and the data now grows vertically instead of horizontally.

player_salaries.head()

Our initial scrape of the salary data from NBA stats is now finished. However, just from the first five elements, we can see that the scrape was not clean. Some of the rows in the tables had RK NAME TEAM SALARY rows every 10 rows for reference. Let's get ride of those.

player_salaries = player_salaries[player_salaries["Rank"] != "RK"]
player_salaries.head()

Lastly, the player names are again disorganized. The position of the player is included into the Player column. We can drop that using the same string split function we used in the previous dataset.

player_salaries['Player'] = player_salaries['Player'].str.split(',').str[0]
player_salaries.head()

Our final dataset to read in from csv files is the ranking data as well as the regular season record for each year.

mega_dataframe = pd.DataFrame()

counter = 1999

while counter < 2019:
    r = 'data_NBA/nba standings/{}.csv'.format(str(counter))
    rotating_df = pd.read_csv(r, skiprows = [0])
    rotating_df['Year'] = counter
    mega_dataframe = pd.concat([mega_dataframe, rotating_df], sort=False, ignore_index = True)
    counter = counter + 1

df_rank = mega_dataframe[["Rk", "Team", "Overall", "Year"]]
df_rank.head()

Again, one issue with this dataset is that each team is labeled differently from the other datasets. In order to allow for merging across tables, we need to map each team name to their respective three letter team name.

print(df_rank["Team"].unique())
df_NBA["Team"].unique()

['Los Angeles Lakers' 'Portland Trail Blazers' 'Indiana Pacers'
 'Utah Jazz' 'Phoenix Suns' 'San Antonio Spurs' 'Miami Heat'
 'Minnesota Timberwolves' 'New York Knicks' 'Charlotte Hornets'
 'Philadelphia 76ers' 'Seattle SuperSonics' 'Toronto Raptors'
 'Sacramento Kings' 'Detroit Pistons' 'Milwaukee Bucks' 'Orlando Magic'
 'Dallas Mavericks' 'Boston Celtics' 'Denver Nuggets' 'Houston Rockets'
 'Cleveland Cavaliers' 'New Jersey Nets' 'Washington Wizards'
 'Atlanta Hawks' 'Vancouver Grizzlies' 'Golden State Warriors'
 'Chicago Bulls' 'Los Angeles Clippers' 'Memphis Grizzlies'
 'New Orleans Hornets' 'Charlotte Bobcats'
 'New Orleans/Oklahoma City Hornets' 'Oklahoma City Thunder'
 'Brooklyn Nets' 'New Orleans Pelicans']

array(['VAN', 'DEN', 'ORL', 'DAL', 'WAS', 'MIL', 'SAS', 'BOS', 'SAC',
       'HOU', 'POR', 'DET', 'MIN', 'CHI', 'SEA', 'PHI', 'IND', 'UTA',
       'GSW', 'PHO', 'TOR', 'CLE', 'ATL', 'MIA', 'LAC', 'CHH', 'NYK',
       'LAL', 'NJN', 'MEM', 'NOH', 'CHA', 'NOK', 'OKC', 'BRK', 'NOP',
       'CHO'], dtype=object)

Above are the team names in the df_rank dataframe. We want to map all of these team names to the three letter abbreviations seen in the df_NBA teams.

df_rank["Team"] = df_rank["Team"].map({ 
    'Los Angeles Lakers' : "LAL",
    'Portland Trail Blazers' : "POR",
    'Indiana Pacers' : "IND",
    'Utah Jazz' : "UTA",
    'Phoenix Suns' : "PHO",
    'San Antonio Spurs' : "SAS",
    'Miami Heat' : 'MIA',
    'Minnesota Timberwolves' : "MIN",
    'New York Knicks' : "NYK",
    'Charlotte Hornets' : "CHH",
    'Philadelphia 76ers' : "PHI",
    'Seattle SuperSonics' : "SEA",
    'Toronto Raptors' : "TOR",
    'Sacramento Kings' : "SAC",
    'Detroit Pistons' : "DET",
    'Milwaukee Bucks' : "MIL",
    'Orlando Magic' : "ORL",
    'Dallas Mavericks' : "DAL",
    'Boston Celtics' : "BOS",
    'Denver Nuggets' : 'DEN',
    'Houston Rockets' : "HOU",
    'Cleveland Cavaliers' : "CLE",
    'New Jersey Nets' : "NJN",
    'Washington Wizards' : "WAS",
    'Atlanta Hawks' : "ATL",
    'Vancouver Grizzlies' : "VAN",
    'Golden State Warriors' : "GSW",
    'Chicago Bulls' : "CHI",
    'Los Angeles Clippers' : "LAC",
    'Memphis Grizzlies' : "MEM",
    'New Orleans Hornets' : "NOH",
    'Charlotte Bobcats' : "CHA",
    'New Orleans/Oklahoma City Hornets' : "NOK",
    'Oklahoma City Thunder' : "OKC",
    'Brooklyn Nets' : "BRK",
    'New Orleans Pelicans' : "NOP"
})

df_rank.head()

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:37: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Okay. Now, we finally have all of our data imported. Let's take a look at all of it. We have four different datasets:

df_team_salaries: The team salaries of each team in each year.
df_NBA: Individual player stats for each player in the league during the regular season in each year.
player_salaries: The player salaries for each player in each year.
df_rank: The ranking and record for each team in each year.

display(df_team_salaries.head())
display(df_NBA.head())
display(player_salaries.head())
display(df_rank.head())

Exploratory Analysis and Characterization of the Data¶

Our next step is to explore our data a little bit. What interesting things can we see?

One main discussion in looking at the meta of the NBA is the rise of the three point shot. With the rise of Stephen Curry came with it the change of offensive style of the game. Three point shots became known as the effecient shot to take. Valued at 50% more points than a shot made in the paint, it gives overall more points per possession given a decent three-point shot percentage. The increase in three-point shots have also increased spacing on the floor, increasing the availability for players in the post to make plays.

In order to better comprehend the change in 3 point attempts, a bar graph will suffice as the plot of choice in explaining the shift in the NBA scoring meta.

#Grouping by each year, the percent of shots taken that were threes were averaged and then plotted as a bar per year.
attempts3_plot = df_NBA.groupby("Year")["3P% of Total Attempts"].mean().plot.bar(figsize = (10, 5))
attempts3_plot.set_title("Percent of total shot attempts that were 3P attempts over Time")
attempts3_plot.set_ylabel("Percent of total shot attempts that were 3P attempts")
attempts3_plot.set_xlabel("Year")
attempts3_plot

<matplotlib.axes._subplots.AxesSubplot at 0x7faccb238d68>

As expected, the percentage of three point shots have increased over time as the culture has changed. While a significant increase in 3P attempts can be seen from 2010 and onwards, the consistent increase in 3P attempts can be seen 2012 and onwards, which is commonly known as the Golden Age or Dynasty of the Golden State Warriors. Morey is well known for driving a statistics-based basketball program.

Morey began coaching the Houston Rockets and implementing is data science based approach in 2007. Let's take a look specifically at the Houston Rockets and see if there is a significant change.

from scipy import stats

#Create a dataframe that is just the Houston Rockets based on their Team abbreviation HOU.
df_rockets = df_NBA.loc[df_NBA["Team"] == "HOU"]

#Next, lets group by year and find the mean of percent of total field goal attempts that were 3 point attempts of all players in each year.
avg_3P_attempts_percent = df_rockets.groupby("Year")[["3P% of Total Attempts"]].mean()
avg_3P_attempts_percent = avg_3P_attempts_percent.reset_index()

#In order to visualize this, a scatter plot will be suitable.
hou_plot = avg_3P_attempts_percent.plot.scatter(x = "Year", y = "3P% of Total Attempts", figsize = (10, 5))

#A regression line is then fit to each plot using the linregress function.
slope, intercept, r_value, p_value, std_err = stats.linregress(avg_3P_attempts_percent["Year"], avg_3P_attempts_percent["3P% of Total Attempts"])
line = slope*avg_3P_attempts_percent["Year"] + intercept
hou_plot.plot(avg_3P_attempts_percent["Year"], line, 'r', label='fitted line')

hou_plot.set_title("The impact of Morey on the Houston Rockets and their 3P Shot Preference")
hou_plot.set_xlabel("Year")
hou_plot.set_ylabel("Percent of total shot attempts that were 3P attempts")

print(r_value**2)

0.7161733415770244

From the graph, it is evident that the percent of field goal attempts that were three point attempts have steadily increased over the years, from .125 to 0.325 in the 2017-2018 season. The r-squared value is 0.716, indicating a fairly significant trend. Morey was the first non-basketball background General Manager hired in the NBA. He began his tenure in 2007. Interestingly enough, the percent of 3P attempts dropped in 2007 and onwards but greatly rose when James Harden joined the team in 2012.

Next, lets take a look at the salary data across the years for each team in the NBA

#Next, we will create a graphic of the teams over time to see how their payroll increases. 
import matplotlib.pyplot as plt

#We just want to initialize our x and y values here for the graphs. We will not use all of the years or all of the payrolls
x = df_team_salaries['Year']
y = df_team_salaries['Rank']


#There are 30 teams, so we want to make 30 possible teams so we can follow their distributions
fig, axes = plt.subplots(nrows = 6, ncols = 5, sharey = True, sharex = True, figsize = (20,12))
plt.xlim(2000, 2019)

#We want to add each distribution per team. We must first make sure we limit only values that are pertaining to the specific franchise.
index = 0
for row in axes:
    for col in row:
        x = df_team_salaries[df_team_salaries['NBA Team'] == df_team_salaries['NBA Team'].unique()[index]]
        y = df_team_salaries[df_team_salaries['NBA Team'] == df_team_salaries['NBA Team'].unique()[index]]
        
        #Then, we want to graph the item to make it available, and add a title over the name so the reader knows which team is which.  
        col.plot(x['Year'] , y['Rank'])
        col.set_title(df_team_salaries['NBA Team'].unique()[index])
        index = index + 1

plt.gca().invert_yaxis()

This graph shows demonstrates which teams in the NBA had the largest salary cap relative to other NBA teams for the last 20 years. Over the years, the ranking of each of the 30 NBA teams has fluctuated in who was paying the most money for their players over the years. In MileStone 1, we defined player success as them getting a large contract to play for their teams. We also wanted to compare their salary to the team salary their team was making. Thus, we would be able to have a ratio of how important their contract was to their team. This graph is the first part of solving this problem, demonstrating which teams were spending the most money to satisfy the star players on their squad.

The most commonly used effeciency score in the game is calculated using the formula:

(PTS + REB + AST + STL + BLK − Missed FG − Missed FT - TO) / Games Played

This is commonly used in most chart comparison and what you would expect to see on a day to day ESPN show. Using our player data, lets use this to calculate the effeciency score for each player.

df_NBA["Missed Field Goals"] = df_NBA["FG Attempts"] - df_NBA["Field Goals"]
df_NBA["Missed FT"] = df_NBA["FT Attempts"] - df_NBA["FT made"]
df_NBA["Standard Eff Score"] = (df_NBA["Points"] + df_NBA["Total Rebounds"] + df_NBA["Assists"] + df_NBA["Steals"] + df_NBA["Blocks"]
                                - df_NBA["Missed Field Goals"] - df_NBA["Missed FT"] - df_NBA["Turnovers"])/df_NBA["Games"]

df_NBA = df_NBA.sort_values(["Standard Eff Score"], ascending = False)
df_NBA.head()

However, what this type of effeciency score does is that it limits the ability for us to see how effective bench players are. Bench players are inherently biased against in that they will not have nearly as much playing time as the stars would.

The next step in our analysis is to try and gain a better understanding the value of each player. In doing so, we want to create some weighted statistics that show how effective each player was in their time on court. Let's begin by first calculating the stats of each player per minute. We will be calculating the weighted scores based on the number of minutes they playered rather than by their raw totals or even total games due to the inherently bias in how benches are played. Someone like Lebron James with high playrate will always have higher stats than someone like Kawhi Leanord who has constant load management. This system of evaluation is meant to measure players in their effectiveness in their given time on the floor.

df_NBA["Minutes"] = df_NBA["Minutes"].fillna(1)
df_NBA["PPM"] = df_NBA["Points"]/df_NBA["Minutes"] # Points per minute
df_NBA["RPM"] = df_NBA["Total Rebounds"]/df_NBA["Minutes"] # Rebounds per minute
df_NBA["APM"] = df_NBA["Assists"]/df_NBA["Minutes"]# Assists per minute
df_NBA["SPM"] = df_NBA["Steals"]/df_NBA["Minutes"]# Steals per minute
df_NBA["BPM"] = df_NBA["Blocks"]/df_NBA["Minutes"]# Blocks per minute
df_NBA["Missed FGPM"] = df_NBA["Missed Field Goals"]/df_NBA["Minutes"]
df_NBA["Missed FTPM"] = df_NBA["Missed FT"]/df_NBA["Minutes"]
df_NBA["TOPM"] = df_NBA["Turnovers"]/df_NBA["Minutes"]

df_NBA["Min_Eff_Score"] = df_NBA["PPM"] + df_NBA["RPM"] + df_NBA["APM"] + df_NBA["SPM"] + df_NBA["BPM"] - df_NBA["Missed FGPM"] - df_NBA["Missed FTPM"] - df_NBA["TOPM"]
df_NBA.head()

Next, let's create same weighted measurements. We will see how far from the average they are. The greater positive gain, the better they are compared to league average.

df_NBA = df_NBA.fillna(0)
df_NBA = df_NBA.replace([np.inf, -np.inf], 0)

df_NBA["wt_PPM"] = (df_NBA["PPM"]-df_NBA["PPM"].mean())/df_NBA["PPM"].std()
df_NBA["wt_RPM"] = (df_NBA["RPM"]-df_NBA["RPM"].mean())/df_NBA["RPM"].std()
df_NBA["wt_APM"] = (df_NBA["APM"]-df_NBA["APM"].mean())/df_NBA["APM"].std()
df_NBA["wt_SPM"] = (df_NBA["SPM"]-df_NBA["SPM"].mean())/df_NBA["SPM"].std()
df_NBA["wt_BPM"] = (df_NBA["BPM"]-df_NBA["BPM"].mean())/df_NBA["BPM"].std()
df_NBA["wt_Missed FGPM"] = (df_NBA["Missed FGPM"]-df_NBA["Missed FGPM"].mean())/df_NBA["Missed FGPM"].std()
df_NBA["wt_Missed FTPM"] = (df_NBA["Missed FTPM"]-df_NBA["Missed FTPM"].mean())/df_NBA["Missed FTPM"].std()
df_NBA["wt_TOPM"] = (df_NBA["TOPM"]-df_NBA["TOPM"].mean())/df_NBA["TOPM"].std()
df_NBA["wt_score"] = df_NBA["wt_PPM"] + df_NBA["wt_RPM"] + df_NBA["wt_APM"] + df_NBA["wt_SPM"] + df_NBA["wt_BPM"] - df_NBA["wt_Missed FGPM"] - df_NBA["wt_Missed FTPM"] - df_NBA["wt_TOPM"]
df_NBA = df_NBA.sort_values(["wt_score"], ascending = False)
df_NBA[:15]

Here, we suddenly see a completely different side of the NBA. The top 15 players are all young players, playing very few minutes but being very effective in their time on the court. Of course, just like all other metrics, it is important to understand that this metric is not sustainable for these players. In their few minutes on the court, they were able to be very effective. This alternate "effeciency score", what we are calling a weighted minutes score, seems to favor bench players with very low minutes player to superstars. Let's see if their is a possible correlation between the standard efficiency score and weighted score.

eff_wt_plot = df_NBA.plot.scatter(x = "Standard Eff Score", y = "Min_Eff_Score", figsize = (10, 5))

#A regression line is then fit to each plot using the linregress function.
slope, intercept, r_value, p_value, std_err = stats.linregress(df_NBA["Standard Eff Score"], df_NBA["Min_Eff_Score"])
line = slope*df_NBA["Standard Eff Score"] + intercept
eff_wt_plot.plot(df_NBA["Standard Eff Score"], line, 'r', label='fitted line')

eff_wt_plot.set_title("Relationship between Standard Effeciency Score and Minutes Based Effeciency Score")
eff_wt_plot.set_xlabel("Standard Efficiency Score")
eff_wt_plot.set_ylabel("Minute Based Efficiency Score")

print(r_value**2)

0.4088115858646164

Here is the same graph but with the weighted efficiency values. The R-squared value is around 0.409, which means that these two metrics are moderately similar in how the predict the abilities of NBA players.

df_NBA["wt_Standard Eff Score"] = (df_NBA["Standard Eff Score"] - df_NBA["Standard Eff Score"].mean())/df_NBA["Standard Eff Score"].std()
eff_wt_plot = df_NBA.plot.scatter(x = "wt_Standard Eff Score", y = "wt_score", figsize = (10, 5))

#A regression line is then fit to each plot using the linregress function.
slope, intercept, r_value, p_value, std_err = stats.linregress(df_NBA["wt_Standard Eff Score"], df_NBA["wt_score"])
line = slope*df_NBA["wt_Standard Eff Score"] + intercept
eff_wt_plot.plot(df_NBA["wt_Standard Eff Score"], line, 'r', label='fitted line')

eff_wt_plot.set_title("Relationship between Weighted Standard Efficiency Score and Weighted Minutes Based Efficiency Score")
eff_wt_plot.set_xlabel("Wt Standard Efficiency Score")
eff_wt_plot.set_ylabel("Wt Minute Based Efficiency Score")

print(r_value**2)

0.16007960057807463

The weighted graph, however, has a much lower R-squared value.

The next question we want to investigate is whether it is better to have a team assembled of similarly strong players, a team with great depth, or just to have a few kew role players than can carry the rest of the team? We will use the Standard efficiency score to measure the average and spread of each team.

avg_effscores = df_NBA.groupby(["Year", "Team"])[["Standard Eff Score"]].mean()
avg_effscores.head()

Now that we have the average standard efficiency score calculated for each team, we can use it to find relative scores for each player. A relative score for a player means how many times better a player's efficiency score is than compared to the average eff score on their team. If a team has invested in a weak bench but has one or two superstars, then that superstar will have a much higher relative score.

def team_effscore_comparison(score, year, team):
    avg_score = avg_effscores.loc[(year, team)]
    return score/avg_score

df_NBA["rel_score"] = df_NBA.apply(lambda x: team_effscore_comparison(x["Standard Eff Score"], x["Year"], x["Team"]), axis = 1)
df_NBA.head()

The next step is to group the relative scores by team. We want to look at superstars and excessively good role players specifically. Thus, we will look at the maximum relative score of each team, where a high rel_score of a team would indicate the presence of a star player and a general lack of depth, which a lower max rel_score indicating a deeper bench or generally a more evenly spread team.

In addition, we will need to take a look at the ranking data we scraped further up in the notebook. We will merge the data sets in an inner join.

top_eff = df_NBA.groupby(["Year", "Team"])[["rel_score"]].max()
top_eff = top_eff.reset_index()

top_eff = top_eff.merge(df_rank, how = "inner", on = ["Team", "Year"])
top_eff.head()

Now, let's see if superstars will result in a greater success in the regular season.

rel_rk_plot = top_eff.plot.scatter(x = "Rk", y = "rel_score", figsize = (10, 5))

#A regression line is then fit to each plot using the linregress function.
slope, intercept, r_value, p_value, std_err = stats.linregress(top_eff["Rk"], top_eff["rel_score"])
line = slope*top_eff["Rk"] + intercept
rel_rk_plot.plot(top_eff["Rk"], line, 'r', label='fitted line')

rel_rk_plot.set_title("Does having a superstar on a team translate to success in the regular season?")
rel_rk_plot.set_xlabel("Rank of Team at end of Regular Season")
rel_rk_plot.set_ylabel("Highest Relative Score on a Team")

print(r_value**2)

0.17082667821596656

Rank 1 means the highest rank in the season. Thus, we can see a negative slope in the graph. Thus this indicates a relationship between the rank of a team and whether they have a star player. Teams with a superstar player rather than a evenly spread, deep team will tend to perform better in the regular season. However, it is important to note that the R-squared value is quite low, at 0.171. Thus, even though the linear relationship seems to indicate that having superstars over a deep bench leads to regular season success, the relationship itself is not strong.

Associating Salary with NBA Players

The scores are all based off minutes, but now, through a join we will be able to access the salary of how much that specific player made that season.

df_NBA = pd.merge(df_NBA, player_salaries, how = 'inner', on = ['Player', 'Year'])
df_NBA

Now that we have a player's salary, we can generate their salary score. To calculate a salary score, it is the following:

Player Salary / Avg. Salary of Player in the same year.

This way, we are able to control for inflation.

#To perform this calculation, we need to clean salary up so that it is a float dtype. 
df_NBA['Salary'] = df_NBA['Salary'].str[1:]
df_NBA['Salary'] = df_NBA['Salary'].str.replace("," , '')
df_NBA['Salary'] = df_NBA['Salary'].astype(float)

#Next, we must find the average salary of a player for a specific year. We'll call this groupby object avg_sal
avg_sal = df_NBA.groupby(["Year"])[["Salary"]].median()

def player_wtscore_salary(salary, year, player):
    avg_score = avg_sal.loc[(year)]
    return salary/avg_score

#Finally, we return the salary score of each NBA Player using our function
df_NBA["sal_score"] = df_NBA.apply(lambda x: player_wtscore_salary(x["Salary"], x["Year"], x["Player"]), axis = 1)
df_NBA

Now we are able to calculate our efficiency score. To do so, we will use the following formula:

wt_score / sal_score

Using this formula will give us the total 'efficiency' score of the player. Specifically, how impactful is that player on the court, with regards to the minutes he has played and the amount he is paid.

Now, we will be able to see who is the most efficient player in the NBA, according to 'our' statistics. This efficient players performs better on conventional NBA Stas than the average player per minute paid, and is also paid less than the average NBA player for his specific season

df_NBA['efficiency_score'] = df_NBA['wt_score'] / df_NBA['sal_score']
 
df_NBA = df_NBA.sort_values(['efficiency_score'], ascending = False)
df_NBA

As you can see from our data, most of the most efficient players have played only a few minutes. To make this a little more interesting, let's limit the scope of his players to those how have appeared in at least 41 games, or half an NBA regular season.

players_41_eff = df_NBA[df_NBA['Games'] >= 41]
players_41_eff

Now, we are able to see players that participated in a reasonable amount of games within the season and their corresponding efficiency scores

Machine Learning Component

No final datascience database is complete without implementing some machine learning element to our data. We elected to choose machine learning regression to help predict our dataset.

We wanted to assess that given a player's weighted score, how much should he paid? General Managers can use this information to predict how much they are essentially worth, assuming that their stats remain somewhat consistent. Below is our code for the problem:

Our Machine Learning Example Using Predicted Pay Based off Weighted Average. We will use an average of the 5, 30, and 100 nearest neighbors to perform this example and compare to each other.

plt = df_NBA.plot.scatter(x = 'wt_score', y = 'Salary', figsize = (10, 10), title = "K Nearest Neighbors of Weighted Score Predicting Salary Pay")

def get_NN_prediction(x_new, k):
    #Given new observation, returns the k-nearest neighbors prediction
    dists = ((X_train - x_new) ** 2).sum(axis=1)
    inds_sorted = dists.sort_values().index[:k]
    return y_train.loc[inds_sorted].mean()

X_train = df_NBA[["wt_score"]]
y_train = df_NBA["Salary"]

X_new = pd.DataFrame()
X_new["wt_score"] = np.arange(0, 15, 1)
X_new

colors = ['red', 'green', 'blue']

for i,k in enumerate([5, 30, 100]):
    y_new_pred = X_new.apply(get_NN_prediction, axis=1, args=(k,))
    y_new_pred.index= X_new
    y_new_pred.plot.line(color = colors[i], label=str(k), legend=True)

The key takeaways from this graph are as follows:

1. The players that have the highest salary do not have overwhelming weighted scores. In fact, some of the players have a weighted score of about zero. 
2. It does not appear that wt_score and Salary are positively correllated, for the most part. From the interval from a wt_score of [2,4] however, it does appear that there is an increase in pay. 
3. Players that have extremely high wt_scores receive less than average Salary. This may be due to high stats relative to a small amount of minutes played, thus giving the players a generous wt_score.

Final Conclusions¶

In this notebook, we took a brief look into some analysis of aspects of basketball, player statistics, salary, and how much individual players can impact the entire team. The basketball game has evolved in the last decade, and significantly through the change in how it has become a statistically-focused change. The 3 point shot has been dubbed the "most efficient shot" by Morey, and with that the Houston Rockets, and the rest of the league, have increased their focus on the 3 point shot.

The main focus of this project, however, was to investigate what team composition allows for the greatest success. Specifically, we looked into whether emphasizing depth of a team is more important or having one excellent role player that serves as a superstar leads to a more successful team. We found that having a key player is correlated with a greater end of season ranking.

We were also able to show a few different scores of quantifying players. ESPN, NBA, and various other organization often use an efficiency score that includes some basic in game stats divided by the number of games they played. We generated two other important methods of quantifying players: a minute based efficiency score (min_eff_score) as well as a minute and salary based efficiency score, where their score was compared to how much they are paid. This showed just how important some bench players and rookies are that are able to significantly outperform their salary. In the current league environment where most of the money are going to key role players, it can be important to recognize these bench players that are able to outperform their expected salary worth.

	Player	Position	Age	Team	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	Steals	Blocks	Turnovers	P Fouls	Points	Year	3P% of Total Attempts	Missed Field Goals	Missed FT	Standard Eff Score
10391	Giannis Antetokounmpo	PF	24	MIL	72	72	2358	721	1247	0.578	...	92	110	268	232	1994	2018	0.140000	526	186	35.250000
9671	Russell Westbrook	PG	28	OKC	81	81	2802	824	1941	0.425	...	132	31	438	190	2558	2016	0.230983	1117	130	33.827160
10533	Anthony Davis	C	25	NOP	56	56	1850	530	1026	0.517	...	88	135	112	132	1452	2018	0.123826	496	89	33.357143
1704	Kevin Garnett	PF	27	MIN	82	82	3231	804	1611	0.499	...	120	178	212	202	1987	2003	0.025998	807	97	33.134146
10639	James Harden	PG	29	HOU	78	78	2867	843	1909	0.442	...	158	58	387	244	2818	2018	0.350017	1066	104	33.089744

	Player	Position	Age	Team	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	Standard Eff Score	PPM	RPM	APM	SPM	BPM	Missed FGPM	Missed FTPM	TOPM	Min_Eff_Score
10391	Giannis Antetokounmpo	PF	24	MIL	72	72	2358	721	1247	0.578	...	35.250000	0.845632	0.380831	0.179813	0.039016	0.046650	0.223070	0.078880	0.113656	1.076336
9671	Russell Westbrook	PG	28	OKC	81	81	2802	824	1941	0.425	...	33.827160	0.912919	0.308351	0.299786	0.047109	0.011064	0.398644	0.046395	0.156317	0.977873
10533	Anthony Davis	C	25	NOP	56	56	1850	530	1026	0.517	...	33.357143	0.784865	0.363243	0.117838	0.047568	0.072973	0.268108	0.048108	0.060541	1.009730
1704	Kevin Garnett	PF	27	MIN	82	82	3231	804	1611	0.499	...	33.134146	0.614980	0.352522	0.126586	0.037140	0.055091	0.249768	0.030022	0.065614	0.840916
10639	James Harden	PG	29	HOU	78	78	2867	843	1909	0.442	...	33.089744	0.982909	0.180677	0.204395	0.055110	0.020230	0.371817	0.036275	0.134984	0.900244

	Player	Position	Age	Team	Games	Minutes	Field Goals	FG Attempts	FG Percent	...	Min_Eff_Score	wt_PPM	wt_RPM	wt_APM	wt_SPM	wt_BPM	wt_Missed FGPM	wt_Missed FTPM	wt_TOPM	wt_score
10189	Georgios Papagiannis	C	20	POR	1	4	1	1	1.000	...	1.250000	0.957429	0.822547	-1.363652	23.764540	-0.880028	-2.598159	-0.927302	-1.755144	28.581442
7620	DeAndre Liggins	SG	25	MIA	1	1	1	1	1.000	...	3.000000	11.541783	9.140374	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	22.171555
7938	Sim Bhullar	C	22	SAC	3	3	1	2	0.500	...	1.333333	2.133469	1.746750	4.324213	-1.547528	13.538509	2.202899	-0.927302	-1.755144	20.674959
4606	Steven Hill	PF	23	OKC	1	2	1	1	1.000	...	2.500000	4.485547	14.685591	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	20.660536
10129	Naz Mitrou-Long	SG	24	UTA	1	1	1	1	1.000	...	3.000000	18.598020	-1.950062	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	18.137356
2193	Jackie Butler	C	19	NYK	3	5	4	4	1.000	...	2.000000	11.541783	-1.950062	-1.363652	8.577299	-0.880028	-2.598159	-0.927302	4.402693	15.048109
10908	Gary Payton	PG	26	WAS	3	16	5	8	0.625	...	1.062500	2.280474	-0.563757	2.902246	7.944497	1.823448	0.102436	-0.927302	0.169180	15.042594
7857	Maalik Wayns	PG	22	LAC	2	9	1	2	0.500	...	0.777778	-1.002636	0.514480	2.428258	9.702280	-0.880028	-0.997806	-0.927302	-1.755144	14.442606
3978	Gerald Green	SF	22	HOU	1	4	3	3	1.000	...	2.000000	8.013665	3.595156	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	13.098219
5132	Trey Gilder	SF	25	MEM	2	5	1	1	1.000	...	0.800000	0.251806	0.268025	-1.363652	8.577299	-0.880028	-2.598159	-0.927302	-1.755144	12.134056
2702	Martynas Andriukevicius	C	19	CLE	6	9	0	1	0.000	...	0.555556	-2.570689	2.979021	-1.363652	9.702280	-0.880028	-0.997806	-0.927302	-1.755144	11.547185
4342	Marcus Williams	SF	21	SAS	1	2	0	1	0.000	...	0.000000	-2.570689	-1.950062	-1.363652	-1.547528	20.747777	4.603427	-0.927302	-1.755144	11.394865
10923	Zhou Qi	PF	23	HOU	1	1	1	1	1.000	...	2.000000	11.541783	-1.950062	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	11.081120
9083	Briante Weber	PG	23	MIA	1	3	1	1	1.000	...	1.333333	2.133469	1.746750	4.324213	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	11.057481
9404	Terrence Jones	PF	25	MIL	3	6	0	3	0.000	...	0.333333	-2.570689	3.595156	-1.363652	6.889828	6.329241	4.603427	-0.927302	-1.755144	10.958903

	Player	Position	Age	Team	Games	Minutes	Field Goals	FG Attempts	FG Percent	...	wt_RPM	wt_APM	wt_SPM	wt_BPM	wt_Missed FGPM	wt_Missed FTPM	wt_TOPM	wt_score	wt_Standard Eff Score	rel_score
10189	Georgios Papagiannis	C	20	POR	1	4	1	1	1.0	...	0.822547	-1.363652	23.764540	-0.880028	-2.598159	-0.927302	-1.755144	28.581442	-0.622092	0.538691
7620	DeAndre Liggins	SG	25	MIA	1	1	1	1	1.0	...	9.140374	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	22.171555	-0.945798	0.345432
7938	Sim Bhullar	C	22	SAC	3	3	1	2	0.5	...	1.746750	4.324213	-1.547528	13.538509	2.202899	-0.927302	-1.755144	20.674959	-1.215553	0.161214
4606	Steven Hill	PF	23	OKC	1	2	1	1	1.0	...	14.685591	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	20.660536	-0.622092	0.532472
10129	Naz Mitrou-Long	SG	24	UTA	1	1	1	1	1.0	...	-1.950062	-1.363652	-1.547528	-0.880028	-2.598159	-0.927302	-1.755144	18.137356	-0.945798	0.359470

	Player	Position	Age	Team_x	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	wt_BPM	wt_Missed FGPM	wt_Missed FTPM	wt_TOPM	wt_score	wt_Standard Eff Score	rel_score	Rank	Team_y	Salary
0	Georgios Papagiannis	C	20	POR	1	0	4	1	1	1.000	...	-0.880028	-2.598159	-0.927302	-1.755144	28.581442	-0.622092	0.538691	579	Sacramento Kings	$185,397
1	Georgios Papagiannis	C	20	SAC	16	0	118	17	41	0.415	...	1.319410	0.331300	-0.927302	0.332258	1.727396	-0.874988	0.389814	579	Sacramento Kings	$185,397
2	Sim Bhullar	C	22	SAC	3	0	3	1	2	0.500	...	13.538509	2.202899	-0.927302	-1.755144	20.674959	-1.215553	0.161214	469	Sacramento Kings	$507,336
3	Steven Hill	PF	23	OKC	1	0	2	1	1	1.000	...	-0.880028	-2.598159	-0.927302	-1.755144	20.660536	-0.622092	0.532472	458	Oklahoma City Thunder	$442,114
4	Naz Mitrou-Long	SG	24	UTA	1	0	1	1	1	1.000	...	-0.880028	-2.598159	-0.927302	-1.755144	18.137356	-0.945798	0.359470	522	Utah Jazz	$815,615
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
7959	Chris McCray	SG	22	MIL	5	0	12	0	3	0.000	...	-0.880028	1.002634	-0.927302	3.376387	-11.763677	-1.593210	-0.115203	439	Milwaukee Bucks	$412,718
7960	Darko Milicic	C	27	BOS	1	0	5	0	1	0.000	...	-0.880028	0.282476	-0.927302	10.560530	-16.009575	-1.755063	-0.255935	500	Boston Celtics	$854,389
7961	Mindaugas Kuzminskas	SF	28	NYK	1	0	2	0	2	0.000	...	-0.880028	11.805014	-0.927302	-1.755144	-17.434525	-1.755063	-0.227269	257	New York Knicks	$3,025,035
7962	Von Wafer	SF	21	LAC	1	0	1	0	1	0.000	...	-0.880028	11.805014	-0.927302	-1.755144	-17.434525	-1.593210	-0.136596	470	LA Clippers	$23,443
7963	Mile Ilic	C	22	NJN	5	0	6	0	3	0.000	...	-0.880028	4.603427	-0.927302	13.639449	-23.779126	-1.593210	-0.111739	355	New Jersey Nets	$800,000

	Rank	NBA Team	Total Salary	Total Salary (adj.)	Year
0	1.0	Portland	$87,395,140	$129,847,169	2000
1	2.0	New York	$74,007,738	$109,956,860	2000
2	3.0	Miami	$73,472,329	$109,161,375	2000
3	4.0	Brooklyn	$68,977,578	$102,483,308	2000
4	5.0	Washington	$59,085,969	$87,786,867	2000

	Player	Pos	Age	Tm	G	GS	MP	FG	FGA	FG%	...	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS	Year
0	Mahmoud Abdul-Rauf\abdulma02	PG	31	VAN	41	0	486	120	246	0.488	...	5	20	25	76	9	1	26	50	266	2000
1	Tariq Abdul-Wahad\abdulta01	SG	26	DEN	29	12	420	43	111	0.387	...	14	45	59	22	14	13	34	54	111	2000
2	Shareef Abdur-Rahim\abdursh01	SF	24	VAN	81	81	3241	604	1280	0.472	...	175	560	735	250	90	77	231	238	1663	2000
3	Cory Alexander\alexaco01	PG	27	ORL	26	0	227	18	56	0.321	...	0	25	25	36	16	0	25	29	52	2000
4	Courtney Alexander\alexaco02	PG	23	TOT	65	24	1382	239	573	0.417	...	42	101	143	62	45	5	75	139	618	2000

	Player	Position	Age	Team	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	Off Rebounds	Def Rebounds	Total Rebounds	Assists	Steals	Blocks	Turnovers	P Fouls	Points	Year
0	Mahmoud Abdul-Rauf	PG	31	VAN	41	0	486	120	246	0.488	...	5	20	25	76	9	1	26	50	266	2000
1	Tariq Abdul-Wahad	SG	26	DEN	29	12	420	43	111	0.387	...	14	45	59	22	14	13	34	54	111	2000
2	Shareef Abdur-Rahim	SF	24	VAN	81	81	3241	604	1280	0.472	...	175	560	735	250	90	77	231	238	1663	2000
3	Cory Alexander	PG	27	ORL	26	0	227	18	56	0.321	...	0	25	25	36	16	0	25	29	52	2000
4	Courtney Alexander	PG	23	TOT	65	24	1382	239	573	0.417	...	42	101	143	62	45	5	75	139	618	2000

	Rank	Player	Team	Salary	Year
0	RK	NAME	TEAM	SALARY	1999
0	RK	NAME	TEAM	SALARY	2000
1	1	Kevin Garnett, PF	Minnesota Timberwolves	$19,600,000	2000
2	2	Shaquille O'Neal, C	Los Angeles Lakers	$19,285,000	2000
3	3	Alonzo Mourning, C	Miami Heat	$16,879,000	2000

	Rk	Team	Overall	Year
0	1	Los Angeles Lakers	67-15	1999
1	2	Portland Trail Blazers	59-23	1999
2	3	Indiana Pacers	56-26	1999
3	4	Utah Jazz	55-27	1999
4	5	Phoenix Suns	53-29	1999

	Year	Team	rel_score	Rk	Overall
0	2000	ATL	2.757420	25	25-57
1	2000	BOS	2.861600	20	36-46
2	2000	CHH	2.462256	14	46-36
3	2000	CHI	2.777528	29	15-67
4	2000	CLE	2.624521	23	30-52

	Player	Position	Age	Team_x	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	wt_Missed FTPM	wt_TOPM	wt_score	wt_Standard Eff Score	rel_score	Rank	Team_y	Salary	sal_score	efficiency_score
367	Andre Ingram	SG	32	LAL	2	0	64	8	17	0.471	...	-0.927302	-0.311901	3.452718	1.077364	1.825390	596	Los Angeles Lakers	13824.0	0.003747	921.579100
329	Eric Moreland	PF	27	PHO	1	0	5	0	0	0.000	...	-0.927302	-1.755144	3.622909	-0.945798	0.331796	498	Toronto Raptors	17092.0	0.005228	692.974943
0	Georgios Papagiannis	C	20	POR	1	0	4	1	1	1.000	...	-0.927302	-1.755144	28.581442	-0.622092	0.538691	579	Sacramento Kings	185397.0	0.050246	568.835405
22	Greg Monroe	C	28	BOS	2	0	5	3	5	0.600	...	-0.927302	-1.755144	9.741841	-0.783945	0.370057	486	Toronto Raptors	59820.0	0.018298	532.411828
9	Marcus Williams	SF	21	SAS	1	0	2	0	1	0.000	...	-0.927302	-1.755144	11.394865	-1.431357	0.000000	486	San Antonio Spurs	50254.0	0.021990	518.176822
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5734	Coby Karl	SG	26	CLE	3	0	5	0	0	0.000	...	-0.927302	22.876205	-23.226528	-1.539259	-0.073930	478	Golden State Warriors	17328.0	0.007534	-3082.930181
7917	Dahntay Jones	SF	36	CLE	1	0	12	3	8	0.375	...	2.202151	0.810621	-6.165344	-0.622092	0.538876	594	Cleveland Cavaliers	5767.0	0.001794	-3437.293703
7869	Aaron Jackson	PG	31	HOU	1	0	35	3	9	0.333	...	0.145653	-0.875453	-4.402120	-0.783945	0.513874	598	Houston Rockets	4608.0	0.001249	-3524.963512
3729	Linton Johnson	PF	27	TOR	2	0	10	2	5	0.400	...	-0.927302	7.481612	-9.539916	-1.269504	0.117225	490	Toronto Raptors	4533.0	0.001984	-4809.481384
7902	Billy Thomas	SF	32	NJN	4	0	8	0	3	0.000	...	-0.927302	2.093504	-10.517132	-1.512284	-0.061446	491	New Jersey Nets	4533.0	0.001984	-5302.137799

	Player	Position	Age	Team_x	Games	Games Started	Minutes	Field Goals	FG Attempts	FG Percent	...	wt_Missed FTPM	wt_TOPM	wt_score	wt_Standard Eff Score	rel_score	Rank	Team_y	Salary	sal_score	efficiency_score
1455	Tim Frazier	PG	28	NOP	47	17	909	88	195	0.451	...	-0.555486	0.277145	1.814477	0.052869	0.817592	478	Milwaukee Bucks	196553.0	0.060121	30.180395
399	Samuel Dalembert	C	22	PHI	82	53	2197	270	499	0.541	...	0.132467	-0.549923	3.354572	0.848401	1.482487	174	Philadelphia 76ers	887000.0	0.177400	18.909650
340	Andrei Kirilenko	PF	22	UTA	78	78	2895	412	931	0.443	...	0.421768	0.531445	3.563709	2.063007	2.565895	173	Utah Jazz	956000.0	0.191200	18.638648
74	Greg Stiemsma	C	26	BOS	55	3	766	66	121	0.545	...	-0.338998	-0.428717	5.934447	-0.289558	0.801254	502	Boston Celtics	762195.0	0.326521	18.174791
112	Chris Andersen	C	30	DEN	71	1	1460	160	292	0.548	...	0.384496	-0.300039	5.410281	0.595224	1.347470	405	Denver Nuggets	797581.0	0.310077	17.448166
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
7654	Eddie House	PG	22	MIA	50	0	550	104	247	0.421	...	-0.176233	0.204168	-2.609587	-0.884294	0.390193	349	Miami Heat	316000.0	0.143636	-18.168010
6217	Cartier Martin	SF	29	ATL	53	6	822	106	256	0.414	...	-0.242020	-0.519082	-0.855779	-0.561016	0.599476	418	Chicago Bulls	104034.0	0.045808	-18.681965
4142	JaKarr Sampson	SF	22	PHI	47	18	691	86	202	0.426	...	0.974826	0.517285	-2.554306	-0.659973	0.555519	497	Philadelphia 76ers	258489.0	0.106228	-24.045425
7536	Rasual Butler	SG	31	LAC	41	2	744	73	226	0.323	...	-0.473027	-0.844711	-2.235689	-0.847107	0.385816	518	LA Clippers	211084.0	0.092072	-24.281998
13	Marcus Williams	PG	22	NJN	53	7	854	111	293	0.379	...	-0.487566	0.948824	-0.915535	-0.469401	0.730390	486	San Antonio Spurs	50254.0	0.021990	-41.633576