Project 9: Analysis of NYC Public School Test Results for SAT

1. Inspecting the data

New York City schoolbus

Photo by Jannis Lucas on Unsplash.

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections – reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.

In this notebook, we will take a look at data on SATs across public schools in New York City. Our database contains a single table:

schools

columntypedescription
school_namevarcharName of school
boroughvarcharBorough that the school is located in
building_codevarcharCode for the building
average_mathintAverage math score for SATs
average_readingintAverage reading score for SATs
average_writingintAverage writing score for SATs
percent_testednumericPercentage of students completing SATs

Let’s familiarize ourselves with the data by taking a looking at the first few schools!

In [228]:

%%sql
postgresql:///schools
    
-- Select all columns from the database
-- Display only the first ten rows

SELECT * FROM schools
LIMIT 10;
10 rows affected.

Out[228]:

school_nameboroughbuilding_codeaverage_mathaverage_readingaverage_writingpercent_tested
New Explorations into Science, Technology and Math High SchoolManhattanM022657601601None
Essex Street AcademyManhattanM44539541138778.9
Lower Manhattan Arts AcademyManhattanM44541842841565.1
High School for Dual Language and Asian StudiesManhattanM44561345346395.9
Henry Street School for International StudiesManhattanM05641040638159.7
Bard High School Early CollegeManhattanM09763464163970.8
Urban Assembly Academy of Government and LawManhattanM44538939538180.8
Marta Valle High SchoolManhattanM02543841339435.6
University Neighborhood High SchoolManhattanM44643735535269.9
New Design High SchoolManhattanM44538139637273.7

2. Finding missing values

It looks like the first school in our database had no data in the percent_tested column!

Let’s identify how many schools have missing data for this column, indicating schools that did not report the percentage of students tested.

To understand whether this missing data problem is widespread in New York, we will also calculate the total number of schools in the database.

In [230]:

%%sql

-- Count rows with percent_tested missing and total number of schools
SELECT (COUNT(*) - COUNT(percent_tested)) AS num_tested_missing , COUNT(school_name) as num_schools FROM schools;
 * postgresql:///schools
1 rows affected.

Out[230]:

num_tested_missingnum_schools
20375

3. Schools by building code

There are 20 schools with missing data for percent_tested, which only makes up 5% of all rows in the database.

Now let’s turn our attention to how many schools there are. When we displayed the first ten rows of the database, several had the same value in the building_code column, suggesting there are multiple schools based in the same location. Let’s find out how many unique school locations exist in our database.

In [232]:

%%sql

-- Count the number of unique building_code values
SELECT COUNT(DISTINCT(building_code)) AS num_school_buildings FROM schools;
 * postgresql:///schools
1 rows affected.

Out[232]:

num_school_buildings
233

4. Best schools for math

Out of 375 schools, only 233 (62%) have a unique building_code!

Now let’s start our analysis of school performance. As each school reports individually, we will treat them this way rather than grouping them by building_code.

First, let’s find all schools with an average math score of at least 80% (out of 800).

In [234]:

%%sql

-- Select school and average_math
-- Filter for average_math 640 or higher
-- Display from largest to smallest average_math

SELECT school_name, average_math FROM schools
WHERE average_math >= 640
ORDER BY average_math DESC;
 * postgresql:///schools
10 rows affected.

Out[234]:

school_nameaverage_math
Stuyvesant High School754
Bronx High School of Science714
Staten Island Technical High School711
Queens High School for the Sciences at York College701
High School for Mathematics, Science, and Engineering at City College683
Brooklyn Technical High School682
Townsend Harris High School680
High School of American Studies at Lehman College669
New Explorations into Science, Technology and Math High School657
Eleanor Roosevelt High School641

5. Lowest reading score

Wow, there are only ten public schools in New York City with an average math score of at least 640!

Now let’s look at the other end of the spectrum and find the single lowest score for reading. We will only select the score, not the school, to avoid naming and shaming!

In [236]:

%%sql

-- Find lowest average_reading
SELECT MIN(average_reading) AS lowest_reading FROM schools
 * postgresql:///schools
1 rows affected.

Out[236]:

lowest_reading
302

6. Best writing school

The lowest average score for reading across schools in New York City is less than 40% of the total available points!

Now let’s find the school with the highest average writing score.

In [238]:

%%sql

-- Find the top score for average_writing
-- Group the results by school
-- Sort by max_writing in descending order
-- Reduce output to one school
SELECT school_name, MAX(average_writing) AS max_writing FROM schools
GROUP BY school_name
ORDER BY max_writing DESC
LIMIT 1;
 * postgresql:///schools
1 rows affected.

Out[238]:

school_namemax_writing
Stuyvesant High School693

7. Top 10 schools

An average writing score of 693 is pretty impressive!

This top writing score was at the same school that got the top math score, Stuyvesant High School. Stuyvesant is widely known as a perennial top school in New York.

What other schools are also excellent across the board? Let’s look at scores across reading, writing, and math to find out.

In [240]:

%%sql

-- Calculate average_sat
-- Group by school_name
-- Sort by average_sat in descending order
-- Display the top ten results

SELECT school_name,(average_math+average_reading+average_writing) AS average_sat FROM schools
GROUP BY school_name
ORDER BY average_sat DESC
LIMIT 10;
 * postgresql:///schools
10 rows affected.

Out[240]:

school_nameaverage_sat
Stuyvesant High School2144
Staten Island Technical High School2041
Bronx High School of Science2041
High School of American Studies at Lehman College2013
Townsend Harris High School1981
Queens High School for the Sciences at York College1947
Bard High School Early College1914
Brooklyn Technical High School1896
Eleanor Roosevelt High School1889
High School for Mathematics, Science, and Engineering at City College1889

8. Ranking boroughs

There are four schools with average SAT scores of over 2000! Now let’s analyze performance by New York City borough.

We will build a query that calculates the number of schools and the average SAT score per borough!

In [242]:

%%sql

-- Select borough and a count of all schools, aliased as num_schools
-- Calculate the sum of average_math, average_reading, and average_writing, divided by a count of all schools, aliased as average_borough_sat
-- Organize results by borough
-- Display by average_borough_sat in descending order

SELECT borough,COUNT(*) AS num_schools, SUM(average_math+average_reading+average_writing)/COUNT(school_name) AS average_borough_sat FROM schools
GROUP BY borough
ORDER BY average_borough_sat DESC;
 * postgresql:///schools
5 rows affected.

Out[242]:

boroughnum_schoolsaverage_borough_sat
Staten Island101439
Queens691345
Manhattan891340
Brooklyn1091230
Bronx981202

9. Brooklyn numbers

It appears that schools in Staten Island, on average, produce higher scores across all three categories. However, there are only 10 schools in Staten Island, compared to an average of 91 schools in the other four boroughs!

For our final query of the database, let’s focus on Brooklyn, which has 109 schools. We wish to find the top five schools for math performance.

In [244]:

%%sql

-- Select school and average_math
-- Filter for schools in Brooklyn
-- Aggregate on school_name
-- Display results from highest average_math and restrict output to five rows

SELECT school_name, SUM(average_math) AS average_math FROM schools
WHERE borough = 'Brooklyn'
GROUP BY school_name
ORDER BY average_math DESC
LIMIT 5;
 * postgresql:///schools
5 rows affected.

Out[244]:

school_nameaverage_math
Brooklyn Technical High School682
Brooklyn Latin School625
Leon M. Goldstein High School for the Sciences563
Millennium Brooklyn High School553
Midwood High School550

Leave a comment

Hello !!! I’m Ram Rallabandi

an AI consultant and data scientist with a proven track record of delivering high-impact AI solutions. My mission is simple: transform complex AI strategies into measurable business outcomes.

→ What I Do:
I specialize in leading the end-to-end lifecycle of AI products, from ideation to technical development to successful market launch. My expertise lies in bridging the gap between technical innovation and business value to ensure that every AI initiative aligns with organizational goals.

→ How I Add Value:
☑ Drive the adoption of responsible AI practices to future-proof businesses
☑ Build scalable, high-quality AI solutions that directly address customer needs
☑ Streamline processes to connect AI demand with delivery teams efficiently
☑ Analyze market trends to position data-driven solutions for maximum impact

Why It Works:
With a strong foundation in AI, machine learning, and data science, I provide strategic insights that empower organizations to leverage cutting-edge technologies for growth. I collaborate with technical and business teams to:
↳ Define product roadmaps
↳ Manage budgets, timelines, and scopes
↳ Foster cross-functional alignment for AI-driven innovation

My Passion:
I thrive on leading AI storytelling for executive leadership and fostering collaboration across diverse teams. With deep experience in Fintech and Payments, I’m dedicated to unlocking AI’s potential to deliver business transformation and create widely adopted solutions.

Let’s connect to explore how AI can transform your business.