This post has been republished via RSS; it originally appeared at: Excel Blog articles.
Round 1 Post | Round 2 Post | Round 3 Post |
Link to Test 2 Workbook - WITHOUT answers
Link to Test 2 Workbook - WITH answers
Welcome Back! I know this round 2 post is long overdue, but I also know you are a forgiving crowd. I wanted to be sure this is out as everyone prepares for the upcoming WORLD FINALS! You will be able to participate in helping us pick the winner! Click here to learn more
Round 2 calls upon a few more features, like PivotTables and Get & Transform Queries. If you are not familiar with these tools, I hope this walkthrough can give you enough of a foothold to start doing some investigation of your own. I’ll say right now the videos are particularly long. This is not because the solutions require many complex steps, but because I try to provide detailed explanations for anyone new to these tools. Advice to the more experienced: jump to the provided timestamps in the videos to see the solutions. Advice to everyone: jump to the provided timestamps in the videos to minimize hearing my voice.
Before we dive in, here’s a quick refresher on the ground rules to these solution posts…
- There are many ways to solve a problem. I share a solution, not the only solution
- Data visualization is a subjective art, so I won’t spend too long the Chart Gymnastics challenge
- Videos and Files are included to help better demonstrate everything I type below
One last thing before we dive in – in the video I said I would provide links to function definitions – but I realized that would make this long post even longer. Instead, I am going to take the “teach a person to fish” approach and point out that you can quickly get to these function definitions from inside Excel
- Use Tell Me – I know I said it in the round one post, but that little box will give you help if you type a function name into it
- Links in the wizards – any pop-up or dialog box in Excel that helps you write a function, usually has a “Help on this function” link at the bottom
- Function Tool Tips have links too – you know when you start typing in a cell, and Excel has a little beige box floating by your cell that displays the function definition? Well that little helper also has links!
- When you are really stuck, USE THE COMMUNITY. I find getting tips from others to be one of the best ways to learn.
Challenge 01-01 (Data Relays part 1)
The purpose of this question was to introduce pivot tables, which can take many rows of data and summarize it as a smaller table with some useful data aggregation (like the sum or average of the data). Pivot Tables are incredibly valuable to understand because it is common to have more data than you can processes/understand at one time, often the value of large data sets is in their high-level trends, and sometimes you need to start at the high level to identify areas of concern that require drilling into.
Again, in the provided video I talk A LOT (SURPRISE SURPRISE)…I spend quite a bit of time talking about pivot tables and how to use them in a very general sense. If you want to just see the answer to the question, jump to the time stamps provided with each step
- Create a first Blank Pivot Table (1:50) – This is an obvious first step, but this is our starting point. We can set up a blank table and then pull in the data we need to answer the question posed
- Add “Round Number” field to the “Filters” area of Pivot Table (8:10) – we will eventually going need to limit ourselves to just round 2 competitors. Whether we put this field into the filters, rows, or columns of the Pivot we will be able to filter down to round 2. We will put the field here because the question does not require us to display round number.
- Add “Country” field to the “Rows” area of the Pivot Table (8:40) – We’ll need to see every country in round 2.
- Filter “Round Number” to only values of “2” (8:50) – now we’ve limited all data to round 2 only!
- Add “Result Time” to the “Values" area of the Pivot Table (9:30) – Pivot Tables summarize data by aggregating data points. By default, our numeric “Result Time” field will be summarized as a sum (which is what we want). At this point, the pivot is taking all the original data, apply all the filters (at this point: Round 2 for the whole Pivot Table and filter the country based on the row of the Pivot Table), and add up the “Result Time” from the filtered rows. Since this was a relay race, each row is a runner for a given round. The time for the country in a given round is the sum of each of its runners for that round.
- Ascending sort “Country” field based on “Sum of Result Time” (10:45) - Ascending sort puts the smallest number at the top. In other words, our countries are sorted fastest time first.
- Add “Race Position” field to the “Columns” area of the Pivot Table (11:35) – Now we can see the individual athlete times that made up the country’s total time. Because we have put Race Position in the Column area, we still have a single row showing the total time for a country.
- Add “Athlete Name” field to the “Rows” area of the Pivot Table (12:30) – now we can see the athlete names for each country. Because the Pivot Table is showing subtotals, we still have a single row showing each country’s time.
- Ascending sort “Athlete Name” field based on “Sum of Result Time” (14:30) – again an ascending sort puts the fastest times first. We have created 2 levels of sorting now. First the countries are sorted by total time. Within each country, the athletes are now sorted as well.
- Collapse the “Country” field to hide “Athlete Names” for countries not in the top 3 (16:30) – Expand/Collapse allows us to show/hide lower levels of data in the in Pivot. In this case, we keep the top three countries expanded to see the athlete names; we collapse the remaining countries to hide those athlete names. Our pivot is done!
- Create our second Blank Pivot Table (20:45) – the question asks for a second pivot – you got to do what you got to do.
- Add “Country” field to the “Rows” area of the Pivot Table (21:20) – again this will let us see the original data summarized by country with a single row representing each country.
- Add “Result Time” to the “Values” area of the Pivot Table (21:30) – there’s only one data point we need summarized, and the “Values” area is the place for data.
- Change “Sum of Result Time” to calculate as an Average rather than a SUM (22:00) – sum is not the only aggregation possible by pivot tables. Look at all the options available. COUNT and AVERAGE are popular ones, but there are plenty more.
- Ascending Sort “Country” field based on “Average of Result Time” (23:30) – just like the first pivot, this sort will give us the fastest times at the top of the pivot.
Challenge 01-02 (Data Relays part 2)
GET & TRANSFORM IS HERE!!!!!!! Don’t let my monotonous voice on the video fool you, I am pumped
Sorry I go on a bit in the video before getting to the meaty part of answering the question because I want to explain what Get & Transform is.
A quick note regarding the video is that we continue to modify Excel and those with O365 subscriptions get the latest version. The ribbon I am using in the video may look a little different than what you have on your computer depending on your version of Office.
If you are new to Get and Transform I recommend following in the video. Besides just answering the question I do a bit of an introduction to Get & Transform and show some features I really like. I’d also recommend checking out the tutorials mentioned in the Excel workbook. They are a great starting point for learning more.
If by the end, you don’t love Get & Transform, blame the messenger! Keep exploring the tool and try some more of the challenges. There is so much you can do with Get & Transform.
- Excel: Use Table 1 as the data source for your Get & Transform Query (5:28) – we need data to shape! You can do this by going to the “Data” ribbon, finding the “Get & Transform” section, and clicking the “From Table” button, because our data is in a table in our workbook. There are a ton of options on where to get your data and that list keeps growing as we make updates. These are mostly external sources. In the typical real world scenarios, your data probably won’t already be in the workbook.
- Query Editor: convert our round numbers from rows of data to new columns (8:45) – You do this in the query editor by selecting the “Round Number” column, going to the “Transform” ribbon, clicking the “Pivot” button, and choosing to Pivot the “Round Number” column using the “Result Time (s)” column for values. Like building a traditional pivot table, we are quickly shifting the layout of our data
- Query Editor: filter the “3” column (Round 3 data) to “Remove Empty” rows (10:40) – this will limit our data to only athletes with round 3 data. In other words, they competed in round 3.
- (Optional) Query Editor: Re-order Round 1, 2, 3 columns (11:30) – just my personal preference on seeing the progression through rounds when reading left to right. And this step shows that reordering columns is easy – just drag and drop!
- Query Editor: Ascending Sort “3” column (Round 3 result times) (12:20) – this gets our fastest round 3 times to the top of the list
- Query Editor: Ascending Sort “2” column (Round 2 result times) (12:45) – this is like the advanced sorting on tables we saw in the first round of #ExcelWorldChamp. We are doing a multi-level sorting, first by round 3 then by round 2. An important note is that we achieve this multi-level sorting because we do this step immediately after the prior sorting. Not having these steps back to back would yield different results.
- Query Editor: Close and Load To.. “Table” (13:20) – the question asks for a table in the in the response section for Challenge 01-02, so that’s where our data needs to go.
That’s the end of the challenge. After this step, the video shows a few other introductory aspects of Get & Transform for the new comers…
- What’s the Workbook Queries Pane (16:22)
- How to Change an Existing Query (16:30)
- Example of Refreshing a Query after your Data Changes (16:50)
Not a fan of Get & Transform yet? Keep playing around with it and check out the tests of the later rounds, we are just getting started ;)
Challenge 02 (Chart Gymnastics)
Just like my prior post, I won’t go into a lot of details here, because there is no right answer. If you are new to chart building, go check out some of the references I provided in the round 1 post. I do want to give credit to the community members that spotted my error: I botched the copy/paste so this data doesn’t exactly match the source data I provided. A perfect example of why you should use Get & Transform to gather data whenever possible: you minimize human error. That was not an option in this case, so you are left with my imperfect data set.
We selected this data set to throw a few wrenches into the visualizations (some of these issues also appeared in the round 1 data set):
- Many sets of data
- Data with different orders of magnitude
- Data covering different periods of time
Take a look at some of our most popular visualizations from this round. Shout out to….
link to post | link to post | link to post |
You’ll notice they all managed these issues by separating relevant data into different charts, but you can still quickly scan across charts to make comparisons. Now this is not the only approach, but sometimes it is the cleanest. I have to give a special shout out to Prerit for going the extra mile and hypothesizing revenue drivers :)
Challenge 03 (Formula Wrestling)
Round 2 takes formula writing up a notch! The questions require a little more than one quick formula to get the job done. I’ve decided to provide 2 approaches to each problem.
The first approach lays out the steps for processing the data across many cells. It makes it easier to see in the workbook how the data flows, and it’s just how my mind tends to work through these types of problems.
The second approach comes from various submissions from the community to provide a “efficient” solution. By “efficient” I mean the formulas are very compact and use the least amount functions and operations. These solutions relied heavily on array functions. If you are new to array functions, I recommend reading some array function documentation. 1 BIG disclaimer on the compact formulas: some of them depend on the data being ordered in particular ways (like consistently between tables). There was no explicit statement that this would hold true when the data refreshes. That said, I do think the solutions were incredibly resourceful and used some AWESOME mathematics principles, so I wanted to share them.
- Which Gymnast Had the 3rd Highest Score? (2:00)
- Approach 1
- Calculate everyone’s score (2:00) – The gymnasts score is sort of an “adjusted average” we can get by using SUM() to add scores, subtracting the MIN() to remove the lowest score, subtracting MAX() to remove the highest score and dividing the result by 3 (the COUNT() of remaining scores after removing the lowest and highest scores)
- Calculate how the scores compare to one another (5:10) - the RANK() function will gives us the relative size of each score (i.e. RANK() returns 1 for the largest score, 2 for the second largest score, 3 for the 3rd largest score, etc.….) or Identify the 3rd largest score of the group (6:50) – While RANK() will tell us the relative size of each score, the LARGE() function returns particular the corresponding score when given a particular rank.
- With the 3rd largest score identified, find the corresponding Athlete Number (7:40) – Since we have a list of athlete numbers, a list of their scores, and the 3rd largest score, we can use the combination of INDEX() and MATCH() to return the athlete number we want. In the video, I am quite insistent we went over this powerful formula combination in in Round 1 – we did not, sorry!
- I will take a detour and explain these functions because they should be in every Excel user’s arsenal. Use the summaries below with the video/solution workbook to see the dynamic duo in action
INDEX(array, row_num) returns the item in row number “row_num” of “array”. INDEX(A10:A20, 5) would return the 5th row of the cell range A10:A20, which is the value in cell A15. In our case, we want to return an item from our list of Athlete Numbers, but the row number will change depending on the athlete scores. Stated in pseudo-Excel formula writing, we want INDEX(athlete_numbers, ???)
MATCH(lookup_value, lookup_array, [match_type]) returns the location of “lookup_value” in “lookup_array” (for now I’ll just say we will use zero as the value for match_type to indicate we are looking for an exact match. Check the documentation on MATCH to see how you can expand your use of it) MATCH(“hello”, B10:B20, 0) would search through the range of cells B10:B20 and return the first row number of the range where “hello” is found. If cell B17 equaled “hello”, the formula would return 7 (B17 is in the 7th row of cell range B10:B20). Jumping back to our scenario with some more pseudo-Excel formula writing, if we did MATCH(3rd_hiighest_score, all_scores, 0) we would find the row number of the 3rd highest score. This replaces the ??? in our INDEX function above…
With a list of all athlete numbers and their scores, we can do INDEX(athlete_numbers, MATCH(3rd_highest_score, all_scores, 0)) This formula combination effectively says, find the row with the 3rd Highest Score in the list of scores and return the corresponding athlete number for that score. (And as we saw the prior step, we can use the LARGE() function to get that 3rd Highest Score) -
Approach 2: The Compact Formula (8:50) – This is effectively the same logic as above, but stuffed into 1 cell. The video steps through each portion in detail, but I’ll give the pseudo formula text version a crack. I’ll start from the inside and work our way out to the formula’s result. A few notes on this and the other compact formulas that follow:
- Some solutions use array formulas to get all the calculations into one cell. Where I have wrapped a formula in braces, e.g.{=formula}, you would need to press CTRL + SHIFT + ENTER if you type this into a cell (rather than just hitting ENTER with traditional formulas)
- I use a mix of pseudo formula writing here to make the calculations more readable
- Part of my pseudo writing involves “variables” intended to represent the detailed formula from the prior steps. For example, if you see sum_of_scores: =SUM(1,2,3,4) in one line and later see sum_of_scores / 4, that is intended to be a condensed, human friendly way of writing SUM(1,2,3,4)/4. I’m rarely accused of being human though, so I apologize if my writing style makes things more confusing.
-
Full Solution
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} -
Breakdown pt 1 (11:10)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} array_of_athlete_scores: {=OFFSET(F25,ROW(1:8),,,5))} Starting with cell F25, create 8 arrays that are 5 cells wide. In other words, create an array for each of the 8 athletes that has all 5 of their scores
-
Breakdown pt 2 (12:10)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} sum_max_&_min_of_athlete_scores: {= SUBTOTAL({9,4,5}, array_of_athlete_scores} Using the SUBTOTAL() function calculate the sum (SUBTOTAL function number 9), the max (SUBTOTAL function number 4), and the min (SUBTOTAL function number 5) for each array of athlete scores
-
Breakdown pt 3 (13:35)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} proxy_scores_v1: {=MMULT(sum_max_&_min, {1;-1;-1})} Using the MMULT() function (matrix multiplication), for each athlete multiply the sum of scores by 1, the max score by -1, and the min score by -1. Then add those 3 numbers up. In short, for each athlete calculate sum minus max minus min (sound familiar?). This is each gymnast’s score * 3 (no division done yet in the calculate of the “adjusted average”). It is a mathematically sound proxy for the scores when we just need to know how big scores are relative to one another.
-
Breakdown pt 4 (15:30)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;- 1})*10000+ROW(1:8),3)))} proxy_scores_v2: {=proxy_scores_v1*10000}
Multiplying each of these proxy scores by 10,000 effectively adds a bunch of zeroes to the right side of the score. For example, a proxy score of 26.38 becomes 263800. This does not impact the relative size of the gymnast scores, so we have another mathematically sound proxy score.
-
Breakdown pt 5 (16:23)
{=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} proxy_scores_v3: {=proxy_scores_v2 + ROW(1:8)} Adding the array of row numbers produced by the ROW() function adds the row number to the ones place of our proxy score. So if score 263800 is in row number 1 of the array, it becomes 263801. Since the 5 most significant digits of these calculations still aligns with proxy scores, their relative sizes are maintained. So we have yet ANOTHER mathematically sound proxy score, but the right most digit of each proxy score also holds the row number where the score is found in the list of all scores!
-
Breakdown pt 6 (17:45) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} 3rd_largest_proxy_score: {=LARGE(proxy_scores_v3, 3)} – like we saw in the first approach to the problem, we use the LARGE() function to get the 3rd largest proxy score.
-
Breakdown pt 7 (18:10) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} row_num_for_3rd_largest_score: {=RIGHT(3rd_largest_proxy_score:)} The RIGHT function returns characters from a string, starting from the right side. Because the modifications previously made, getting the right most character of our 3rd largest proxy score returns the row number of the 3rd largest score
-
Breakdown pt 8 (19:00) {=INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),3)))} ANSWER: {=INDEX(all_athlete_numbers, row_num_for_3rd_largest_score)} As we saw with the first approach, we can use the INDEX function with the appropriate row number, to retrieve the athlete number we want.
- Approach 1
- What was the country and athlete number of the gymnast with the 4th highest score? (19:50)
- Approach 1
- Get the country for each athlete (20:30) - using the INDEX/MATCH formula combo we can pull countries from Table 2 based on athlete number
- Identify the 4th largest score and get the corresponding athlete number (22:15) – like question 1, we can use the LARGE function to find this 4th largest score and combine that with the INDEX/MATCH combo to find the athlete number. In the video I accomplish that by copying the formula from question 1 and changing =INDEX(…LARGE(…,3)) to =INDEX(… LARGE(…,4)). Aren’t formulas great?
- Add the corresponding country to our solution (22:40) - in part b we did INDEX(athlete_numbers, MATCH(…)) to get our 4th athlete number. We can make a small modification, and use INDEX(countries_of_each_athlete, MATCH(…)) to get the corresponding country. By using the “&” character we can combine our formulas into 1 text string. So INDEX(athlete_numbers, MATCH(…)) & “-“ & INDEX(countries_of_each_athlete, MATCH(…)) gives us our final answer.
- Approach 2: The Compact Formula (24:10) – Like approach 1, we will use logic we created in question 1, to help make this question easier to answer.
-
Helper Step (24:25)
4th_place_athlete_number:={ =INDEX(E26:E33,RIGHT(LARGE(MMULT(SUBTOTAL({9,4,5},OFFSET(F25,ROW(1:8),,,5)),{1;-1;-1})*10000+ROW(1:8),4)))} This is the exact same logic as we used for the compact answer to question 1, we are just finding the 4th largest score instead other 3rd largest score (i.e. LARGE(…,4) instead of LARGE(…,3) -
Full Solution
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
-
Breakdown pt 1 (25:10) =LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4th_place_matches_in_country_table_v1: {=(4th_place_athlete_number = F38:F45)}
By comparing our the 4th place athlete number (our helper step) to every athlete number in Table 2, we produce a TRUE/FALSE array that shows TRUE for the slot corresponding to the 4th place athlete and FALSE for all others -
Breakdown pt 2 (25:57)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4th_place_matches_in_country_table_v2: {=0/4th_place_matches_in_country_table_v1}
Because TRUE/FALSE is also represented as 1/0, the operation of dividing zero by our TRUE/FALSE array produces an array that shows 0 where we had a TRUE (the slot corresponding to the location of the 4th place athlete) and a dividing by zero error everywhere we had a FALSE -
Breakdown pt 3 (27:15)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
4th_place_country:=LOOKUP(1,4th_place_matches_in_country_table_v2, country_list)
LOOKUP(lookup_value, lookup_vector, result_vector) is similar to INDEX(). It looks for the lookup_value in lookup_vector and returns the corresponding value from the result_vector. For example, LOOKUP(“dog”, {“cat”,”cow”,”dog”}, {“meow”,”moo”, ”woof”}}) first looks for “dog” in {“cat”,”cow”,”dog”}, determines it is the 3rd item in lookup_vector, and returns “woof” (the 3rd item in result_vector). LOOKUP() assumes lookup_vector is sorted in ascending order. If LOOKUP() does not find lookup_value in lookup_vector, it uses the largest value in lookup_vector that is less than or equal to lookup_value (aka closest without going over, aka Price-Is-Right-style). Since our lookup_vector is a bunch of errors and a single zero, our LOOKUP function will use the location of the zero to return a value from the result_vector. In other words, we get the country that corresponds to the 4th place score -
Breakdow pt 4 (29:20)
=LOOKUP(1,0/(R17=F38:F45),E38:E45)&-R17
ANSWER: 4th_place_country&-4th_place_athlete_number
Again a simple “&” allows us to combine strings. Since the second part of the answer is a number, we can make it negative to produce the “-“ in our final answer.
-
- Approach 1
- Which judge tended to give the lowest scores to athletes? (30:35)
- Approach 1
-
Calculate the average score for each judge (31:15) – using the AVERAGE() function we can quickly calculate the average score for each judge
-
Find the lowest average score among the judges (32:45) – using the MIN() function we can find the lowest among all the averages
-
Find the appropriate judge number (33:30) – again using MATCH function we can find which judge corresponds to the lowest score
-
Format the judge number (34:15) – since the judge numbers are always “J#”, we can use string concatenation and say “J”&MATCH(…) to get our final answer. Alternatively we could use the INDEX/MATCH combo on the headers of Table 1 to get the judge number. The INDEX/MATCH approach is more sound, because the order of Judge numbers is not locked across the different tables. INDEX/MATCH allows you to do searches and lookups without having specific orders in your data sets. I’ve show the “J”&# approach to be consistent with the compact formula.
-
-
Approach 2: The Compact Formula – Again the logic flow is pretty similar to what we accomplished in approach 1, but with fewer function calls thanks to array functions
-
Full Solution
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),0) -
Breakdown pt 1
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),)
scores_by_judge_array: {=OFFSET(E26,,{1,2,3,4,5},8)}
The formula =OFFSET(E26,,1,8) says start from cell E26, go 1 column to the right and create a cell reference that is 8 cells high. This is effectively the range of cells holding Judge 1’s scores. By entering an array of column offsets ({1,2,3,4,5} rather than just 1), we produce an array of 5 cell ranges. Each cell range is 8 cells high, that is each range represents 1 judge’s scores across all athletes -
Breakdown pt 2
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),0)
avg_score_per_judge_array: {=SUBTOTAL(1, scores_by_judge_array)}
Function number 1 of the SUBTOTAL() function is average, so SUBTOTAL(1, ...) calculates the average of a set of numbers or range of cells. Since we are feeding this function an array of cell ranges, the output is an array holding the averages of those ranges. In other words, we get an array that has the average score for each judge -
Breakdown pt 3
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),0)
lowest_avg_score_location: {=FREQUENCY(0, avg_score_per_judge_array)}
Again some clever work from the community! The FRQUENCY() function takes an array of data points and an array of data bins and returns an array representing the number of data points (the frequency) falling into each bin. You might have gone through this process when building a histogram. In this case, we’ve passed a single data point (zero) and bins corresponding to the average scores for each judge. The zero data point lands in the smallest bin, which corresponds to the lowest average score. So FREQUENCY() returns an array of all zeros (no data points landing in the bins), EXCEPT for a 1 corresponding to the location of the lowest average score -
Breakdown pt 4
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),)
judge_number: {=MATCH(1, lowest_avg_score_location, 0)}As we said above, we produced an array of zeros with a single 1. MATCH() searches for the 1 in our array and returns its location. So if the 1 was the 4th item in the array, it returns 4.
-
Breakdown pt 5
="J"&MATCH(1,FREQUENCY(0,SUBTOTAL(1,OFFSET(E26,,{1,2,3,4,5},8))),)
ANSWER: {=”J” & judge_number} Simple string concatenation with “&” gives us the judge number appropriately formatted.
-
- Approach 1
- Which judge's scores were excluded the most? (35:28)
- Approach 1
- Determine which judges were excluded for each athlete (36:15) – we know we can use MAX() and MIN() to determine the high and low scores. We can test if a judges score equals the max score with the formula = judge_score = MAX(…). But we want to test for the high OR the low, so we use the OR() function to do multiple logical tests. OR(judge_score=MAX(…), judge_score = MIN(…)) will return TRUE if judge_score is either the high or low score. Otherwise it will return FALSE.
- Change our TRUE/FALSE tests into 1/0 (37:45) – the IF() function lets us do a logical test and take one action if the test is TRUE and another if it is FALSE. So =IF(OR(…), 1, 0) will turn all of our TRUEs from the prior step into 1s. In other words, we will show a 1 if judge_score was excluded (was the high or low for an athlete) otherwise it will show 0.
- Copy this calculation for all judges/athletes (38:46) – this will produce a table of 1s and 0s, showing us all score exclusions in the competition
- Calculate the total exclusions for all judges (39:20) – a simple SUM() of our 1s and 0s will produce a count of exclusions for each judge
- Find the highest number of exclusions (40:10) – a simple MAX() calculation on all of our SUM() can find this answer
- Find the judge corresponding to these exclusions (40:30) – we’ve already used MATCH() to find the location of a particular number in a list
- Convert this to a judge number (41:20) – again a simple string concatenation “J” & MATCH(…)
- Determine which judges were excluded for each athlete (36:15) – we know we can use MAX() and MIN() to determine the high and low scores. We can test if a judges score equals the max score with the formula = judge_score = MAX(…). But we want to test for the high OR the low, so we use the OR() function to do multiple logical tests. OR(judge_score=MAX(…), judge_score = MIN(…)) will return TRUE if judge_score is either the high or low score. Otherwise it will return FALSE.
-
Approach 2: The Compact Formula –
Side Note on Array Formulas (42:00) – A little late I know, but if you have never used array formulas before I discuss a little bit about how you actually type them into a cell
Side Note on writing about arrays - You may have noticed I’ll write an array as a data series wrapped in braces {}. We can call the array {1,2,3} 1-dimensional or say it is a 1x3 array (it has 1 “row” with 3 “columns”). We can also call the array {10;20} 1-dimensional or say it is a 2x1 array (it has 2 “rows” with 1 “column” each. The subtle notation difference is that I used commas (,) to separate my “columns” and semicolons (;) to separate my “rows”. If we added our 2 example arrays together, we would get the 2x3 array {11, 12, 13; 21, 22, 23}. I call this out, because understanding the various dimensions of an array factors into my walkthrough below.- Full Solution
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5))) -
Breakdown part 1
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v1:{=F26:J33*10000}
F26:J33 represents the range of scores in Table 1. This is a 8x5 2-dimensional array. Multiplying all of these numbers by 10,000 uniformly removes the decimal point and adds a trailing zero. For example, J1’s score for athlete number 3150 goes from 8.973 to 89730. This modification means scores still have the same relative size (the highest original score becomes the highest modified score, the second highest original score becomes the second highest modified score, etc…), but scores are now whole numbers with the least significant digit open to modify further -
Breakdown part 2
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v2:{=modified_scores_2Darray_v1 + {1,2,3,4,5}}
When our five column array of modified scores has the 1x5 array {1,2,3,4,5} added to it, the first column has 1 added to it, the second column has 2 added to it, etc… This means the least significant digit that was previously a zero, becomes a number 1-5 corresponding to its column number in the array (also corresponding to the scores Judge number). J1’s score for athlete number 3150 goes from 89730 to 89731 -
Breakdown part 3
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
modified_scores_2Darray_v3:{=ROW(1:8)*1000000 + modified_scores_2Darray_v2}
A similar clever trick as we’ve seen on the prior questions. The ROW() formula produces an 8x1 array with numbers 1 through 8 as ROWS. They are then multiplied by 1,000,000 to produce the array {1M; 2M; 3M; 4M; 5M; 6M; 7M; 8M}. When this gets added to the 8x5 array of modified scores, 1M is added to all the scores in the first row, 2M is added to all the scores in the second row, etc… J1’s score for athlete number 3150 (which was in row 1) goes from 89731 to 189731. Now for each of our modified scores, the first digit is the row number (indicating the athlete), the next 5 digits represent the original score, and the last digit is the column number (indicating the judge). It also means the 1st row (where all 5 columns had 1M added) will have the 5 lowest modified scores (ranks 36 through 40), the 2nd row will have the next 5 lowest modified scores (ranks 31 through 35) … the 8th row will have the 5 highest modified scores (ranks 5 through 1). In effect we’ve created a pattern where the highest score for each row (the highest score for each athlete) have rankings 36, 31, 26, 21, 16, 11, 6, 1 when looking across the entire array of scores. Similarly, the lowest scores for each row (the lowest scores for each athlete) have rankings 40, 35, 30, 25, 20, 15, 10, 5. -
Breakdown part 4
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
max_and_min_ranks:{= {41,45}-ROW(1:8)*5}
Again ROW(1:8) will produce the 8x1 array of numbers from 1 to 8. When this is multiplied by 5, the numbers go from 5 to 40 in increments of 5…{5; 10; 15; 20; 25; 35; 40}. When we take the 1x2 array {41,45} and subtract this 8x1 array with multiples of 5, the result is an 8x2 array where 41 and 45 have a different multiple of 5 subtracted in each row..{36,40; 31,35; 26,30; 21,25; 16,20; 11,15; 6,10; 1,5}. Do these rankings look familiar? :) -
Breakdown part 5
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
max_min_scores_2Darray:{=LARGE(modified_scores_2Darray_v3, max_and_min_ranks}
By isolating to just the max and min ranks. We now have an 8x2 array with just the highest and lowest modified scores for each athlete -
Breakdown part 6
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
judge_exclusions_txt:{=RIGHT(max_min_scores_2Darray)}
Remember the calculation in part 2 made the least significant digit of each score correspond to the judge that gave the score. By taking the right most character of every element of the 8x2 array of scores, we get a 8x2 array showing the judge numbers (as text) corresponding to the excluded high and low for each of the 8 athletes) -
Breakdown part 7
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
judge_exclusions_num:{=--judge_exclusions_txt}
The “-- " quickly turns the judge numbers from text strings to numeric values. The first “-“ turns the text into a numeric value and multiples it by -1. The second “-“ multiplies by -1 again so that we are back to positive numbers (the actual judge number) -
Breakdown part 8
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
most_excluded_judge_num: {=MODE(judge_exclusions_num)}
The MODE() function returns the most frequently occurring number in an array. In this case, the most frequently occurring number is the most excluded judge -
Breakdown part 9
="J"&MODE(--RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))
ANSWER: {=”J” & most_excluded_judge_num}
Of course we finish with some basic string concatenation to format our Judge Number correctly.
- Full Solution
- Approach 1
- In cases where the Judge and Athlete shared the same home country, how many times was the Judge's score thrown out? (43:50)
- Approach 1
-
Find the country for each judge (44:55) – we’ll use the INDEX/MATCH combination to find the country for each judge.
-
Determine instances where the judge and athlete are from the same country (46:15) – a simple test of athlete_country = judge_country will give us a TRUE when this is the case
-
Change our TRUE/FALSE test to 1/0 (47:00) – just like question 4, we will use IF() to change our TRUE/FALSE into 1 or 0
-
Copy this calculation for all judges/athletes (47:35)
-
Find out when score exclusions align with country matches (48:35) – since our exclusion cases are 1s, our country matches are 1s, and everything else are 0s, multiplying our exclusion tests with our country match test will return 1s when both are 1s (a score exclusion and a country match) and 0s otherwise
-
Find the total number of cases where score exclusion aligns with country matches (50:22)
-
- Approach 2: The Compact Formula – It’s even messier, but we’ve already done a lot of the heavy lifting
- Full Solution
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45)) -
Breakdown part 1
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45))
judge_exclusions_txt:{=T(IF(1,RIGHT(…)))}
See question 4 compact solution breakdown parts 1 through 6. This is almost the exact same calculation. Question 4 parts 1 through 6, get us up to the RIGHT(…) part of to the function. Without getting into detail, I’ll say that the T(IF(…)) wrapper functions force Excels order of operations to maintain the output we saw earlier. Our end result is a 2x8 array showing the judge numbers for all excluded scores (again same output from question 4 parts 1 through 6). Where the first row are the judges excluded for the first athlete in Table 1, the second row are the judges excluded for the second athlete, etc… -
Breakdown part 2
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45))
judge_exclusion_countries:{=INDEX(E50:E54, judge_exclusions_txt)}
E50:E54 are the judges countries (again this solution depends on the ordering of the judges in Table 3 to match the column order of Table 1). So our INDEX() looks up all the countries based on judge number. We end up with a 2x8 array showing the judge country associated with each excluded score (as in part 1, the rows of this array align with the ordering of athletes in Table 1) -
Breakdown part 3
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45))
country_matches_TRUE_FALSE:{=judge_exclusion_countries=E38:E45}
E38:E45 are the athlete countries (again this solution depends on the ordering of the athletes in Table 1 to match Table 2). Comparing our judge countries to the athlete countries produces a 2x8 array of TRUE/FALSE values. The TRUEs represent cases where the country of the judge with an excluded score matches the athlete country, FALSEs represent non-matches -
Breakdown part 4
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45)) country_matches_1_0:{=N(country_matches_TRUE_FALSE)} The N() function converts certain values to numbers (you can read more in the documentation link on the N() function); in this cases our TRUEs become 1s and our FALSEs become 0s -
Breakdown part 5
=SUM(N(INDEX(E50:E54,T(IF(1,RIGHT(LARGE(ROW(1:8)*1000000+F26:J33*10000+{1,2,3,4,5},{41,45}-ROW(1:8)*5)))))=E38:E45))
ANSWER:{=SUM(country_matches_1_0)}
Now that we have 1s for country matches and 0s for everything else, a simple sum gets us the final count of matches.
- Full Solution
- Approach 1
If you’re reading this, you’ve gotten through the entire walkthrough and just completed my bonus endurance challenge. Thanks for going through the post. I know there was a lot to digest here, and if you’ve never seen array formulas before then the compact solutions may have you scratching your head. Again, I recommend going back and checking out some of array function documentation as a starting point. Also use the trick I showed you in the video (select a portion of the formula in the formula bar and click F9 to see how it calculates). Alternatively, select a cell that’s giving you trouble, go the “Formula” ribbon, and click the “Evaluate Formula” button. This will help you digest the arrays and see Excel’s order of operations. These can help bridge some gaps for you.
And of course, USE THIS COMMUNITY! Sometimes nothing beats learning from someone who has been in the same boat as you.
Posts on the remaining tests will be coming as quickly as I can churn them out. Thanks!
Round 1 Post | Round 2 Post | Round 3 Post |