Individual Contribution to Group Work (Help Needed)

As you point out, “this is America”

The goal is to teach, but we also want to be able evaluate learning and ability. It seems like the ideal system would be one that chooses the best pedagogy for learning, and maps an effective method of evaluation on top of it. I am under the impression group work is a better pedagogy than individual work (or at least that mixed group and individual work is better than individual work alone).

I think people say this more as a way to motivate team members. Oneness with the group is a powerful mindset, and it enables many actions that aren’t justified on a purely individualistic grounds. But there are still individuals, the best players can still be more or less reliably identified.

I’m worried this might be the case, but I’m hopeful too. We have a few variables to play with: n (the size of the class), k (the size of the group), and p (the number of projects).

We also have the option of using non-random groups to get better information. One way might be to make it random with restrictions, (e.g. random for first project, no overlap between first project and second project group, no overlap between second and third project groups, etc). Another might be to intentionally group together or keep apart certain students based on their grades on previous projects, in order to distinguish among them. I might be that putting the best students with the worst students will tend to differentiate them more, or that intentionally balancing the groups will tend to show which students are pulling extra weight and which are not.

I’ve written a short program to generate test data, written in python. I’ll add it here if anyone wants to play with it or mock me for how bad I suck at programming.
[tab][code]import random
random.seed()

students = list()

Create 25 ‘students’, each a two item list:

- An index number (as an identifier), and

- A random integer between 0 and 100 (their ‘contribution’)

for i in range(25):
students.append([i,random.randint(0,100)])

Print the students, number 0 to 24 (python starts list indexes at zero).

for i in students:
print ((‘Student %s’ % str(i[0])) + ": " + str(i[1]))

projects = list()
projStudents = list()

Create 10 projects, in which the students will be groups randomly into groups of 5.

for i in range(10):
for j in students:
projStudents.append(j)
random.shuffle(projStudents)
# Each project will generate a list of groups with the scores they receive.
grouplist = list()
while projStudents:
group = list()
# There has to be a better way to do this, but this is a quick and dirty way to pick 5 random students.
group = [projStudents.pop(),projStudents.pop(),projStudents.pop(),projStudents.pop(),projStudents.pop()]
grade = 0
for j in range(5):
grade += group[j][1]
# This is a roundabout way to get the average contribution, which is the group’s grade.
grade = grade/5
# The grade is added to the group as the 6th value
group.append(grade)
grouplist.append(group)
# Add this grouplist to the projects list, and go to the next project.
projects.append(grouplist)

print (‘\nFinal outcomes : \n’)
x = 0

for i in projects:
# Print the projects, number 0 through 9
print ((‘project %s’ % x) + “:”)
for h in i:
printlist=list()
score=0
for idx, item in enumerate(h):
if idx != 5:
printlist.append(item[0])
print (printlist)
print (‘Score = ’ + str(h[5]))
print (’')
x+=1[/code][/tab]
And here’s some sample output:
[tab][code]Student 0: 86
Student 1: 91
Student 2: 8
Student 3: 10
Student 4: 15
Student 5: 33
Student 6: 82
Student 7: 17
Student 8: 99
Student 9: 18
Student 10: 24
Student 11: 20
Student 12: 95
Student 13: 35
Student 14: 90
Student 15: 91
Student 16: 35
Student 17: 99
Student 18: 40
Student 19: 13
Student 20: 25
Student 21: 8
Student 22: 3
Student 23: 15
Student 24: 80

Final outcomes :

project 0:
[16, 8, 2, 7, 9]
Score = 35.4
[18, 14, 23, 6, 1]
Score = 63.6
[17, 15, 10, 24, 4]
Score = 61.8
[0, 19, 13, 12, 3]
Score = 47.8
[11, 5, 20, 21, 22]
Score = 17.8

project 1:
[21, 8, 13, 5, 3]
Score = 37.0
[16, 12, 6, 19, 17]
Score = 64.8
[0, 24, 1, 22, 15]
Score = 70.2
[10, 7, 11, 4, 9]
Score = 18.8
[18, 23, 2, 20, 14]
Score = 35.6

project 2:
[11, 2, 20, 18, 5]
Score = 25.2
[21, 14, 12, 10, 19]
Score = 46.0
[8, 24, 1, 6, 0]
Score = 87.6
[7, 22, 13, 23, 16]
Score = 21.0
[17, 9, 15, 4, 3]
Score = 46.6

project 3:
[11, 1, 20, 10, 2]
Score = 33.6
[14, 17, 22, 24, 12]
Score = 73.4
[0, 5, 13, 19, 23]
Score = 36.4
[16, 21, 15, 8, 7]
Score = 50.0
[18, 4, 6, 3, 9]
Score = 33.0

project 4:
[1, 0, 24, 21, 22]
Score = 53.6
[2, 8, 16, 18, 23]
Score = 39.4
[12, 6, 7, 9, 17]
Score = 62.2
[11, 15, 3, 10, 5]
Score = 35.6
[4, 19, 13, 14, 20]
Score = 35.6

project 5:
[11, 15, 13, 7, 10]
Score = 37.4
[16, 5, 9, 4, 12]
Score = 39.2
[2, 18, 22, 3, 0]
Score = 29.4
[24, 1, 20, 21, 8]
Score = 60.6
[14, 19, 23, 17, 6]
Score = 59.8

project 6:
[23, 12, 3, 6, 20]
Score = 45.4
[14, 15, 2, 24, 16]
Score = 60.8
[19, 9, 22, 11, 8]
Score = 30.6
[21, 13, 7, 0, 18]
Score = 37.2
[4, 10, 17, 1, 5]
Score = 52.4

project 7:
[16, 5, 17, 13, 3]
Score = 42.4
[23, 6, 21, 2, 1]
Score = 40.8
[7, 20, 12, 8, 14]
Score = 65.2
[15, 0, 19, 11, 4]
Score = 45.0
[10, 18, 24, 9, 22]
Score = 33.0

project 8:
[15, 24, 23, 21, 5]
Score = 45.4
[18, 19, 4, 14, 16]
Score = 38.6
[13, 8, 1, 3, 17]
Score = 66.8
[0, 12, 7, 10, 9]
Score = 48.0
[6, 11, 20, 22, 2]
Score = 27.6

project 9:
[23, 13, 21, 4, 19]
Score = 17.2
[24, 9, 12, 10, 5]
Score = 50.0
[15, 22, 16, 8, 6]
Score = 62.0
[17, 11, 18, 3, 1]
Score = 52.0
[2, 14, 0, 20, 7]
Score = 45.2[/code][/tab]
This output lists 25 students (number 0 through 24) and their ‘contribution’ score. Grades are just the average of contribution score (that’s a gross simplification, but useful for present purposes). Then it lists the 10 projects, numbered 0-9, showing the students in each group and then the group score (which should be the average of those student’s contribution scores.

It’s an open question whether this models actual group interaction, but my approach right now is to see how much I can determine from a simple model, and then add complexity and tweak the solution to account for it.

The usefulness of this simple model is that we know what each student’s contribution score, and we can see how well we can approximate that score looking only at their group scores. If that’s possible, that should alleviate some of the complaints voiced here so far that looking at group scores alone is ‘vulgar’ or ‘unfair’. If we can approximate an individual character trait well while looking only at group work, we can fairly evaluate individuals without sacrificing pedagogy.

Based on what?

But the best players are not found by looking at the team result, but rather individual performance within a set of games. Looking at team results is much too coarse.

That would invalidate a statistical approach since it introduces a bias.

How did you model student ability and results?

One would expect the students to display ability in a subject which should be a normal distribution.

One would also expect the students to display an ability to work within a team … which I tend to think is also normally distributed.

How ability and team work ability is related - that I don’t know. :-k

Let me propose this scenario :
Student A individually gets 50% in a subject.
Student B individually gets 30% in a subject.

They do a project together and the mark is a 50%.

What does this mean for their individual performance within the group?

Did Student A do all the work?
Did Student A tutor Student B and now Student B has a 50% level of understanding?
Was there effective teamwork? Is 50% a sign of good or bad teamwork? Student B achieved better than expected but Student A only achieved the usual.

How much should the marks of Student A and Student B be adjusted?

Mostly anecdote. It seems that most of adult life is more like group projects than individual assignments, and that the skills group work builds are more important those that individual work builds. It also seems that group work requires most if not all of the skills that individual work requires, while individual work does not require the skills that group work requires.

I admit it’s a weak basis, but I’m OK with that and still find the problem of allowing for mostly-groupwork schooling to be worth solving.

This must be in part because of the way sports tend to work. Even in little league, teams are fixed and the same group plays against many opponents. If we had data from how teams perform when individuals are swapped around within them, we could use that data to evaluate the individuals – and I believe this is done in professional sports, where players are regularly traded. It should also be possible to judge individual contribution buy looking at whole-team performance while a player is on the field vs. when that player is off the field in games where players are rotated, like soccer or basketball.

It would have to be done carefully, but I don’t think by itself it would invalidate any approach. Fortunately, with the mocked-up data set and data set generator program, we can test the hypothesis by first using a fully random group selection process, and then comparing it against a partially random or rule-based selection process.

The model is very simple right now, it simply takes the average of the contribution scores for a group. In reality, people’s contribution will depend in part on who else is in the group. A very individually capable person may lose it if they’re in a group with a bully or a crush, etc. etc. But that’s just noise. Again, at some number p of group projects, we know we’ll get reliable individual data, we just need to find out if p is a number that could be accomplished in a reasonable amount of time.

I don’t know. How do Student A and Student B do on the next project? I don’t think that one can pull out individual information from a single group data point. But lets say we have 50 projects, and we know that A never gets less than 50%, and B never gets less than 30%, and when A works with any other student who sometimes gets less than 50%, the group gets 50%, and when B works with any other student who sometimes gets less than 30%, the group gets 30%. Then can we reach a conclusion about their individual contributions?

Your scenario does raise the interesting point that the outcome will depend on the statistical model we use, and that a solution will likely fit the data better or worse depending on the model that’s producing the data. However, this seems to be a general problem with evaluation, it’s just not as apparent when it’s individual work (indeed, it might be that group work compounds the problem with the added uncertainty of contribution).

If you insist on a brute force method, I suggest that you start with 4 people working in groups of 2 - that’s only 6 unique teams and 3 projects.

For example, students 1 to 4 have the following individual performance out of 100%
1 : 25%
2 : 40%
3 : 60%
4 : 75%

If the teams produce a result which is the average of the individual scores then these are the results:
1,2 : 65/2=32.5
1,3 : 85/2=42.5
1,4 : 100/2=50
2,3 : 100/2=50
2,4 : 115/2=57.5
3,4 : 135/2=67.5

Then if those results are averaged for all the groups of a particular student :
1 : 32.5+42.5+50 /2 =41.7
2 : 32.5+50+57.5 /2 =46.7
3 : 42.5+50+67.5 /2 =53.3
4 : 50+57.5+67.5 /2 =58.3

How does this relate to the individual performance that we started with??

Carleas, you can test your own methods by merely inventing some proposed actual performance measurements and then apply your proposed theory to reveal individual performance and see if you come up with the same performance numbers.

But basically, you are doing the original sin thing that has cost all of Mankind ALL of its troubles: Over extending your reach in an effort to remotely control too much. You want to use a few simple high order group numbers to judge and peg individuals associated with the group. That is no different than England wanting to control the Colonies and every other travesty of justice for thousands of years. It is sociopathic.

You can rely on the biases built into standardized tests alone or you can complement or replace this with the biases in this kind of group evaluation model.

and that shifts the burden completely over to the biases in tests and then also in testing in general as evaluation. Testing does measure certain things, and different kinds of tests measure different kinds of things. But they discriminate against those who do less well in testing situations - certainly in the very artificial, time limited, quiet, sitting in rows types of standardized testing - often multiple choice or fill in - that are common and teachers often need to use because of their own time constraints. I think the problems will still show up in a broarder range of testing types - say, including take home essays and hands on type testing - but is very severe with the usual test types.

I think a shift over to emphasis on doing - a la Dewey - apprenticeship approaches, real life problems as the basis for PBL and as much as possible interested centered learning is the direction I would like to see education move in. Also I think the current pedagogies are poor training for participation in society because it does not resemble this, but even more so it is poor training for being a free mature agent in a democracy. This is not only due to the evaluation process, but this is certainly a significant part of the problem. Once you are seen as a container that the authority figure must fill with certain things (that yes, do include doings also not just facts), you are not learning to be an agent. And by the time you get to college, where you potentially have more freedom of choice, your brain is pretty well used to what itr role is. Plasticity has had its major stage.

In this case, we know we can solve for student’s individual abilities. I’ll replace the numbers with variables for ease of manipulation.

1: w
2: x
3: y
4: z

1,2: (w+x)/2 = a
1,3: (w+y)/2 = b
1,4: (w+z)/2 = c
2,3: (x+y)/2 = d
2,4: (x+z)/2 = e
3,4: (y+z)/2 = f

w = a - e + c
x = e - c + a
y = f - c + b
x = c - b + f

These solutions aren’t unique, but you can verify them on the data you provided. They do assume that we know how the individual contribution scores relate to group score, but that is again a question of modelling, and it’s not unique to this group work case (we assume, for example, that SAT score bears a certain relation to individual academic ability).

As I show above, that’s possible given a relation of the group numbers to the individuals. That that is not a given is a problem that plagues all evaluation of individuals. However, given that such evaluation is ever possible, finding the relationship between individual input and group output is an empirical question.

This isn’t an issue of control, it’s a matter of using better educational techniques without sacrificing the ability to evaluate individuals. I’m not advocating for such evaluation either, I’m just recognizing that we do in fact already evaluate individuals, and that the only academic systems likely to be implemented in real life are ones that allow for the evaluation of individuals. If group work is a better form of education, it behooves us to find a way to evaluate individuals based on their group work.

The projects presented to groups will have similar problems with implicit biases as do standardized tests, and the format similarly benefits specific traits that we might not want to be testing for e.g. extroversion and physical attractiveness. And the two biases don’t seem opposed; there’s no reason to think that subjective evaluation will cancel out the bias in the project design. Anonymous grading removes a bias, and leaves other biases untouched.

Otherwise, I agree with your comments on doing, real-life problems, and training people to be members of society. Increasing the amount of self-directed group work is part of that.

I’d be interested to know your thoughts on evaluation. In an ideal world, we might prefer not to evaluate people at all, or at least not so reductively, but in practice reductive evaluation is needed to communicate ability and potential to people who might not be able to observe intangible qualities, or who might need to review so many applicants that a holistic evaluation would be impossible. In those cases, numeric evaluation seems necessary, even in an otherwise student-centered environment.

Yes, you can solve them with algebra as long as the individual contribution is constant and directly related to the group mark. As soon as the subjectivity of marking is taken into account, the equations won’t be consistent.

That is the other problem - you are assuming that individual contribution is constant in all the group projects - which is extremely unlikely.
And that the relationship of individual contribution to group mark is linear (which is your current model/assumption). - also unlikely.

But you are trying to derive an individual score from a set of group scores. That’s an extra set of mathematical manipulations as compared to an individual taking an SAT and getting a score. Manipulations based on a specific assumptions.
Whether the SAT means anything or not is another question.

It is NOT at all “possible” and in fact, QM proves it to be truly impossible (not that simple logic wouldn’t have been enough).

It is a con game used to manipulate. You just don’t see how.

It is exactly the same as someone saying, "My answer was 32, so what was the question?"

“Better techniques” will NOT answer the question. But the belief that you can use tricky techniques allows for manipulation of belief.

Mathematically it is merely an issue of having too many unknowns and not enough equations. You can’t actually solve for ANY variable until you have as many or more equations as unknowns.

It is excessively unwise and demonic to propose otherwise.

Going about it the WRONG DIRECTION does NOT improve the system, merely makes it even worse.

In the long run such blind remote control is proven to be the WRONG DIRECTION. Even IBM proved it with their Big Blue vs Baby Blue issue in the late 70’s. Localized information processing (aka “individual supervision”) is required for maximum performance of the over all system. But that means that the local supervisor personally knows how to evaluate what SHE is watching (and they NEVER do). You are proposing to give her a formula to make her even MORE blind and heartless … a dumbed down bureaucratic mechanistic drone (much like Obama’s socialistic healthcare insurance drones they call “physicians” or “anti-doctors”).

Be it as petty as it is, it is evilness in action.

But that’s just noise. Since we’ll need to be doing statistical inference from incomplete sets of group scores, the noise will just affect how many projects we’ll need to get a statistically significant result.

The equations I provide show only that if we know the group scores and the way individuals contribute to group scores, we can find individual scores. It’s thus possible. The quality of our theory about how individual scores contribute, and how consistent project difficulty and scoring are, are all questions, but the first is empirical and the second two are statistical. We can still get an answer within error bars, which is what we get with individual assessments (we just don’t acknowledge the error bars).

I’m making specific assumptions here, because of the theory we built into the model. Given a different theory, we could use different assumptions with the same result. However we set up individual ability to contribute to group score, with enough data points we can extract individual contribution to a statistically significant degree (if we have the right theory).

But if what we want to know is something closer to ‘individual contribution to a group’ than to ‘ability to take an LSAT’, a statistically derived result based on group scores can produce a more useful result.

Here’s a hypothetical to spell it out: let’s say we decided to grade people for how high they can reach (for whatever reason, we value this skill), and so we use individual height to approximate it. What I read you to be doing is to compare my proposal here to grading people based on the average height of groups of three students. With enough groups of three, we could get a statistically significant approximation of individual height, but it won’t be as accurate as grading individuals until we’ve got a complete set of group average.

But what if we put the students in groups of three and tell them, as a team to reach as high as they can. The strong ones can lift the light ones to reach higher. In practice, reach is measured better in this way, because in practice people work in teams like this when they reach. If we then measure the height the group could reach to, even though our statistical approximation isn’t very good at figuring out height, it’s better at measuring the complimentary skill that add up to reach in practice. It’s thus still a better metric, even with the noise.

This is a silly hypothetical, but I hope it makes my point clear: Even if we can’t measure individual contribution precisely based on group scores, if group scores are closer to what we actually value, the imprecise measure may nevertheless be a better metric.

You can approximate the variable, though, using the unwise and demonic tools of statistics. And again, in practice we don’t have great information about how e.g. SAT scores relate to human worth, so there are always a lot of unknowns, and we’re always just approximating.

You don’t know what is signal and what is noise because you don’t have enough information about the nature of the relationships. You haven’t studied the most critical aspect - how does teamwork alter performance?

You don’t have a theory.

When I asked how ability and teamwork ability affect the group score, you said that you did not know.

You produced a program that arbitrarily assumes a fixed contribution value for each individual in each project. You have no grounds for doing that.

You seem to think that if you get enough data, you will be able to work backwards and calculate how individual contributions produce a group result. The relationships are almost certainly too complex to do that. You’re just going to end up with lots of data. You need to start at a more fundamental level, build a theory and test the theory. You need to look at the performance of teams first and only after you understand it, then you can calculate backwards to get individual scores from a group score. (Although the problem may still be unsolvable, you should have a good idea about why it is unsolvable.)

Oh I see. So with current information and since everything is just a guess anyway, my best guess is that it is better to just kill off ALL Jews and Blacks, therefore…

Gyahd… :icon-rolleyes:

Sometimes I have to think that homosapian is just too fucking dumb to be a species.

One can picture it already…

Two students work on a project and it gets a grade of 75%.

“We have analyzed the data and 65% fits our models better. The other 10% must have been noise. The correct grade is 65%”

“We have also analyzed all your projects and determined what must have been your contributions. Your contribution merits a 63% and yours merits a 68%”

How do you know that the project grade was inappropriate without looking into the details of this project?
How do you know how much effort each of us put into this particular project without looking into the details of this project?

“We have lots of data. We have a model. Software does not lie. You have been assigned your correct grades.”
:open_mouth: :astonished:

:smiley:

Trying to add resolution to an existing picture only helps you to see what you were expecting or desiring to see. You connect the dots and fill in between the lines with shades of preference.

There was a lot of detail present during the creation of the project but most of it was not recorded. The final mark is a gross sum of the effort. Now he wants to use it to rebuild the details. :confusion-shrug:

Exactly, like getting a thumb picture of something that you liked and trying to expand it into a full size pristine photo (or trying to modify SCTV cam shots to show fine details … even if they are often false).

“It doesn’t matter if every convicted person was guilty as long as the probability and majority were guilty.”

Different biases, not the same ones. My point was that thinking that testing somehow removes bias is not correct. You raised the issue of biases. I pointed out that there are biases with testing.

Extroversion, I think could be worked against, but making the groups task oriented and with teacher intervention. Not eliminated, but worked against. Of course someone could argue that these same traits match biases in the work world, so the evaluations will be good markers for future success and also attractiveness to employers. I wouldn’t want to run that line, but I think then we need to eliminate the conception of eduction as the production of good worker bees.

I think anonymous grading of tests is fine. I am not arguing against it. It just does not deal with the biases of testing as a whole.

Well, in a society that is rushing out of control into the future we do have little time to evaluate and so we measure. I can’t be critical of one piece of the whole without being critical of the whole. I think extending apprenticeships more generally into society would be good thing. And I do not mean following the old guild model. I think students should be able to apprentice without committing, certainly when the are young. I live somewhere where everything is education/‘merit’ based. All security guards have gone through exhaustive eductions. Even cleaners and other basic skill jobs are getting educations. There is little sense that most jobs can be learned by smart enough people and that most of the main learning will take place on the job. I would like to see the opposite trend. Where character, determination, personal sense of connection for a core for someone getting into different kinds of internship/apprentice situations that lead to long term placements. Some professions need some real focus on very specific skills and there I can see dealing with grades in number form more. I think engineers could be given problem based (real life scenario problems) tests and get grades on their math etc. and the numbers would be valuable. But how well one does in many jobs is just not going to be reflected by how well you dealt with sitting in rows, cramming for tests, and being, in general a passive intaker of knowledge. I believe there have been some studies that have shown that determination, ability to deal with frustration and obstacles and passion are much better markers for success in most fields. The gifted often end up not shining. And schooling trains a certain kind of relationship to learning and dealing with obstacles/problems that is not so life like. At the very least I think evaluations should be there in addition. And really it is not so hard to dip into even a rather big pile of evaluations of a person and learn much more about them in not much more time. Some businesses actually put people in group problem solving situations with other candidates and watch, and these companies are aware that introverts can be valuable members of a group, so they are not ‘fooled’ by the flash of the extroverts. Teachers and students could be taught this also. I mean, think about it.

If we are in school to learn and we just assume that students cannot learn about biases based on physical attractiveness and extroversion, then we have cut the world up, very oddly, into what is prioritized to be learned (facts and formulas) and wisdom, which is considered off the table. It might be a good idea to work on teaching wisdom and slow the whole thing down.

And by the way, I am not saying that we should have a kind of big brother education where we train people to not have biases, but there are ways to keep the light on such issues, withing jamming conclusions into people’s brains, such that people can come to see things in a more complex way.

Tests on the other hand treat everyone AS IF the best knower of history or the user of English or the user of math has the precise same ideal. They should be able to answer more questions correctly than the others.

In a workplace or group of any kind, very different kinds of learners, with very different kinds of skills and personalities can contribute and complexly formed groups do better than groups with, say, all extroverts. If testing is the measure, yes, introverts will not be biased against, though perhaps extroverts will. Some of them at least will tend to make more detail errors, on the other hand they may set the ball rolling and help groups towards solutions faster. This may not be the case about extroverts, but there are going to be character types who will do less well on tests, but add at least as much to professional or educational groups. And they may well flourish in apprentice type situations.

some people like to deal with things as abstracted out problems. Others cannot invest energy and focus until it is part of a project that is real. A lot of the people who get labelled ADHD do better when physical action is part of the learning and when actual projects are taking place. Tests, which they consider more boring than other students do, are severely biased against them.

I’ve gone a bit random and wandered.

Why am I jumping into this OP? … don’t know … feels like I’m going out on a limb here.

In another forum I suggested that intuition has a role to play in the learning process … seems to me our education system … and culture … suppresses our in born intuition capabilities.

Babies have intuition … I often make eye contact with babies/toddlers(strangers) … sometimes I get a smile … sometimes a frown … sometimes a scream … and so on. Seems to me there is an intuitive assessment of compatibility from even brief eye contact … without any verbal communication.

How about introducing 3 projects to a class … try to elicit some discussion of what is expected … and then allow the children to decide which project they want to join … by secret ballot … to avoid the natural group associations already existing in the class.

May take a couple of iterations to work out the wrinkles … results may be surprising … and then again maybe not so much.

There are at least two ways the idea could go. One is to continue to insist that the best way have to test individual ability is tests like the SAT and an average individual-project scores (GPA). In that case, the best we could do is to correlate group performance to those individual indices. I suspect there would be a strong correlation, but there needn’t be. This is an empirical question, and I don’t have the data.

Another, the one I’d prefer, is to say that the SAT and individual-project scores aren’t especially relevant to what we’re really interested in, that in fact teamwork is performance, and so measuring how an individual affects group score is measuring the relevant individual ability directly, rather than by proxy. Then it is a statistical question to find out what an individual contributes to a group.

If it’s a given that we’ve gotten “enough” data, then of course we can calculate individual contribution to a group with high confidence (unless we’re assuming a degree of agnosticism about how evaluation relates to ability that would undermine any form of evaluation, including SATs and average individual-project scores). Even if the relationship is very complex, with enough data points we can find a model that fits. Each data point increases the confidence.

You are right that I have no idea what “enough” data is, though. But I think, if we take sample data generated by a simple model, and see how reliably we can reverse-engineer the model, we can get an idea. I think we agree that there is some degree of complexity of the actual relationship between individual contribution and group outcome such that it would be wildly impractical to attempt to get enough data for a reasonable approximation. My contention, and where we apparently disagree, is that there are some possible relationships between individual contribution and group outcome that we could get enough data with relatively few data points (for example, if the relationship is the one I used in the program to generate the sample data).

And, I think the data required to establish the model and the data required to actually evaluate students should be treated differently. It could be the case that we need thousands of results to get a good idea of how individual contribution relates to group outcome, but that once we have that relationship, we can plug in a much smaller data set to get reliable scores for a group of students.

I worked in an academic admissions office that got close to 40,000 applications a year for a few hundred spots, and I can tell you that at that point, the admissions process will use any number they have to pair down the pile. We were concerned to do as holistic a review as we could, but with that many applications it’s just not possible, and GPA and test scores would effectively serve as a blunt filter to reduce the applicant pool by more than half.

And, it is my impression that the connection between admissions criteria and actual outcomes is intuition driven and not at all empirical. I would guess that companies that you mention who have moved to group- and problem-solving interviews are actually using empirical findings to improve their hiring processes, since the companies that I am aware of that do this are companies that are willing to buck the status quo to improve outcomes.

Unfortunately, there’s a lot of interest in not discovering whether or not e.g. the SAT correlates with future success, not to mention that much of the data necessary to really drill down into what predicts life outcomes is either privately owned or protected by regulations that prevent such analysis. It does seem like that’s an area begging for more research (though I am only passively watching for such research; there is likely more out there than I’m aware of).