TSDI - Unit 4 - Expert Panel #3

Video Transcript

I’m here with Dr. Susan Friel again and we’re going to be talking with some, early you had heard

her mention some of her favorite data sets and we’re going to actually talk about some of those

today. So, I know that in a lot of your work you use a software called Tinker Plots and we’re

also giving our participants an option of using that software as well in the course, and it’s a

popular one used in middle school settings. So, we’re going to talk about one of your favorite

data sets, it’s a cat data set that has, I think, 200 cases in it.

This one does, yeah.

This one has 200 cats in this data set. So, tell me a little bit about the data set, kind of where it

came from and what some of the attributes are in it.

Okay. Way back when, quite a few years ago when I first got into to doing statistics now for the

schools I worked on an elementary project called Use Numbers, and the co-PI who worked with

me was a lover of cats. So one of the data sets we put in the 5th grade curriculum, 5th grade

curriculum I think, was a data set of 27 cats, and it was data from their friends and from

themselves in terms of what they were using, and actually when you…

That’s a lot of cats for a few people.

Yeah, like they’re friends too. It actually is still on Tinker Plots, that 27…

Yeah, it comes with it, it is a sample data set, yeah.

That 27, it comes with that data, yeah right, and so, whenever I’ve done workshops with teachers

around statistics; we had a big project in North Carolina called Teach Stat many years ago where

we did two weeks of professional development. One of the things we did was get a cat data base

and then we would use that to motivate statistics, so, what you see on my screen is a picture of a

data card. Every time I do this I make some data cards to go with a group of teachers I’m

working with, and you see a cat named Sasha, and Sasha was my cat and she, at the time, was a

female was 13 years old, weighed 11 pounds, body length was 19”, tail length was 11”, her fur

color was white, ginger, and black, her eye color was yellow, although you can’t see it, and her

pad color was pink and black, and then she’s Ginger’s sister who also lives with Annie, and

Ginger and Annie were my two other cats at that time.

And so, first of all, you see I have a bunch of attributes. So when we think about a data set we

talk about, “Well, what are the attributes that we can collect data about?” And then, you can go

home and do this measurement, indeed, we have a little diagram that teaches kids how to go

home so that they can actually weigh a cat, and I don’t know if you know how to weigh a cat, but

the way you do that.

 Teaching Statistics Through Data Investigations Page 1

2

Without getting scratched.

Well, the way you do it is you step on the scale and you weigh yourself and then you step on the

scale with the cat in your arms, and hope that it doesn’t wiggle because it will wreck the weight

if it does. And then, or you can do it first, weigh the cat and then weigh yourself afterwards,

that’s how you get the pounds. The body length is measured from the tip of the nose to the base

of tail over the head, and the tail length, obviously, is from the tip of the tail attached to the body

to its end, and fur color you have that, so there’s a numerical data we have, and then there’s what

we call categorical data that we have about each cat which makes it very nice.

Yeah, yeah. So with the pounds and the inches, when these measurements were taken or

students are going home and measuring on their own cats, are they directed to how they’re

supposed to round as far as pounds or as far as inches, is there…are there any other…are there

any particular measurement issues that might come up.

It could, typically, they’re using a scale in your house, so they could get tenths.

Yeah, it depends on the brand of the scale, if it’s digital.

Yeah, and so…and actually, I may have some that have like 7.5 or something like that in terms

of weight or…and some of them will measure 11 ½ inches or something like that, typically,

they’re measuring pretty quickly so they’re not often…go to…but some kids are occupied with

the parts, right.

Yeah, hope that that cat stays still so it won’t…

But it’s very simple tenths of an inch or something, we don’t…we have never used metric, we

also used the English system because of the weight and everything that we were doing. And why

I like this data set is you can use it to work with kids, even just the 27 cats, and if the kids want

to they can add their own data, which then helps them make it their own as well when you’re

doing it.

Yeah, that’s nice….that’s nice. All right, so let’s open up Tinker Plots, I see you getting this

open and tell us how the…like Sasha’s card or a card from a different cat would be represented

over here.

I’m not sure where Sasha is, but I’m pretty sure she’s in this set. I have used a lot of different

software and I…excuse me…I worked with Cliff Konold who developed Tinker Plots, he had

the middle school projects working with him, so I was quite actively giving him feedback. I

think this is the most exciting piece of software I’ve ever seen. And I’ve used it with kids, and

it’s very engaging. This is a pretty large data set, this is 200 cats, I probably would start with 27

with kids because they’re going to explore it first, but you’ll notice on the left side, you’ll see

that you see the same attributes listed; name, gender, age, weight, body length, tail length, eye

color, and pad color. And notice for this cat, Speedy, it belonged to somebody, I’m not sure

because I collect these from my teachers when I work with them, and at the time, you’ll see that

the Unit of measure for age was years and they reported that he was .17 years old, which I don’t

know how many months that would be, but that’s probably how they got it.

 Teaching Statistics Through Data Investigations Page 2

3

Yeah, yeah.

This is one with weight where they had 3.5.

3.5, yeah.

Yes, body length is 15”, and tail length is 8”. And so, we have this array of data in here. With

the one with 27, you can do the same things I’m going to do with you, and it might be easier with

them too, so, part of it is the attributes signify with their coloring the kind of variable you’re

dealing with. So if I click on gender, what do you notice about the color of the icons?

Yeah, we only have two represented.

Two represented, and what’s neat about these is they move around the screen, if I do this little

circle down here, mix up…and what I should do is put my sound up.

So each of these dots is a case.

So let me…I’ll click on a dot here and watch the table over here. That’s Trooper, and here’s

Lucy, and here’s Pink Lady….so each of those is a case, and if I also have a spreadsheet like

table I can have open which will be linked, so there’s three things that link together when you’re

doing it.

Yeah, yeah, and I think it would be very important when students are first starting to use this

software, that make sure that they understand the links between all these different

representations.

Well, with the 27 cats, to be honest with you, because I still love the cards, you saw the cat card

with Sasha, I have a set of 30 cat cards that go with this data set, but with the 27 we actually have

all 27 of the cats as cards, as well, and typically, what we do with kids is we explore the data

cards first, then we come back and we look at the computer and we talk about how the data cards

actually connect with the representation on the computer, they don’t have much struggle with

this in terms of working through it so.

Oh yeah, yes…yes, I have worked with students and teachers with this software and it is

extremely easy to use.

Well, there’s this little thing down here that lets you mix up the icons and you hear the sound, I

once worked with a group of kids who really got into to just doing this, they like the sound and

seeing it mixed up first so.

And as a teacher you learn how to turn that off.

Right, you can just turn your sound off and you won’t hear it anymore. So, one of the first things

you can do on this, is you can separate the data and you just click on an icon and you drag it, so

you’ll notice that all the purple ones went to one section and all the green and yellow ones went

to the other section and low and behold, one group is male, one group is female, and if we want,

we can stack them. So you see them stacked, looks like it might be 100 male and 100 female.

 Teaching Statistics Through Data Investigations Page 3

4

It sure looks that way.

But we can actually go up here and turn on, no, that’s not what I want to do, I want to label…

We want the end?

I want the end.

There we go.

Yes, there are 100 of each so, and this is sort of a fixed database.

Yeah, there is purposely chosen 100 males and 100 females.

And if you wanted to actually see the key to this, you can even put this little key up and it tells

you the colors that you’re working with, so it’s very nice.

That’s nice. So, I think, it’s interesting that at times when you’re first getting students to kind of

think about data, there’s this playful kind of period, and what I see you doing, I mean you’re

telling us a lot about the software, but you’re also just starting to play with different ways of kind

of organizing it, and that seems like it might be an interesting and important thing for students to

do.

Yeah, it is, it’s very nice. So you won’t get in the formal histograms or things like that on here,

but you can do box plots and a few other things, so, if I turn off the end and I mix everything up,

notice what happens when I click on the next variable, age.

Yeah, I’ve got a gradient.

Yeah, you get a gradient.

From like a white to dark red, or pink, it’s hard to tell.

Right, and actually if we turn on the key, you can see that it actually goes from .17 years old to

18 years old so that’s actually the ages that we’re talking about. Now again, I can mix them up.

So these are just randomized placement on the screen to show that there’s no order.

Right, and if I click on this one that’s Diva, and she’s 3.5, she’s 3.5 years old, and so that tells

you. And if I click on a dark one, there’s Bart and he’s 16 years old, so the darker colors are the

older ages. Now again, I can drag, now I can start to drag and you’ll notice that it starts to

section the data. I’m a big one for just dragging all the way. And so, you see it shows from 0-18

as the years but it’s all spaced out above, and of course, we want to stack it. So I just stack it like

that and sometimes I spread it out a little bit, I can go down here and make my icons a little

smaller, but some people get bothered by how they sort of lump over each other, but I don’t, it

sort of gives you the idea of where they are.

Right, right.

 Teaching Statistics Through Data Investigations Page 4

5

So this is actually the distribution of the ages of the cats that are in this database, and it goes all

the way to 18 years. And you’ll notice that I have a lot of young kittens, don’t I? A lot of young

cats.

Yeah.

But I don’t have a lot of data on old cats.

Yeah.

Cats actually live a pretty long time now, but these people, that’s what they had.

Right, right, and that seems like an interesting thing to point out, that this data set isn’t

necessarily a random sample of cats that we can then make any type of inference about all cats,

but this is just a collection of cats that we happen to be studying.

Right, it’s a convenience sample, like from people in the group.

Yeah, so we couldn’t infer necessarily from this data set that cats don’t live a long time.

Right, now one of the questions if you look at weight, so this naturally lends to some

comparative questions very quickly. So if you look at weight, you might wonder if the older cats

weigh more, right? So what I can do is, now I’m going to click on weight and you’ll notice that

the icons change color. So that’s showing me the weights, but it’s pretty hard to sort of tell by

just looking at the colors, you can see a lot of the light colors which is the low weights are down

here, but you start to see the…

All the orangey’s are kind of spread out in there.

Spread out, yeah.

It doesn’t look like a clear pattern to me.

No, and so what I absolutely love about this software is you can start to drag up so if I just

start…just click an icon and start to drag up.

Because you were thinking you want to see this in a two-dimensional.

Right, and I can actually drag it further if I want to. So we start to get even smaller intervals, so

you see…notice these cats here, and here’s one that’s very small at 6 years old. They’re only

between 0 and 3.99 pounds. And then we start to get up here, so our question was, “Do older

cats weigh more?” So the question is, “What do you think?” because here’s our weights and

here’s our ages, and so, you start to see the heavier weights are up here and the ages are over

here, we could actually put in a little reference line if we wanted to, and just say, “Well,

here’s…well, let’s just say 12 pounds and more is heavier”. So here’s where 12 pounds and

more is. No, I’m sorry, ages…

Ages.

 Teaching Statistics Through Data Investigations Page 5

6

Ages 12 are older, and here’s the weights, and you notice that, actually, the weights are variable

about the cats.

Right. Right, you have several of the different weight categories represented.

Right, so you can actually look at that.

Right, and what I find…interesting about what I just saw you do, and I have seen children do

this, as well as, teachers who are learning to play with data sets, is you start looking at one

variable and all of a sudden you have a question particularly about that variable, and then you

wonder how it might be related to another variable, and I think that’s a beauty of having data sets

that have multiple variables in it that you can explore.

Right, and I will tell you as teachers, this data set naturally leads to what we call co-variation,

which is how does one data vary with the other, and I actually can turn off this reference line and

pull this fully all the way through, and you see a kind of scatter plot, not a well related scatter

plot, the data’s all over the place and so it’s not fitting in a line or anything, and there’s no way

to draw a line of this fit here, but I can go all the way and look at the data in relation, age and

weight, and how does it…it’s very scattered so there doesn’t appear to be any relationship to it.

Yeah, yeah, and it’s just important to give students opportunities to see data sets where there is

no relationship, as where there is a relationship.

Right. And so, part of it we could check, I haven’t thought about this, but I haven’t looked at

this in a while, but we’ll mix up…here’s body length, and I’ll stack it, and then let’s just look at

tail length.

So, do long bodied cats also have long tails?

Right.

Perhaps.

Doesn’t look like it….maybe if I pull it down a little bit, then we can see the categories. Again,

here you’ve got longer cats. Well, you’ve got, well, I think it’s just…isn’t that interesting, yeah.

Because the longer bodies, so let’s say, you know, 24, there aren’t any…there aren’t any 24” cats

that have…

Tail lengths this length.

Yeah, they’re all 8 and above.

Right. And these little cats, they of course, couldn’t have that, the long lengths because there is a

proportion, like cats actually have a…once they get to a certain point they can be big, but their

tale lengths stop getting big-big.

 Teaching Statistics Through Data Investigations Page 6

7

And one of the things that we want students to do, is to be able to informally kind of describe

these relationships, that we don’t have to be rushing to formally describing everything with a

new regression or with correlation and things like that.

Right.

We need these informal experiences to be able to just think about ‘what does it mean for two

variables to be related to each other?’

Right, and as I said, kids naturally ask questions about relationships with this data set, so this is

the data set for it. Now suppose that you wanna just look at weight, and we’ll just…let me get

weight again.

Yeah, I see…are you going to look at this last…this question?

Typical, yeah.

Yeah, yeah, so the question that she’s thinking about is, “What’s a typical weight for cats, and do

males tend to be longer or way longer…is it?”

Weigh more.

Weigh more.

Tend to be.

Be longer than females, but we could also talk about weight, “Do males tend to weigh more than

females?”

Right, so here’s a distribution we might talk about typical, and kids actually look at clumps of

data, so they would look at the middle and say, “Well, there’s a lot of cats between 8 and 12 lbs”.

Yep, I would have claimed that would be a nice middle too.

Right, but you can get…this software will mark the median, so it shows that 9 is the median, if

you want the numerical value of the median, you could show it, so it will show you the median

or I can turn the numerical value off, and then the little arrow marks the mean, which actually

turns out to be…

About the same.

Yeah, it’s 9., I’m going to turn the median off for a minute, it’s 9.0475, so that’s really

interesting, the mean and the median are almost identical in this other data which is not always

that common.

Right, right.

So the software lets you mark those and so you can mark the mode too, but I generally don’t

spend much time with the mode, but male female, let’s go back up here to gender. So again, we

 Teaching Statistics Through Data Investigations Page 7

8

can do this separate thing and separate them into two groups; so, here’s the males up here and

here’s the females up here. Here’s your median and your mean for your females, and here’s your

mean a little bit lower than the median for the males, between 10 and 11 pounds here, and this is

8 and 9 pounds. So what do you think, do females weigh less than males?

It seems like, in general, there tends to be that the females are slightly less. I mean, we certainly

see similar ranges. I mean, they’re…both the males and females have variability in their data

set, so…which would be expected since we knew the ages of our cats, so one of the things we’re

not looking for here is the age of the cats, we’re not comparing older cats only, but the male cats

tend to have a little bit higher weights.

Right, if you want to put on age for the color, you can see that.

Yeah, this is one of my favorite parts. This is one of my favorite parts of the software.

You can’t do a three-dimensional on this, but you can see weight and gender and age, and you

can see the younger cats at least have the lower weight in both categories.

Absolutely, so right now, we’re looking at three different variables in this data set, and I’m here

to tell you that middle school students are very capable of coordinating their reasoning with three

variables.

And look at the dark red, and those…watch the ones that are…

The dark red are the older ones? Yes, they are the older ones.

The older ones, right, right, and so they’re spread out, I mean, you do…older cats do have a

tendency to weigh less, but then you can have bigger cats that weigh more too.

Right, right, I had a male cat that was 20 lbs.

I have one right now that’s 18.

Yeah, yeah.

And so they weigh a lot, and so if we turn that off, what did we have? Do males cats tend to be

longer than females? So what we could do is body length.

Right, so we can leave the male.

Oh, I’m sorry.

That’s okay.

I can undo that, you’ll need to go back.

Yeah, so we have males and females already separated, I’m thinking we could just substitute

body length for weight down here.

 Teaching Statistics Through Data Investigations Page 8

9

We could, we could just drag it down here, you’re right…we could just drag it over here.

And we see now, the body length, the distribution.

For the females, and for the males. Now here’s the median and mean for the males, and the

females, it looks like it’s pretty close.

I think too, I see a lot of overlap in those distributions, whereas, before I was seeing it a little bit

more separated.

And I think this data up here might actually pull the mean a little bit this way.

Oh absolutely, I would think that that would be.

Right, so the medians are really close, it’s like 18 and this one’s 19.

So, let’s hide that data case.

There we go, hide selected case.

There we go, so we have hidden…so that…if we take that case out, now we only have 199 cases

being shown in our graph here, and that lowered our mean and our median for the males. Yeah,

nice.

So, this does a nice job of letting you explore some of the concepts. If you wanna see what a box

plot looks like, we can actually do this right now, I’ll show you how a box plot works on this. I

don’t want to do that, sorry.

Yeah, there we go, box plot.

And so, there’s a box plot, if you remember what they are, I’m going to shrink the icons so you

can’t see them.

Make them just a little…there you go.

Okay, so you can.

I actually like to see both.

Right, and what’s nice with a box plot, if you remember, it’s possible to do outliers, I don’t know

if there are outliers here or not….oop, there are.

Yes.

So, here we go with our box plot, the middle 50%, the interquartile range tells us where 50% of

the data are, and 50%, and kids will zero in on that as a way of describing it, and then you’ll

notice that some of these upper values both in body length are unusual when compared to the rest

of the data when using the guidelines for how to determine outliers when you do box plots,

there’s some unusual values there. The box plots themselves, the interquartile ranges are very

 Teaching Statistics Through Data Investigations Page 9

10

similar and the males are shifted only slightly to the left. So what was the other one we were

looking at, we were looking at, was it weight we were after?

Yes, we were looking at weight before.

Let’s bring that one back down. All right, so there’s the weight one, there are some outliers,

there’s a few little outliers over here, but notice there’s a bigger shift on the males, and notice the

line in the middle of the median, the median has shifted higher too. So box plots are very handy

for doing that kind of thing.

They really are…they really are. So at the end of an investigation like this, when students are

starting to make kind of their final claim or their interpretations of the data, trying to make

a…what their final result might be, what do you hope that they’re going to do when they’re in

that final stage of trying to explain what they found out?

Well, actually I’ve done it and kids have turned in their graphs and things like that with this, but

so…what’s a typical weight for cats? I’ve asked them to talk to me about what they think is a

typical weight and they might start off with all the cats, and then they’ll go…they might split it

up just this way. Kids do have a tendency to not want to choose single values. For, you know,

the mean and the median, but then we ask, if we had to pick one number to try to describe what

might be the typical weight, why might we pick the median, why might we pick the mean? And

so, the idea of a single number describing a set of data is important, but typically, they look

where the data have a tendency to congregate and they try to do that as well.

Which I think is actually very valuable because as we were talking about like weight times

earlier, that a single value you can always expect some error around that, and so, being able to

describe kind of a central tendency both with an integral of, ‘okay, this is where most of the data

is, I could describe typical as 8 to 9 or 8.5’.

Right. And the other measure, which this one…this one does a plot for, but it doesn’t give you

the measure, that mean absolute deviation that crops up in 6th grade now, I have come to

appreciate it as a measure. Because the kids really, it’s much easier to look at this…

It’s computational mean here.

Right, but it’s also easier and I don’t know what it is here, but it is easier to talk about a measure

of center and then say, ‘let the data vary around it’ in this way and that’s what the mean absolute

deviation does, so that idea of weight time in a line is saying that typically the data vary around

the mean as much as this, high or low, and that’s really what you do, and once you know that, as

well as, the measure of center you sort of have a good picture of your data set.

Yeah, yeah, that’s a nice way of being able to describe that distribution.

Right.

So in your final claims, you want them to be using appropriate measures, kind of maybe using a

couple of different things to support their reasoning and making sure that it’s highly connected to

 Teaching Statistics Through Data Investigations Page 10

11

the context that, you know, they don’t just say, “Well, the mean….the typical weight for cats is

5”, you know, that they give, “it’s 5 because”, and this is “that most cats weigh”, and kind of

making sure that they are connecting that with the context.

And then they might come back and say, “But if you look at the males and females, it looks like

the males weigh a little bit more than the females because this is what the mean is”. It might not

do box plots because it might not even know them at this point.

Right, of course.

But they might look at that and say, “Use that” and they might look at some other things too, so

there’s a richer discussion.

Yeah, yeah. This was fun looking at this data set, and hopefully, it’s inspired some of you to

think about how you might get kids engaged in exploring an interesting data set that you find that

you create yourself.

 Teaching Statistics Through Data Investigations Page 11