13.4.11: Evaluating Data as Sources
Evaluating data for relevance and credibility is just as important as evaluating any other source. Another thing that is the same with data is that there is never a 100% perfect source. So just as is pointed out in Evaluating Sources , you’ll have to make educated guesses (inferences) about whether the data are good enough for your purpose.
Critical thinking as you evaluate sources is something your professors will expect. But you’ll benefit in other ways, too, because you’ll be practicing a skill necessary for the rest of your life, both in the workplace and in your personal life. It’s those skills that will keep you from being duped by fake news and taken advantage by posts that are ignorant or, sometimes, simply scams.
To evaluate data, you’ll need to find out how the data were collected. If the data are in another source, such as a book; web page; or newspaper, magazine, or research journal article, evaluate that source in the usual way (see Evaluating Sources ). If the book or newspaper, magazine, or web page got the data from somewhere else, do the same evaluation of the source from which the book or article got the data. The article, book, or web page should cite where the data came from. If it doesn’t, then that is a black mark against using that data. (The data in a research journal article are often the work of the authors of the article. But you’ll want to be sure they provide information about how they collected the data.)
In addition, if the data are in a research journal article , read the entire article, including the section called Methodology, which tells how the data were collected. Then determine the data’s relevance to your research question by considering such questions as:
- Were the data collected recently enough?
- Is the data cross-sectional (based on information from people at any one time) or longitudinal (based on information from the same people over time)? If one is more appropriate for your research question than the other, is there information that you can still logically infer from this data?
- Were the types of people from whom the data were collected the same type of people your research question addresses? The more representative the study’s sample is of the group your research question addresses, the more confident you can be in using the data to make your argument in your final product.
- Was the data analysis done at the right level for your research question? For instance, it may have been done at the individual, family, business, state, or zip code level. But if that doesn’t relate to your research question, can you still logically make inferences that will help your argument? Here’s an example: Imagine that your research question asks whether participation in high school sports in Columbus City Schools is positively associated with enrolling in college. But the data you are evaluating is analyzed at the state level. So you have data about the whole state of Ohio’s schools and not Columbus in particular. In this case, ask yourself whether there is still any inference you can make from the data.
Research articles are sometimes difficult to read until you get used to them. Here’s a helpful PDF: https://violentmetaphors.files.wordp...-article.pdfTo evaluate the credibility of the data in a research journal article you have already read, take the steps recommended in Evaluating Sources, plus consider these questions:
- Is the article in a peer reviewed journal? (Look at the journal’s instructions for authors, which are often located on the journal’s website, to see if it talks about peers reviewing the article and asking for changes [revisions] before publishing.) If it is a peer reviewed journal, consider that a plus for the article’s credibility. Being peer reviewed doesn’t mean it’s perfect; just more likely to be credible.
- Do the authors discuss causation or correlation? Be wary of claims of causation; it is very difficult to determine a causal effect. While research studies often find relationships (correlation) between various variables in the data, this does not equal causation. For instance, let’s return to our example above: If the study of Ohio high schools students’ sports participation showed a positive correlation between sports participation and college enrollment, the researcher cannot say that participation caused college enrollment. If it were designed to show cause and effect, the study would not have resulted in a correlation. Instead, it would have had to have been designed as an experiment or quasi-experiment, used different statistical analyses, and would have supported or not supported its hypotheses.
Data Visualization
Modern software can help you display your data in ways that are striking and often even beautiful. But the best criterion for judging whatever display you use is whether it helps you and your audience understand your data better than only text, maybe even noticing points that you would have otherwise missed.
Specific kinds of charts and graphs accomplish different things, which is important to keep in mind as you evaluate data and data sources. For instance:
- Line charts are usually used to show trends, comparing data over time.
- Scatter plots show the distribution of data points.
- Bar graphs usually compare categories of data.
- Pie charts show proportions of a whole.
It’s important to decide what you want a display to do before making your final choice. Studying your data first so you know what you have will help you make that decision. Also, it may also be conventional in your discipline to display your data in certain ways. Examining the sources you were assigned to read in your course or asking your professor will help you learn what’s considered conventional.
Your professors will be examining your visual display to make sure you did not misrepresent the data. For example, the proportions of slices in a pie chart all have to add up to 100%. If yours don’t, you’ve done something wrong.
It’s easy to get overwhelmed by all the choices to be made between potential displays and what each can do: Here are two sites to help you sort them out once you know your data :
If you aren’t ready yet to use some of the specialized tools for display, make it a point to learn how to use the data display capabilities in Microsoft Word and/or Excel. You can find helpful tutorials on the Web. Good search statements to find those tutorials are:
- “Microsoft Word” (charts OR graphs)
- “Microsoft Excel” (charts OR graphs).
If you are OSU staff, students, or faculty, OSU Libraries’ Research Commons can help you choose a display, recommend a tool to accomplish it, and check out your finished data visualization before you have to turn it in. Contact the data visualization specialist.
If you are interested in displaying geospatial data on a map, consider how the Research Commons also helps OSU students, staff, and faculty find geospatial data and choose tools to display them.
Citing Data
Data is not copyrightable, but the expression of data is. So as with any other information source, you should cite any data you use from a source, whether it appeared in an article or you downloaded the data from a repository on the Web.
Unfortunately, data citation standards do not exist in many disciplines, although the DataCite initiative is working on them. Current workarounds include:
- Citing a “data paper,” where available.
- Citing a journal article that describes the dataset.
- Citing a book that includes the data.
- Citing the dataset as a website, where possible.
Examples: Citing Data
Data from a research database:
- APA: Department of Agriculture (USDA) (2008). “Crops Harvested”, Crop Production [data file]. Data Planet, (09/15/2009).
- MLA: “Crops Harvested”, Department of Agriculture (USDA) [data file] (2008). Data Planet, (09/15/2009).
Data from a file found on the open Web:
- APA: Center for Health Statistics, Washington State Department of Health. (2012, November). Mortality Table D1. Age-Adjusted Rates for Leading Causes of Cancer for Residents, 2002-2011. [Microsoft Excel file]. Washington State Department of Health. Retrieved from http://www.doh.wa.gov/
- MLA: Center for Health Statistics, Washington State Department of Health. Mortality Table D1. Age-Adjusted Rates for Leading Causes of Cancer for Residents, 2002-2011. Washington State Department of Health, Nov. 2012. Microsoft Excel file. Retrieved from http://www.doh.wa.gov/
Proper Use of Data
Once you have your data, you can examine them and make an interpretation. Sometimes, you can do so easily. But not always.
What if…
…you had a lot of information? Sometimes data can be very complicated and may include thousands (or millions…or billions…or more!) of data points. Suppose you only have a date and the high temperature for Columbus – but you have this for 20 years’ worth of days. Do you want to calculate the average highs for each month based upon 20 years’ worth of data by hand or even with a calculator?
…you want to be able to prove a relationship? Perhaps your theory is that social sciences students do better in a certain class than arts and humanities or life and physical science students. You may have a huge spreadsheet of data from 20 years’ worth of this course’s sections and would need to use statistical methods to see if a relationship between major and course grade exist.
You may find yourself using special software, such as Excel, SAS, and SPSS, in such situations.
Many people may have a tendency to look for data to prove their hypothesis or idea, as opposed to really answering their research questions. However, you may find that the opposite happens: the data may actually disprove your hypothesis. You should never try to manipulate data so that it gives credence to your desired outcome. While it may not be the answer you wanted to find, it is the answer that exists. You may, of course, look for other sources of data – perhaps there are multiple sources of data for the same topic with differing results. Inconclusive or conflicting findings do happen and can be the answer (even if it’s not the one you wanted!).
Conflicting results on the same topic are common. This is the reality of research because, after all, the questions researchers are studying are complicated. When you have conflicting results you can’t just ignore the differences—you’ll have to do your best to explain why the differences occurred.