COMET
  • Get Started
    • Quickstart Guide
    • Install and Use COMET
    • Get Started
  • Learn By Skill Level
    • Getting Started
    • Beginner
    • Intermediate - Econometrics
    • Intermediate - Geospatial
    • Advanced

    • Browse All
  • Learn By Class
    • Making Sense of Economic Data (ECON 226/227)
    • Econometrics I (ECON 325)
    • Econometrics II (ECON 326)
    • Statistics in Geography (GEOG 374)
  • Learn to Research
    • Learn How to Do a Project
  • Teach With COMET
    • Learn how to teach with Jupyter and COMET
    • Using COMET in the Classroom
    • See COMET presentations
  • Contribute
    • Install for Development
    • Write Self Tests
  • Launch COMET
    • Launch on JupyterOpen (with Data)
    • Launch on JupyterOpen (lite)
    • Launch on Syzygy
    • Launch on Colab
    • Launch Locally

    • Project Datasets
    • Github Repository
  • |
  • About
    • COMET Team
    • Copyright Information

On this page

  • Prerequisites
  • Learning Outcomes
  • 9.0 Intro
  • 9.1 Types of Graphs
    • 9.1.1 Scatter Plot using twoway
    • 9.1.2 Line Plot using twoway
    • 9.1.3 Histogram using twoway
    • 9.1.4 Bar Plot using graph
  • 9.2 Exporting Format
  • 9.3 Fine-tuning a Graph Further
  • 9.4 Wrap Up
  • 9.5 Wrap-up Table
  • 9.6 Further Reading
  • References
  • Report an issue

Other Formats

  • Jupyter

09 - Creating Meaningful Visuals

econ 490
pystata
visualization
scatter plot
histogram
bar chart
twoway
This notebook goes over how to make all sorts of visuals. We look at different types of graphs, like scatter plots and histograms, exporting figures, and how to edit the figure for clarity.
Author

Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt

Published

29 May 2024

Prerequisites

  1. Be able to effectively use Stata do-files and generate log-files.
  2. Be able to change your directory so that Stata can find your files.
  3. Import datasets in .csv and .dta format.
  4. Save data files.

Learning Outcomes

  1. Know when to use the following kinds of visualizations to answer specific questions using a data set:
    • scatterplots
    • line plots
    • bar plots
    • histograms
  2. Generate and fine-tune visualizations using the Stata command twoway and its different options.
  3. Use graph export to save visualizations in various formats including .svg, .png and .pdf.

9.0 Intro

Note: The best approach to completing this module is to copy and paste these commands into a do-file in Stata. Because Stata produces graphs in a separate window, Jupyter Notebooks will not produce a graph that we can see when we execute the commands on this page. The most we can do is export image files to a directory on our computer. We will see these commands whenever a graph is produced below.

We’ll continue working with the fake data set we have been using as we work on developing our research skills. Recall that this data set is simulating information for workers in the years 1982-2012 in a fake country where a training program was introduced in 2003 to boost their earnings.

import stata_setup
stata_setup.config('D:/Stata', 'se')

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 19.5
___/   /   /___/   /   /___/       SE—Standard Edition

 Statistics and Data Science       Copyright 1985-2025 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: Unlimited-user network, expiring 19 Aug 2026
Serial number: 401909301439
  Licensed to: Alex Ronczewski
               UBC

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.
>>> import sys
>>> sys.path.append('/Applications/Stata/utilities') # make sure this is the same as what you set up in Module 01, Section 1.3: Setting Up the STATA Path
>>> from pystata import config
>>> config.init('se')
%%stata
clear*
*cd "" 
use fake_data, clear 

. clear*

. *cd "" 
. use fake_data, clear 

. 

Data visualization is an effective way of communicating ideas to our audience, whether it’s for an academic paper or a business setting. It can be a powerful medium to motivate our research, illustrate relationships between variables, and provide some intuition behind why we applied certain econometric methods.

The real challenge is not understanding how to use Stata to create graphs. Instead, the challenge is figuring out which graph will do the best job at telling our empirical story. Before creating any graphs, we must identify the message we want the graph to convey. Try to answer these questions: Who is our audience? What is the question you’re trying to answer?

9.1 Types of Graphs

9.1.1 Scatter Plot using twoway

What is it? and, when to use?

Scatter plots are frequently used to demonstrate how two quantitative variables are related to one another. This plot works well when we are interested in showing relationships and groupings among variables from relatively large data sets.

Below is a nice example.

Scatter plot presenting the relationship of country religiosity vs wealth

Let’s say we want to plot the log-earnings by year using our fake data set. We begin by generating a new variable for log-earnings.

%%stata

generate log_earnings = log(earnings)

label var log_earnings "Log-earnings" // We are adding the label "log-earnings" to the variable log_earnings

. 
. generate log_earnings = log(earnings)

. 
. label var log_earnings "Log-earnings" // We are adding the label "log-earning
> s" to the variable log_earnings

. 

Now let’s create a new data set that includes a variable that is the log-earnings by year. We use the command preserve to save the data set that we are working on. We then include the command restore to bring back the original data set.

%%stata

preserve
collapse (mean) log_earnings, by(year)
describe

. 
. preserve

. collapse (mean) log_earnings, by(year)

. describe

Contains data
 Observations:            17                  
    Variables:             2                  
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
year            int     %8.0g                 Calendar Year
log_earnings    float   %9.0g                 (mean) log_earnings
-------------------------------------------------------------------------------
Sorted by: year
     Note: Dataset has changed since last saved.

. 

To create a graph between two numeric variables, we need to use the command twoway. The format for this command is twoway (type_of_graph x-axis_variable y-axis_variable).

In this case we want to create a graph that is a scatterplot that shows log-earnings as the dependent variable (y-axis) and year as the explanatory variable (x-axis variable).

%%stata

twoway (scatter log_earnings year)

graph export graph1.jpg, as(jpg) replace

. 
. twoway (scatter log_earnings year)

. 
. graph export graph1.jpg, as(jpg) replace
(file graph1.jpg not found)
file graph1.jpg written in JPEG format

. 

Stata Graph - Graph 10.55 10.6 10.65 10.7 10.75 10.8 (mean) log_earnings 1995 2000 2005 2010 Calendar Year

Note that no graph will appear in the notebook when we executed this command. However, we can find the graph directly saved under the name “graph1.jpg”. That graph will look like this:

Stata Graph - Graph 10.4 10.45 10.5 10.55 10.6 (mean) log_earnings 1995 2000 2005 2010 Calendar Year

myscatterplot

A second way that we can create this graph is by replacing the graph type scatter with the graph type connected. This will create the graph below.

%%stata

twoway (connected log_earnings year)

graph export graph1.jpg, as(jpg) replace

. 
. twoway (connected log_earnings year)

. 
. graph export graph1.jpg, as(jpg) replace
file graph1.jpg written in JPEG format

. 

Stata Graph - Graph 10.55 10.6 10.65 10.7 10.75 10.8 (mean) log_earnings 1995 2000 2005 2010 Calendar Year

Stata Graph - Graph 10.4 10.45 10.5 10.55 10.6 Log-earnings 1995 2000 2005 2010 Year

connected-scatter-plot

9.1.2 Line Plot using twoway

What is it? and, when to use?

Line plots visualize trends with respect to an independent, ordered quantity (e.g., time). This plot works well when one of our variables is ordinal (time-like) or when we want to display multiple series on a common timeline.

Line plots can be generated using Stata’s twoway command we saw earlier. This time, instead of writing scatter for the type of graph, we write line.

Below we introduce something new. We have added options to the graph that change the title on the x-axis (xtitle) and on the y-axis (y-title). Options for the graph as a whole appear at the end of the command. As we will see, options that affect an individual plot appear in the brackets where the plot is specified.

%%stata

twoway (line log_earnings year), xtitle("Year") ytitle("Log-earnings")

graph export graph3.jpg, as(jpg) replace

. 
. twoway (line log_earnings year), xtitle("Year") ytitle("Log-earnings")

. 
. graph export graph3.jpg, as(jpg) replace
(file graph3.jpg not found)
file graph3.jpg written in JPEG format

. 

Stata Graph - Graph 10.55 10.6 10.65 10.7 10.75 10.8 Log-earnings 1995 2000 2005 2010 Year

It should look something like this:

Stata Graph - Graph 10.4 10.45 10.5 10.55 10.6 Log-earnings 1995 2000 2005 2010 Year

mylineplot

Now, let’s try creating a line plot with multiple series on a common twoway graph. To create this graph we first need to restore our data to the original version of the “fake_data” data set.

%%stata

restore

. 
. restore

. 

Now that we have done that, we can collapse it to create the mean of log_earnings by both year and treated

%%stata

preserve

collapse (mean) log_earnings, by(treated year)

describe

. 
. preserve

. 
. collapse (mean) log_earnings, by(treated year)

. 
. describe

Contains data
 Observations:            34                  
    Variables:             3                  
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
year            int     %8.0g                 Calendar Year
treated         byte    %8.0g                 Treatment Dummy
log_earnings    float   %9.0g                 (mean) log_earnings
-------------------------------------------------------------------------------
Sorted by: treated  year
     Note: Dataset has changed since last saved.

. 

We can create a graph that separates the earnings between the treated and non-treated over time. We need to add each line separately to the graph. Within brackets, we can choose the observations we want included. We can also add line specific options, like color.

%%stata

twoway (connected log_earnings year if treated==1, color(orange)) (connected log_earnings year if treated==0, color(purple)), xtitle(Year) ytitle(Average Log Earnings)

graph export graph4.jpg, as(jpg) replace

. 
. twoway (connected log_earnings year if treated==1, color(orange)) (connected 
> log_earnings year if treated==0, color(purple)), xtitle(Year) ytitle(Average 
> Log Earnings)

. 
. graph export graph4.jpg, as(jpg) replace
(file graph4.jpg not found)
file graph4.jpg written in JPEG format

. 

Stata Graph - Graph 10 10.2 10.4 10.6 10.8 11 Average Log Earnings 1995 2000 2005 2010 Year (mean) log_earnings (mean) log_earnings

One final tip about working with scatterplots: sometimes we will want to draw a fit line on our graph that approximates the relationship between the two variables. We can do this by adding a second graph to the twoway plot that uses the graph type lfit.

9.1.3 Histogram using twoway

What is it? and, when to use?

Histograms visualize the distribution of one quantitative variable. This plot works well when we are working with a discrete variable and are interested in visualizing all its possible values and how often they each occur.

Now let’s restore the original data set so that we can plot the distribution of log_earnings and draw a simple histogram.

%%stata

restore

histogram log_earnings

graph export graph5.jpg, as(jpg) replace

. 
. restore

. 
. histogram log_earnings
(bin=51, start=3.58887, width=.28193801)

. 
. graph export graph5.jpg, as(jpg) replace
(file graph5.jpg not found)
file graph5.jpg written in JPEG format

. 

Stata Graph - Graph 0 .1 .2 .3 .4 Density 0 5 10 15 20 Log-earnings

It will look like this:

Stata Graph - Graph 0 .1 .2 .3 .4 Density 0 5 10 15 20 Log-earnings

myhistogram

We can also draw two histograms on one plot. They won’t look very nice unless we change the plot colours though. But, if we execute the command below, it should create a nice graph that allows us to compare the distributions of log_earnings between the treatment and control groups.

%%stata

twoway (histogram log_earnings if treated==0, color(orange) lcolor(black))     ///
    (histogram log_earnings if treated==1, color(olive) lcolor(black)),        ///
    legend(label(1 "Treated") label(2 "Untreated"))

graph export graph6.jpg, as(jpg) replace

. 
. twoway (histogram log_earnings if treated==0, color(orange) lcolor(black))   
>   ///
>     (histogram log_earnings if treated==1, color(olive) lcolor(black)),      
>   ///
>     legend(label(1 "Treated") label(2 "Untreated"))

. 
. graph export graph6.jpg, as(jpg) replace
(file graph6.jpg not found)
file graph6.jpg written in JPEG format

. 

Stata Graph - Graph 0 .1 .2 .3 .4 Density 5 10 15 20 Log-earnings Treated Untreated

9.1.4 Bar Plot using graph

What is it? and, when to use?

Bar plots visualize comparisons of amounts. They are useful when we are interested in comparing a few categories as parts of a whole, or across time. Bar plots should always start at 0. Starting bar plots at any number besides 0 is generally considered a misrepresentation of the data.

Let’s plot mean earnings by region. Note that the regions are numbered in our data set.

To make a bar plot, we have to use the command graph instead of twoway. The syntax is similar:graph bar (statistic) x-var, over(grouping_var).

See an example below:

%%stata

graph bar (mean) earnings, over(region)
graph export graph7.jpg, as(jpg) replace

. 
. graph bar (mean) earnings, over(region)

. graph export graph7.jpg, as(jpg) replace
(file graph7.jpg not found)
file graph7.jpg written in JPEG format

. 

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 mean of earnings 1 2 3 4 5

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 mean of earnings 1 2 3 4 5

mybarchart

We can also create a horizontal bar plot by using the option hbar instead of bar.

%%stata

graph hbar (mean) earnings, over(region)

graph export graph8.jpg, as(jpg) replace

. 
. graph hbar (mean) earnings, over(region)

. 
. graph export graph8.jpg, as(jpg) replace
(file graph8.jpg not found)
file graph8.jpg written in JPEG format

. 

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 mean of earnings 5 4 3 2 1

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 mean of earnings 5 4 3 2 1

mybarchart2

We can also group our bars over another variable (or “category”).

%%stata

graph hbar (mean) earnings,  over(treated) over(region)

graph export graph9.jpg, as(jpg) replace

. 
. graph hbar (mean) earnings,  over(treated) over(region)

. 
. graph export graph9.jpg, as(jpg) replace
(file graph9.jpg not found)
file graph9.jpg written in JPEG format

. 

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 100000 mean of earnings 5 4 3 2 1 1 0 1 0 1 0 1 0 1 0

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 100000 mean of earnings 5 4 3 2 1 1 0 1 0 1 0 1 0 1 0

mybarchart3

9.2 Exporting Format

So far, we have been exporting our graphs in .svg format. However, we can also export graphs in other formats such as .jpg, .png, and .pdf. This may be particularly helpful if using LaTeX to write a paper, as .svg files cannot be used with LaTeX PDF output.

9.3 Fine-tuning a Graph Further

In order to customize our graph further, we can use the tools in the Stata graph window or the graph option commands we have been using in this module. Namely, we can include and adjust the following:

  • title
  • axis titles
  • legend
  • axis
  • scale
  • labels
  • theme (i.e. colour, appearance)
  • adding lines, text or objects

Let’s see how to add some of these customizations to our graphs in practice. For example, let’s modify our latest bar graph such that:

  • the title is “Earnings by region and treatment”: we do this with the option title();
  • the axis title is “Earnings (average)”: we do this with the option ytitle();
  • the regions and the treatment status are labeled: we do this with the sub-option relabel within the over option, over(varname, relabel()). Remember that relabelling follows the order in which the values appear: e.g., for treated and untreated, the not treated group appears first and the treated group appears second, therefore we have to use 1 to indicate the non-treated group and 2 to indicate the treated group: over(treated, relabel(1 "Not treated" 2 "Treated"));
  • the background color is white: we do this with the option graphregion(color());
  • the color of the bars is dark green: we do this using the option bar and its suboptions. Remember that we need to specify this option for each variable we are plotting in the bars. In our case, we are only plotting variable earnings, which is by definition the first variable we are plotting, therefore all sub-options refer to 1: bar(1, fcolor(dkgreen)).
%%stata

graph hbar (mean) earnings, ///
    over(treated, relabel(1 "Not treated" 2 "Treated"))  ///
    over(region, relabel(1 "A" 2 "B" 3 "C" 4 "D" 5 "E")) ///
    title("Earnings by region and treatment") ytitle("Earnings (average)") ///
    graphregion(color(white)) bar(1, fcolor(dkgreen))

graph export graph10.jpg, as(jpg) replace

. 
. graph hbar (mean) earnings, ///
>     over(treated, relabel(1 "Not treated" 2 "Treated"))  ///
>     over(region, relabel(1 "A" 2 "B" 3 "C" 4 "D" 5 "E")) ///
>     title("Earnings by region and treatment") ytitle("Earnings (average)") //
> /
>     graphregion(color(white)) bar(1, fcolor(dkgreen))

. 
. graph export graph10.jpg, as(jpg) replace
(file graph10.jpg not found)
file graph10.jpg written in JPEG format

. 

Stata Graph - Graph 0 20,000 40,000 60,000 80,000 100000 Earnings (average) E D C B A Treated Not treated Treated Not treated Treated Not treated Treated Not treated Treated Not treated Earnings by region and treatment

These are just some of the customizations available to you. Other common options are:

  • adding a labelled legend to our graphs. To include the legend, we use the option legend( label(number_of_label "label"));
  • adding a vertical line, for example one indicating the year in which the treatment was administered (2003). To include the indicator line we use the the option xline(). The line can also have different characteristics. For example, we can change its color and pattern using the options lcolor() and lpattern().

We can always go back to the Stata documentation to explore the options available based on what we need to do. We can also adjust many of these aspects in the Graph Editor that appears wheneve we create a new graph (top right corner). Just don’t forget to save your graph when you are done since this won’t be in your do-file!

When thinking about colors, always make sure that your graphs are accessible to everyone. Run the code cell below to view the colorstyle options available in Stata. If the color you desire is not available, you can input its RGB code within quotes: for example, a red line would be lcolor("248 7 27"). You can learn more about accessible color combinations on this website.

%%stata

help colorstyle

. 
. help colorstyle

[G-4] colorstyle -- Choices for color
                    (View complete PDF manual entry)


Syntax
------

    Set color of <object> to colorstyle

        <object>color(colorstyle)


    Set color of all affected objects to colorstyle

        color(colorstyle)


    Set opacity of <object> to #, where # is a percentage of 100% opacity

        <object>color(%#)


    Set opacity for all affected objects colors to #

        color(%#)


    Set both color and opacity of <object>

        <object>color(colorstyle%#)


    Set both color and opacity of all affected objects

        <object>color(colorstyle%#)


    colorstyle            Description
    -------------------------------------------------------------------------
    black                 

    stc1                  color used by scheme stcolor
    stc2                  color used by scheme stcolor
    .                     
    .                     
    stc15                 color used by scheme stcolor
    stblue                blue used by scheme stcolor
    stgreen               green used by scheme stcolor
    stred                 red used by scheme stcolor
    styellow              yellow used by scheme stcolor

    gs0                   gray scale: 0 = black
    gs1                   gray scale: very dark gray
    gs2                   
    .                     
    .                     
    gs15                  gray scale: very light gray
    gs16                  gray scale: 16 = white

    white                 

    blue                  
    bluishgray            
    brown                 
    cranberry             
    cyan                  
    dimgray               between gs14 and gs15
    dkgreen               dark green
    dknavy                dark navy blue
    dkorange              dark orange
    eggshell              
    emerald               
    forest_green          
    gold                  
    gray                  equivalent to gs8
    green                 
    khaki                 
    lavender              
    lime                  
    ltblue                light blue
    ltbluishgray          light blue-gray, used by scheme s2color
    ltkhaki               light khaki
    magenta               
    maroon                
    midblue               
    midgreen              
    mint                  
    navy                  
    olive                 
    olive_teal            
    orange                
    orange_red            
    pink                  
    purple                
    red                   
    sand                  
    sandb                 bright sand
    sienna                
    stone                 
    teal                  
    yellow                

                          colors used by The Economist magazine:
    ebg                           background color
    ebblue                        bright blue
    edkblue                       dark blue
    eltblue                       light blue
    eltgreen                      light green
    emidblue                      midblue
    erose                         rose

    none                  no color; invisible; draws nothing
    background or bg      same color as background
    foreground or fg      same color as foreground

    "# # #"               RGB value; white = "255 255 255"

    "# # # #"             CMYK value; yellow = "0 0 255 0"

    "hsv # # #"           HSV value; white = "hsv 0 0 1"

    "#######"             hexadecimal value; red = "#FF0000"

    colorstyle*#          color with adjusted intensity; #'s range from 0 to
                            255

    colorstyle%#          color with adjusted opacity; #s range from 0 to 100

    *#                    default color with adjusted intensity
    %#                    default color with adjusted opacity
    -------------------------------------------------------------------------
    When you specify RGB, CMYK, HSV, or hexadecimal values, it is best to
      enclose the values in quotes; type "128 128 128" not 128 128 128.


Description
-----------

    colorstyle sets the color and opacity of graph components such as lines,
    backgrounds, and bars.  Some options allow a sequence of colorstyles with
    colorstylelist; see [G-4] stylelists.


Links to PDF documentation
--------------------------

        Remarks and examples

    The above sections are not included in this help file.


Remarks
-------

    colorstyle sets the color and opacity of graph components such as lines,
    backgrounds, and bars.  Colors can be specified with a named color, such
    as black, olive, and yellow, or with a color value in the RGB, CMYK, or
    HSV format.  colorstyle can also set a component to match the background
    color or foreground color.  Additionally, colorstyle can modify color
    intensity, making the color lighter or darker.  Some options allow a
    sequence of colorstyles with colorstylelist; see [G-4] stylelists.

    To see a list of named colors, use graph query colorstyle.  See [G-2]
    graph query.  For a color palette showing an individual color or
    comparing two colors, use palette color.  See [G-2] palette.

    Remarks are presented under the following headings:

        Adjust opacity
        Adjust intensity
        Specify RGB values
        Specify CMYK values
        Specify HSV values
        Specify hexadecimal values
        Export custom colors


Adjust opacity
--------------

    Opacity is the percentage of a color that covers the background color.
    That is, 100% means that the color fully hides the background, and 0%
    means that the color has no coverage and is fully transparent.  If you
    prefer to think about transparency, opacity is the inverse of
    transparency.  Adjust opacity with the % modifier.  For example, type

        green%50
        "0 255 0%50"
        %30

    Omitting the color specification in the command adjusts the opacity of
    the object while retaining the default color.  For instance, specify
    mcolor(%30) with graph twoway scatter to use the default fill color at
    30% opacity.

    Specifying color%0 makes the object completely transparent and is
    equivalent to color none.


Adjust intensity
----------------

    Color intensity (brightness) can be modified by specifying a color, *,
    and a multiplier value.  For example, type

        green*.8
        purple*1.5
        "0 255 255*1.2"
        "hsv 240 1 1*.5"

    A value of 1 leaves the color unchanged, a value greater than 1 makes the
    color darker, and a value less than 1 makes the color lighter.  Note that
    there is no space between color and *, even when color is a numerical
    value for RGB or CMYK.

    Omitting the color specification in the command adjusts the intensity of
    the object's default color.  For instance, specify bcolor(*.7) with graph
    twoway bar to use the default fill color at reduced brightness, or
    specify bcolor(*2) to increase the brightness of the default color.

    Specifying color*0 makes the color as light as possible, but it is not
    equivalent to color none.  color*255 makes the color as dark as possible,
    although values much smaller than 255 usually achieve the same result.

    For an example using the intensity adjustment, see Typical use in [G-2]
    graph twoway kdensity.


Specify RGB values
------------------

    In addition to specifying named colors such as yellow, you can specify
    colors with RGB values.  An RGB value is a triplet of numbers ranging
    from 0 to 255 that describes the level of red, green, and blue light that
    must be emitted to produce a given color.  RGB is used to define colors
    for on-screen display and in nonprofessional printing.  Examples of RGB
    values are

        red     =   255    0    0
        green   =     0  255    0
        blue    =     0    0  255
        white   =   255  255  255
        black   =     0    0    0
        gray    =   128  128  128
        navy    =    26   71  111


Specify CMYK values
-------------------

    You can specify colors using CMYK values.  You will probably only use
    CMYK values when they are provided by a journal or publisher.  You can
    specify CMYK values either as integers from 0 to 255 or as proportions of
    ink using real numbers from 0.0 to 1.0.  If all four values are 1 or
    less, the numbers are taken to be proportions of ink.  For example,

        red     =     0  255  255    0   or, equivalently,     0     1  1     0
        green   =   255    0  255    0   or, equivalently,     1     0  1     0
        blue    =   255  255    0    0   or, equivalently,     1     1  0     0
        white   =     0    0    0    0   or, equivalently,     0     0  0     0
        black   =     0    0    0  255   or, equivalently,     0     0  0     1
        gray    =     0    0    0  128   or, equivalently,     0     0  0    .5
        navy    =    85   40    0  144   or, equivalently,  .334  .157  0  .565


Specify HSV values
------------------

    You can specify colors with HSV (hue, saturation, and value), also called
    HSL (hue, saturation, and luminance) and HSB (hue, saturation, and
    brightness).  HSV is often used in image editing software.  An HSV value
    is a triplet of numbers.  So that Stata can differentiate them from RGB
    values, HSV colors must be prefaced with hsv.  The first number specifies
    the hue from 0 to 360, the second number specifies the saturation from 0
    to 1, and the third number specifies the value (luminance or brightness)
    from 0 to 1.  For example,

        red     =   hsv   0     1     1
        green   =   hsv 120     1  .502
        blue    =   hsv 240     1     1
        white   =   hsv   0     0     1
        black   =   hsv   0     0     0
        navy    =   hsv 209  .766  .435


Specify hexadecimal values
--------------------------

    You can specify colors with hexadecimal values. A hexidecimal value is a
    triplet of symbols ranging from 00 to FF that describes the level of red,
    green, and blue in the color. The symbols can include digits and letters
    A, B, C, D, E, and F in either uppercase or lowercase. For example,

       red     =   #FF0000
       green   =   #00FF00
       blue    =   #0000FF
       white   =   #FFFFFF
       black   =   #000000


Export custom colors
--------------------

    graph export stores all colors as RGB+opacity values, that is, RGB values
    0-255 and opacity values 0-1.  If you need color values from Stata in
    CMYK format, use the graph export command with the cmyk(on) option, and
    save the graph in one of the following formats: PostScript, Encapsulated
    PostScript, or PDF.

    You can set Stata to permanently use CMYK colors for PostScript export
    files by typing translator set Graph2ps cmyk on and for EPS export files
    by typing translator set Graph2eps cmyk on.

    The CMYK values returned in graph export may differ from the CMYK values
    that you entered.  This is because Stata normalizes CMYK values by
    reducing all CMY values until one value is 0.  The difference is added to
    the K (black) value.  For example, Stata normalizes the CMYK value 10 10
    5 0 to 5 5 0 5.  Stata subtracts 5 from the CMY values so that Y is 0 and
    then adds 5 to K.


Video example
-------------

        Transparency in Stata graphs

. 

9.4 Wrap Up

We have learned in this module how to create different types of graphs using the command twoway and how to adjust them with the multiple options which come with this command. However, the most valuable take-away from this module is understanding when to use a specific type of graph. Graphs are only able to tell a story if we choose them appropriately and customize them as necessary.

Remember to check the Stata documentation when creating graphs. The documentation can be your best ally if you end up using it.

9.5 Wrap-up Table

Command Function
twoway scatter It creates a scatterplot.
twoway connected It creates a scatterplot where points are connected by a line.
twoway line It creates a line graph.
twoway histogram It creates a histogram.
graph bar, over(varname) It creates a bar graph by category of varname.

9.6 Further Reading

  • Make your data speak for itself! Less is more (and people don’t read)

References

Timbers, T., Campbell, T., Lee, M. (2022). Data Science: A First Introduction
Schrimpf, Paul. “Data Visualization: Rules and Guidelines.” In QuantEcon DataScience. Edited by Chase Coleman, Spencer Lyon, and Jesse Perla.
Kopf, Dan. “A brief history of the scatter plot.” Quartz. March 31, 2018.
Histograms in Stata
Box plots in Stata
Pie charts in Stata
Bar graphs in Stata
Basic scatter plots in Stata
Modifying sizes of elements in graphs
Modifying graphs using the Graph Editor

  • Creative Commons License. See details.
 
  • Report an issue
  • The COMET Project and the UBC Vancouver School of Economics are located on the traditional, ancestral and unceded territory of the xʷməθkʷəy̓əm (Musqueam) and Sḵwx̱wú7mesh (Squamish) peoples.