How might issues of sampling and threats to validity impact program evaluation? How might issues of sampling and threats to validity impact program evaluation?