A Study Supporting Intensity Over Volume (But Not Really)

Mangine, G. T., Hoffman, J. R., Gonzalez, A. M., Townsend, J. R., Wells, A. J., Jajtner, A. R., & … Stout, J. R. (2015). The effect of training volume and intensity on improvements in muscular strength and size in resistance-trained men. Physiological Reports, 3(8), n/a. doi:10.14814/phy2.12472

What’s thiiiiis? The article is accessible through PubMed. I’m able to access it through my University, so I don’t know if its freely available. Here’s a copy of the abstract (which is available freely of course):

“This investigation compared the effect of high-volume ( VOL) versus high-intensity ( INT) resistance training on stimulating changes in muscle size and strength in resistance-trained men. Following a 2-week preparatory phase, participants were randomly assigned to either a high-volume ( VOL; n = 14, 4 × 10-12 repetitions with ~70% of one repetition maximum [1 RM], 1-min rest intervals) or a high-intensity ( INT; n = 15, 4 × 3-5 repetitions with ~90% of 1 RM, 3-min rest intervals) training group for 8 weeks. Pre- and posttraining assessments included lean tissue mass via dual energy x-ray absorptiometry, muscle cross-sectional area and thickness of the vastus lateralis ( VL), rectus femoris ( RF), pectoralis major, and triceps brachii muscles via ultrasound images, and 1 RM strength in the back squat and bench press ( BP) exercises. Blood samples were collected at baseline, immediately post, 30 min post, and 60 min postexercise at week 3 ( WK3) and week 10 ( WK10) to assess the serum testosterone, growth hormone ( GH), insulin-like growth factor-1 ( IGF1), cortisol, and insulin concentrations. Compared to VOL, greater improvements ( P < 0.05) in lean arm mass (5.2 ± 2.9% vs. 2.2 ± 5.6%) and 1 RM BP (14.8 ± 9.7% vs. 6.9 ± 9.0%) were observed for INT. Compared to INT, area under the curve analysis revealed greater ( P < 0.05) GH and cortisol responses for VOL at WK3 and cortisol only at WK10. Compared to WK3, the GH and cortisol responses were attenuated ( P < 0.05) for VOL at WK10, while the IGF1 response was reduced ( P < 0.05) for INT. It appears that high-intensity resistance training stimulates greater improvements in some measures of strength and hypertrophy in resistance-trained men during a short-term training period.”

I was wondering whether we could discuss what this article actually shows vs what it purports to show, at least with regards to muscle hypertrophy and strength.

The article appears to show that volume is not a superior training variable when compared to intensity. One group went through a higher volume protocol and another group went through a lower volume but higher intensity protocol, and the latter showed superior gains in muscular hypertrophy and strength. I think it’s clear, however, after digging in the article a bit, that while the high volume group is indeed higher volume than the lower volume group, the lower volume group falls more in line with what we would see in a properly programmed strength program, at least merely in terms of set/reps and not exercise variation, etc. The conclusion this would actually support, then, is that there are more and less effective intensity ranges for muscular hypertrophy and strength–which is nothing new. A more informative study would have been to compare similar intensities, say at around that 3-5 rep range, but with different volumes. (They label the 3-5 rep range as “~90%”, but I think we all know well and good that with 3 min rests, with ExSci studies in general, they’re not working with 90%.)

Is this a decent partial analysis?

I scanned through this paper, and while I don’t have time to do a full analysis, it is interesting.

To summarize, they reportedly studied:

n = 15, 4 × 3-5 repetitions with ~90% of 1 RM, 3-min rest intervals

Vs.

n = 14, 4 × 10-12 repetitions with ~70% of one repetition maximum

This set/rep scheme was performed for six exercises, four days per week.

Before I leave additional thoughts here, let’s see if anyone else wants to take a stab.

1 Like

Could be off the mark here, but the 1 minute rest interval for the volume group is suspicious, especially as it would pertain to compliance with the given program. The results seem incomplete if they were unable to measure how well each group actually adhered to the given program. For example, if the volume group didn’t rest long enough and drastically reduced the weight on the bar, failed reps, missed reps, it would certainly impact the results.

From the study: “Progressive overload was achieved by increasing the load when all prescribed repetitions (for a particular exercise) were achieved on two consecutive workouts”

Being able to see a detailed view of how each participant’s training variables played out over the course of the program would help a lot in understanding the validity of the study. It seems possible that maybe the volume group was never able to progress, or possibly even regressed in training volume over time.

The results of greater strength improvements in the intensity group doesn’t seem that surprising, but the hypertrophy outcomes seem to contradict our current understanding of what drives hypertrophy, which means the details of the methods are crucial here.

I suppose implicit in your post is the assertion that this study should worry those of us who believe volume is primary driver of hypertrophy. And, perhaps, that it should worry us enough that we should feel compelled to search through the text of the study and attempt to “debunk” it by finding some “error” we can point to that explains away the results.

I don’t think this is the case. (And a quick scan of the text shows that while they do some things that appear dumb and weird, the methods are unfortunately no more dumb and weird than the studies you’d be citing to support the pro-volume position.) Rather, I suggest relaxing and realizing that individual studies, especially those with small samples, are weak indicators of the truth. Even if a study is attempting to detect a true effect, there’s a lot of noise inherent to the process that might prevent that from happening, or even produce a “statistically significant” result in the opposite direction. For example, with a small number of subjects, uncontrolled inter-individual variability might affect the results. Randomization is supposed to prevent this, but with only n=15 in each group that’s far from guaranteed.

Or, there could be some moderating variable particular to this set of methods that has gone undetected. Here’s what McShane and Bockenholt, in “You Cannot Step Into the Same River Twice: When Power Analyses Are Optimistic,” say about the same phenomenon in psychology:

In a recent article, Cumming (2014) discussed the “dance of the confidence intervals,” that is, how the point estimates and 95% confidence intervals from a set of replication studies tend to “bounce around”: [When studies] all estimate the same population mean, µ . . . the bouncing around . . . should match what we expect simply because of sampling variability. If there is notably more variability than this, we can say the set of studies is heterogeneous, and there may be one or more moderating variables that affect the effect size [µ]. (p. 22) Effect size heterogeneity—extra variability or bounce in the dance of the confidence intervals, to use the language of Cumming (2014)—has long been regarded as important for more general (i.e., systematic or conceptual) replications in psychological research. For instance, a meta-analysis of 17 general replications of the choice overload effect (see Appendix A for data) yields I^2 = 78% (i.e., more than three quarters of the variability in these 17 studies is due to heterogeneity—a large amount). Though substantial heterogeneity is unsurprising in the context of more general replications, there is mounting evidence of heterogeneity even under conditions that are nearly ideal for replication. For example, consider the Many Labs project that provides 16 estimates of 13 classic and contemporary effects in psychology from 36 independent samples totaling 6,344 subjects. Despite the fact that each of the 36 labs involved in the Many Labs project used identical materials and that these materials were administered through a web browser to minimize labspecific effects, random effects meta-analyses conducted by the Many Labs authors yield nonzero estimates of heterogeneity for all 14 of the effects they found to be nonnull (they studied 16 effects in total, but 2 were found to be null). Further, the average I^2 across these 14 studies was 40%: Lab-specific method factors account for nearly half of the total variability of the studies on average (see Table 3 of Klein et al., 2014). Given these results, it is clear that substantial heterogeneity can occur even under conditions that are nearly ideal for replication and without questionable research practices: In the Many Labs studies, it was caused exclusively by as yet unidentified (and potentially unidentifiable) method factors specific to each of the 36 labs participating in the project. Consequently, it seems reasonable to conclude that some degree of effect size heterogeneity is likely to be present in much psychological research. Effect size heterogeneity is caused by moderating variables (i.e., what we term method factors). When these moderating variables can be identified (e.g., large effect for male subjects and small effect for female subjects), heterogeneity can be explained and controlled for (e.g., by controlling for sex in the study design and analysis). However, moderators are often hard to identify—particularly when a research area is new or when a set of studies consists of close replications (e.g., the Many Labs studies). We therefore suggest that researchers explicitly account for heterogeneity in study planning—in particular in setting sample sizes to achieve adequate statistical power—rather than assuming, as is typical, that heterogeneity is zero.

Obviously exercise science is not psychology, but it is still interesting to me such great heterogeneity was found even in tightly controlled replications. I don’t think it is a stretch to say that the variability in methods in studies studying the effects of volume on hypertrophy is not nearly as controlled.

I am not trying to preach radical skepticism, but merely point out that science is messy and complicated and an individual study is only provides a weak signal about the truth. Refer to the attached image. Even if we’re interested in some intervention that has on average a weakly positive effect, due to the variability inherent in doing science, individual studies might rate the effect anywhere from negative to strongly positive.

[ATTACH=JSON]{“alt”:“Click image for larger version Name:\tonestudy.png Views:\t3 Size:\t13.9 KB ID:\t11027”,“data-align”:“none”,“data-attachmentid”:“11027”,“data-size”:“medium”}[/ATTACH]

The way to get around this problem is, of course, to aggregate the individual studies into a meta-analysis. Then – assuming you have enough data – you can eliminate enough of the noise to have some certainly about the true effect. And indeed, if we check the meta-analyses, we find that volume (at the appropriate intensity, etc.) appears to be the main driver of hypertrophy.

Here’s another example, which might be clearer: periodization. Meta-analyses, and literally any powerlifting coach, will tell you that sensible periodization is more effective than constant loading (e.g. trying to LP on 3x5 for the the rest of your life) for a non-novice trainee. And this is what the bulk of the studies on the topic say. However, if you check the meta-analyses, you can always find a handful that purport to show that constant loading superior. This noise is just the cost of doing business.

It seems like what they consider “high intensity” training would align more with a volume based approach in the strength world.

With only access to the abstract, I’m assuming this study was performed on an untrained group. In this sense, the training the INT group underwent most closely resembles the recommendation made here, which is still for untrained individuals to run a NLP. The argument that after novice gains are exhausted VOL is better than INT for driving strength and hypertrophy does not seem to be addressed by this study.

My 2 cents are that they say they are testing resistance trained men, but by only training them for 2 weeks prior to the start of an 8 week program. They could be grabbing people right off the street training them for 2 weeks, then for an additional 8 weeks for a total of 10. This a length of time where a 3x5 program like NLP still provides very good hypertrophy and strength gains as the stimuli are still novel.

Part of the scientific method is to have your results peer reviewed and replicated by an independent party, so the jury is still out. The one minute rest period adds another variable.

It seems what’s important to each of is how will our unique body respond to different methods of training. I stumbled onto high volume work at ~70% intensity to improve how my joints were feeling and as a happy side effect, I have been seeing good upper body hypertrophy for the first time. I can’t do a high intensity program for very long because my muscle strength out paces my ability to strengthen my connective tissues. My LP was a disaster even with 2.5 lb increments, I was in pain the whole time. I suspect this is due to my age since I am 60. So what shows to be best for a large population, may not be optimum for the individual.

First off, I’m glad this thread exists. I’d love to see discussions of specific entries in “the literature” be a regular part of forum activity here. Now, a few thoughts on the paper.

  1. But is that really your 1RM?

The 1RM was tested as follows: “For each exercise, a warm-up set of 5–10 repetitions was performed using 40–60% of the participant’s perceived 1RM. After a 1-min rest period, a set of 2–3 repetitions was performed at 60–80% of the participant’s perceived 1RM. Subsequently, 3–5 maximal trials (one repetition sets) were performed to determine the 1RM. For the bench press, proper technique was enforced by requiring all participants to maintain contact between their feet and the floor; their buttocks, shoulders, and head with the bench; and use a standard grip (slightly wider than shoulder width) on the bar. Upon lowering the bar to their chest, participants were required to pause briefly and wait for an “UP!” signal before initiating concentric movement. The purpose for this pause was to eliminate the influence of bouncing and distinguish eccentric from concentric muscle activation during electromyography analysis. Any trials that involved excessive arching of the back or bouncing of the weight were discarded. For the back squat, a successful attempt required the participant to descend to the “parallel” position, where the greater trochanter of the femur was aligned with the knee. At this point, an investigator located lateral to the participant, provided an “UP!” signal, indicating that proper range of motion had been achieved; no pause was required for the squat exercise. Rest periods between attempts were 2–3 min in length.”

This is apparently a standardized procedure (Hoffman 2006), but I’m skeptical of its accuracy in measuring a true 1RM. Confounding factors include: the “UP!” signal on the squat, short rest intervals between attempts, and lack of clarity about prior training history of the participants (noted earlier in the article as a major limitation of the study).

That said, I’m not sure how useful a comparison would be between the percentages assigned in the study and the percentages that a serious strength/power athlete might use, either relative to an E1RM or a PR single.

  1. I need more rest!

Different rest periods between the VOL (1 min) and INT (3 min) groups is an unnecessary confounding variable.

  1. How much more volume is it, really?

Clearly the VOL group is doing more total repetitions. But, both groups are doing four sets of near-maximal work, as evidenced by the fact that they progress when they are able to complete the prescribed sets/reps at a given weight. As this pertains to strength training, it may be that both groups are doing “a sufficient amount of volume at a useful intensity” to drive hypertrophy, while the INT group is also working in a more useful intensity range for strength adaptations.

1 Like