关于 chunk 的大小,包括ctf-wiki以及我能找到的所有视频对chunk大小的描写差不多,无非就是根据 x86 或 x64 有所不同,其大小必须是 2 * SIZE_SZ 的整数倍。如果申请的内存大小不是 2 * SIZE_SZ 的整数倍,会被转换满足大小的最小的 2 * SIZE_SZ 的倍数。32 位系统中,SIZE_SZ 是 4;64 位系统中,SIZE_SZ 是 8。 malloc/malloc.c 中解释如下。
/* malloc/malloc.c
* Vital statistics:
Supported pointer representation: 4 or 8 bytes
Supported size_t representation: 4 or 8 bytes
Note that size_t is allowed to be 4 bytes even if pointers are 8.
You can adjust this by defining INTERNAL_SIZE_T
Alignment: 2 * sizeof(size_t) (default)
(i.e., 8 byte alignment with 4byte size_t). This suffices for
nearly all current machines and C compilers. However, you can
define MALLOC_ALIGNMENT to be wider than this if necessary.
Minimum overhead per allocated chunk: 4 or 8 bytes
Each malloced chunk has a hidden word of overhead holding size
and status information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4 overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
# ptrs but 4 byte size) or 24 (for 8/8) additional bytes are
needed; 4 (8) for a trailing size field and 8 (16) bytes for
free list pointers. Thus, the minimum allocatable size is
16/24/32 bytes.
Even a request for zero bytes (i.e., malloc(0)) returns a
pointer to something of the minimum allocatable size.
The maximum overhead wastage (i.e., number of extra bytes
allocated than were requested in malloc) is less than or equal
to the minimum size, except for requests >= mmap_threshold that
are serviced via mmap(), where the worst case wastage is 2 *
sizeof(size_t) bytes plus the remainder from a system page (the
minimal mmap unit); typically 4096 or 8192 bytes.*/
但最近在学习中发现了一些小问题。
一、问题提出
// malloc_test_1.c
#include<stdlib.h>
int main()
{
for (int i=0 ; i <= 0x30 ; i++)
{
malloc( i * 8 );
}
}//gcc -m32 malloc_test_1.c -o malloc_test_1_x86
对于上述代码编译后,当执行到 i = 2 时,malloc(0x10) 的chunk为 0x20 , 并非 0x18,显然这与理论不符。如图
二、调试分析
1.手动计算理论值
根据malloc.c、malloc-internal.h、malloc-internal.h文件中的内容计算chunk的理论值在 malloc(0x10) 时 chunk 应为 0x18。
/* malloc/malloc-internal.h
INTERNAL_SIZE_T is the word-size used for internal bookkeeping of
chunk sizes.
The default version is the same as size_t.
While not strictly necessary, it is best to define this as an
unsigned type, even if size_t is a signed type. This may avoid some
artificial size limitations on some systems.
On a 64-bit machine, you may be able to reduce malloc overhead by
defining INTERNAL_SIZE_T to be a 32 bit `unsigned int' at the
expense of not being able to handle more than 2^32 of malloced
space. If this limitation is acceptable, you are encouraged to set
this unless you are on a platform requiring 16byte alignments. In
this case the alignment requirements turn out to negate any
potential advantages of decreasing size_t word size.
Implementors: Beware of the possible combinations of:
- INTERNAL_SIZE_T might be signed or unsigned, might be 32 or 64 bits,
and might be the same width as int or as long
- size_t might have different width and signedness as INTERNAL_SIZE_T
- int and long might be 32 or 64 bits, and might be the same width
To deal with this, most comparisons and difference computations
among INTERNAL_SIZE_Ts should cast them to unsigned long, being
aware of the fact that casting an unsigned int to a wider long does
not sign-extend. (This also makes checking for negative numbers
awkward.) Some of these casts result in harmless compiler warnings
on some systems. */
#ifndef INTERNAL_SIZE_T
# define INTERNAL_SIZE_T size_t
#endif
/* The corresponding word size. */
#define SIZE_SZ (sizeof (INTERNAL_SIZE_T))
/* The corresponding bit mask value. */
#define MALLOC_ALIGN_MASK (MALLOC_ALIGNMENT - 1)
// sysdeps/generic/malloc-alignment.h
#define MALLOC_ALIGNMENT (2 * SIZE_SZ < __alignof__ (long double) ? __alignof__ (long double) : 2 * SIZE_SZ)
// malloc/malloc-internal.h
#define MALLOC_ALIGN_MASK (MALLOC_ALIGNMENT - 1)
// malloc/malloc.c
#define request2size(req) (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE) ? MINSIZE : ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
// 对于以上内容手动计算,当malloc(0x10)时,malloc的chunk大小应为0x18;
2.调试结果
动态调试
本着遇事不决动态调试的原则跟进malloc中进行调试,结果如下图
上图中红色部分为计算 request2size 的步骤,对比 glibc 中的代码说明如下
#define request2size(req) (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE) ? MINSIZE : ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
/*
对比动态调试代码,显然
( and esi ,0xfffffff0 ) 对应 ( & ~MALLOC_ALIGN_MASK );
(lea esi , [eas +0x13]) 对应 ((req) + SIZE_SZ + MALLOC_ALIGN_MASK )
所以 MALLOC_ALIGN_MASK = 0xf , SIZE_SZ = 4
根据 malloc/malloc-internal.h 中 #define MALLOC_ALIGN_MASK (MALLOC_ALIGNMENT - 1) 得出 MALLOC_ALIGNMENT = 0x10
这与 sysdeps/generic/malloc-alignment.h 中
#define MALLOC_ALIGNMENT (2 * SIZE_SZ < __alignof__ (long double) ? __alignof__ (long double) : 2 * SIZE_SZ)
显然计算不一样,那么问题在哪里?????
*/
静态分析
通过 ldd 检查到对应的 libc.so.6 ,并导入 ida 中分析,找到动态中对应的部分如下。(通过 strings 测试 此版本为2.30)
通过上面可以看出,在计算 request2size 过程中,malloc(0x10) 会计算成 0x20 的大小,并不是理论上的 0x18。
多个glibc测试
为了避免是本机glibc版本存在问题,我随后通过 glibc-all-in-one 将 2.23 2.24 2.26 2.27 2.31 进行逐个分析,发现在2.26之前的版本与理论一样,2.26 及其之后的版本 malloc(0x10) 时均会成为 0x20 的大小,各版本如下图
1.glibc2.23,通过下面可以看出 request2size(req) = (req + 4 + 7) & (~7) ,与理论相同。
2.glibc2.24同 2.23
3.从glibc2.26 开始,request2size(req) = (req + 4 + 15) & (~15)
4.glibc2.27 同 2.26
5.glibc2.31 同 2.26
事后我又查看了 glibc2.25 ,现在基本可以确认在 glibc2.26 之后,32位软件 chunk 的对齐 (MALLOC_ALIGNMENT) 为 0x10,并非 0x8
三、刨根问底
1.手动编译
对于这个与理论计算不同的现象,我决定找到问题的根源在哪里,首先就是手动编译glibc
# 从官网下载glibc并编译,以2.31为例
wget http://ftp.gnu.org/gnu/libc/glibc-2.31.tar.gz
tar -zxvf glibc-2.31.tar.gz
cd glibc-2.31
mkdir build
cd build
# 手动编译成32位glibc
../configure --prefix=/root/glibc_2.31 CC="gcc -m32" CXX="g++ -m32" CFLAGS="-O2 -march=i686" CXXFLAGS="-O2 -march=i686" i686-linux-gnu
通过手动编译glibc 2.23 2.25 2.26 2.27 2.31 之后更加证实了上面的结果,在 glibc2.26 之后,32位软件 chunk 的对齐 (MALLOC_ALIGNMENT) 为 0x10,并非 0x8
2.查看版本更新信息
通过重点查看 glibc2.26 版本更新信息发现有如下内容
2017-06-30 H.J. Lu <hongjiu.lu@intel.com>
[BZ #21120]
* malloc/malloc-internal.h (MALLOC_ALIGNMENT): Moved to ...
* sysdeps/generic/malloc-alignment.h: Here. New file.
* sysdeps/i386/malloc-alignment.h: Likewise.
* sysdeps/generic/malloc-machine.h: Include <malloc-alignment.h>.
也就是说 malloc-alignment.h 增加了两份 ,一份是在 sysdeps/generic/ 中, 一份是在 sysdeps/i386/ 中 ,而我们经常查看(编辑器自动跳转)的是在 sysdeps/generic/ 中,因为 malloc-alignment.h 是通过 malloc/malloc.c -> malloc/malloc-internal.h -> sysdeps/generic/malloc-machine.h -> malloc-alignment.h 包含进来的 ,编辑器自然会将 sysdeps/generic/malloc-alignment.h 认为是所需要的 malloc-alignment.h。我进入sysdeps/i386/malloc-alignment.h 看到如下内容。
/* Define MALLOC_ALIGNMENT for malloc. i386 version.
Copyright (C) 2017-2020 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#ifndef _I386_MALLOC_ALIGNMENT_H
#define _I386_MALLOC_ALIGNMENT_H
#define MALLOC_ALIGNMENT 16
#endif /* !defined(_I386_MALLOC_ALIGNMENT_H) */
当我看到这个的时候,我意识到我应该找到问题的关键了,并且我通过搜索 malloc-alignment.h 发现glibc文件夹中只有这两个 malloc-alignment.h 文件,所以应该就是通过包含 sysdeps/i386/malloc-alignment.h 使得 MALLOC_ALIGNMENT = 16。那么现在只剩一个问题,为什么会包含 sysdeps/i386/malloc-alignment.h 而不是包含 sysdeps/generic/malloc-alignment.h,显然问题出在编译过程中。通过查看自己在上面手动编译中生成的 config.make 文件 查到如下代码
在 glibc 的 Makeconfig 中
# Complete path to sysdep dirs.
# `configure' writes a definition of `config-sysdirs' in `config.make'.
sysdirs := $(foreach D,$(config-sysdirs),$(firstword $(filter /%,$D) $(..)$D))
显然 sysdeps/i386/ 优先级高过 sysdeps/generic/
四、最后总结
glibc 在编译过程中,如果选择编译成32位系统, config-sysdirs 中 sysdeps/i386/ 优先级高过 sysdeps/generic/ ,那么在编译使用 sysdeps/i386/malloc-alignment.h 使得 #define MALLOC_ALIGNMENT 16 。最终导致 32位软件 chunk 的对齐 (MALLOC_ALIGNMENT) 为 0x10,并非 0x8 。 因为 sysdeps/i386/malloc-alignment.h 是在2.26以后才加入的,并且到本人写文章时的 glibc2.32 中依然存在,所以在 glibc2.26 之后,32位软件 chunk 的对齐 (MALLOC_ALIGNMENT) 为 0x10,并非 0x8。
测试
本着事实胜于雄辩的原则,我手动将 sysdeps/i386/malloc-alignment.h 中的 #define MALLOC_ALIGNMENT 16 修改为 #define MALLOC_ALIGNMENT 8 随后手动编译 ,并使用 patchelf 修改 libc.so ,进行调试结果如下。
从图中看出在 malloc(0x10) 时的chunk为 0x18 ,至此实验成功。(不使用 heap 命令查看是因为自己编译的glibc还是存在一些问题,懒得装一些依赖,就简单使用 x 命令显示)
注:对齐的问题并不影响 chunk 的使用,因为 bins 的各种宏均使用 MALLOC_ALIGNMENT 和 MALLOC_ALIGN_MASK 计算。各个讲解中 smallbins largebins 的大小范围并不是以软件计算,而是以 MALLOC_ALIGNMENT 计算,所以在 glibc2.26 之后,32位软件 bins 的取值范围和 64 位软件一样。(有待测试)
五、新的问题
我重新审视了 chunk 的各种宏定义后发现,对齐的问题并不影响 chunk 的使用,因为 bins 的各种宏均使用 MALLOC_ALIGNMENT 和 MALLOC_ALIGN_MASK 计算。各个讲解中 smallbins largebins 的大小范围并不是以软件计算,而是以 MALLOC_ALIGNMENT 计算,细节上的变化见下表。
通过上表可以看出 x86 系统下 glibc2.26 之前,与 x64 系统下和网上的教程内容超不多,只有在 glibc2.26 以后 x86 模式下chunk 的大小会存在一些问题。
1.largebin 与 smallbin的重叠
其实在 bins 的数据结构中并没有 smallbin 与 largebin 的区分,只是通过算法将其分开,序号在 2~63 的为 smallbin ,序号在 64~126 的为largebin ,但在 glibc2.26 以后 x86 模式下,index = 64 的 bin 虽然在 largebin 的判断内,但实际上是起到了 smallbin 的作用,只存储了固定大小的 chunk (0x3f0)。对应代码如下
#define NBINS 128
#define NSMALLBINS 64
#define SMALLBIN_WIDTH MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > 2 * SIZE_SZ)
#define MIN_LARGE_SIZE ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)
#define in_smallbin_range(sz) \
((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)
#define smallbin_index(sz) \
((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
+ SMALLBIN_CORRECTION)
#define largebin_index_32(sz) \
(((((unsigned long) (sz)) >> 6) <= 38) ? 56 + (((unsigned long) (sz)) >> 6) :\
((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) :\
((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
126)
// 这个就是针对 x86 系统下 glibc2.26 以后的情况的算法,但是比较好奇的是这段代码在2.23就已存在????
#define largebin_index_32_big(sz) \
(((((unsigned long) (sz)) >> 6) <= 45) ? 49 + (((unsigned long) (sz)) >> 6) :\
((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) :\
((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
126)
// XXX It remains to be seen whether it is good to keep the widths of
// XXX the buckets the same or whether it should be scaled by a factor
// XXX of two as well.
#define largebin_index_64(sz) \
(((((unsigned long) (sz)) >> 6) <= 48) ? 48 + (((unsigned long) (sz)) >> 6) :\
((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) :\
((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
126)
#define largebin_index(sz) \
(SIZE_SZ == 8 ? largebin_index_64 (sz) \
: MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz) \
: largebin_index_32 (sz))
#define bin_index(sz) \
((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))
所以在 glibc2.26 之后,32位软件 bins 的取值范围和 64 位软件基本上差不多,只有边界有微小不同而已。
2.fastbin 链表 空置。
fastbin 是受影响最大的 bins,因为 fastbin 的宏定义并没有使用MALLOC_ALIGNMENT 和 MALLOC_ALIGN_MASK 计算,而直接使用 SIZE_SZ,代码如下。
// malloc_test_2.c
#include<stdlib.h>
int main()
{
void * ptr[100];
for (int i = 0; i < 10;i++)
{
for (int j = 0;j < 10;j++)
{
ptr[ i*10 + j ] = malloc( i * 8 );
}
}
for(int i = 0; i < 10;i++)
{
for(int j = 0;j < 10;j++)
{
free(ptr[ i*10 + j ]);
}
}
}//gcc -m32 -g malloc_test_2.c -o malloc_test_2_x86
结果如下图