Content Table

自定义 Content Type Prober

在浏览器中点击 PDF 文件的链接:

  • 在 A 网站点击 a.pdf,浏览器自动下载 a.pdf
  • 在 B 网站点击 b.pdf,浏览器直接打开 b.pdf

被访问都是 PDF 文件,为啥在网站 A 和在网站 B 访问时,浏览器的行为不一样,是什么东西影响它在浏览器中的行为呢?答案就是浏览器会根据响应的 Content-Type 来决定下载还是打开它们 (当然 Content-Type 的值只是一个 hit,具体的操作还是要看浏览器的实现)。

文件的类型非常多,怎么获取文件的 Content-Type 是什么呢?Java 1.7 提供 java.nio.file.Files.probeContentType(Path path) 用于尝试获取文件的 Content-Type,但发现支持的文件类型不够全面,查看方法 probeContentType 的帮助文档:

This method uses the installed FileTypeDetector implementations to probe the given file to determine its content type. Each file type detector’s probeContentType is invoked, in turn, to probe the file type. If the file is recognized then the content type is returned. If the file is not recognized by any of the installed file type detectors then a system-default file type detector is invoked to guess the content type.

A given invocation of the Java virtual machine maintains a system-wide list of file type detectors. Installed file type detectors are loaded using the service-provider loading facility defined by the ServiceLoader class. Installed file type detectors are loaded using the system class loader. If the system class loader cannot be found then the extension class loader is used; If the extension class loader cannot be found then the bootstrap class loader is used. File type detectors are typically installed by placing them in a JAR file on the application class path or in the extension directory, the JAR file contains a provider-configuration file named java.nio.file.spi.FileTypeDetector in the resource directory META-INF/services, and the file lists one or more fully-qualified names of concrete subclass of FileTypeDetector that have a zero argument constructor. If the process of locating or instantiating the installed file type detectors fails then an unspecified error is thrown. The ordering that installed providers are located is implementation specific.

从文档中可以知道,只需要实现接口 FileTypeDetector,并以 SPI 的方式配置好,打成 jar 包放到项目的 classpath 中,调用 Files.probeContentType 时会先使用自定义的 FileTypeDetector 尝试获取文件的 Content-Type,如果获取不到返回 null 才会使用系统默认自带的 FileTypeDetector 继续尝试获取。

创建项目 content-type-prober 来介绍具体实现,项目的目录结构为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
content-type-prober
├── build.gradle
└── src
├── main
│ ├── java
│ │ └── xtuer
│ │ └── ContentTypeProber.java
│ └── resources
│ ├── config
│ │ └── content-type.properties
│ └── META-INF
│ └── services
│ └── java.nio.file.spi.FileTypeDetector
└── test
└── java
└── ContentTypeProberTest.java

其中:

  • main/java/xtuer/ContentTypeProber 是 FileTypeDetector 接口的实现类,核心代码
  • META-INF/services/java.nio.file.spi.FileTypeDetector 用于 SPI 的配置
  • config/content-type.properties 是文件类型与 Content-Type 的映射关系
  • ContentTypeProberTest 是测试类

下面就逐个文件的进行介绍。

ContentTypeProber.java

主要逻辑为项目启动时自动使用 SPI 机制加载类 ContentTypeProber 到 JVM,在静态代码块里把 content-type.properties 的内容加载到 Properties 对象 CONTENT_TYPE_PROPS 中。

调用 Files.probeContentType(path) 获取文件的 Content-Type 时会先调用 ContentTypeProber.probeContentType(path) 进行获取,在其中使用文件的后缀名从 CONTENT_TYPE_PROPS 中查询对应的 Content-Type。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
package xtuer;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Path;
import java.nio.file.spi.FileTypeDetector;
import java.util.Properties;

/**
* 文件的 content type 探测工具类,通过 SPI 自动注册到 JVM FileTypeDetector,
* 通过调用 Files.probeContentType(Path) 使用。
*/
public final class ContentTypeProber extends FileTypeDetector {
/**
* 日志对象.
*/
private static final Logger log = LoggerFactory.getLogger(ContentTypeProber.class);

/**
* Content type properties file path.
*/
private static final String CONTENT_TYPE_PROPS_PATH = "config/content-type.properties";

/**
* Content type properties loaded from file CONTENT_TYPE_PROPS_PATH.
*/
private static final Properties CONTENT_TYPE_PROPS = new Properties();

static {
// Loading content type when class loaded.
loadContentTypes();
}

/**
* 加载 content type.
*/
private static void loadContentTypes() {
log.info("[开始] 加载 content type file...");

ClassLoader classLoader = Thread.currentThread().getContextClassLoader();

try (InputStream in = classLoader.getResourceAsStream(CONTENT_TYPE_PROPS_PATH)) {
if (in != null) {
CONTENT_TYPE_PROPS.load(in);
} else {
log.warn("[错误] content type file 不存在");
}
} catch (IOException ex) {
log.warn("[错误] 加载 content type file 异常: {} ", ex.getMessage());
}

log.info("[结束] 加载 content type file");
}

/**
* 获取传入的 Path 的文件名后缀.
*
* @param path 路径
* @return 返回文件名后缀
*/
public static String getFilenameExtension(Path path) {
String name = path.getFileName().toString();
int dot = name.lastIndexOf(".");
return dot == -1 ? "" : name.substring(dot + 1);
}

/**
* 根据文件后缀名获取 content type,如果返回 null 则继续调用系统中其他注册的 FileTypeDetector 继续尝试获取。
*
* @param path 文件路径
* @return 返回文件的 content type,如果不存在则返回 null
*/
@Override
public String probeContentType(Path path) {
String ext = getFilenameExtension(path);
return CONTENT_TYPE_PROPS.getProperty(ext);
}
}

java.nio.file.spi.FileTypeDetector

文件 java.nio.file.spi.FileTypeDetector 为 SPI 的配置文件,需要放到 META-INF/services 目录里,其内容为接口 java.nio.file.spi.FileTypeDetector 的实现类的全路径名 (fully-qualified name):

1
xtuer.ContentTypeProber

content-type.properties

文件类型与 Content-Type 的映射关系使用 Java properties 格式: 后缀名 = Content-Type。内容过长,以下部分仅作样例展示,可点击下载:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
3dm     = x-world/x-3dmf
3dmf = x-world/x-3dmf
a = application/octet-stream
aab = application/x-authorware-bin
aam = application/x-authorware-map
aas = application/x-authorware-seg
abc = text/vndabc
acgi = text/html
afl = video/animaflex
ai = application/postscript
aif = audio/aiff
aif = audio/x-aiff
aifc = audio/aiff
aifc = audio/x-aiff
aiff = audio/aiff
aiff = audio/x-aiff
aim = application/x-aim
aip = text/x-audiosoft-intra
ani = application/x-navi-animation
aos = application/x-nokia-9000-communicator-add-on-software
aps = application/mime
arc = application/octet-stream
arj = application/arj
arj = application/octet-stream

由于使用 classLoader.getResourceAsStream(CONTENT_TYPE_PROPS_PATH) 加载 content-type.properties 文件,会优先在项目的 classpath 里搜索,搜索不到再从 classpath 的 jar 中进行搜索,当上面提供的 Content-Type 不满足需求时,可以在项目的 config/content-type.properties 里配置具体需要的 Content-Type,会覆盖上面提供的 Content-Type (注意是覆盖,不是合并),不需要去修改 ContentTypeProber 所在的 jar 包。

ContentTypeProberTest.java

测试自定义的 ContentTypeProber 是否生效,执行下面的程序不报异常则说明生效。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import org.junit.Assert;
import org.junit.Test;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ContentTypeProberTest {
@Test
public void probeTest() throws IOException {
Assert.assertEquals("text/plain", Files.probeContentType(Paths.get("foo/test.txt")));
Assert.assertEquals("application/vndms-excel", Files.probeContentType(Paths.get("foo/test.xls")));
Assert.assertEquals("chemical/x-pdb", Files.probeContentType(Paths.get("foo/test.xyz")));
Assert.assertNull(Files.probeContentType(Paths.get("foo/test.bib")));
Assert.assertNull(Files.probeContentType(Paths.get("foo/txt")));
}
}