Je souhaite lire les chemins d'accès aux fichiers, qu'ils soient HDFS ou locaux. Actuellement, je passe les chemins locaux avec le préfixe fichier: // et les chemins HDFS avec le préfixe hdfs: // et écris du code comme suit:
Configuration configuration = new Configuration();
FileSystem fileSystem = null;
if (filePath.startsWith("hdfs://")) {
fileSystem = FileSystem.get(configuration);
} else if (filePath.startsWith("file://")) {
fileSystem = FileSystem.getLocal(configuration).getRawFileSystem();
À partir de là, j'utilise les API du FileSystem pour lire le fichier.
Pouvez-vous s'il vous plaît laissez-moi savoir s'il existe un meilleur moyen que celui-ci?
Est-ce que ça a du sens,
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
BufferedReader br = new BufferedReader(new InputStreamReader(;
System.out.println("Enter the file path...");
String filePath = br.readLine();
Path path = new Path(filePath);
FileSystem fs = path.getFileSystem(conf);
FSDataInputStream inputStream =;
Vous n'êtes pas obligé de mettre ce chèque si vous allez de cette façon. Obtenez le FileSystem directement à partir de Path et faites ce que vous voulez.
Vous pouvez obtenir la FileSystem
de la manière suivante:
Configuration conf = new Configuration();
Path path = new Path(stringPath);
FileSystem fs = FileSystem.get(path.toUri(), conf);
Vous n'avez pas besoin de déterminer si le chemin commence par hdfs://
ou file://
. Cette API fera le travail.
S'il vous plaît vérifier l'extrait de code ci-dessous qui liste les fichiers du chemin d'accès HDFS; à savoir la chaîne de chemin qui commence par hdfs://
. Si vous pouvez fournir la configuration Hadoop et le chemin local, il listera également les fichiers du système de fichiers local. à savoir la chaîne de chemin qui commence par file://
//helper method to get the list of files from the HDFS path
public static List<String> listFilesFromHDFSPath(Configuration hadoopConfiguration, String hdfsPath,
boolean recursive)
//resulting list of files
List<String> filePaths = new ArrayList<String>();
FileSystem fs = null;
//try-catch-finally all possible exceptions
//get path from string and then the filesystem
Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException
fs = path.getFileSystem(hadoopConfiguration);
//resolve hdfsPath first to check whether the path exists => either a real directory or o real file
//resolvePath() returns fully-qualified variant of the path
path = fs.resolvePath(path);
//if recursive approach is requested
if (recursive)
//(heap issues with recursive approach) => using a queue
Queue<Path> fileQueue = new LinkedList<Path>();
//add the obtained path to the queue
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
//get the file path from queue
Path filePath = fileQueue.remove();
//filePath refers to a file
if (fs.isFile(filePath))
else //else filePath refers to a directory
//list paths in the directory and add to the queue
FileStatus[] fileStatuses = fs.listStatus(filePath);
for (FileStatus fileStatus : fileStatuses)
} // for
} // else
} // while
} // if
else //non-recursive approach => no heap overhead
//if the given hdfsPath is actually directory
if (fs.isDirectory(path))
FileStatus[] fileStatuses = fs.listStatus(path);
//loop all file statuses
for (FileStatus fileStatus : fileStatuses)
//if the given status is a file, then update the resulting list
if (fileStatus.isFile())
} // for
} // if
else //it is a file then
//return the one and only file path to the resulting list
} // else
} // else
} // try
catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException
//if some problem occurs return an empty array list
return new ArrayList<String>();
} //
//close filesystem; not more operations
if(fs != null)
} catch (IOException e)
} // catch
} // finally
//return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
return filePaths;
} // listFilesFromHDFSPath
Si vous voulez vraiment utiliser l'API, la méthode suivante vous aidera à répertorier les fichiers uniquement à partir du système de fichiers local. à savoir chaîne de chemin qui commence par file://
//helper method to list files from the local path in the local file system
public static List<String> listFilesFromLocalPath(String localPathString, boolean recursive)
//resulting list of files
List<String> localFilePaths = new ArrayList<String>();
//get the Java file instance from local path string
File localPath = new File(localPathString);
//this case is possible if the given localPathString does not exit => which means neither file nor a directory
System.err.println("\n" + localPathString + " is neither a file nor a directory; please provide correct local path");
//return with empty list
return new ArrayList<String>();
} // if
//at this point localPath does exist in the file system => either as a directory or a file
//if recursive approach is requested
if (recursive)
//recursive approach => using a queue
Queue<File> fileQueue = new LinkedList<File>();
//add the file in obtained path to the queue
//while the fileQueue is not empty
while (!fileQueue.isEmpty())
//get the file from queue
File file = fileQueue.remove();
//file instance refers to a file
if (file.isFile())
//update the list with file absolute path
} // if
else //else file instance refers to a directory
//list files in the directory and add to the queue
File[] listedFiles = file.listFiles();
for (File listedFile : listedFiles)
} // for
} // else
} // while
} // if
else //non-recursive approach
//if the given localPathString is actually a directory
if (localPath.isDirectory())
File[] listedFiles = localPath.listFiles();
//loop all listed files
for (File listedFile : listedFiles)
//if the given listedFile is actually a file, then update the resulting list
if (listedFile.isFile())
} // for
} // if
else //it is a file then
//return the one and only file absolute path to the resulting list
} // else
} // else
//return the resulting list; list can be empty if given path is an empty directory without files and sub-directories
return localFilePaths;
} // listFilesFromLocalPath