Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphml writer #8

Open
geoHeil opened this issue Jan 17, 2017 · 8 comments
Open

graphml writer #8

geoHeil opened this issue Jan 17, 2017 · 8 comments
Milestone

Comments

@geoHeil
Copy link

geoHeil commented Jan 17, 2017

I just found your great package and your graphml loader https://github.com/sparkling-graph/sparkling-graph/blob/master/loaders/src/main/scala/ml/sparkling/graph/loaders/graphml/GraphMLLoader.scala

and wonder if a similar writer exists?

for gexf I have found the following code online but that will not play nice with gephi.

def toGexf[VD, ED](g: Graph[VD, ED]): String =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
      "<gexf xmlns=\"http://www.gexf.net/1.2draft\" version=\"1.2\">\n" +
      "  <graph mode=\"static\" defaultedgetype=\"directed\">\n" +
      "    <nodes>\n" +
      g.vertices.map(v => "      <node id=\"" + v._1 + "\" label=\"" +
        v._2 + "\" />\n").collect.mkString +
      "    </nodes>\n" +
      "    <edges>\n" +
      g.edges.map(e => "      <edge source=\"" + e.srcId +
        "\" target=\"" + e.dstId + "\" label=\"" + e.attr +
        "\" />\n").collect.mkString +
      "    </edges>\n" +
      "  </graph>\n" +
      "</gexf>"
@riomus
Copy link
Member

riomus commented Jan 17, 2017

Hello, currently only data reading is supported. Providing that feature will be quite easy if we only want to export nodes and edges without data. I can try to support that in near future .

@geoHeil
Copy link
Author

geoHeil commented Jan 17, 2017

It would be cool if export supports (selected columns) as data as well.

@riomus
Copy link
Member

riomus commented Jan 17, 2017

Ok, i will try to do it.

@geoHeil
Copy link
Author

geoHeil commented Jan 17, 2017

I created the following function which is "nearly there", but the XML produced will not really work for gephi. Maybe you spot a problem.

def toGraphML(g: GraphFrame): String =
    s"""
       |<?xml version="1.0" encoding="UTF-8"?>
       |<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
       |         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       |         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
       |         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
       |<!-- Created by igraph -->
       |  <key id="v_name" for="node" attr.name="name" attr.type="string"/>
       |  <key id="v_fraud" for="node" attr.name="label" attr.type="int"/>
       |  <key id="e_edgeType" for="edge" attr.name="edgeType" attr.type="string"/>
       |  <graph id="G" edgedefault="directed">
       |${
      g.vertices.map {
        case Row(id, name, fraud) =>
          s"""
             |      <node id="${id}">
             |         <data key = "v_name">${name}</data>
             |         <data key = "v_fraud">${fraud}</data>
             |      </node>
           """.stripMargin
      }.collect.mkString.stripLineEnd
    }
       |${
      g.edges.map {
        case Row(src, dst, relationship) =>
          s"""
             |      <edge source="${src}" target="${dst}">
             |      <data key="e_edgeType">${relationship}</data>
             |      </edge>
           """.stripMargin
      }.collect.mkString.stripLineEnd
    }
       |  </graph>
       |</graphml>
  """.stripMargin

  val v = spark.createDataFrame(List(
    ("a", "Alice", 1),
    ("b", "Bob", 0),
    ("c", "Charlie", 0),
    ("d", "David", 0),
    ("e", "Esther", 0),
    ("f", "Fanny", 0),
    ("g", "Gabby", 0)
  )).toDF("id", "name", "fraud")
  val e = spark.createDataFrame(List(
    ("a", "b", "A"),
    ("b", "c", "B"),
    ("c", "b", "B"),
    ("f", "c", "B"),
    ("e", "f", "B"),
    ("e", "d", "A"),
    ("d", "a", "A"),
    ("a", "e", "A")
  )).toDF("src", "dst", "relationship")
  val g = GraphFrame(v, e)

@riomus
Copy link
Member

riomus commented Jan 17, 2017

Unfortunately, we can not do that this way. For large graphs, it will cause memory problems.

@geoHeil
Copy link
Author

geoHeil commented Jan 17, 2017

But when collecting the graphml file to one node (i.e. for visualization purposes (gephi)) shoultn't the assumption hold that the data is not too big?

@blacknred0
Copy link

That is an assumption, but I think if a feature like this were to go in, it would be nice to analyze/visualize the whole graph on gephi rather than just a node.

@riomus
Copy link
Member

riomus commented Jan 19, 2017

We should not assume that export of data to graph ml is only for Gephi visualization. What is more, implementing it using data bricks XML writer, or Hadoop XML support should not be so complex.

@riomus riomus added this to the 0.0.7 milestone Mar 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants